Conjoined Genes as Common Events in Childhood Acute Lymphoblastic Leukemia

Simple Summary Acute lymphoblastic leukemia (ALL) is the most frequent childhood cancer. In recent years, broad application of NGS technologies enabled the discovery of novel genomically defined ALL. In this study, as a proof-of-principle, we applied RNA-seq technology to comprehensively profile the transcriptional landscape of a collection of 10 childhood BCP-ALL cases, and performed a deep bioinformatics analysis including several publicly available datasets, in order to characterize their full spectrum of transcriptional events. The paired-end RNA sequencing of our BCP-ALL pediatric cohort revealed a total of 9001 raw fusion events, which, after filtering, resulted in 245 candidate fusions. Overall, 235 out of 245 events were intra-chromosomal fusions, among which 229 involved two contiguous or overlapping genes, also known as conjoined genes (CGs). Among them, we identified a subset of 14 CGs (6.1%) exclusively expressed in leukemic cases but neither in solid cancers nor in normal samples. These events could be suggestive of a novel mechanism of transcriptional regulation in childhood leukemia and may represent novel potential leukemia-specific biomarkers. Abstract Acute lymphoblastic leukemia (ALL) is the most frequent childhood cancer. For the last three decades, conventional cytogenetic and molecular approaches allowed the identification of genetic abnormalities having prognostic and therapeutic relevance. Although the current cure rate in pediatric B cell acute leukemia is approximately 90%, it remains one of the leading causes of mortality in childhood. Furthermore, in the contemporary protocols, chemotherapy intensity was raised to the maximal levels of tolerability, and further improvements in the outcome will depend on the characterization and reclassification of the disease, as well as on the development of new targeted drugs. The recent technological advances in genome-wide profiling techniques have allowed the exploration of the molecular heterogeneity of this disease, even though some potentially interesting biomarkers such as conjoined genes have not been deeply investigated yet. In the present study, we performed the transcriptome sequencing (RNA-seq) of 10 pediatric B cell precursor (BCP)-ALL cases with different risk (four standard- and six high-risk patients) enrolled in the Italian AIEOP-BFM ALL2000 protocol, in order to characterize the full spectrum of transcriptional events and to identify novel potential genetic mechanisms sustaining their different early response to therapy. Total RNA was extracted from primary leukemic blasts and RNA-seq was performed by Illumina technology. Bioinformatics analysis focused on fusion transcripts, originated from either inter- or intra-chromosomal structural rearrangements. Starting from a raw list of 9001 candidate events, by employing a custom-made bioinformatics pipeline, we obtained a short list of 245 candidate fusions. Among them, 10 events were compatible with chromosomal translocations. Strikingly, 235/245 events were intra-chromosomal fusions, 229 of which involved two contiguous or overlapping genes, resulting in the so-called conjoined genes (CGs). To explore the specificity of these events in leukemia, we performed an extensive bioinformatics meta-analysis and evaluated the presence of the fusions identified in our 10 BCP-ALL cohort in several other publicly available RNA-seq datasets, including leukemic, solid tumor and normal sample collections. Overall, 14/229 (6.1%) CGs were found to be exclusively expressed in leukemic cases, suggesting an association between CGs and leukemia. Moreover, CGs were found to be common events both in standard- and high-risk BCP-ALL patients and it might be suggestive of a novel potential transcriptional regulation mechanism active in leukemic cells.

Simple Summary: Acute lymphoblastic leukemia (ALL) is the most frequent childhood cancer. In recent years, broad application of NGS technologies enabled the discovery of novel genomically defined ALL. In this study, as a proof-of-principle, we applied RNA-seq technology to comprehensively profile the transcriptional landscape of a collection of 10 childhood BCP-ALL cases, and performed a deep bioinformatics analysis including several publicly available datasets, in order to characterize their full spectrum of transcriptional events. The paired-end RNA sequencing of our BCP-ALL pediatric cohort revealed a total of 9001 raw fusion events, which, after filtering, resulted in 245 candidate fusions. Overall, 235 out of 245 events were intra-chromosomal fusions, among which 229 involved two contiguous or overlapping genes, also known as conjoined genes (CGs). Among them, we identified a subset of 14 CGs (6.1%) exclusively expressed in leukemic cases but neither in solid cancers nor in normal samples. These events could be suggestive of a novel mechanism of transcriptional regulation in childhood leukemia and may represent novel potential leukemia-specific biomarkers.
Abstract: Acute lymphoblastic leukemia (ALL) is the most frequent childhood cancer. For the last three decades, conventional cytogenetic and molecular approaches allowed the identification of genetic abnormalities having prognostic and therapeutic relevance. Although the current cure rate in pediatric B cell acute leukemia is approximately 90%, it remains one of the leading causes of mortality in childhood. Furthermore, in the contemporary protocols, chemotherapy intensity was raised to the maximal levels of tolerability, and further improvements in the outcome will depend on the characterization and reclassification of the disease, as well as on the development of new targeted drugs. The recent technological advances in genome-wide profiling techniques have allowed the exploration of the molecular heterogeneity of this disease, even though some potentially interesting biomarkers such as conjoined genes have not been deeply investigated yet. In the present study, we performed the transcriptome sequencing (RNA-seq) of 10 pediatric B cell precursor (BCP)-ALL cases with different risk (four standard-and six high-risk patients) enrolled in the Italian AIEOP-BFM ALL2000 protocol, in order to characterize the full spectrum of transcriptional events and to identify novel potential genetic mechanisms sustaining their different early response to therapy. Total RNA was extracted from primary leukemic blasts and RNA-seq was performed by Illumina technology. Bioinformatics analysis focused on fusion transcripts, originated from either inter-or intra-chromosomal structural rearrangements. Starting from a raw list of 9001 candidate events, by employing a custom-made bioinformatics pipeline, we obtained a short list of 245 candidate fusions. Among them, 10 events were compatible with chromosomal translocations. Strikingly, 235/245 events were intra-chromosomal fusions, 229 of which involved two contiguous or overlapping

Introduction
Acute lymphoblastic leukemia (ALL) is the most frequent childhood cancer. ALL onset is due to a multi-step complex process, characterized by the expansion of a pre-leukemic clone which accumulates cooperative genetic events required for full malignant transformation and clinical manifestation [1,2]. For the last three decades, several conventional cytogenetic studies of genetic aberrations that include chromosomal translocations and alterations in chromosome number have provided information on the pathogenesis of ALL. Common translocations in children with B-ALL include t(12;21) [ETV6-RUNX1](25%), t(1;19) [TCF3-PBX1] (5%), t(9;22) [BCR-ABL1] (3%) and translocations involving the MLL gene with various fusion partner genes (5%). Gains in whole chromosomes, or high hyperdiploidy (>50 chromosomes) accounts for 25% of childhood ALL, whereas hypodiploidy (< 44 chromosomes) accounts for approximately 1% of cases. Several of these genetic changes have prognostic and therapeutic implications and are important in risk stratification schemas providing more intensive and/or targeted treatment for patients at risk of developing a relapse, e.g., BCR-ABL1-positive or KMT2A rearranged, while limiting toxic effects for patients with favorable prognosis, e.g., ETV6-RUNX1-positive or hyperdiploid cases. However, despite these remarkable progresses, B cell precursor (BCP)-ALL remains one of the leading causes of mortality in childhood [3][4][5]. Furthermore, in the contemporary protocols, chemotherapy intensity was raised to the maximal levels of tolerability, and further improvements in the outcome will depend on the characterization and re-classification of cases, in particular, in the subset of BCP-ALL, in which no major genetic alteration could be detected with conventional cytogenetic and molecular approaches. Technological advances in the genomics field over the last decade, particularly with the advent of next-generation sequencing (NGS), helped unravelling ALL genomic landscape and biology. In recent years, broad application of NGS technologies, notably whole-transcriptome sequencing (RNA-seq), has redefined the molecular taxonomy of ALL. RNA-seq enabled the discovery of novel genomically defined ALL subtypes, characterized by chromosomal rearrangements cryptic on karyotyping (e.g., DUX4-rearranged ALL), new fusion genes (e.g., MEF2D, ZNF384, or NUTM1-R ALL) and expression profiles similar to classic BCP-ALL subtypes, e.g., BCR-ABL-like, ETV6-RUNX1-like [6,7]. In addition, NGS technology offered the ability to simultaneously identify heterogeneous genetic alterations, emerged as prognostically relevant such as IKZF1 deletions, sequence mutations, as well as complexly rearranged transcripts that were in general neglected or underestimated in previous reports. One such intriguing example is the "conjoined genes" (CGs), which are "read-through transcripts" or "co-transcribed genes" derived from the non-traditional splicing between two or more adjacent or overlapping genes which lie on the same chromosome. As a result, the two adjacent genes fuse together at the transcript level, with no alteration in their chromosome structure. In some cases, the transcripts formed by CGs are translated to form chimeric or completely novel proteins [8][9][10]. Therefore, they represent a new repertoire for the discovery of novel candidate biomarkers and drug targets [11].
Although CGs have been identified in several cancer types, no reports have dealt with CGs in pediatric BCP-ALL up to now [11][12][13].
In this study, we applied the RNA-seq technology as a proof-of-principle to comprehensively profile the transcriptional landscape of a collection of childhood BCP-ALL cases with different recurrence risk, and performed a deep bioinformatics analysis including also several publicly available datasets, in order to characterize their full spectrum of transcriptional events and to reveal the comprehensive expression of CGs.

Materials and Methods
Ten pediatric BCP-ALL cases were profiled by whole-transcriptome RNA-seq technology. All the cases selected were already enrolled in the Italian AIEOP-BFM ALL2000 clinical protocol. They were homogeneous for all clinical or genetic risk factors but differed by minimal residual disease (MRD) after induction (four standard-(SR) and six high-risk (HR) patients, according to MRD at day 33 and 78 of treatment) ( Table 1). Total RNA was extracted from primary bone marrow (BM) leukemic blasts using the guanidine thiocyanate-phenol-chloroform method and checked for integrity by microcapillary electrophoresis on the 2100 Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA, USA). Starting from 2-3 µg of total RNA per sample, poly-A+ RNA-seq libraries were prepared using the TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA), according to manufacturer's instructions, and sequenced on the Genome Analyzer IIx platform (Illumina) in 76-cycle paired-end runs, generating a mean of 80 M raw reads/sample. After fastq quality control by using FastQC tool (https://www.bioinformatics.babra ham.ac.uk/projects/fastqc/, accessed on 15 April 2022), candidate fusions were searched on raw sequencing reads by FusionMap (v.10.0, 9) [14], using human HG38 as the reference genome and Gencode v28 (excluding known read-through transcripts) as the gene model. The "FilterUnlikelyFusionReads" parameter was set to "False" in order to retrieve also readthrough events. Then, we implemented a custom bioinformatics pipeline to identify all the putative fusions, to filter them according to stringent qualitative criteria and remove false positives, and, thus, to identify a subset of confident fusion events ("candidate fusions") originated from chromosomal rearrangements (inter-or intra-chromosomal translocations) or not (conjoined genes) ( Figure 1). Briefly, the raw list of events was filtered out excluding those present in FusionMap own blacklist, those with the same chromosomal breakpoint, those having a complete match of the candidate junction on the genome, those with a read depth <10 in the region flanking the rearrangement or an incidence of the reads supporting the event <5% of the total and those with no reads spanning across the junction after remapping the original reads on the FusionMap-generated transcript sequence. Candidate fusions of interest were validated in the original samples by RT-PCR and/or FISH assays. SNP array copy number profiles already produced for these samples on Cytogenetics Whole Genome 2.7 M Arrays (Affymetrix, Santa Clara, CA, USA) were exploited to assess the presence of chromosomal imbalances accompanying fusion events.
To explore the specificity of these events in leukemia, a representative selection of these candidate fusion transcripts was experimentally evaluated in a commercial RNA library of human normal tissues derived from healthy donors (Human Total RNA Master Panel II, Diatech LabLine, Jesi, Italy), including lung, trachea, skeletal muscle, brain, prostate, testis, uterus, adrenal gland, spleen, thymus, salivary gland, stomach, thyroid and kidney tissues. Moreover, we performed an extensive bioinformatics meta-analysis to evaluate the incidence of these candidate fusions in other publicly available RNA-seq datasets, including: 1 AML study (27 cases from Leucegene project); 2 T-ALL studies (12 cases from Leucegene project, and 14 cases from COG study); 1 B-ALL study (10 cases at diagnosis and 10 cases at relapse); 10 solid cancer types from TCGA project (bladder urothelial carcinoma (BLCA), breast carcinoma (BRCA), cervical squamous cell carcinoma (CESC), colon adenocarcinoma (COAD), kidney renal clear cell carcinoma (KIRC), low grade glioma (LGG), lung adenocarcinoma (LUAD), prostate adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM), thyroid carcinoma (THCA), 20 cases for each cancer type, for a total of 200 cases); 1 CEU population dataset from the 1000 genomes sample collection (91 samples from Geuvadis consortium). All the downloaded cases are listed in Table S1. Fastq files were checked by FastQC tool and then mapped on the junctions of the candidate fusions, in order to find reads spanning over them. A specific filter was implemented with the aim of excluding false positive matches with reads mapping only on one side of the junction. Finally, events were screened against the FusionHub [15] and the Atlas of Genetics Oncology [16] databases, in order to filter out candidates already described as associated to non-tumoral samples, and annotated by searching support in already known gene models (Gencode, GenBank, RefSeq, UCSC genes and VEGA).

Results
The paired-end RNA sequencing of our BCP-ALL pediatric cohort revealed a total of 9001 raw fusion events, which, after filtering, resulted in 245 candidate fusions. On average, over the 10 BCP-ALL cases, 52% of the raw events were filtered out because they were in the FusionMap blacklist, 19% involved the same breakpoints over multiple events, 38% had a complete match of the candidate junction on the genome, 22% had <10× coverage, 37% were characterized by having <5% incidence over the linear transcript and 8% were excluded since no support was found when remapping the original sequencing read on the junction. Overall, 235 out of 245 events were intra-chromosomal fusions. Among them, 229 involved two contiguous or overlapping genes (CGs), with 221 (97%) CGs generated by genes in the same orientation, while 8 (3%) by genes in opposite direction. 204 CGs (89%) were identified both in cancer patients and in normal tissues, whereas 11 were detected exclusively in cancer, both in leukemia and solid tumors (Table 2).
In particular, KLHL22::SCARF2 was found as a transcript variant (ENST00000429594) of SCARF2 gene, annotated as "nonsense-mediated decay", and was present also in the ConjoinG database (CGHSA0597); at the same time, PPP1R3F::LL0XNC01-7P3.1 was previously reported in GenBank (LF211393) as a "Polycomb-Associated Non-Coding RNA" [17]. Some CGs identified only in leukemia samples involved long non-coding RNAs. Furthermore, by transcriptomic analysis we were able to identify 10 fusions compatible with inter-chromosomal translocations not previously showed by conventional methods ( (Table 4). Details about reads supporting both CGs and fusions are depicted in Figure S1.
To explore the specificity of these events, a representative selection of CGs and translocations was evaluated by RT-PCR, SNP arrays and Sanger sequencing (Figure 2A-C).
The novel PAX5::POM121C fusion was identified in one SR BII ALL patient, confirmed by RT-PCR and Sanger sequencing. Additionally, NUP214::ABL1 fusion was identified in only one HR BI ALL case and confirmed by the same methods. Most of the remaining rearrangements were not experimentally investigated; however, they were identified in silico in other public RNA-seq datasets, which can be considered as an indirect validation. Considering the total of 16 fusion transcripts and 25 cancer-specific CGs, all our BCP-ALL patients, except for one, carried more than one transcriptional rearrangement; the mean number of events was 4.5 for each patient, with range from 1 to 9, demonstrating the complex transcriptional landscape of leukemia. According to MRD classification, the four SR patients presented a mean number of transcriptional rearrangements of 5 (range 3-7), while in the HR group, the number was 4 but with a wider range (1-9 events). No evident differences were observed both in the entire cohort and in the two MRD groups, probably due to the low number of cases.

Discussion
Pediatric cancers, even leukemia, differ from adult tumors, especially for their very low mutational rate.
Therefore, their etiology may involve further oncogenic mechanisms, such as the development of chimeric transcripts [18,19]. Here, by applying the RNA-seq technology, we comprehensively profiled the transcriptional landscape of 10 childhood BCP-ALL patients negative for recurrent translocations and with different recurrence risk. Overall, we identified 16 different transcriptional fusions not previously identified by conventional cytogenetic, comprising 10 translocations and 6 intra-chromosomal events compatible with deletions. In particular, the RNA-seq technology allowed us to identify a new PAX gene translocation (PAX5::POM121C, as we already reported in [20]), and a fusion involving NUP214 and ABL1 genes, respectively, in one SR and one HR B-ALL case. NUP214::ABL1 was originally reported as a recurrent abnormality in T-ALL, accounting for 6% of adult and less than 2% of pediatric T-ALL, as recently reported by the Associazione Italiana di Onco-Ematologia Pediatrica [21]. Otherwise, only few cases have been reported in B-ALL [22][23][24]. Moreover, the fusion involving MAEA::CTBP1 (detected in one SR B-ALL case from our cohort), although with different breakpoints, has already been described in colon adenocarcinoma and acute myeloid leukemia [25].
Besides chromosomal rearrangements, RNA processing events, such as cis-and transsplicing, also contribute to the formation of chimeric RNAs. Alternative splicing between exons of neighboring genes is a RNA processing event that occurs within a single pre-mRNA, where the transcription machinery reads through the intergenic regions of the two genes. Although only few examples of spliced RNA chimeras were experimentally confirmed in mammalian cells, bioinformatics analysis of paired-end RNA-seq data have successfully identified many chimeric RNAs composed of two adjacent genes, which could originate from transcriptional read-through [11]. Since it is evident CGs are not merely artifacts of transcription, then they must be the result of some specific genomic requirements and have well-defined functional roles. This idea is further strengthened by the fact that some CGs have endured purifying evolutionary selective pressure and are conserved in different animal species. In addition to protein evolution, CGs can be responsible for gene regulation by preventing the expression of at least one or more of the parent genes [10].
Until recently, the assumption was that all gene fusions and fusion products (RNAs and proteins) were exclusive to cancer. This dogma has been challenged as more groups demonstrated the presence of fusion RNAs and proteins in non-pathological situations. Thus, many CGs have been discovered in both normal and cancer cells and collected in different databases, such as the ConjoinG database, dedicated to the 800 CGs identified in the human genome up to now (https://metasystems.riken.jp/conjoing/, accessed on 15 April 2022) [26][27][28][29][30]. The implications of these intergenic spliced chimeric RNAs are multifaceted. On one hand, their presence in normal tissues and cells makes doubtful the use of fusion RNAs in cancer diagnosis and treatment. On the other hand, even though they represent new biomarkers and drug targets, no reports have dealt with CGs in pediatric BCP-ALL.
In our cohort, 229 out of the 245 fusion events identified by RNA-seq were CGs involving two contiguous or overlapping genes. As previously reported in literature, most of them (89%) were expressed also in normal samples, so that they were not specific for leukemia. Nonetheless, we identified a subset of 14 CGs (6.1%) exclusively expressed in leukemic cases but neither in solid cancers nor in normal samples. These events could be suggestive of a novel mechanism of transcriptional regulation in childhood leukemia and may represent novel potential leukemia-specific biomarkers. Some of them, even when involving cancer genes, were only indirectly validated by identification in other public RNA-seq datasets. This could be related to the higher sensitivity of RNA-seq technology as compared to other molecular assays (RT-PCR, FISH and SNP array) to identify sub-clonal genetic events and confirm leukemia as an oligoclonal disease. Some of the leukemiaspecific CGs identified in our cohort involved long non-coding RNAs. Although discoveries on long non-coding RNAs are increasing, their function both in normal and malignant tissue remains unclear [31]. As previously reported in other hematological malignancies, non-coding RNAs involved in chimeric transcripts might lead to dysregulation of fusion partner gene expression [32,33]. In particular, in B-other ALL, this genetic mechanism could be of interest to explain some gene expression alterations in leukemic cells. We also observed that CGs were present and recurrent both in MRD stratified standard-and high-risk patients, suggesting a possible role on the disease onset rather than on treatment response. However, due to the small size of the cohort analyzed, no conclusion can be reached and certainly an extensive leukemia cohort is needed to define the possible role of CGs on chemotherapy response or risk stratification at diagnosis. Most of our cases carried more than one transcriptional event. This observation corroborates the complex and heterogeneous transcriptional landscape of acute leukemia, in particular for those patients not carrying recurrent translocations, as in our cohort. Moreover, the co-presence of more CGs in the same leukemia case suggests their possible role as cooperative rather than primary genetic events in leukemogenesis.

Conclusions
In conclusion, even in a very small cohort of children we demonstrated the presence of CGs specific for acute lymphoblastic leukemia, most of them involving non-coding RNAs. This study is a proof of principle for further studies crucial to confirm and to expand the knowledge on the possible role of CGs in children and adult ALL. In particular, the evaluation of CGs in patients without any recurrent translocation at diagnosis in an extensive cohort of adult and pediatric patients might be of interest to dissect the role of these rearrangements in leukemogenesis and risk assessment.
In this setting, the extensive use of NGS approaches at ALL diagnosis, in particular whole transcriptome analysis, could be a useful tool to further dissect and identify potential novel mechanisms of transcriptional regulation in leukemic cells, in order to clarify the relevant genetic mechanisms leading to the development of the disease.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers14143523/s1. Table S1: Publicly available RNA-seq datasets; Figure S1: reads supporting CGs and fusion genes. Funding: The authors deeply thank the "Comitato Maria Letizia Verga" for its support with "Passaporto Genetico" project. The project has also been funded by Associazione Italiana per la Ricerca sul Cancro (AIRC) IG2015 no. 17593 (to G.C.).
Institutional Review Board Statement: BCP-ALL cases enrolled in Italy in the AIEOP-BFM ALL2000 /R2006 protocols (Eudract number: 2007-004270-43). Investigation has been conducted in accordance with the ethical standards of the Declaration of Helsinki and to national and international guidelines. The study is approved by each institutional review board. A written informed consent was obtained from patients or legal representatives.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Table S1.