Identification and Characterization of Novel Fusion Genes with Potential Clinical Applications in Mexican Children with Acute Lymphoblastic Leukemia

Acute lymphoblastic leukemia is the most common type of childhood cancer worldwide. Mexico City has one of the highest incidences and mortality rates of this cancer. It has previously been recognized that chromosomal translocations are important in cancer etiology. Specific fusion genes have been considered as important treatment targets in childhood acute lymphoblastic leukemia (ALL). The present research aimed at the identification and characterization of novel fusion genes with potential clinical implications in Mexican children with acute lymphoblastic leukemia. The RNA-sequencing approach was used. Four fusion genes not previously reported were identified: CREBBP-SRGAP2B, DNAH14-IKZF1, ETV6-SNUPN, ETV6-NUFIP1. Although a fusion gene is not sufficient to cause leukemia, it could be involved in the pathogenesis of the disease. Notably, these new translocations were found in genes encoding for hematopoietic transcription factors which are known to play an important role in leukemogenesis and disease prognosis such as IKZF1, CREBBP, and ETV6. In addition, they may have an impact on the prognosis of Mexican pediatric patients with ALL, with the potential to be included in the current risk stratification schemes or used as therapeutic targets.


Introduction
Acute lymphoblastic leukemia (ALL) is the most common cancer in children under 15 years old. Mexico City and Hispanic children in general have one of the highest incidences and mortality rates of this cancer. Additionally, more than half of these children (58.8%) are classified with a high risk of relapse at the moment of diagnosis confirmation [1,2]. In Mexico City, ALL represents the second cause of death in children aged between 1 and 14 years [3], with an annual incidence of 49.5 cases per million children under 15 years old [1]. It has been reported that ALL is a disease associated with a great genetic variation; because of this, we hypothesize that the high incidence of childhood ALL in our population may be due, among other factors, to a distinctive genetic variation [4].
Several studies have realized the genomic characterization of children with ALL, finding distinct patterns of genes and pathways altered with somatic variants, deletion, chromosomal translocations, and the presence of DNA copy number alterations. These studies have determined that genomic heterogeneity in ALL is largely because each patient had a unique genome with specific genomic aberrations, which are used to reclassify ALL into subtypes [4,5]. We approach this study from the perspective of the chromosomal translocations because these have been considered as initiating events in leukemogenesis (premalignant clone) and play a central role in the malignant transformation of many cancers, including ALL. Nevertheless, they are not sufficient to originate leukemia, which requires accumulation of complementary genetic lesions [6,7]. Chromosome translocations are the product of a recombination or juxtapositioning of separate genes, which results in the dysregulation of the genes involved, oncogene activation, and coding for hybrid proteins with altered properties [8].
Until now, the advances in the molecular characterization of ALL have resulted in the identification of chromosomal translocations and fusion genes associated with ALL prognosis and targeting for therapy. For example, BCR-ABL1 fusion or Philadelphia positive (Ph+) involves a recombination between ABL1 and BCR genes at chromosome 9 and 22, respectively. It encodes for a constitutively active tyrosine kinase with transforming capacity. Its prevalence is between 3% and 5% of pediatric ALL cases and it is associated with poor prognosis [4]. Patients with this translocation receive specific treatment with tyrosine kinase inhibitors (TKIs) and their survival rates have improved over the last years in developed countries. On the other hand, KMT2A (MLL) fusions are present in 60-80% of ALL children under 1 year of age, whereas its prevalence is low (3%) in older children. They are associated with a high incidence of early relapses and low survival rates [9]. The most frequent translocation of children with ALL is ETV6-RUNX1 (TEL-AML1), which is associated with the highest survival rates observed in this disease [10]. Nonetheless, it has been reported that its prevalence varies across populations, with the Mexican population having the lowest frequencies of detection despite the use of different methodologies (3.8-7.4%) [11]. This situation was one of the strongest reasons that led us to conduct the present research in order to investigate what other unknown clinically relevant translocation could be found in Mexican children with this type of cancer, which can help diagnosis, risk stratification, and targeted therapy.
Chromosomal rearrangement detection requires molecular methods such as fluorescence in situ hybridization (FISH), comparative genomic hybridization (CGH), and/or RT-PCR. With the utilization of next-generation sequencing (NGS) technologies, especially RNA sequencing (RNA-seq), it is possible to simultaneously sequence multiple DNA fragments at a high sequencing depth, increasing the capacity to detect and characterize genomic aberrations at a more detailed level [12]. With RNA-seq, novel chromosomal translocations have been discovered in hematology malignancies with more sensitivity and specificity [13]. In this study, we characterized novel fusion genes present in Mexican children and analyzed the relationship between the functions of the genes that comprise the novel fusion genes and clinical features.

Clinical Features of Patients
The clinical data of the 27 patients included are displayed in Table 1. Of these, 23 were patients diagnosed with ALL. We included four patients with a presumptive diagnosis of leukemia; however, three were subsequently confirmed with infectious diseases and one patient with bicytopenia. Thirteen ALL patients were male and 14 were female, with a mean age of 6.8 years (range 0.6-13 years), 22 had PreB-lineage, and 1 had a T cell immunophenotype. All patients achieved complete remission after induction therapy. Follow-up information was only obtained for ALL patients (Table 1). ALL cases were treated according to chemotherapy treatment used in participating hospitals. Three patients died without relapse (199MO, 63MO, and 28MO) and only one relapsed and died (74MO). The other 19 patients maintained complete remission for the follow-up period.

Novel Fusion Transcripts
In this study, RNA-seq assay was conducted to identify translocations in bone marrow samples. Twelve different fusion transcripts were identified in nine samples, but two of them were not possible to amplify by PCR (GLYRI-SLC9A8 and WDR74-RCC1). As expected, no fusion genes were detected in the cases without leukemia. All fusions were found to be in-frame, of which five (E2A-PBX1, ETV6-RUNX1, BCR-ABL, MLL-AF4, EP300-ZNF384 [14]) had already been reported, and four (CREBBP-SRGAP2B, DNAH14-IKZF1, ETV6-SNUPN, ETV6-NUFIP1) (Tables 1 and 2) were not previously reported. We validated the bioinformatic result with PCR amplification and Sanger sequencing using cDNA or the DNA of the original sample and specific primers (Table S1).  CREBBP-SRGAP2B fusion transcript was detected in one case (74MO). Interestingly, in this case, BCR-ABL (minor) was also identified (Table 1). This patient was treated with targeted (minor BCR-ABL) molecular therapy. Notwithstanding, this patient presented isolated central nervous system relapse (CNS) at two years after diagnosis confirmation, and died two months after as a consequence of septic shock. Sequencing analysis of the RT-PCR product confirmed that exon 1 with the promoter region of CREBBP was fused to exons 2-1 of SRGAP2B, which gave rise to the loss of coding sequence (CDS) of both genes. The predicted structure is shown in Figure 1 and coverage and the reading depth are displayed in Figure S1, and the PCR with genomic DNA suggests the existence of more of than one breakpoint site ( Figure 1, lane 2). CREBBP-SRGAP2B fusion transcript was detected in one case (74MO). Interestingly, in this case, BCR-ABL (minor) was also identified (Table 1). This patient was treated with targeted (minor BCR-ABL) molecular therapy. Notwithstanding, this patient presented isolated central nervous system relapse (CNS) at two years after diagnosis confirmation, and died two months after as a consequence of septic shock. Sequencing analysis of the RT-PCR product confirmed that exon 1 with the promoter region of CREBBP was fused to exons 2-1 of SRGAP2B, which gave rise to the loss of coding sequence (CDS) of both genes. The predicted structure is shown in Figure 1 and coverage and the reading depth are displayed in Figure S1, and the PCR with genomic DNA suggests the existence of more of than one breakpoint site ( Figure 1, lane 2).  This fusion gene was observed in a 1.8-year-old girl with standard-risk Pre-B ALL (179MO) who achieved complete remission after the induction of the remission phase. The bioinformatic analysis reported different sites of fusion between DNAH14 (exon 29, 33, 34 and 36) and IKZF1(exon 4 and 5, coverage and reading depth are displayed in Figure S2), suggesting that these fusions resulted from alternative splicing. By genomic DNA-PCR-validation of the original sample, we identified the breakpoint between exon 36 of DNAH14 and exon 4 of IKZF1 (Figure 2), resulting in a constituted fusion for the first 36 exons of the DNAH14 and exon 4 and upstream exons of IKZF1. This DNAH14-IKZF1 was in-frame and predicted to encode a chimeric protein where the final region comprising both the four-zinc finger and the dimerization of IKZF1 are truncated. These gene rearrangements were detected in a female adolescent (28MO case) diagnosed with high-risk ALL. Two months after diagnosis confirmation and one month after complete remission achievement, she presented severe neutropenia related with chemotherapy toxicity, septic shock, pneumonia, multiple organ failure, and death. The RT-PCR-Sanger indicated that the breakpoints were located on intron 1 of ETV6 and intron 5 of NUFIP1 in the case of the ETV6-NUFIP1 translocation; whereas, for ETV6-SNUPN gene rearrangement, the breakpoints were located on intron 2 of ETV6 and intron 1 of SNUPN. Both fusions involved the first exon and the promoter region of ETV6 (Figure 3, and coverage and reading depth are displayed in Figures S3 and S4).  These gene rearrangements were detected in a female adolescent (28MO case) diagnosed with high-risk ALL. Two months after diagnosis confirmation and one month after complete remission achievement, she presented severe neutropenia related with chemotherapy toxicity, septic shock, pneumonia, multiple organ failure, and death. The RT-PCR-Sanger indicated that the breakpoints were located on intron 1 of ETV6 and intron 5 of NUFIP1 in the case of the ETV6-NUFIP1 translocation; whereas, for ETV6-SNUPN gene rearrangement, the breakpoints were located on intron 2 of ETV6 and intron 1 of SNUPN. Both fusions involved the first exon and the promoter region of ETV6 (Figure 3, and coverage and reading depth are displayed in Figures S3 and S4 The fusion EP300-ZNF384 had been previously reported in other populations and associated with the prognosis of children with PreK ALL subtype. Notably, this had not been reported in Mexican children. This fusion was identified in the 197MO case: a girl of 9.4 years diagnosed with Pre-B ALL who achieved complete remission after induction therapy ( Table 1). The structure and sequences of the EP300-ZNF384 fusion are depicted in Figure 4. Exon 6 of EP300 was fused to exon 3 of ZNF384 in-frame ( Figure 4B, and coverage and reading depth are displayed in Figure S5), and the same structure was previously reported. This fusion included the initiation codon of the ZNF384 gene and the fusion transcripts were predicted to encode a fusion protein of 110-kDa with 1027 amino acids, with this structure containing the transcriptional adapter zinc-finger 1 (TAZ1) domain in the cysteine-histidine-rich region 1 (CH1) of EP300 and the entire ZNF384 protein.
achievement, she presented severe neutropenia related with chemotherapy toxicity, septic shock, pneumonia, multiple organ failure, and death. The RT-PCR-Sanger indicated that the breakpoints were located on intron 1 of ETV6 and intron 5 of NUFIP1 in the case of the ETV6-NUFIP1 translocation; whereas, for ETV6-SNUPN gene rearrangement, the breakpoints were located on intron 2 of ETV6 and intron 1 of SNUPN. Both fusions involved the first exon and the promoter region of ETV6 (Figure 3, and coverage and reading depth are displayed in Figures S3 and S4).  The fusion EP300-ZNF384 had been previously reported in other populations and associated with the prognosis of children with PreK ALL subtype. Notably, this had not been reported in Mexican children. This fusion was identified in the 197MO case: a girl of 9.4 years diagnosed with Pre-B ALL who achieved complete remission after induction therapy ( Table 1). The structure and sequences of the EP300-ZNF384 fusion are depicted in Figure 4. Exon 6 of EP300 was fused to exon 3 of ZNF384 in-frame ( Figure 4B, and coverage and reading depth are displayed in Figure S5), and the same structure was previously reported. This fusion included the initiation codon of the ZNF384 gene and the fusion transcripts were predicted to encode a fusion protein of 110-kDa with 1027 amino acids, with this structure containing the transcriptional adapter zinc-finger 1 (TAZ1) domain in the cysteine-histidine-rich region 1 (CH1) of EP300 and the entire ZNF384 protein.

Discussion
In this study, we identified new fusion genes that can provide prognostic information because they involve important gene transcription factors with an essential role in lymphoid differentiation: these are CREBBP, IKZF1, and ETV6. Additionally, it has been reported that the mutations and deletions of these genes were related to prognosis in ALL children (Table 2).
For example, CREB-binding protein (CBP) is a transcriptional coactivator with intrinsic histone acetyltransferase (HAT) activity that interacts with numerous transcriptions factors. It has an important function in cell growth, division, and differentiation [20,21]. CREBBP is fused with

Discussion
In this study, we identified new fusion genes that can provide prognostic information because they involve important gene transcription factors with an essential role in lymphoid differentiation: these are CREBBP, IKZF1, and ETV6. Additionally, it has been reported that the mutations and deletions of these genes were related to prognosis in ALL children (Table 2).
For example, CREB-binding protein (CBP) is a transcriptional coactivator with intrinsic histone acetyltransferase (HAT) activity that interacts with numerous transcriptions factors. It has an important function in cell growth, division, and differentiation [20,21]. CREBBP is fused with SRGAP2B; its gene encodes for a protein belonging to the GTPase Rho SLIT-ROBO family which has been noted as participating in the transcriptional regulation pathways of brain development [22]. CREBBP-SRGAP2B fusion only retained the first exon of CREBBP gene and lost important structural regions such as the C-terminal (Figure 1). The C-terminal region is necessary to interact with the basal transcription factor TFIIB, considered to be a key component of the transcriptional machinery [23]. The presence of this translocation could be associated with the relapse of Mexican ALL children, given the fact that a child with this gene rearrangement had a relapse in the CNS. In other studies, it has been reported that deletions or mutations in CREBBP are associated with relapse and glucocorticoid resistance [15]. For this reason, it would be very important to evaluate the impact of this novel translocation in the prognosis of children with ALL and its association with drug resistance and relapse in future studies [24]. It is worth noting that in this patient, BCR-ABL1 (minor) was also detected. In this regard, several molecular studies have provided evidence that the most frequent mode of acquired resistance is the acquisition of point mutations. We hypothesize that the coexistence of different fusion genes could induce mechanisms of drug resistance, as has been reported for fusions in tyrosine and BRAF kinases [25].
Furthermore, in the present research, the coexistence of different fusion genes was observed in two ALL cases (8.6%). This frequency is high with respect to that reported in other populations (0.24%) [26]. It is important to take into consideration that conventional molecular methods have important limitations for the detection of multiple genetic alterations occurring in the same patient, and the NGS technology allows the identification of the coexistence of different fusions with a high sensitivity and specificity [12,13]. Using this may improve the current risk stratification and chemotherapy assignment of ALL children. On the other hand, because the genomic heterogeneity in ALL is large, different mutations, deletions, and fusion genes can be co-occurring in the same patient. In this regard, we did not analyze if the fusion genes found in the present study were co-occurring with other known mutations because it was not the main objective. This is a limitation of our study, and it will be relevant to deeply explore this association in future investigations.
To the best of our knowledge, there is no information regarding the impact of the DNAH14-IKZF1 translocation on the prognosis of leukemia patients. Therefore, it would be relevant to know what the frequency of this translocation is in a greater number of patients and if a relationship with prognosis exists. The only patient who was positive for DNAH14-IKZF1 was classified as at standard risk; currently, she is alive and in complete remission after 2 years of treatment initiation (179MO , Table 1). However, a longer follow-up is required to discard an association with late relapses because it has been documented that deletions in IKZF1 gene have been associated with high rates of relapse in pediatric patients with ALL [16]. In addition, there is evidence to support a plausible role of this alteration in the prognosis of a child with ALL. As has been noted, IKZF1 is a transcription factor that belongs to the family of zinc-finger DNA-binding protein and is critical for normal hematopoiesis and lymphoid system development. Different isoforms have been described and most share a common C-terminal domain (exon8), which contains two zinc fingers necessary for dimerization and interaction with other proteins [27,28]. On the other hand, the DNAH14-IKZF1 fusion generates a protein that interrupts the formation of the four-zinc finger encoded in exons 4 and 6 as well as the dimerization sites. The DNAH14-IKZF1 fusion is similar to the IK6 isoform of IKZF1 [17]. It has been mentioned that the IK6 isoforms of IKZF1 and DNA-binding domain point mutations in mouse models of BCR-ABL1-positive leukemia resulted in dasatinib resistance, cellular mislocalization, and the induction of stem cells [18]. Considering this information, the DNAH14-IKZF1 fusion could provoke a functionally inactive allele of IKZF1 ("loss-of-function") and it is possible that this gene rearrangement could have prognostic and therapeutic implications; however, further study is required.
Two different translocations related with the ETV6 gene were detected in the same patient (ETV6-SNUPN and ETV6-NUFIP1) (Table 1, Figure 3). SNUPN (Snurportin 1) is an snRNP-specific nuclear import receptor and essential for proliferation and chromosome region maintenance [29], while the NUFIP1 gene encodes for a nuclear RNA-binding protein that contains a C2H2 zinc finger motif and a nuclear localization signal. As it has been previously described, the ETV6 gene encodes an ETS (E-twenty-six) family transcription factor which has an important role in hematopoiesis, as well as in malignant transformation when ETV6 translocations, insertions, or inversions are present [19]. It has been pointed out that ETV6 is a highly promiscuous gene since it has been reported as having 30 ETV6 partner genes in cancer [19]. The most commonly reported partner in ALL children is ETV6-RUNX1, which is associated with a good prognosis of the disease. However, the patient who was positive for the ETV6-SNUPN and ETV6-NUFIP1 translocations died as result of chemotherapy-related toxicity two months after diagnosis. The role that ETV6-SNUPN and ETV6-NUFIP1 play in the prognosis of Mexican children with ALL ought to be determined in future investigations.
ZNF384 is a C2H2-type zinc finger protein which functions as a transcription factor, and fusion genes with this gene have been reported in~3% of children with B-cell precursor acute lymphoblastic leukemia. Specifically, EP300-ZNF384 has been associated with a relatively advanced age at diagnosis, no significant elevation of the white blood cell (WBC) count at presentation, and a favorable response to conventional chemotherapy in comparison with patients with MLL translocations [14,30]. These characteristics were observed in the 197MO case who was positive for the EP300-ZNF384 fusion, 9.4 years at diagnosis and 2430 WBC 10 6 cell/L. Additionally, the patient is alive and in complete remission after 4 years of treatment initiation. The incorporation of EP300-ZNF384 detection in the routine diagnostic panel of translocations of Mexican children with ALL could help in achieving better risk stratification and treatment.

Patients and Samples
This study was conducted in accordance with the principles embodied in the Declaration of Helsinki and was approved on 24 September 2013 by El Comité de Ética en Investigación del Instituto Mexicano del Seguro Social (IMSS) with project identification code R-2013-785-068. We examined 27 bone marrow (BM) samples obtained from pediatric patients diagnosed and treated in public hospitals of Mexico City during 2014-2016; the samples analyzed included 23 diagnosed acute lymphocytic leukemia (ALL) and 4 no-leukemia patients. The diagnosis of ALL was based on the histochemical tests and cytometric evaluation (monoclonal antibodies directed against lineage-associated antigens) of the bone marrow. Informed consent was obtained from each child s parents. Clinical details and follow-up information were obtained from the medical records. A database was constructed to register the age, sex, residence, year of diagnosis, and clinical manifestations of the patients. The follow-up information during treatment was obtained in ALL patients.

RNA-Seq Libraries and Sequencing
RNA was extracted from mononuclear cell suspensions from diagnostic BM aspirates using the direct-zol RNA, and on-column DNase digestion was performed to remove DNA (Zymo Research, Irvine, CA, USA); the RNA integrity was assessed by using the 4200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). The libraries were constructed using TruSeq Stranded Total RNA with Ribo-Zero Gold (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Briefly, rRNA was removed from RNA samples and fragmented, and reverse transcriptase was used to synthesize cDNA and sequencing adapters were ligated. Libraries were evaluated using the 4200 TapeStation and sequenced for 2 × 75 cycles (paired-end sequencing) on the Illumina sequencing platform (NextSeq500, Illumina, San Diego, CA, USA). Raw reads were preprocessed using the standard Illumina pipeline which consists of dataset quality control using FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/; accessed on January 2018) for trimming and adapter removal Trimmomatic-0.36 (ww.usadellab.org/cms/index.php? page=trimmomatic; accessed on January 2018). TopHat (https://ccb.jhu.edu/software/tophat/index.shtml; accessed on January 2018) was used to align reads to an hg19 human genome reference and TopHat-fusion for gene fusion analysis.

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) for Fusion Genes
Total RNA previously isolated from the bone marrow of patients was used for the synthesis of the cDNA using Superscript III Transcriptase (Invitrogen, Waltham, MA, USA), in accordance with the manufacturer's instructions. Mononuclear cell suspensions previously obtained were subject to digestion with Proteinase K and subsequent phenol-chloroform was used for the extraction of genomic DNA. To detect ALL fusion, we performed PCR or RT-PCR using specific primers designed with Primer3Plus software (v. 0.4.0) [31], (Duke-NUS Medical School, Singapore) listed in Table S1. We performed PCR with Taq DNA Polymerase (Invitrogene). The PCR products were electrophoresed in 2% agarose gel and cloned into vector pJET1.2 with CloneJET PCR Cloning Kit (Thermo Fisher Scientific, Waltham, MA, USA). The plasmids were confirmed by Sanger using an ABI3500xL genetic analyzer (Applied Biosystems, Waltham, MA, USA). Nucleotide sequences of each fusion gene have been deposited in GenBank under the accession numbers: MK172836, MK172837, MK172838, MK172839, MK172840.

Conclusions
The findings of the present study show that the identified novel fusion genes could be associated with the pathogenesis of childhood leukemia. Additionally, the relatively low frequency (39.13%) of translocations found in this subgroup of Mexican children with ALL confirms the high genetic variation of this disease. It would be important to identify the leading oncogenic lesions related to the evolution of the disease. In this work, we advanced the characterization of important genetic variations potentially related to the prognosis of ALL in Mexican children. In addition, the new fusion genes could contribute to better risk stratification and treatment.