Occurrence of L1M Elements in Chromosomal Rearrangements Associated to Chronic Myeloid Leukemia (CML): Insights from Patient-Specific Breakpoints Characterization

Chronic myeloid leukemia (CML) is a rare myeloproliferative disorder caused by the reciprocal translocation t(9;22)(q34;q11) in hematopoietic stem cells (HSCs). This chromosomal translocation results in the formation of an extra-short chromosome 22, called a Philadelphia chromosome (Ph), containing the BCR-ABL1 fusion gene responsible for the expression of a constitutively active tyrosine kinase that causes uncontrolled growth and replication of leukemic cells. Mechanisms behind the formation of this chromosomal rearrangement are not well known, even if, as observed in tumors, repetitive DNA may be involved as core elements in chromosomal rearrangements. We have participated in the explorative investigations of the PhilosoPhi34 study to evaluate residual Ph+ cells in patients with negative FISH analysis on CD34+/lin- cells with gDNA qPCR. Using targeted next-generation deep sequencing strategies, we analyzed the genomic region around the t(9;22) translocations of 82 CML patients and one CML cell line and assessed the relevance of interspersed repeat elements at breakpoints (BP). We found a statistically higher presence of LINE elements, in particular belonging to the subfamily L1M, in BP cluster regions of both chromosome 22 and 9 compared to the whole human genome. These data suggest that L1M elements could be potential drivers of t(9;22) translocation leading to the generation of the BCR-ABL1 chimeric gene and the expression of the active BCR-ABL1-controlled tyrosine kinase chimeric protein responsible for CML.


Introduction
Genomic instability is a hallmark of most types of cancer and leads to structural variants (SVs) including translocations, inversions, deletions, and duplications [1]. Genomic changes, ranging from simple translocations to complex karyotypes that arise from tangled phenomena such as chromothripsis, reshape cancer genomes and create de novo fusion genes with oncogenic properties [2]. Genomic rearrangements originate from DNA double strand breaks (DSBs) and misrepair through non-homologous end joining (NHEJ), which requires little or no homology, or homologous recombination (HR) [3].
Repeated DNA sequences are implicated as promoting elements of some of these recurrent chromosomal rearrangements [4]. The high density of repetitive DNA in some BP regions suggests that these sequences may provide "hot spots" for recombination and mediate translocation processes, increasing the likelihood of chromosomal rearrangement phenomena [4]. Four classes of repeated and interspersed sequences have been identified in the mammalian genome: DNA transposons, and LINE (long interspersed nuclear element), SINE (short interspersed nuclear element), and LTR (long terminal repeat) retrotransposons. The last three classes are the main represented categories of retrotransposons in mammals and constitute about 20%, 13%, and 8% of the human genome, respectively [5]. These elements, previously described as "junk DNA", are now considered key components for the evolution and plasticity of the genome [5]. Their involvement in genetic dysfunctions is now evident, as they can trigger chromosomal rearrangements and cause mutations, resulting in the development of different pathologies including cancer [5]. The LINE's most common family is L1, while Alu sequences are the most represented SINE elements. Both these elements are classified into subfamilies based on their sequence variants and evolution: L1 can be divided into L1M (mammalian-specific, oldest), L1P (primate-specific, intermediate), and L1H (human-specific, youngest) subfamilies; Alu elements are classified as AluJ (oldest), AluS (intermediate), and AluY (youngest) [6]. The recombination between Alu elements has been reported to be responsible for the partial duplication of the MLL gene in acute myeloid leukemia as well as in the generation of reciprocal translocations in tumors [7].
With the development of next generation sequencing (NGS) technologies (whole genome sequencing, exome, and targeted panels), a deeper understanding of genetic mechanisms involved in aberrant genetic alterations has become possible [8]. Limited efforts have been made to clarify molecular phenomena underlying the formation of such nonrandom genomic rearrangements, and little is known regarding the mechanisms responsible for a variety of translocations [9]. The precise knowledge of sequences surrounding BPs is fundamental to exploring genetic causes in patients with balanced translocations or inversion.
Chronic myeloid leukemia is a clonal myeloproliferative disorder characterized by the reciprocal translocation t(9;22)(q34;q11), involving the proto-oncogene ABL1 (Abelson leukemia kinase proto-oncogene) on chromosome 9 and the BCR (breakpoint cluster region) on chromosome 22. The translocation generates the characteristic Ph chromosome and BCR-ABL1 fusion gene [10], a constitutively active tyrosine kinase that promotes proliferation and survival of leukemic cells through the activation of downstream pathways such as RAS, RAF, JUN kinase, MYC, and STAT [11].
Three well-defined BCR BP regions in chromosome 22 have been characterized: the most frequent, named the Major BP cluster region (M-bcr), occurs between exons 12 and 16. CML patients show mostly M-bcr BPs resulting in a b2a2 or b3a2 transcript, containing BCR exon 13 (b2) or 14 (b3) and ABL1 exon 2 (a2) [12]. In a small subset of patients, the region between exons 19 and 20 (micro-bcr, µ-bcr) or the region distal to exon 1 (minor-bcr, m-bcr) in the BCR are involved. The BP region in ABL1 includes a more variable and extended region of about 200 kb, from 10 kb upstream of the 5 of the gene to exon 2 [12]. These BP regions are associated with the production of p210, p230, and p190 BCR-ABL fusion protein variants, respectively.
Previously, in a cohort of 27 CML patients, we identified the presence of Alu at the BCR-ABL1 BP junctions, underlying the evidence that repeated sequences may facilitate the pairing process and the resulting chromosomal translocation [9].
As participants in the exploratory investigations of the PhilosoPhi34 study that investigates the efficacy of nilotinib 300 mg BID in depleting bone marrow (BM) leukemic stem cells (CD34+/linPh+) in newly diagnosed chronic-phase (CP) CML patients at specific time points of treatment [13], we have analyzed a cohort of 82 CML patients with the goal to identify the patient-specific rearrangement of the BCR/ABL1 in order to evaluate the minimal residual disease (MRD) in the Ph + CML patients after 6 or 12 months of nilotinib treatment. The aim of this study was to evaluate, through a gDNA-qPCR assay designed on patient-specific BP sequences, the number of Ph + residual cells [9,[14][15][16][17].
We identified patients' BPs with the target enrichment approach and next generation sequencing (NGS) [14][15][16][17]. Moreover, we have included the KCL22 CML cell line as a positive control in our analyses. The analysis showed that BCR-ABL1 junctions were clustered in 9 kbs on chromosome 22, mostly including the M-bcr BP cluster region, while on chromosome 9 they were clustered in a wider region of 154 kbs. Thus, we investigated the causes behind the generation of the Ph chromosome by analyzing sequences flanking each single breakpoint.
The analysis of repetitive elements in these regions showed a high presence of LINE elements belonging to the subfamily L1M. The comparison of L1M elements' frequency at the BP regions compared with their distribution throughout the entire human genome highlighted that L1M elements were much more recurrent at BP regions than in the whole genome. Therefore, we hypothesized, for the first time, the involvement of L1M in the translocation t(9;22)(q34;q11) and in the consequent creation of the Ph chromosome and BCR-ABL1 fusion gene responsible for CML.

Materials and Methods
For any specific details of the analysis we performed, look at Appendix A.

Patient Samples
The PhilosoPhi34 study, which included 15 centers in Italy, collected bone marrow (BM) samples from 87 consecutive patients with CML on behalf of the Rete Ematologica Lombarda (REL). The PhilosoPhi34 study enrolled newly diagnosed Ph + CML patients in the chronic phase (CP-CML), aged ≥ 18 years, either male or female. BM samples were obtained and analyzed in accordance with the declaration of Helsinki, after written consent. We analyzed 82 CML patients of the PhilosoPhi34 study.

Selection of BM CD34+/lin-Cells
Mononuclear cells (MNCs) from bone marrow (BM) blood samples of the CML patients were isolated and BM CD34+/lin-cells were selected using a Diamond CD34 Isolation kit and an autoMACS Pro separator (Miltenyi Biotec, Bologna, Italy) according to the manufacturer's instructions (Miltenyi Biotec). Method details were described in "http:// dx.doi.org/10.17504/protocols.io.yncfvaw (assessed on 19 July 2019)" and in our previous study [18].

CML Cell Line
An aliquot of the leukemic cell line KCL22 was kindly provided by Papa Giovanni XXIII Hospital (Bergamo, Italy).

NGS and DELLY Analysis
Target enrichment and NGS characterization of 82 CML patients and one cell line showed peculiar BP DNA coordinates. DELLY analysis and visual inspection with the Integrative Genomics Viewer (IGV) identified a total number of 51556 SVs, among which 3248 were translocation events, in the 83 samples. We identified both Ph and reciproca der9 chromosomes in 53 samples (106 translocations), whereas Ph alone was identified in 26 samples and der9 in four (Figure 1), resulting in 136 total translocation rearrangements and 272 BPs in total. DELLY reported 102 of 136 BP junction consensus sequences: 86 were characterized by micro-homologous regions (ranging from one to twenty nucleotides) and two BPs showed non-template insertion sequences of one base at the joined ends of the rearrange ments (88%). The remaining 14 junction sequences were blunt (12%) (Tables 1 and S1). DELLY reported 102 of 136 BP junction consensus sequences: 86 were characterized by micro-homologous regions (ranging from one to twenty nucleotides) and two BPs showed non-template insertion sequences of one base at the joined ends of the rearrangements (88%). The remaining 14 junction sequences were blunt (12%) (Tables 1 and S1).
Based on the BP coordinates, all samples showed a loss of genomic material at the BP regions either for chr9 or chr22, except for one case.  Additionally, 36% of the total BPs (98/272) mapped into a repetitive element (Figure 1). The most represented classes were SINE (43.9%) and LINE (35.7%), followed by DNA transposons, LTR, and simple repeats (20.4% in total) (Figure 1).
Due to the BPs clustering in the M-bcr and ABL1 regions, we evaluated the genomic content of these intervals of~9 kbs and~154 kbs, respectively. Both regions were characterized by the predominant presence of LINEs and SINEs. In detail, based on the UCSC's RepeatMasker annotation, the ABL1 region contains 101 LINE elements and 75 of them are L1M* type ( Figure 2). Likewise, in the small M-bcr region, five LINEs are annotated and three of them belong to two different L1M* subfamilies (Figure 2).
The screening of the random intervals (RIs, see Materials and Methods) confirmed an enrichment of LINEs as well as SINEs in the regions of BP clustering in our samples. We paid more attention to the LINE elements due to the highest level of enrichment (Table 1). In detail, considering a FC value greater than two, the ABL1 region was characterized by the enrichment of 13 different LINEs: HAL1ME, L1MCc, L1MEc, L1ME1, L1MEf, L1Med, L1MC5, L1P4, L1ME4a, L1MB4, HAL1, L1M5, and L1MC4. With the exception of HAL1ME, L1P4, and HAL1, all elements are included in the L1M subfamily. Similarly, in M-bcr, we revealed an enrichment of three LINE elements; two of them are members of the L1M subfamily (L1MC1 and L1ME), and one of the L2b family. L1M copy number values in the samples' M-bcr and ABL1 intervals resulted significantly higher than the corresponding L1M element mean values in RIs ( Figure S1).  The screening of the random intervals (RIs, see Materials and Methods) confirm an enrichment of LINEs as well as SINEs in the regions of BP clustering in our sampl We paid more attention to the LINE elements due to the highest level of enrichment (Ta 1). In detail, considering a FC value greater than two, the ABL1 region was characteriz by the enrichment of 13 different LINEs: HAL1ME, L1MCc, L1MEc, L1ME1, L1M L1Med, L1MC5, L1P4, L1ME4a, L1MB4, HAL1, L1M5, and L1MC4. With the exception HAL1ME, L1P4, and HAL1, all elements are included in the L1M subfamily. Similarly, M-bcr, we revealed an enrichment of three LINE elements; two of them are members the L1M subfamily (L1MC1 and L1ME), and one of the L2b family. L1M copy numb values in the samples' M-bcr and ABL1 intervals resulted significantly higher than t corresponding L1M element mean values in RIs ( Figure S1).
Pairwise alignment identified 12 DNA blocks ranging from 119 to 250 bps in leng and an average sequence identity of 81.5%. In M-bcr, 11 out of 12 sequences mapp Pairwise alignment identified 12 DNA blocks ranging from 119 to 250 bps in length and an average sequence identity of 81.5%. In M-bcr, 11 out of 12 sequences mapped closely to a L1ME1 element, while the remaining one was in proximity of a L1MC1 repeat, and along the ABL1-enriched region the blocks showed a scattered distribution; like L1M repeat elements, the identified sequence similarity was derived from Alu sequences (Figure 3 and Supplementary Files).
The finding of an enrichment of the same type of repetitive elements (L1M) in both BP cluster regions potentially involved in the rearrangements is an interesting result described in detail for the first time. repeat elements, the identified sequence similarity was derived from Alu sequences (Figure 3 and supplementary notes). The finding of an enrichment of the same type of repetitive elements (L1M) in both BP cluster regions potentially involved in the rearrangements is an interesting result described in detail for the first time.

Discussion
Transposons represent around 54% of the mammalian genome, of which almost 21% is represented by long interspersed nuclear elements (631,64 Mb) [19]. Most of these elements are silent and inactive, but 10-15% of them play an active role in the regulation of gene expression. In this study, we explored the role of transposable elements in the rearrangements of BCR and ABL1, investigating if a topological feature or a specific sequence close to these genes could promote their fusion. We characterized patient-specific BP sequences at chr9 and chr22 and we found that almost all 82 CML samples had the M-bcr (mostly in introns 13, 14, and 15). The analysis of regions spanning the BPs of ~9 kbs and ~154 kbs for chr22 and chr9, respectively, showed a predominant presence of SINEs and LINEs. Indications that Alu elements may have a role in the generation of BCR-ABL have already been reported [20]. With a fine characterization of the breakpoints mapping, we narrowed the BPs regions involved in the t(9;22) in CML, and we assessed a high presence of LINE1 elements belonging to the L1M subfamily: 10 different L1M elements in the ABL1 cluster region (L1MCc, L1MEc, L1ME1, L1MEf, L1Med, L1MC5, L1ME4a, L1MB4, L1M5, and L1MC4) and two L1M elements in the M-bcr cluster region (L1MC1 and L1ME). A previous study on a big cohort of samples was not able to identify consensus sequences

Discussion
Transposons represent around 54% of the mammalian genome, of which almost 21% is represented by long interspersed nuclear elements (631,64 Mb) [19]. Most of these elements are silent and inactive, but 10-15% of them play an active role in the regulation of gene expression. In this study, we explored the role of transposable elements in the rearrangements of BCR and ABL1, investigating if a topological feature or a specific sequence close to these genes could promote their fusion. We characterized patient-specific BP sequences at chr9 and chr22 and we found that almost all 82 CML samples had the M-bcr (mostly in introns 13, 14, and 15). The analysis of regions spanning the BPs of~9 kbs and~154 kbs for chr22 and chr9, respectively, showed a predominant presence of SINEs and LINEs. Indications that Alu elements may have a role in the generation of BCR-ABL have already been reported [20]. With a fine characterization of the breakpoints mapping, we narrowed the BPs regions involved in the t (9;22) in CML, and we assessed a high presence of LINE1 elements belonging to the L1M subfamily: 10 different L1M elements in the ABL1 cluster region (L1MCc, L1MEc, L1ME1, L1MEf, L1Med, L1MC5, L1ME4a, L1MB4, L1M5, and L1MC4) and two L1M elements in the M-bcr cluster region (L1MC1 and L1ME). A previous study on a big cohort of samples was not able to identify consensus sequences around breakpoints but they analyzed only a small region across breakpoints [21], while in our study we analyzed a bigger region spanning all the breakpoints of our cohort.
According to the literature, a LINE1 insertion can affect the genome, epigenome, and the whole transcriptome of cells. Studies revealed that LINE1s increase gene mobilization leading to the formation of chimeric genes generating new transcripts that could rise to chimeric proteins. An in silico study found 988 genes with LINE1 insertions that could generate chimeric transcripts, of which twenty have been associated with cancer [22].
The involvement of LINE1s in both gene deletion and chromosomal translocations was demonstrated by Rodriguez-Martinez and colleagues in an esophageal adenocarcinoma and head-and-neck and colorectal cancers. Authors reported that the aberrant L1 integrations can delete large regions of chromosomes, leading to the removal of tumor-suppressor genes and inducing complex translocations and large-scale duplications [23].
Repetitive elements are known to be heavily methylated in normal somatic tissues, but their methylation status is to a lesser extent in malignant tissues, driving the global genomic hypomethylation [24]. The cancer genome is frequently characterized by promoter hypomethylation of specific genes with an overall decrease at the level of five-methylcytosine. This hypomethylation affecting repeat sequences and transposable elements results in chromosomal instability and mutation events [25]. Previous studies demonstrated that hypomethylation of LINE1 promoters and other transposable elements appear at early stages of CML development. Roman-Gomez et al. hypothesized that destabilization of repetitive sequences (i.e., L1 hypomethylation) could be one of such mechanisms employed by BCR-ABL to generate genomic instability in the malignant cell, suggesting that repetitive DNA hypomethylation is closely associated with CML progression [26]. In addition, hypomethylation is also associated with open chromatin compartments, which turn out to be easily accessible. Engreitz et al. reported that translocations whose partners lie in the open chromatin regions are more significantly proximal than translocations with one or both partners in the closed compartments. By means of Hi-C experiments using a karyotypically normal lymphoblastoid cell line (GM06990), they demonstrated a significant contact frequency between the BCR and ABL loci [27]. Considering all these data, we could speculate that the higher density of hypomethylated L1M in these regions might be responsible for their nuclear proximity and open chromatin structure, thus allowing more frequent chromosomal rearrangements.

Conclusions
Our data highlighted the high distribution of L1M in BCR and ABL1 gene regions, suggesting that L1M elements could be potential drivers of the t(9;22) translocation leading to the generation of the BCR-ABL1 gene and the consequent expression of the BCR-ABL1 active tyrosine kinase chimeric protein. Although it is not possible to confirm that L1M elements work as main actors in mediating the t(9;22) rearrangement, we can argue for a synergic role of theirs with the Alu elements, which can trigger the translocation rearrangement in association with the tendency of BCR and ABL loci to be spatially proximal [28].
Supplementary Materials: The following supporting information can be downloaded at https:// www.mdpi.com/article/10.3390/genes14071351/s1: Figure S1: L1M occurrence in random intervals; Table S1: List of breakpoints and associated features.  We analyzed the genomic composition and structure of the regions enriched with M-bcr (chr22:23287159-23296156, hg38-based) and ABL1 (chr9:130701881-130855096, hg38-based). In detail, we screened such regions for the presence of interspersed repeats and low complexity DNA sequences by intersecting them with the RepeatMasker track from the UCSC Genome Browser (http://genome-euro.ucsc.edu/cgi-bin/hgTables). In order to evaluate a potential enrichment of the repetitive elements enclosed in the region of the BPs clustering, we generated random intervals (RIs) of the same size as the M-bcr (~9 kbs) and ABL1 (~154 kbs) regions, using bedtools' random (v2.29.0) [34]. Specifically, we intersected a total number of 170,000 and 10,000 RIs of 9 kbs and 154 kbs, respectively, covering almost half of the human genome, with the RepeatMasker track coordinates. We considered as enriched the repetitive elements in the intervals of interest with a fold change (FC) more than 2. FC was calculated as the ratio of copy number value of the element in the region of interest and the mean value of the same element in the RIs. Assuming the occurrence distribution of the repetitive elements in RIs is a Poisson distribution, we calculated the repetitive elements' mean occurrence 'l' per intervals in the RIs and then the probability of having 'x' occurrences (corresponding to the repeat element value of interest) in M-bcr and ABL1 regions using the 'ppois' R function. We considered as significant the elements in M-bcr and ABL1 enriched regions with a p-value < 0.05.