Translocation Breakpoints Preferentially Occur in Euchromatin and Acrocentric Chromosomes

Chromosomal translocations drive the development of many hematological and some solid cancers. Several factors have been identified to explain the non-random occurrence of translocation breakpoints in the genome. These include chromatin density, gene density and CCCTC-binding factor (CTCF)/cohesin binding site density. However, such factors are at least partially interdependent. Using 13,844 and 1563 karyotypes from human blood and solid cancers, respectively, our multiple regression analysis only identified chromatin density as the primary statistically significant predictor. Specifically, translocation breakpoints preferentially occur in open chromatin. Also, blood and solid tumors show markedly distinct translocation signatures. Strikingly, translocation breakpoints occur significantly more frequently in acrocentric chromosomes than in non-acrocentric chromosomes. Thus, translocations are probably often generated around nucleoli in the inner nucleoplasm, away from the nuclear envelope. Importantly, our findings remain true both in multivariate analyses and after removal of highly recurrent translocations. Finally, we applied pairwise probabilistic co-occurrence modeling. In addition to well-known highly prevalent translocations, such as those resulting in BCR-ABL1 (BCR-ABL) and RUNX1-RUNX1T1 (AML1-ETO) fusion genes, we identified significantly underrepresented translocations with putative fusion genes, which are probably subject to strong negative selection during tumor evolution. Taken together, our findings provide novel insights into the generation and selection of translocations during cancer development.


Introduction
Chromosome instability (CIN) is a hallmark of cancer [1,2]. It refers to an increased gain of chromosomal abnormalities. CIN typically predicts poor cancer patient survival and increased drug-resistance [2][3][4]. Numerical CIN (n-CIN) comprises the gain or loss of whole chromosomes and leads to aneuploidy. N-CIN is common in solid tumors [5] and is often caused by aberrant expression of cell cycle, centrosome or centromere proteins, which in turn leads to centrosome amplification or mitotic aberrations followed by chromosome missegregation [6][7][8][9][10]. Structural CIN (s-CIN) refers to the gain, loss or rearrangement of fractions of chromosomes. Copy number changes can be focal or involve large chromosomal segments, such as entire chromosome arms. Translocations are a form of chromosomal rearrangements, which are particularly common in leukemias, lymphomas and some

Translocation Breakpoints Preferentially Occur in Longer Cytogenetic Chromosome Bands
Using quality-filtered karyotypes from 13,844 human blood cancers and 1563 human solid tumors [35], we considered all chromosomal breakpoints that resulted in a translocation and analyzed the rate at which translocation breakpoints occur in each cytogenetic chromosome band. We first tested whether there is a correlation between translocation frequency in a cytogenetic band and that band's physical length. Not surprisingly, this revealed that breakpoints preferentially occur in longer cytogenetic bands in both blood and solid tumors (r = 0.259, p < 0.0001 and r = 0.229, p < 0.0001, respectively, Spearman's correlation tests; Figure 1a).

Translocation Breakpoints Preferentially Occur in Longer Cytogenetic Chromosome Bands
Using quality-filtered karyotypes from 13,844 human blood cancers and 1563 human solid tumors [35], we considered all chromosomal breakpoints that resulted in a translocation and analyzed the rate at which translocation breakpoints occur in each cytogenetic chromosome band. We first tested whether there is a correlation between translocation frequency in a cytogenetic band and that band's physical length. Not surprisingly, this revealed that breakpoints preferentially occur in longer cytogenetic bands in both blood and solid tumors (r = 0.259, p < 0.0001 and r = 0.229, p < 0.0001, respectively, Spearman's correlation tests; Figure 1a). translocation frequencies within these bands in blood and solid tumors. Shown are analyses including all data, as well as analyses with data from which statistically identified outliers were removed (see main text). The latter was done to rule out the possibility that statistical significance was reached solely due to one or several highly frequent events which could skew the analyses. p and r values: Spearman correlations. D.p., data points. (b) Scatter plots as in (a) but at chromosome arm level.
As some specific translocations may occur at extremely high frequencies, they could introduce a bias in the analyses and heavily skew the outcome. To address this possibility, we used a method that combines robust regression and outlier removal, ROUT [36], to identify outliers. However, after removal of the identified outliers (at a false-discovery rate (FDR) of q = 0.01), the correlations remained highly statistically significant (p < 0.01 and p < 0.001, respectively; Figure 1a).
Similarly, we observed strong correlations between chromosome arm length and the translocation frequencies in both blood and solid cancers, irrespective of whether identified outliers were removed (all p-values < 0.01) (Figure 1b). Together, these data indicate that there is a strong positive association between the translocation breakpoint frequency and the length of the cytogenetic band or chromosome arm in which the breakpoint occurs.

Translocation Breakpoints Preferentially Occur in Open Chromatin
Next, we assessed whether translocations are more likely to occur in open chromatin (euchromatin) or densely packed, "closed" chromatin (heterochromatin). To investigate that, we translocation frequencies within these bands in blood and solid tumors. Shown are analyses including all data, as well as analyses with data from which statistically identified outliers were removed (see main text). The latter was done to rule out the possibility that statistical significance was reached solely due to one or several highly frequent events which could skew the analyses. p and r values: Spearman correlations. D.p., data points. (b) Scatter plots as in (a) but at chromosome arm level.
As some specific translocations may occur at extremely high frequencies, they could introduce a bias in the analyses and heavily skew the outcome. To address this possibility, we used a method that combines robust regression and outlier removal, ROUT [36], to identify outliers. However, after removal of the identified outliers (at a false-discovery rate (FDR) of q = 0.01), the correlations remained highly statistically significant (p < 0.01 and p < 0.001, respectively; Figure 1a).
Similarly, we observed strong correlations between chromosome arm length and the translocation frequencies in both blood and solid cancers, irrespective of whether identified outliers were removed (all p-values < 0.01) (Figure 1b). Together, these data indicate that there is a strong positive association between the translocation breakpoint frequency and the length of the cytogenetic band or chromosome arm in which the breakpoint occurs.

Translocation Breakpoints Preferentially Occur in Open Chromatin
Next, we assessed whether translocations are more likely to occur in open chromatin (euchromatin) or densely packed, "closed" chromatin (heterochromatin). To investigate that, we assigned an average chromatin density (ACD) score to each cytogenetic band. On a scale of 0 to 100, the ACD score indicates how loosely or tightly the chromatin is packed. Cytogenetic bands that were entirely open-euchromatic-received a score of 0, while bands whose chromatin was completely closed-heterochromatic-received a score of 100. For each cytogenetic band, the ACD score was calculated (see Methods). We independently plotted the translocation frequency of chromosome bands against their ACD scores. This showed that lower ACD scores are statistically significantly associated with higher frequencies of translocation breakpoints (r = −0.434, p < 0.0001 in blood tumors; r = −0.426, p < 0.0001 in solid tumors, Spearman's correlation tests; Figure 2a). These correlations also remained statistically significant after removal of outliers identified by the ROUT method (at FDR q = 0.01; r = −0.428, p < 0.0001 in blood tumors; r = −0.451, p < 0.0001 in solid tumors; Figure 2a). Thus, these analyses indicate that translocation breakpoints are more likely to occur in loosely packed chromatin than in dense chromatin. assigned an average chromatin density (ACD) score to each cytogenetic band. On a scale of 0 to 100, the ACD score indicates how loosely or tightly the chromatin is packed. Cytogenetic bands that were entirely open-euchromatic-received a score of 0, while bands whose chromatin was completely closed-heterochromatic-received a score of 100. For each cytogenetic band, the ACD score was calculated (see Methods). We independently plotted the translocation frequency of chromosome bands against their ACD scores. This showed that lower ACD scores are statistically significantly associated with higher frequencies of translocation breakpoints (r = −0.434, p < 0.0001 in blood tumors; r = −0.426, p < 0.0001 in solid tumors, Spearman's correlation tests; Figure 2a). These correlations also remained statistically significant after removal of outliers identified by the ROUT method (at FDR q = 0.01; r = −0.428, p < 0.0001 in blood tumors; r = −0.451, p < 0.0001 in solid tumors; Figure 2a). Thus, these analyses indicate that translocation breakpoints are more likely to occur in loosely packed chromatin than in dense chromatin. . Importantly, our analyses above indicated that breakpoints more frequently occur in longer cytogenetic bands (Figure 1a). Therefore, to account for this, we normalized breakpoint frequencies to the length of the cytogenetic bands. To achieve that, we divided the breakpoint frequency of each band by its proportion of the whole genome length. We then plotted length-adjusted breakpoint frequencies of each band against their ACD scores for the blood or solid tumor cohorts. The relationships between length-adjusted breakpoint frequencies and ACD scores were stronger than for non-length-adjusted frequencies for both the blood and solid cancer cohorts (r = −0.572, p < 0.0001 and r = −0.537, p < 0.0001, respectively, Spearman's correlation tests; Figure 2b). In addition, these remained highly significant after exclusion of outliers (r = −0.525, p < 0.0001 and r = −0.507, p < 0.0001, respectively; Figure 2b). Taken together, we conclude that translocation breakpoints preferentially occur in more open, euchromatic chromatin.

Translocation Breakpoints Preferentially Occur in Regions Rich in CTCF/Cohesin Binding Sites
We next asked whether translocation breakpoints might be more common in regions in which gene regulation occurs. As a surrogate for this, for each cytogenetic band, we calculated the density of the DNA binding sites for CTCF/cohesin, a DNA-binding protein complex that regulates   Importantly, our analyses above indicated that breakpoints more frequently occur in longer cytogenetic bands (Figure 1a). Therefore, to account for this, we normalized breakpoint frequencies to the length of the cytogenetic bands. To achieve that, we divided the breakpoint frequency of each band by its proportion of the whole genome length. We then plotted length-adjusted breakpoint frequencies of each band against their ACD scores for the blood or solid tumor cohorts. The relationships between length-adjusted breakpoint frequencies and ACD scores were stronger than for non-length-adjusted frequencies for both the blood and solid cancer cohorts (r = −0.572, p < 0.0001 and r = −0.537, p < 0.0001, respectively, Spearman's correlation tests; Figure 2b). In addition, these remained highly significant after exclusion of outliers (r = −0.525, p < 0.0001 and r = −0.507, p < 0.0001, respectively; Figure 2b). Taken together, we conclude that translocation breakpoints preferentially occur in more open, euchromatic chromatin.

Translocation Breakpoints Preferentially Occur in Regions Rich in CTCF/Cohesin Binding Sites
We next asked whether translocation breakpoints might be more common in regions in which gene regulation occurs. As a surrogate for this, for each cytogenetic band, we calculated the density of the DNA binding sites for CTCF/cohesin, a DNA-binding protein complex that regulates transcription [28]. We found a significant correlation between breakpoint frequency and CTCF/cohesin binding site density for both blood and solid tumors (r = 0.523, p < 0.0001 and r = 0.379, p < 0.0001, respectively, Spearman's correlation tests; Figure 3a). Removal of outliers only marginally affected the strength of the correlations between these two parameters (r = 0.438, p < 0.0001 and r = 0.361, p < 0.0001, respectively, Spearman's correlation tests; Figure 3a). transcription [28]. We found a significant correlation between breakpoint frequency and CTCF/cohesin binding site density for both blood and solid tumors (r = 0.523, p < 0.0001 and r = 0.379, p < 0.0001, respectively, Spearman's correlation tests; Figure 3a). Removal of outliers only marginally affected the strength of the correlations between these two parameters (r = 0.438, p < 0.0001 and r = 0.361, p < 0.0001, respectively, Spearman's correlation tests; Figure 3a). Additionally, we independently plotted the length-adjusted breakpoint frequencies against the CTCF/cohesin binding site densities of the cytogenetic bands. These analyses indicate that translocation breakpoints are more likely to occur in regions rich in CTCF/cohesin binding sites .(r = 0.408, p < 0.0001 for blood tumors; r = 0.268, p < 0.0001 for solid tumors; Figure 3b). Similar to previous analyses, the strength of the correlations was only slightly affected after removal of outliers (r = 0.366, p < 0.0001 and r = 0.325, p < 0.0001, respectively, Spearman's correlation tests, Figure 3b). Thus, translocation breakpoints preferentially occur in regions of gene regulation.

Translocation Breakpoints Preferentially Occur in Gene-Rich Regions
We next asked whether there is a relationship between the translocation rate and gene density. Direct comparison of these parameters showed that they strongly correlate positively in both blood tumors and solid tumors (r = 0.514, p < 0.0001 and r = 0.388, p < 0.0001, respectively, Spearman's correlation tests; Figure 4a) and removal of outliers only slightly affected the strength of these correlations (r = 0.429, p < 0.0001 and r = 0.386, p < 0.0001, Spearman's correlation tests; Figure 4a). Similarly, a significant association was observed after length-adjustment (r = 0.408, p < 0.0001 in blood tumors; r = 0.277, p < 0.0001 in solid tumors, Spearman's correlation tests; Figure 4b) and this remained significant after outliers were removed (r = 0.362, p < 0.0001 and r = 0.320, p < 0.0001, respectively, Spearman's correlation tests; Figure 4b). This indicates that translocation breakpoints preferentially occur in gene-rich regions.     Additionally, we independently plotted the length-adjusted breakpoint frequencies against the CTCF/cohesin binding site densities of the cytogenetic bands. These analyses indicate that translocation breakpoints are more likely to occur in regions rich in CTCF/cohesin binding sites. (r = 0.408, p < 0.0001 for blood tumors; r = 0.268, p < 0.0001 for solid tumors; Figure 3b). Similar to previous analyses, the strength of the correlations was only slightly affected after removal of outliers (r = 0.366, p < 0.0001 and r = 0.325, p < 0.0001, respectively, Spearman's correlation tests, Figure 3b). Thus, translocation breakpoints preferentially occur in regions of gene regulation.

Translocation Breakpoints Preferentially Occur in Gene-Rich Regions
We next asked whether there is a relationship between the translocation rate and gene density. Direct comparison of these parameters showed that they strongly correlate positively in both blood tumors and solid tumors (r = 0.514, p < 0.0001 and r = 0.388, p < 0.0001, respectively, Spearman's correlation tests; Figure 4a) and removal of outliers only slightly affected the strength of these correlations (r = 0.429, p < 0.0001 and r = 0.386, p < 0.0001, Spearman's correlation tests; Figure 4a). Similarly, a significant association was observed after length-adjustment (r = 0.408, p < 0.0001 in blood tumors; r = 0.277, p < 0.0001 in solid tumors, Spearman's correlation tests; Figure 4b) and this remained significant after outliers were removed (r = 0.362, p < 0.0001 and r = 0.320, p < 0.0001, respectively, Spearman's correlation tests; Figure 4b). This indicates that translocation breakpoints preferentially occur in gene-rich regions.

Chromatin Density is the Primary Predictor for Translocation Breakpoints
Above, we found that translocation breakpoints preferentially occur in chromosomal regions that are longer and harbor more open chromatin, more CTCF/cohesin binding sites and more genes. Importantly, these factors are often associated with each other [32]. Multiple (linear) regression analysis is often applied to test the individual contributions of multiple, potentially dependent, factors [37]. Thus, we performed multiple regression analysis to investigate which parameters are most significantly associated with translocation breakpoints. In human blood tumors, the length of cytogenetic bands, chromatin density and CTCF/cohesin binding site density showed significant contribution to the multiple regression model (all p < 0.01; Table 1). However, gene density did not. We also performed the multiple regression test on data excluding highly recurrent outlier translocations. Interestingly, CTCF/cohesin binding site density also no longer contributed significantly. This suggests that the likelihood for translocation breakpoints increases more readily by cytogenetic band length and chromatin density than by CTCF/cohesin binding site or gene density (Table 1)..

Chromatin Density is the Primary Predictor for Translocation Breakpoints
Above, we found that translocation breakpoints preferentially occur in chromosomal regions that are longer and harbor more open chromatin, more CTCF/cohesin binding sites and more genes. Importantly, these factors are often associated with each other [32]. Multiple (linear) regression analysis is often applied to test the individual contributions of multiple, potentially dependent, factors [37]. Thus, we performed multiple regression analysis to investigate which parameters are most significantly associated with translocation breakpoints. In human blood tumors, the length of cytogenetic bands, chromatin density and CTCF/cohesin binding site density showed significant contribution to the multiple regression model (all p < 0.01; Table 1). However, gene density did not. We also performed the multiple regression test on data excluding highly recurrent outlier translocations. Interestingly, CTCF/cohesin binding site density also no longer contributed significantly. This suggests that the likelihood for translocation breakpoints increases more readily by cytogenetic band length and chromatin density than by CTCF/cohesin binding site or gene density (Table 1). We next analyzed the parameters in blood cancers using a multiple regression model with length-adjusted translocation frequencies. Chromatin density and CTCF/cohesin binding site density were shown to significantly contribute to the model (p = 8.5 × 10 −8 and p = 0.032, respectively). However, CTCF/cohesin binding site density no longer did after we excluded outliers (Table 1).
For solid cancers, we performed multiple regression analyses in the same way. This yielded similar results. In non-length-adjusted analyses, length and chromatin density strongly contributed to the model irrespective of whether outliers were removed (all p < 0.0001; Table 1). However, after length-adjustment, only chromatin density significantly contributed to the model (p = 1 × 10 −9 ; Table 1).
Taken together, we identify chromatin density as the primary predictor for translocation breakpoints in both blood and solid tumors. Our data indicate that translocation breakpoints preferentially occur in loosely packed chromatin.

Translocation Breakpoints Preferentially Occur in Acrocentric Chromosome Arms
We next determined the translocation frequencies for each chromosome arm-irrespective of their translocation partner-to identify specific arms that are recurrently involved in translocations. In blood cancers, seven chromosome arms are involved in translocations at significantly increased frequencies compared to the frequencies of all other arms (ROUT test at FDR q = 0.01) (Figure 5a). Above, we found that translocations preferentially occur in longer cytogenetic bands, or longer chromosome arms, and open chromatin (Table 1). Following adjustment for chromosome arm length or arm length and chromatin density, this finding remained largely unchanged (Figure 5a). However, removal of outlier translocations had a considerable impact, leaving only translocations in chromosome arm 21q as significantly recurrent (ROUT test at q = 0.01) (Figure 5a). Similar analyses for solid cancers identified translocations in two to four arms as significantly recurrent, yet none of these remained significant following removal of highly frequent outlier translocations (ROUT test at q = 0.01) (Figure 5b).
Strikingly, the data in Figure 5a suggested that acrocentric chromosome arms are preferentially involved in translocations, even though they are typically shorter. Indeed, the average translocation frequency in acrocentric chromosome arms was significantly higher than that average for non-acrocentric chromosome arms (p = 0.0027; Mann-Whitney U test) (Figure 5c) and this difference remained highly significant after adjusting for arm length or arm length and chromatin density (p = 0.0081, p = 0.0061, respectively) ( Figure 5c). Thus, we conclude that translocations preferentially occur in acrocentric chromosome arms.

Translocation Breakpoints Preferentially Occur in Acrocentric Chromosomes
We wondered whether chromosome translocations preferentially occur in metacentric, submetacentric and/or acrocentric whole-chromosomes. To assess this, we calculated the expected percentages at which each of these types of chromosomes would be involved in translocations, taking into account the fraction of the cumulative length of each chromosome type within the whole genome (see Methods). Next, we compared these expected frequencies to our observed rates. We found that acrocentric chromosomes are involved in translocations nearly twice as often as expected (p < 0.0001; binomial test), while metacentric and submetacentric chromosomes are significantly less

Translocation Breakpoints Preferentially Occur in Acrocentric Chromosomes
We wondered whether chromosome translocations preferentially occur in metacentric, submetacentric and/or acrocentric whole-chromosomes. To assess this, we calculated the expected percentages at which each of these types of chromosomes would be involved in translocations, taking into account the fraction of the cumulative length of each chromosome type within the whole genome (see Methods). Next, we compared these expected frequencies to our observed rates. We found that acrocentric chromosomes are involved in translocations nearly twice as often as expected (p < 0.0001; binomial test), while metacentric and submetacentric chromosomes are significantly less often involved in translocations (p < 0.0001; Figure 5d). Blood cancers predominantly contribute to this phenomenon, as they show a 2.2-fold higher than expected involvement of acrocentric chromosomes (p < 0.0001; Figure 5d). In solid cancers, acrocentric chromosomes are also more often involved in translocations than expected, but this increase is not statistically significant (p = 0.2414; Figure 5d).
It is possible that these observations are skewed due to the contribution of one or several translocations that occur at very high frequencies and hence represent outliers. However, after removal of outliers (using the ROUT test at FDR q = 0.01), acrocentric chromosomes still showed a significantly higher than expected involvement in translocations (p < 0.001) (Figure 5e). In fact, in solid cancers acrocentric chromosomes were now also significantly more often than expected involved in translocations (p = 0.0394), indicating that outlier translocations, in particular involving metacentric chromosomes, introduced a bias that masked preferential involvement of acrocentric chromosomes in these cancers (Figure 5d,e). Thus, we conclude that translocation breakpoints preferentially occur in acrocentric chromosomes in both hematological and solid cancers.
We compared this observation to translocations included in the "Atlas of Genetics and Cytogenetics in Oncology and Haematology" [38] (Table S1). However, this resource only lists unique translocations that have been reported in the literature. It notably does not include translocation frequencies. Hence, this precluded direct comparison of our observations to those in this Atlas (Table S1).
Above, we found that translocations preferentially occur in open chromatin and our multiple regression analysis indicated that chromatin density is the primary predictor for the occurrence of translocation breakpoints. Thus, if acrocentric chromosomes have more open chromatin-or a lower chromatin density score-then that could explain why they are more often involved in translocations. To test this, we compared the chromatin density scores of acrocentric chromosomes to those of non-acrocentric chromosomes. This indicated that acrocentric chromosomes are in fact significantly more chromatin-dense than non-acrocentric chromosomes (p = 0.0418; t-test) (Figure 5f). Even if this figure would not have shown statistical significance, chromatin density could skew analyses on a per-chromosome basis. However, the current observation means that our data in Figure 5d,e are an underestimation and that acrocentric chromosomes are preferentially involved in translocations despite the fact that they are more chromatin-dense.
To account for the more heterochromatic state of acrocentric chromosomes, we adjusted the expected rates at which the chromosome types are involved in translocations to their respective chromatin densities. This showed that in both blood and solid cancers, acrocentric chromosomes are preferentially involved in translocations (all p values < 0.001) (Figure 5g), irrespective of whether outliers are removed (all p values < 0.001) (Figure 5h). Our data also strongly suggest that this occurs mostly at the expense of translocations involving metacentric chromosomes (Figure 5d,e,g,h).

Identification of Significantly Recurrent and Underrepresented Translocations
A considerable number of highly recurrent translocations-including the respective fusion genes-have previously been identified [15,38]. In addition to these, we here aim to identify less common translocations, as well as significantly underrepresented translocations, as the latter could reveal strong negative selection. To do so, we used a previously described probabilistic model developed to identify statistically significant pair-wise patterns of species co-occurrence [39]. Each translocation can be considered a co-occurring pair of chromosome arms or chromosome bands. Accordingly, we performed a co-occurrence analysis for translocations in blood and solid tumors.
We built matrices for the co-occurrences/translocations. Statistically significant pairs were identified based on p-values smaller than 0.05 (Veech's probabilistic model [39]). At the chromosome arm level, there were 299 significant pairs in blood tumors, compared to 133 pairs in solid tumors, whereas at the cytogenetic chromosome band level, we identified 298 significant pairs for blood tumors and 25 significant over-or underrepresented pairs for solid tumors (Figure 6a, Supplementary Figure S1, Tables S2-S5). Next, to better visualize the co-occurrences, we generated networks of the most frequent co-occurrences (Figure 6b, Supplementary Figure S1), as well as volcano plots, which included all translocations (Figure 6c, Supplementary Figure S2). This led to a number of observations. volcano plots, which included all translocations (Figure 6c, Supplementary Figure S2). This led to a number of observations.   [39]. Statistically significantly (p < 0.05) more observed than expected ("positive") translocations-taking into account the frequencies at which each individual cytogenetic chromosome band is involved in translocations-are shown in blue. Translocations occurring at significantly lower than expected frequencies ("negative") are shown in orange. Non-statistically significant pairs ("random") are shown in grey. Only part of the matrix is shown. The full matrix is shown in Supplementary Figure S1a First, consistent with previous findings, these analyses indicate that chromosomal translocations are much more prevalent in blood tumors than in solid cancers ( Figure 6, Supplementary Figure S1). Second, not surprisingly, with 18.8% and 7.4% of all translocations, t(9;22)(q34;q11) and t(8;21)(q22;q22)-corresponding to the Philadelphia chromosome/BCR-ABL1 and RUNX1-RUNX1T1/AML1-ETO fusion genes [12,15]-were the most frequent translocations in blood cancers (Figure 6d, Supplementary Figure S1, Table S2). Strikingly, however, the highly prevalent occurrence of translocation breakpoints at these and several other locations predicted fusion genes such as "IGH-BCR", "BCR-RUNX1", "RUNX1-ABL1", "IGH-ABL1" and "BCR-RUNX1T1" at frequencies up to 5.2%, which would have represented the third most common translocation in blood cancers (Figure 6d, Supplementary Table S2). However, these were identified at significantly lower than expected frequencies, ranging from only 0.04% to 0.5% (all p < 0.00001) (Figure 6d, Table S2). This strongly suggests that such translocations do not provide a survival advantage for hematological cancer cells or that there is strong negative selection against them.
Finally, for solid cancers, the numbers of significant positive and negative correlation pairs were about equally distributed. However, we observed a considerably lower number of significant positive than significant negative correlation pairs in blood cancers (p < 0.0001, Fisher's exact test; Figure 6a,c,e, Supplementary Figure S1, Tables S2 and S3). Yet, a small number of the most frequent translocations in blood cancers showed the strongest significance. Notably, the top four most frequent translocations represented more than a third of all translocations and the top ten constituted half of all translocations (Supplementary Table S2). This strongly suggests that blood cancer cells with few specific translocations harbor considerable malignant advantages that provide benefits for tumorigenesis.

Discussion
Chromosomal translocations have been shown to promote tumorigenesis in many types of cancer, including leukemia [12], lymphoma [13], sarcoma [14], breast carcinoma [40] and lung carcinoma [15,41]. A variety of mechanisms have been proposed to underlie chromosomal translocations, involving both the generation of DNA DSBs and the fusion of breakpoint sites on heterologous chromosomes [42,43]. These relate to V(D)J recombination, gene expression and chromatin density. However, where in the genome the translocation breakpoints are most likely to occur remains incompletely understood. This may in part be due to interdependencies of proposed factors (see also below). Here, using 1563 karyotypes of solid tumors and 13,844 karyotypes of blood tumors, we assessed the associations of several parameters with the frequency of translocation breakpoints in blood and solid tumors..
We find that translocations more often occur in longer cytogenetic chromosome bands. While this might have a biological cause, we believe that this simply reflects an increased mathematical probability. Hence, we also performed our analyses on chromosome band length-adjusted translocation frequencies.
Transcription has been linked to genome instability. It may alter the DNA sequence or promote chromosomal rearrangement [44]. Using genome-wide translocation sequencing to analyze DSBs as translocation hotspots, two comprehensive studies found that translocations are strongly associated with transcription start sites in the genome [45,46]. These observations are consistent with our finding that translocation breakpoints occur more frequently in regions enriched in genes and CTCF/cohesin binding sites, the latter of which are important for both transcription and enhancer-promoter interactions [28,32].
Chromatin density has also previously been linked to influence translocation frequency [42,47]. Open chromatin is thought to be more susceptible to DNA DSBs than heterochromatin, as the latter is protected by proteins that mediate higher order chromatin condensation and DSBs are the first requirement for the generation of translocations. Consistent with this thesis, we also find that translocation breakpoints preferentially occur in open chromatin.
Adding to the complexity of identifying which factors promote translocations, a number of parameters are often associated with each other. For example, chromatin density, gene density and CTCF/cohesin binding site density are linked. After all, CTCF/cohesin affects transcriptional activity and chromatin density [32], open chromatin is required for transcription and transcription occurs where genes are located. Importantly, however, multiple regression analysis enabled us take such interdependencies into account. This indicated that chromatin density is a more significant predictor for translocation breakpoints than CTCF/cohesin binding site density or gene density.
We observed vastly distinct chromosomal translocation signatures in blood and solid tumors. This may be attributed to the profound differences between hematological cell types and those of mesenchymal or epithelial origin, for example in chromosomal organization or dynamics [48]. Some studies showed that the spatial proximity of heterologous loci undergoing DSBs promotes ligation-and hence the translocation-between them [48][49][50]. This phenomenon may partly explain why some translocations occur at extremely high frequencies.
More broadly, the forms of genomic instability that drive blood and solid tumorigenesis are also markedly different. Hematological cancer development is typically facilitated by the expression of fusion genes as a result of translocations [11,15]. In contrast, solid tumorigenesis is more often promoted by common aberrations in tumor suppressor pathways, which in turn lead to whole-chromosome instability or forms of structural chromosome instability that may or may not include translocations [1,5,6,51].
We identified acrocentrism as a novel chromosomal attribute that predisposes to translocations. In blood tumors, nearly a third of all translocations involve acrocentric chromosomes. After removal of highly prevalent translocations, this observation remained highly statistically significant. In contrast, acrocentric chromosomes were also more frequently than expected involved in solid cancer translocations. However, this increase was statistically significant only in multivariate analyses or after removal of outliers.
Our observations provide insights into where translocations may be generated subcellularly. Within the nucleus, chromosomes are organized in territories [52,53]. Also, the short arms of acrocentric chromosomes harbor ribosomal DNA, which is organized in nucleolar organiser regions (NORs) [54]. These NORs-and hence the short arms of acrocentric chromosomes-consistently localize to nucleoli, which are located in the inner nuclear space, rather than at the nuclear periphery. Consistently, acrocentric chromosomes localize to the core of the nucleoplasm, away from the nuclear lamina, where larger chromosomes in particular are located [53]. Thus, our finding that translocations preferentially occur in acrocentric chromosomes suggests that cancerous chromosomal translocations are often generated perinucleolarly, in the inner nuclear space, away from the nuclear lamina.
Interestingly, translocation breakpoints in the germline often overlap with those in tumors [55]. Consistent with this, germline translocations also frequently involve acrocentric chromosomes. In fact, almost all translocations in the germline are Robertsonian translocations, involving two acrocentric chromosomes. For example, the Robertsonian translocation between chromosomes 14 and 21 is sometimes detected in the germline [56]. Offspring of such translocation carriers may be trisomic for chromosome 21 and affected by Down syndrome.
Our pairwise probabilistic co-occurrence modeling identified highly significant translocations involving chromosome 9q34 and 22q11, chromosome 8q24 and 14q32, as well as chromosome 15q22 and 17q21. It is well documented that the formation of fusion oncogenes from those translocations, including ABL1 and BCR [57], MYC [58], PML and RARA [59], cause blood tumorigenesis [11,15]. Yet, our identification of significantly underrepresented translocations and putative fusion genes is novel and suggests that these are probably strongly selected against during tumor evolution.
Lastly, the generation of "fusion mRNAs" has also been proposed as a potential tumorigenic mechanism [60]. Such fusion mRNAs are generated from early-terminated transcripts, rather than from rearranged genomic loci. This suggests that malignant fusion transcripts may be more common in cancer cells than expected based on translocation frequencies alone.
Taken together, we find that cancerous translocations preferentially occur in euchromatin and acrocentric chromosomes. Probabilistic co-occurrence modeling identified well-known recurrent translocations, as well as markedly underrepresented translocations, which either do not provide proliferative advantages, or against which strong negative selection occurs during tumor progression. Thus, our findings generated novel insights into the mechanisms and selection of translocations during tumorigenesis.

Karyotype Selection and Translocation Frequencies
Karyotypes from human tumors were collected from the Mitelman Database of Chromosome Aberrations in Cancer [61]. Biases in the Mitelman database karyotypes were previously reported [35]. Hence, our analyses only included quality-checked karyotypes, as described by Ozery-Flato and colleagues [35]. A total of 1563 karyotypes from solid tumors and 13,844 karyotypes from blood cancers were analyzed (also available from the "STACK" database at http://acgt.cs.tau.ac.il/stack [35]). Translocation frequencies within each cytogenetic chromosome band, chromosome arm or whole chromosome were determined using these data.

Definitions of Parameters
Physical length of cytogenetic chromosome bands, chromosome arms or whole chromosomes, as well as the numbers of genes and CTCF/cohesin DNA binding sites in each of these were obtained from the Human Genome Browser (https://genome.ucsc.edu), University of California Santa Cruz (Santa Cruz, CA, USA). Gene density and CTCF/cohesin binding site density were calculated by dividing the number of genes or number of CTCF/cohesin binding sites in each cytogenetic band by the physical length of the cytogenetic band. The average chromatin density (ACD) score was calculated using the intensities of Giemsa staining of each chromosome band, as depicted in shades of grey in the conventional ideogram, ranging from white (euchromatin, score 0) to black (heterochromatin, score 100). Each chromosome sub-band received a discrete score of 0, 25, 50, 75, or 100. The ACD score of each chromosome band was calculated as the weighted average of the discrete scores of each sub-band (taking into account the length/proportion of each chromosome sub-band within the cytogenetic band). To which cytogenetic type (i.e., acrocentric, sub-metacentric or metacentric) each chromosome belonged was determined by the position of the centromere within the chromosome.

Data Distributions, Outliers and Linear Regression
Statistical assessment of translocation frequencies, cytogenetic band lengths, ACD scores, CTCF/cohesin binding site densities and gene densities was performed as described [62,63]. D'Agostino-Pearson normality tests showed that none of these parameters were normally distributed (all p < 0.0001). The non-Gaussian distribution of the translocation frequencies, as well as the observation of several extremely high translocation frequencies within the dataset, prompted us to identify outliers. Where indicated, data were re-analyzed after outlier removal to ensure that conclusions were not reached mostly or exclusively due to the strong contributions of outliers. In all analyses, outliers were identified using the ROUT method [36] with a false discovery rate of q = 0.01 using GraphPad Prism software (GraphPad Software, Inc., La Jolla, CA, USA). In order to make highly skewed distributions less skewed, translocation frequency data were log 10 -transformed. Spearman's rank-order correlations were used to assess the extent to which parameters associated with chromosome translocation frequency. Spearman regression coefficients (r) were presented to assess the relationship between translocation frequencies and factors of interest. The p values express the probability that the observed value was not due to chance.

Multiple Regression Analysis
Multiple regression analysis was used to determine the contribution of each variable to the chromosomal translocations. Regression coefficients of each predictor indicated the mean change in the translocation for one unit of change in the predicted factor while holding other factors of interest in the model constant. The goodness-of-fit of the multiple linear regression for the model was expressed by R 2 and adjusted R 2 values. The p values express the probability that the slope of the multiple linear regression line is zero.

Chromosome Type Analyses
For chromosome arms, absolute translocation frequencies were presented. Adjustment to compensate for arm length occurred by dividing the absolute translocation frequencies by the respective physical chromosome arm lengths. Additional adjustment to account for chromatin density occurred by multiplying the latter by the respective ACD scores. For whole chromosomes, absolute and adjusted translocation frequencies were calculated similarly.
The expected rates at which translocations occur in each of the chromosome types, i.e., E type , referring to E acrocentric , E metacentric and E submetacentric , were calculated using Equation (1).
Herein, O total is the observed total translocation frequency and n the number of chromosomes in the group. Subscript "chromosome" refers to any chromosome. The fact that the genome is diploid is accounted for in Equation (1). Expected frequencies adjusted to physical chromosome length (l) were computed according to Equation (2).
Additional adjustment to chromatin density occurred according to Equation (3).

Probabilistic Co-Occurrence Modeling
Pairwise probabilistic co-occurrence modeling, or Veech's probabilistic modeling, was performed as previously described [39]. Briefly, each translocation was considered the co-occurrence of two chromosomal breakpoints. Observed co-occurrence frequencies O cooccur , calculation of expected co-occurrence frequencies E cooccur (based on the frequencies of the individual breakpoints) computation of the p values, reflecting the probability that O cooccur > E cooccur ("positive" co-occurrence) or O cooccur < E cooccur ("negative" co-occurrence) occurred by chance, and generation of matrices, networks and volcano plots were performed in the R programming environment.

Conclusions
The existence of chromosomal translocations in human tumors has been known for many decades [15]. More recently, a number of factors have been identified that influence where in the genome translocation breakpoints occur. These include chromatin density, gene density and CTCF)/cohesin binding site density. However, interdependence of these factors has considerably complicated deciphering the precise contribution of each of these. Our multiple linear regression analyses on thousands of blood and solid cancers identified chromatin density as the primary contributor with breakpoints preferentially occurring in open chromatin. We also identified acrocentrism as a novel predisposing factor. As the short arms of acrocentric chromosomes localize to the nucleoli, this suggests that translocations are often generated around nucleoli and hence in the inner nucleoplasm, rather than close to the nuclear envelope. Using pairwise probabilistic co-occurrence modeling, we identified both highly prevalent and significantly underrepresented translocations with putative fusion genes. The latter are probably strongly selected against during tumor development. Thus, our discoveries have shed new light on both the generation and selection of translocations during tumorigenesis.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/10/1/13/ s1. Figure S1: Matrices and networks of translocation/co-occurrence analyses, Figure S2: Volcano plots for translocation/co-occurrence analyses, Table S1: Numbers of unique translocations listed in the Atlas of Genetics and Cytogenetics in Oncology and Haematology, Table S2: Veech's probabilistic model of translocations in blood cancers at the cytogenetic chromosome band level, Table S3: Veech's probabilistic model of translocations in solid cancers at the cytogenetic chromosome band level, Table S4: Veech's probabilistic model of translocations in blood cancers at the chromosome arm level, Table S5: Veech's probabilistic model of translocations in solid cancers at the chromosome arm level.