Detection of rare germline variants in the genomes of B cell neoplasms

Simple Summary: The global importance of rare variants in tumorigenesis has been addressed by pan-cancer analysis, revealing significant enrichments of protein truncating variants in genes such as ATM , BRCA1 / 2 , BRIP1 and MSH6 . Germline variants can influence treatment response and contribute to the development of treatment-related second neoplasms, especially in childhood leukemia. We aimed to analyze the genomes of patients with B-cell lymphoproliferative disorders for the discovery of genes enriched in rare pathogenic variants. We discovered a significant enrichment of 26 genes in germline protein truncating variants (PTVs), affecting cell signaling ( MET , JAK2 , ANGPT2 ), energy metabolism ( ACO1 ) and nucleic acid metabolism and repair pathways ( NT5E , DCK ). Additionally, we detected rare and likely pathogenic variants associated with tumor subtype, disease prognosis and potential druggability, indicating a relevant role of these events in the variability of cancer phenotypes. Abstract: Growing evidence has revealed the implication of germline variation in cancer predisposition and prognostication. Here, we describe an analysis of putatively disruptive rare variants across the genomes of 726 patients with B-cell lymphoid neoplasms. We discovered a significant enrichment of 26 genes in germline protein truncating variants (PTVs), affecting cell signaling ( MET , JAK2 , ANGPT2 ), energy metabolism ( ACO1 ) and nucleic acid metabolism and repair pathways ( NT5E , DCK ). Interestingly, some of these variants were restricted to either chronic lymphocytic leukemia (CLL) (i.e., ANGPT2 and AKR1C3 ) or B-cell lymphoma cases ( PNMT , TPT1 and IGHMBP2 ). Additionally, we detected 1,675 likely disrupting variants in genes associated with cancer, of which 44.75% were novel events and 7.88% were PTVs. Among these, the most frequently affected genes were ATM , BIRC6 , CLTCL1A and TSC2 . Homozygous or compound heterozygous variants were detected in 28 cases; and coexisting somatic events were observed in 17 patients, some of which affected key lymphoma drivers such as ATM , KMT2D and MYC . Finally, we observed that variants in the helicase gene WRN were independently associated with shorter survival in CLL. Our study results support an important role for rare germline variation in the pathogenesis, clinical presentation and disease outcome of B-cell lymphoid neoplasms.

Next-generation sequencing (NGS) technologies have deconvoluted the genomic complexity of B-cell lymphoid tumors to a great extent, revealing the most frequent molecular drivers of disease and the interplay among them. NHL cases show familial predisposition, and much of the heritability of these diseases is still unexplained. 2 Genome-wide association analysis (GWAS) have identified the existence of polymorphisms significantly associated with risk of CLL, 3 DLBCL 4 and follicular lymphoma [5]. Similarly, some polymorphisms are also related with the outcome of B-cell lymphomas [6][7][8] and CLL [9]; and it has also been proved that some variants cooperate with somatic events in shaping clinical outcomes of cancer patients [10]. Another source of germline variation consists of rare variants (allele frequency < 0.1-1%). The global importance of such rare variants in tumorigenesis has been addressed by pan-cancer analysis, revealing significant enrichments of protein truncating variants in genes such as ATM, BRCA1/2, BRIP1 and MSH6 [11]. Indeed, some of these variants predispose to cancer development through the acquisition of second somatic hits [12], such as point mutations or loss-of-heterozygosity (LOH) [13]. Additionally, germline variation can influence treatment response and contribute to the development of treatment-related second neoplasms, especially in childhood leukemia [14]. Many such rare variants in cancer related genes have been associated with particular cancer subtypes [15][16][17], but until now little attention has been focused on the genome-wide frequency, pathogenicity and clinical implications of rare variants in lymphoid malignancies. Rare variants in ATM and CDK1 variants have been associated with CLL risk in genome-wide analysis [18], whereas evidence for the implication of infrequent events in other genes come from familial studies or single-gene analysis [19][20][21].
In this report we performed an exploratory analysis of the frequency and distribution of rare and putatively pathogenic germline variants in the genome of several mature B cell lymphoid neoplasms using high-throughput sequencing data produced by the International Cancer Genome Consortium (ICGC) [22]. Our results indicate the existence of multiple genes affected by highly pathogenic germline variants in the genome of these patients, some of which seem to condition phenotypic expression and patient survival.

Data source
We processed germline next-generation sequencing data obtained from 726 patients with B-cell lymphoid malignancies that were included in the International Cancer Genome Consortium. Briefly, 504 cases pertained to the Spanish Chronic Lymphocytic Leukemia project, and 222 were retrieved from the German Malignant Lymphoma project. Overall, there were 504 chronic lymphocytic leukemia (CLL) or small lymphocytic lymphoma (SLL) cases (including 54 monoclonal B cell lymphocytosis cases), 97 follicular lymphoma cases, 85 diffuse large b-cell lymphoma (DLBCL) cases, 36 Burkitt lymphoma cases and 4 unclassified B-cell lymphoma cases. CLL control samples were derived from non-tumoral leukocytes (<2% tumor contamination), whereas lymphoma controls originated from whole blood or buffy coats checked for negative clonality analysis.

Germline variant identification and annotation
Most CLL germline samples (440 out of 502) were processed using exome-sequencing kits (Agilent SureSelect Human All Exon V4 and V4+UTRs), whereas whole genome sequencing was done on a group of 262 CLL cases and all B-cell lymphoma cases included in the MALY-DE project. We restricted our analysis to protein-coding regions covered by the exome-sequencing kits. Variants were detected using the optimized bcbio-nextgen (version 1.1.5) pipeline [23], and the GRCh37. 75

Burden test against public controls
We used Testing Rare vAriants using Public Data (TRAPD) software in order to compare the enrichment of our cohort of patients in PTVs against 15,708 public controls from the gnomAD v2 whole genome sequencing dataset [42].
Importantly, none of these controls originated from cancer studies. PTV variants with a maximum allele frequency (popmax) < 0.5% were selected. Multiple testing correction was performed with the FDR method.

Compound heterozygotes and germline-somatic double hit event detection
In order to identify putatively compound heterozygous genes, we selected concurrent rare heterozygous and putatively damaging variants affecting the same gene in the same individual. All variants in linkage disequilibrium (R 2 > 0.2) were discarded from this analysis according to 1000 Genomes data, as they could pertain to the sample haplotype.
Second-hit somatic mutations were detected by comparing germline variants with somatic mutations for the same set of individuals present in the ICGC database.

Myeloid clonal hematopoiesis filtering
Potentially mosaic somatic mutations in the blood controls due to myeloid clonal hematopoiesis of undetermined significance (CHIP) could exist. In order to assess this issue we initially identified a list of 22 recurrently mutated genes in clonal hematopoiesis that had at least one putatively rare germline variant in the final dataset [43-45]. Among these genes, we analyzed if the variants were present in both the control and tumor (lymphoid) compartment, and those mutations that were not found (or found at very low VAF) in the tumoral department were catalogued as likely myeloid CHIP events.

Rare variants overview
1,665 rare germline variants with likely disruptive activity (CADD scores > 20 or protein truncating) were detected in 559 cancer-related genes across 693 (95.45%) patients (Figure 1, Supplementary Table 1). Overall, the frequency of these rare and likely disrupting mutations in cancer-related genes was superior to those found in non-cancer related genes (4.25 x 10 -3 vs 3.61 x 10 -3 mutations per gene & patient). Most of these were missense variants (1,559 events, 93.01%).
Overall, 113 patients (15.56%) harbored 126 PTVs in 103 different loci, which included frameshift, splice donor, splice acceptor, nonsense, stop loss and start loss variants (Supplementary Table 3). The frequency of PTVs in this gene list was notoriously superior to that observed in the remaining genes (2.11 x 10 -3 vs 7.33 x 10 -4 mutations per gene & patient), suggesting an enrichment in loss of function mutations among cancer-related genes. The most frequently affected genes were ATM (5 cases), SETDB1 (5 cases in a single locus), ISX (4 cases) and POLQ (4 cases).
Some of the missense variants showed a remarkable increased frequency compared in patients with lymphoid neoplasia compared with the non-Finnish European (NFE) gnomAD database. This was the case of the variants rs199502695 in PRPF40B (4 cases, 71.17 times more frequent), rs191413750 in DOCK8 (5 cases, 55.55 times more frequent), rs377188372 in N4BP2 (4 cases, 34.66 times more frequent) and rs146946726 in MLLT10 (6 cases, 8.10 times more frequent).
227 different variants have also been described as pathogenic or likely pathogenic somatic mutations in cancer (Supplementary Table 4 Finally, 11 variants in homozygosis were observed, one of which (c.1642C>T in ZCCHC8) was present in 2 different patients ( Table 1). Similarly, 15 patients harbored two variants in the same gene, probably in the form of compound heterozygotes ( Table 1). Interestingly, these compound heterozygotes were observed twice in FAT1 and ZFHX3.
Moreover, one homozygous nonsense variant and a compound heterozygote were detected in the gene GLI1, and one homozygous missense variant plus a compound heterozygote was detected in MYH9.

Rare variants affecting genes involved in cancer syndromes with germline inheritance
84 genes associated with inherited cancer syndromes were affected by a total of 372 occurrences of 225 different rare variants (Supplementary Table 7), of which 19 were PTV and affected 22 patients (3%). 131 variants were observed in genes linked with autosomal dominant syndromic cancer, affecting 168 patients. Among these, the most frequently mutated genes were TSC2 (22 cases), linked to tuberous sclerosis, APC (16 cases), linked to hereditary colon cancer, and the DNA polymerase POLE (16 cases), involved in predisposition to multiple cancers ( Table 2). Similarly, 94 variants in 32 genes linked to autosomal recessive cancer were observed, which affected 149 patients. The most commonly affected among these were ATM (25 cases), NBN (12 cases), BLM (12 cases), DOCK8 (12 cases) and WRN (12 cases) ( Table 2).
Some of these variants were labelled as pathogenic in ClinVar (Supplementary Table 6 Table 2).

Differential distribution of rare variants and association with patient survival
We did not identify any gene significantly enriched in rare variants in CLL vs B-cell lymphoma cases (Fisher test, FDR<5%). Nevertheless, we discovered that some variants were only detected in one subgroup. For example, the missense variant rs1800729 in TSC2 was exclusively present in CLL (8 cases), and the missense variant rs139075637 in POLE was exclusively present in non-CLL B lymphoid tumors (7 cases). Further analysis needs to be performed in order to confirm these findings and rule-out population substructure biases.
Thereafter, we tested if rare variants could be associated with adverse patient outcomes. Due to the heterogeneity of the dataset and sample size limitations, we restricted our analysis to CLL cases, and considered variants present in at least 1% of cases. Interestingly, rare variants in the DNA helicase WRN (8 cases) were significantly associated with shorter overall survival (cox p-value 1.16 x 10 -4 , q-value 0.01, HR [2.35, 14.59] , Figure 3 B). Indeed, such association was independent of age at diagnosis and CLL/MBL status (p-value 1.97 x 10 -7 , HR [5.03, 35.48]). Moreover, these variants were also linked to shorter time to first treatment (cox p-value 6.15 x 10 -4 , HR [1.85, 9.48], Figure 3 A), which remained significant after adjusting for age at diagnosis and CLL/MBL status (p-value 1.69 x 10 -3 , HR [1.64, 8.48]). These patients tended to harbor high-risk karyotype anomalies in the tumor cells: 11q deletion (3 cases, one as an isolated anomaly, one co-occurring with 13q deletion and one co-occurring with 3 other karyotype anomalies), 17p deletion (1 case, cooccurring with a 18p deletion), 8q deletion (1 case, co-occurring with 21q gain), and 6q deletion (1 case, co-occurring with 13q deletion).
Curiously, ATM germline variants appeared not to be associated with survival in CLL. We reasoned that this could be due to the inclusion of missense variants in the model, since ATM is a gene with great variability in the population. Therefore, we restricted the analysis to patients with truncating variants in ATM (4 cases), and discovered that these few patients had a significantly shorter overall survival (p-value 0.02, [HR 1.28,21.53]).

Association of rare germline variants with somatic mutations
Concurrent rare and likely disruptive germline variants and somatic mutations were detected in 17 cases ( Table 3)

Burden test of protein truncating variants using public controls
Germline PTVs were selected for association with risk of B-cell neoplasms using the burden test against public whole genome sequencing controls. We selected PTVs because these are the most potentially pathogenic variants. As a result, 25 genes were significantly enriched in PTVs among patients affected by B-cell lymphoid neoplasms (q-value < 0.05), and one additional gene (LPIN3) showed a trend towards association (q-value 6.83 x 10 -2 ) ( Table 4). Importantly, inflation statistics were low (ƛ=1.05; Supplementary Figure 1). Overall, we identified 56 different events occurring in 205 cases (Supplementary Table 10). The most significantly genes were PPIL1 (q-value 1.49 x 10 -9 ), JAK2 (q-value 3.01 x 10 -9 ), NT5E/CD73 (q-value 3.02 x 10 -7 ), TPT1 (q-value 1.52 x 10 -5 ) and TLR4 (q-value 1.93 x 10 -5 ) (Figure 4). and UAP1 participate in oncogenic metabolism rewiring. Additionally, TPT1 regulates cellular growth and proliferation, whereas IGHMBP2 encodes a member of the helicase superfamily that binds to a specific DNA sequence from the immunoglobulin mu chain switch region.

Differential distribution of PTVs discovered in burden test and association with patient survival
PTVs affecting TPT1, PNMT and IGHMBP2 were significantly enriched in patients affected by lymphomas compared to CLL cases (q value < 0.05, fisher test), whereas a tendency for an enrichment of AKR1C3 variants in the CLL patients      development of metastatic disease in prostate cancer [15]. Therefore, we reasoned that the analysis of such variants in patients affected by B-cell lymphoid neoplasms could shed new clues about their pathogenesis and prognostication.

Figures, Tables and Schemes
Indeed, our results indicate an increased frequency of protein truncating rare variants in 26 genes. Notably, most of these are clearly vinculated to oncogenesis. For example, the list included 5 members of the oncogenic PI3K-Akt pathway [47], such as the oncogenes JAK2 [48] and MET [49]; and the TP53-regulator TPT1, which promotes p53 degradation in a MDM2-dependent manner [50]. Another group of affected genes play a role in the metabolic rewiring associated with oncogenesis, such as enzyme aconitase 1 (ACO1), an enzyme that participates in the tricarboxylic acid cycle upstream of IDH [51]. Additionally, we detected a significant number of events in genes that regulate DNA metabolism, such as the oncogene ecto-5′-nucleotidase (NT5E/CD73) that catalyzes AMP breakdown to adenosine [52] and the gene deoxycytidine kinase (DCK) that is required for the phosphorylation of several deoxyribonucleosides and their nucleoside analogs [53]. Curiously, some of these disruptive variants showed differential distribution between CLL and B-cell lymphoma patients. The most significantly enriched genes in B-cell lymphoma patients were IGHMBP2 (a helicase gene implicated in DNA repair [54]) and PNMT (an enzyme involved in catecholamine biosynthesis associated with cancer predisposition [55]). On the contrary, truncating mutations in other genes were restricted to CLL patients, such as those of AKR1C3 (an oncogenic enzyme catalyzing the conversion of aldehydes and ketones [56]) and FBXO44 (a mediator of BRCA1 proteasomal degradation [57]). Finally, we also detected a trend towards an association of truncating variants in ANGPT2 with shorter time to first treatment in CLL. Not surprisingly, the expression of this member of the PI3K-Akt pathway has been previously associated with CLL clinical evolution [58]. Altogether, these results support a role for rare germline variants in the pathogenesis of B-cell lymphoid neoplasms, and they also anticipate their importance as drivers of clinical presentation.
In a different approach, we focused our research on the detection of rare and likely disruptive mutations (both PTVs and non-PTVs) in a set of genes involved in cancer pathways, and particularly in lymphoid neoplasms. The collective high frequency of these rare germline variants in cancer genes supposes a challenge for personalized genomics, as many of these are probably non-functional whereas others play a pathogenic or prognostic role. We identified recurrent highly pathogenic variants affecting important drivers of hematological cancer (ATM [59]), epigenetic regulators (ISX [60] & SETDB1 [61]) and mediators of DNA replication (POLQ [62]). Recurrent variants were also observed in drug targets, and particularly in the crizotinib targets ALK, MET and ROS1, as well as the everolimus target TSC2, which suggest new therapeutic strategies for these patients [63][64][65]. Additionally, several variants were previously catalogued as pathogenic (such as the E318K variant in the transcription factor MITF [66]); others affected strong mediators of inherited predisposition to lymphomas (i.e., DOCK8, EXT1, MSH6 and SOS1 [67][68][69][70]); and others have been flagged as pathogenic somatic mutations in cancer, such as NOTCH1 R912W [71,72]; and CNOT3 E20K [73,74]. Importantly, we observed that variants in the DNA helicase WRN were significantly associated with shorter overall survival and time to first treatment in CLL. WRN mutated CLL cases tended to harbor high-risk karyotypic anomalies, suggesting an increased genomic instability [75] mediated by altered DNA repair mechanisms [76].
Germline-germline or germline-somatic "double-hit" events were identified in cancer driver genes. Germline-germline "double-hit" events were detected in 28 cases (3.85% of cases), and curiously 6 genes were affected in more than one patient, including the Hedgehog signalling gene GLI1 [77] and the homeobox tumor suppressor ZFHX3 [78].
This study has several limitations. First, some background heterogeneity could exist between Spanish CLL and German lymphoma populations, although we believe this should be minimal. Secondly, many relevant oncogenes and tumor suppressors were very rarely mutated, and the interpretation of these variants in terms of survival will need the sequencing of thousands of cases. Additionally, the presence of mosaic somatic mutations in the controls due to clonal hematopoiesis could have led to some false positives. In this line, we observed that only a minority of variants in genes associated with CHIP were likely somatic events, but nevertheless our results should be taken with caution among this group of genes. Finally, another limitation arises from the heterogeneity and limited sample size of the B-cell lymphoma dataset, which dissuaded us from making a survival analysis in such cases.

Conclusions
Our results indicate the existence of multiple genes affected by highly pathogenic germline variants in the genomes of patients with B-cell neoplasms, including a significant enrichment of 26 genes in protein-truncating variants.
Additionally, the differential distribution of some of these variants suggests a contribution to the phenotypic variability