Germline Variants Associated with Nasopharyngeal Carcinoma Predisposition Identified through Whole-Exome Sequencing

Simple Summary The aim of this study was to identify the germline genetic variants associated with an increased risk of developing nasopharyngeal carcinoma (NPC). DNA samples from 119 Singaporean NPC patients were sequenced, with 17 pathogenic variants in 17 genes found to be enriched in NPC patients as compared to unaffected controls. Five of these variants (in the JAK2, PRDM16, LRP1B, NIN, and NKX2-1 genes) were supported by repeated testing on an independent set of Singaporean NPC patients and unaffected Singaporean controls. A FANCE variant was observed in two siblings with NPC, but not in three unaffected siblings of the same family. Gene-based burden testing recapitulated the association between NKX2-1 and FANCE variants with NPC risk. Pathway analysis revealed a higher frequency of germline mutations in endocytosis and immune-modulating pathways. Our research has identified new variants and genes associated with susceptibility to NPC, which are relevant for an improved understanding of the genetic predisposition of NPC. Abstract The current understanding of genetic susceptibility factors for nasopharyngeal carcinoma (NPC) is still incomplete. To identify novel germline variants associated with NPC predisposition, we analysed whole-exome sequencing data from 119 NPC patients from Singapore with a family history of NPC and/or with early-onset NPC, together with 1337 Singaporean participants without NPC. Variants were prioritised and filtered by selecting variants with minor allele frequencies of <1% in both local control (n = 1337) and gnomAD non-cancer (EAS) (n = 9626) cohorts and a high pathogenicity prediction (CADD score > 20). Using single-variant testing, we identified 17 rare pathogenic variants in 17 genes that were associated with NPC. Consistent evidence of enrichment in NPC patients was observed for five of these variants (in JAK2, PRDM16, LRP1B, NIN, and NKX2-1) from an independent case-control comparison of 156 NPC patients and 9770 unaffected individuals. In a family with five siblings, a FANCE variant (p. P445S) was detected in two affected members, but not in three unaffected members. Gene-based burden testing recapitulated variants in NKX2-1 and FANCE as being associated with NPC risk. Using pathway analysis, endocytosis and immune-modulating pathways were found to be enriched for mutation burden. This study has identified NPC-predisposing variants and genes which could shed new insights into the genetic predisposition of NPC.


Introduction
Nasopharyngeal carcinoma (NPC) afflicted 129,000 new patients globally in 2018 [1,2]. However, the global incidence of NPC is not homogenous, with more than two thirds of new NPC cases occurring in East Asia and Southeast Asia. Epstein-Barr virus (EBV) infection is the most common causative factor of NPC, although other environmental risk factors, such as the consumption of preserved foods, alcohol, and poor oral hygiene, have also been associated with NPC risk [3]. Recently, a meta-analysis of 334,935 men demonstrated a dose-response relationship between smoking and NPC risk, providing support for smoking as yet another risk factor for NPC [4,5]. Prospective studies in Singapore and China have shown a from two-to more than a ten-fold increased risk of NPC in first degree relatives of patients [6][7][8][9], suggesting a genetic component in the development of NPC.
Potential genetic factors for NPC based on various models of tumorigenesis have implicated developmental genes such as CRIP2 and MIPOL1, EBV oncoproteins LMP1 and LMP2, and increased EBV-associated tumorigenesis via specific EBV-infection-prone HLA haplotypes [10]. More recent studies have employed next-generation sequencing (NGS) to examine the genetics of NPC and identify germline variants in NPC. Using whole-exome sequencing (WES) of NPC tumour DNA samples, somatic variants have been identified in the Ras and cell cycle pathways, NF-kB pathway and MLL3 gene [11][12][13][14]. WES of germline DNA from NPC patients of southern Chinese descent has also identified germline variants associating MST1R and RPA1 with a genetic predisposition for NPC and aggressive disease, respectively [15,16]. In addition, germline mutations associated with increased cancer risk have been reported in over 100 genes [17], and targeted sequencing has found variants in suspected familial or sporadic NPC susceptibility genes CDKN2A/2B, BRD2, TNRFRSF19, and CLPTM1L/TERT [18].
Here, we performed WES on germline DNA from 119 NPC patients from Singapore, to determine the prevalence of mutations in previously reported NPC-associated genes and identify new NPC susceptibility variants. The variants identified in this cohort were also examined in an independent cohort of young 156 NPC cases who were diagnosed before they reached 40 years old. In addition, case-control association analyses were performed against local control and gnomAD non-cancer East Asian control cohorts. By using a stringent filtering and prioritisation strategy, we identified germline variants in FANCE, JAK2, PRDM16, LRP1B, NIN, and NKX2-1 that may be implicated in the pathogenesis of NPC. Pathway analysis revealed that the endocytosis and immune-modulating pathways were enriched for mutation burden.
patients either had early-onset NPC (at or below 40 years of age) and/or a family history of NPC in first-and/or second-degree relatives (Supplementary Table S1). Blood samples were also obtained from 38 family members (2 affected, 36 unaffected) of 16 probands from the discovery cohort. Unaffected control individuals comprised 1337 Singaporean participants who also underwent whole-exome sequencing.
For the validation cohort, blood samples were obtained from 156 patients diagnosed with NPC at or below 40 years of age from the National Cancer Centre Singapore (Supplementary Table S1). Unaffected control individuals comprised 9770 healthy Singaporean participants (SG10K_Health) [19]. All study participants provided written informed consent and the study was approved by the Institutional Review Boards at the study sites. Figure 1. Study design and steps taken for the selection of candidate NPC susceptibility variants. For each variant filtering step, the number of variants remaining after filtering is given within the curly brackets.

Whole-Exome Sequencing
For the discovery cohort and their family members, genomic DNA was extracted from peripheral blood mononuclear cells using routine laboratory methods [20]. Sequencing libraries were prepared from the DNA samples using the Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, Santa Clara, CA, USA) and were 150bp paired-end sequenced on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA). For the validation cohort, sequencing libraries were prepared using the Agilent SureSelect Human All Exon V6 +UTR kit (Agilent Technologies, Santa Clara, CA, USA). Control cohorts were prepared using similar, well-described protocols and sequenced using 150bp paired-end end reads on Illumina high-throughput instruments at 100X (NovaSeq 6000 or HiSeq 4000) (Illumina, San Diego, CA, USA). For the validation cohort, blood samples were obtained from 156 patients diagnosed with NPC at or below 40 years of age from the National Cancer Centre Singapore (Supplementary Table S1). Unaffected control individuals comprised 9770 healthy Singaporean participants (SG10K_Health) [19]. All study participants provided written informed consent and the study was approved by the Institutional Review Boards at the study sites.

Whole-Exome Sequencing
For the discovery cohort and their family members, genomic DNA was extracted from peripheral blood mononuclear cells using routine laboratory methods [20]. Sequencing libraries were prepared from the DNA samples using the Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, Santa Clara, CA, USA) and were 150bp paired-end sequenced on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA). For the validation cohort, sequencing libraries were prepared using the Agilent SureSelect Human All Exon V6 +UTR kit (Agilent Technologies, Santa Clara, CA, USA). Control cohorts were prepared using similar, well-described protocols and sequenced using 150bp paired-end end reads on Illumina high-throughput instruments at 100X (NovaSeq 6000 or HiSeq 4000) (Illumina, San Diego, CA, USA).

Germline Variant Discovery and Annotation
For each sequenced sample, read pairs were aligned to the human reference genome (b37) using BWA-MEM (v0.7.17) [21]. The reads were sorted and reads from multiple lanes were merged with SAMtools (v1.9) [22]. PCR duplicates were flagged for filtering downstream using MarkDuplicates in GATK v4.1.9.0 [23]. Base quality recalibration was carried out and applied using GATK's BaseRecalibrator and ApplyBQSR. Subsequently, variant calling was performed with the GATK HaplotypeCaller producing a GVCF file for each sample. Joint genotyping was performed alongside GVCFs of the local control cohort with the GenotypeGVCFs function. Low-quality variants were removed, using the recommended hard filters from gnomAD v2. 1 [24].

Prioritisation and Filtering of Variants
Potential NPC variants mutated in two or more affected patients were first selected from the jointly genotyped variants. Rare variants were selected by filtering for variants with minor allele frequency (MAF) less than 1% in both the gnomAD non-cancer (EAS) and the local control cohorts. Then, pathogenic variants were selected from nonsynonymous variants with CADD v1.3 phred score greater than 20. This stringent CADD threshold represents the top 1% of CADD-predicted pathogenic variants. Loss-of-function variants (frameshift insertions and deletions, stop-gains, stop-losses, or start-losses) were retained. Finally, rare pathogenic variants in known cancer genes were selected by choosing variants in genes appearing in at least two of the following cancer gene databases or literature sources: Network of Cancer Genes (NCG) 6.0 [33], COSMIC Cancer Gene Census v94 (CGC) [34], germline cancer predisposition genes [17], cancer driver genes [35], or cancer driver genes inferred from nucleotide context [36].

Case-Control Association Analysis
Principal component analysis (PCA) was performed to verify that participants with NPC and unaffected local control participants are of a similar genetic ancestry (Supplementary Figure S1). PCA was done using SNPRelate using default parameters [37]. Case-control association analysis was performed with variants in known cancer genes by comparing their allele frequency in the discovery cohort with that in the local control cohort and gnomAD non-cancer (EAS) (n = 9626). Variants not reported in gnomAD were assumed to have zero allele count with a linearly interpolated allele number, if they were within 300 nucleotides of a valid gnomAD variant. Variants which were significantly more common (FDR-adjusted p-value less than 0.10) in both comparisons were selected.
Case-control association analysis was repeated using a validation cohort of 156 NPC patients from Singapore, and the SG10K_Health control cohort (release 5.3) comprising of 9770 Chinese, Malay, and Indian healthy volunteers from Singapore [19].

Segregation Analysis
Germline variants from family members of discovery cohort probands were called using DRAGEN v3.8.4 with hg38 as reference genome [38]. Variants were filtered for quality control using default DRAGEN filters, then lifted over to hg19. Variant annotation and filtering for pathogenicity was performed using the same methods as described for the discovery and local control cohorts.

Gene-Based Burden Testing
Gene-based burden testing was used to compare the proportion of affected patients with rare pathogenic variants in known cancer genes in the discovery cohort versus local control cohort. Differences in sequencing coverage were controlled for by setting an average per-cohort minimum read depth cut-off. A minimum average cut-off of 25.1 reads per sample in the local control cohort was chosen to balance between test validity, as quantified by QQ-plot R 2 , and the number of variants to be filtered (Supplementary Figure S2A). PCA covariates were not included in the test as they did not appear to improve the validity of the test (Supplementary Figure S2B). The test was performed using the combined multivariate and collapsing test implemented by EPACTS' emmaxCMC [39]. To verify variants identified in 17 genes prioritised from variant-based analysis, the gene-based burden test was also performed on 16 probands and their 38 family members for all pathogenic variants.

Pathway Analysis
The gene-based burden test on discovery and local controls was repeated with the same parameters, without filtering for known cancer genes. Then, 559 genes with p < 0.05 were analysed using QIAGEN Ingenuity Pathway Analysis (IPA) (QIAGEN, Redwood City, CA, USA) for enriched pathways. Enrichment p-values for canonical pathways were calculated using the right-tailed Fisher's Exact Test.

Variant Quality Checks with Integrative Genomics Viewer (IGV)
Low-quality variants in both variant-and gene-based results were identified by checking their alignments in IGV [40] (Supplementary Figure S3). For the variant-based results, the problematic variants were removed from the list of results. For the gene-based results, the gene-based tests were re-run with the exclusion of the problematic variants.

Statistical Analysis
Variant-based case-control analyses were performed using a two-tailed Fisher's Exact Test [41]. Gene-based burden tests were performed via the combined multivariate and collapsing test using EPACTS emmaxCMC [39]. p-values were corrected for multiple testing to reduce using the Benjamini-Hochberg method to reduce the false discovery rate [42].

Variant Filtering
The 119 discovery cohort patients and 1337 local controls were jointly genotyped using GATK ( Figure 1). In total, 1,680,087 variants in both the discovery cohort or local controls passed the filtering criteria of excess heterozygosity (as in the expected Hardy-Weinberg equilibrium), read depth, allele balance and genotype quality. Of these, 272,536 variants were recurrent, being present in two or more cases. After filtering for variants with minor allele frequency (MAF) less than 1% in both gnomAD (EAS) and local control cohorts with CADD v1.3 PHRED score larger than 20, frameshift insertions and deletions, stop-gains, stop-losses, or start-losses, a final list of 188 rare pathogenic variants belonging to genes in known cancer genes according to cancer gene databases (COSMIC, NCG) [33,34] or literature sources [17,35,36] were selected for further case-control association analysis.
A list of singleton variants, each present only in one case, are shown in Supplementary Table S2. These variants were excluded from the variant-based tests, but not the gene-based tests.

Variant-Based Case-Control Association Analysis
We performed case-control association analysis on 188 rare pathogenic variants in known cancer genes, comparing their allele frequencies in our discovery case cohort versus both local controls and gnomAD (EAS). Of these 188 variants, we shortlisted 17 variants, all of which were non-synonymous SNVs with a CADD PHRED score greater than 20, in 17 cancer-associated genes with substantially higher allele frequencies in patients with NPC as compared to both local control and gnomAD cohorts (Tables 1 and S3). These include variants in genes encoding FANCE, a subunit of the Fanconi Anaemia (FA) nuclear complex; NKX2-1, a transcription factor and negative regulator of the NF-κB signalling pathway [43]; and other recognized oncogenes JAK2, PRDM16, BMPR1A and tumor-suppressor genes KMT2C, FAT4, and LRP1B [34] (Supplementary Tables S3 and S4). A plot showing the frequency of these 17 variants, together with the age group and family history for each case, is shown in Supplementary Figure S4. An independent case-control association was repeated for these 17 variants, using a validation cohort of early-onset NPC patients (n = 156) and healthy controls from SG10K_Health (n = 9770). Five of the 17 variants, in JAK2, PRDM16, LRP1B, NIN, and NKX2-1, were also present in the validation cohort. All five variants were more common in patients with NPC compared to the SG10K_Health controls (Table 2).

Gene-Based Burden Testing Shows Increased Mutation Burden for NKX2-1 and FANCE
To determine if the same case-control associations are reflected at the gene-level, we performed gene-based burden testing on rare pathogenic variants in known cancer genes, comparing the germline mutation burden in discovery cases versus local controls (Figure 1). Gene-based burden testing results showed an association between NKX2-1 and NPC risk, as two of 119 cases (1.7%) but none of the 1337 controls had variants in NKX2-1 (FDR-adjusted p = 0.0144) ( Table 3).
For 16 probands from the discovery cohort, DNA samples were available from two NPC-affected family members, and 36 unaffected family members. The gene-based burden test was used to verify variants from the variant-based association test results. In this test, FANCE had a significantly larger germline mutation burden, where two affected individuals (11.1%) but no unaffected individuals had a FANCE variant (rs141551053) ( Table 4). The two affected individuals with this FANCE variant are siblings: the proband A0118 and affected brother A0118-4. A0118 has three other siblings, all of whom are unaffected by NPC and did not carry this FANCE variant.

Differential Gene Expression in Primary Tumor Versus Normal Tissue
We further checked the gene expression in primary tumor versus normal tissue for genes JAK2, PRDM16, LRP1B, NIN, and NKX2-1 with variants repeatedly enriched in variant-based analysis, and, FANCE, enriched in gene-based burden testing, using the TCGA database (Supplementary Figure S5). Except for JAK2 and NKX2-1, all genes were differentially expressed in primary head and neck tumors as compared to normal tissue. All six genes were differentially expressed in primary tumor versus normal tissue expression, in at least one of five commonly diagnosed cancers [1].

Pathway Analysis Suggests the Involvement of the Endocytosis and Immune-Modulating Pathways
We performed pathway analysis using significant genes in a gene-based burden test of variants between discovery case and local control cohorts. The top ten canonical pathways based on the significance of enrichment p-values are shown in Table 5 and Supplementary Figure S6. Two possible mechanisms for EBV entry into the cell, the clathrinand caveolar-mediated endocytosis signalling pathways [44,45], were enriched for mutation burden (p = 0.0274 and p = 0.0441 respectively). Immune-modulating pathways were also significantly enriched, particularly GM-CSF signalling, "JAK1 and JAK3 in γc cytokine signalling", and IL-15 production pathways (p = 0.0092, p = 0.0291, and p = 0.0357, respectively). a IPA enrichment p-value and IPA overlap tests for over-represented biological pathways in the list of genes with significantly different germline mutation burden in cases as compared to controls. The IPA enrichment p-values were calculated using Fisher's exact test. IPA overlap represents the number of genes in our dataset over the total number of genes that make up the pathway in the Ingenuity Knowledge Base. b Odds ratio and odds ratio p-value tests if case or control individuals are over-represented in the list of individuals with any rare pathogenic variant in each pathway.
Odds ratios and p-values were also calculated for NPC patients and controls with variants in any genes in each of the implicated pathways. Our results show that all ten pathways were significantly enriched in our dataset (OR = 3.1-70.4, p < 0.05) ( Table 5).

Variants and Genes Implicated in Prior Literature
We were able to replicate the results of six variants and four genes previously implicated in NPC by previous studies [11][12][13]15,16,18,[46][47][48][49][50][51]. Four common SNVs in GABBR1, encoding a subunit of the GABA receptor, which were previously associated with NPC, were replicated in our cohort at p = 0.05 [46,47]. We also replicated two variants in the transcription regulator gene BRD2 [18,48], though this variant was not included in our primary variant-based case-control association test as BRD2 did not satisfy the criterion of being a known cancer gene in cancer gene databases or the literature sources (Supplementary  Table S5). Four genes previously associated with NPC, BRD2, CTNNB1, TRMT10B, and IRF5, were also replicated in our cohort's gene-based burden test (p < 0.0394) [11,15,18] (Supplementary Table S6).

Discussion
To identify novel germline variants predisposing one to NPC, we first examined WES of 119 Singaporean patients who had early-onset NPC and/or a family history of NPC, followed by an independent set of 156 early-onset NPC patients. Here, we discovered an initial list of 17 unique variants in 17 genes associated with NPC, five of which (in JAK2, PRDM16, LRP1B, NIN, and NKX2-1) were also associated with NPC in the validation case cohort.
Of these 17 variants, two variants were previously reported in cancer-related studies. The BMPR1A nonsynonymous SNV (rs55932635) was identified in one of 56 BRCAnegative breast cancer patients in Puerto Rico [52], while the JAK2 nonsynonymous SNV (rs200018153) was found in one of 1487 acute myeloid leukaemia (AML) patients in the United States [53]. The remaining 15 variants, to the best of our knowledge, have not been implicated with cancer, based on a literature search of their RefSNP numbers (Table 3).
We observed that some variants were absent from control cohorts. For example, the APOB variant was absent from all three control cohorts (Tables 1 and 2). Additionally, the FAT3 and ZEB1 variants were absent from two of three control cohorts. In addition, 12 of the 17 variants have not been reported in ClinVar. This could be due to the underrepresentation of Asian variants in the ClinVar database and underscores the necessity for more extensive sequencing of genomes from Asian populations.
Notably, in a family with five siblings, the FANCE nonsynonymous SNV (rs141551053) was detected in two affected siblings but not in three unaffected siblings. Furthermore, gene-based burden testing on 16 probands and 38 family members also showed an association between the FANCE gene and NPC. FANCE encodes a critical subunit of the Fanconi Anaemia (FA) nuclear complex [54], which facilitates DNA repair, replication, and chromosome segregation. The rs141551053 SNV alters an amino acid (P445S) in its Cterminal domain. Heterozygous mutations in FA genes have been associated with various cancer predispositions, including breast, ovarian, brain, and soft tissue cancers [54]. While heterozygous FA gene mutations have not been directly linked to NPC, patients with the autosomal recessive FA syndrome have a much higher risk of developing head and neck squamous cell carcinomas [55].
NKX2-1 was identified from both variant-based case-control association analysis and gene-based burden testing of the discovery cohort, suggesting that it is associated with NPC predisposition. NKX2-1 is a homeobox transcription factor expressed in the adult thyroid, lung, bronchus, and nasopharynx [56]. In lung adenocarcinoma, NKX2-1 has a dual context-dependent tumour-suppressive or -promoting role [57]. Although NKX2-1 has not been linked to NPC predisposition, its genomic loci on chromosome arm 14q form a commonly deleted locus in NPC [51,[58][59][60][61]. Furthermore, NKX2-1 has been observed to downregulate IKKβ in lung adenocarcinoma [43]. IKKβ is an activator of the NF-κB signaling pathway, and the NF-κB signaling pathway is a commonly activated pathway in NPC [62].
Finally, we identified a JAK2 nonsynonymous SNV c.1174G>A (rs200018153) that was enriched in our discovery cohort. This variant was also detected in an independent validation cohort but did not reach statistical significance. Nonetheless, this JAK2 variant had the highest allele frequency of all our 17 variants, in both discovery and validation cohorts. JAK2 is a non-receptor tyrosine kinase, and plays an important role in regulating the JAK/STAT signalling pathway that controls cell proliferation, differentiation, survival, and cytokine-mediated immune responses [63]. Mutations in JAK2, which lead to the hyperactivation of the JAK/STAT pathway, have been observed in many cancer types [64][65][66]. The rs200018153 SNV alters an amino acid (V392M) in the Src homology 2 (SH2) domain of JAK2. The most frequent and well-studied JAK2 mutation is JAK2-V617F, which has been reported to be associated with predisposition to myeloproliferative disorders [67,68]. Other JAK2 somatic mutations found in exon 12, R683 and T875, have also been linked to hematological malignancies [69][70][71]. Although little has been reported on JAK2 mutations in NPC, two research groups have identified amplifications in the JAK2 gene that are responsible for promoting cell proliferation and cell signalling in NPC [50,72]. Moreover, JAK2 has been found to be overexpressed in NPC tissues and high JAK2 expression correlates with poor clinical outcome [73].
There is conflicting evidence on whether the cellular entry of EBV is facilitated by clathrin-mediated or caveolar-mediated endocytosis, or both [44,45]. In our pathway analysis, both clathrin-and caveolar-mediated endocytosis signalling pathways were enriched for germline mutation burden in NPC. We also found enrichment in important immunemodulating pathways: the GM-CSF signalling pathway, which has been implicated in the recruitment of tumor-associated macrophages in NPC [74]; IL-15 production and its downstream JAK1/JAK3-related γc cytokine signalling pathways, which modulate both antiviral and antitumor effects [75] (Table 5).
In recent years, various genomic approaches have been used to interrogate the genetic landscape of NPC susceptibility. For example, SNP genotyping studies have reported germline polymorphisms in the MHC class I and nearby genes such as GABBR1 and HLA-F, where such polymorphisms correlated with NPC risk [46,47]. A more recent exome-wide association study of 31,870 common SNPs involving 5553 patients with NPC has also identified a novel germline polymorphism in RPA1 (rs1131636) conferring tumor progression and therapeutic resistance in NPC, which ultimately affects the patient's survival [16]. BRCA2 germline alterations related to homologous recombination deficiency were found to be associated with poor clinical outcome in patients with NPC [76]. Recently, WES has been increasingly applied to the discovery of cancer predisposition genes associated with NPC. A Japanese study has identified familial NPC-predisposing germline mutations in MLL3 (KMT2C) in three family members of Italian descent [14]. Another WES analysis of germline DNA has also found NPC-associated rare variants in several genes, including BRD2, a chromatin remodelling gene, in Taiwanese NPC families and sporadic NPC cases [48]. Furthermore, two WES studies have revealed germline variants suggesting MST1R as a candidate susceptibility gene for NPC [15,77]. Along with MST1R, Dai and colleagues from Hong Kong identified several candidate NPC-susceptibility genes, including TRMT10B. In this study, we evaluated the prevalence of these variants and genes in our NPC discovery cohort, replicating the association between NPC and six variants in BRD2 and GABBR1, and four genes (BRD2, CTNNB1, TRMT10B, and IRF5).
As a study of rare variants, our analysis is limited by its sample size. This study had a discovery cohort of 119 NPC cases, and a validation cohort of 156 cases. Further, larger association studies will be necessary to detect rarer variants with greater certainty. For example, some of the variants and genes from previous NPC studies were present in our cohorts but failed to reach statistical significance (p < 0.05), perhaps due to an insufficient sample size. Additional studies in larger cohorts of different genetic ancestry and meta-analyses are required to assess the frequency and importance of these NPC susceptibility variants.

Conclusions
In summary, we identified 17 germline variants in 17 genes that are associated with NPC predisposition. Six of these variants in six genes JAK2, PRDM16, LRP1B, NIN, NKX2-1 and FANCE were further supported by at least one of four additional tests: an additional variant-based case-control association analysis on an independent validation cohort, genebased testing on the original discovery cohort; gene-based testing and co-segregation analysis of family members of the discovery cohort. Our study provides new insights into the genetic susceptibility to NPC, which will facilitate further investigations with the potential to translate the findings into future clinical practice. This warrants further functional characterization of the FANCE, NKX2-1, and JAK2 variants and elucidation of the mechanisms for the role of these genes in NPC development.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14153680/s1, Figure S1: Principal component analysis (PCA) plot of germline genotypes for the discovery and local control cohorts, with genotypes from the 1000 Genomes Project (1KGP) as reference; Figure S2: QQ-plot R 2 values for different read depth cut-offs and number of PCA covariates; Figure S3: Representative IGV alignments of problematic variants manually removed via IGV checks; Figure S4: Oncoplot of 17 variants in 17 prioritized candidate genes, showing the frequency of each variant; Figure S5: Gene expression for six result genes in primary tumors versus normal tissue in six cancers; Figure S6: Top 10 canonical pathways identified by IPA Pathway Analysis; Table S1: Demographic and family history characteristics of NPC patients in the discovery and validation cohorts; Table S2: Table of rare predicted pathogenic variants in prioritised genes found only in single patients; Table S3: Table of 17 prioritised genes with variants in the NPC discovery cohort (n = 119), annotated with supporting information from cancer gene databases; Table S4: Pathogenicity of 17 variants in 17 known or candidate cancer genes using in silico prediction tools and database classifications; Table S5: Variants associated with NPC in the prior literature; Table S6: Genes associated with NPC in the prior literature.