Rare Germline Variants in Chordoma-Related Genes and Chordoma Susceptibility

Simple Summary Chordoma is an extremely rare bone cancer that has not been fully characterized and few risk factors have been identified, highlighting the need for improving our understanding of the disease biology. Our study aims to identify chordoma susceptibility genes by investigating 265 genes involved in chordoma-related signaling pathways and other biological processes on germline DNA of 138 chordoma patients of European ancestry compared to internal control datasets and general population databases. Results were intersected with whole genome sequencing data from 80 skull-base chordoma patients of Chinese ancestry. Several rare loss-of-function and predicted deleterious missense variants were enriched in chordoma cases in both datasets, suggesting a complex model of pathways potentially involved in chordoma development and susceptibility, warranting further investigation in larger studies. Abstract Background: Chordoma is a rare bone cancer with an unknown etiology. TBXT is the only chordoma susceptibility gene identified to date; germline single nucleotide variants and copy number variants in TBXT have been associated with chordoma susceptibility in familial and sporadic chordoma. However, the genetic susceptibility of chordoma remains largely unknown. In this study, we investigated rare germline genetic variants in genes involved in TBXT/chordoma-related signaling pathways and other biological processes in chordoma patients from North America and China. Methods: We identified variants that were very rare in general population and internal control datasets and showed evidence for pathogenicity in 265 genes in a whole exome sequencing (WES) dataset of 138 chordoma patients of European ancestry and in a whole genome sequencing (WGS) dataset of 80 Chinese patients with skull base chordoma. Results: Rare and likely pathogenic variants were identified in 32 of 138 European ancestry patients (23%), including genes that are part of notochord development, PI3K/AKT/mTOR, Sonic Hedgehog, SWI/SNF complex and mesoderm development pathways. Rare pathogenic variants in COL2A1, EXT1, PDK1, LRP2, TBXT and TSC2, among others, were also observed in Chinese patients. Conclusion: We identified several rare loss-of-function and predicted deleterious missense variants in germline DNA from patients with chordoma, which may influence chordoma predisposition and reflect a complex susceptibility, warranting further investigation in large studies.


Introduction
Chordoma is a rare, malignant bone tumor that occurs in the axial skeleton at cranial, spinal and sacral sites. Chordoma tumors are considered slow growing; however, local recurrences are frequent and treatment options are limited, particularly for those with advanced disease, highlighting the need for improving our knowledge of the disease biology and the discovery of novel druggable targets [1][2][3]. The clinical progression of skull base chordoma is highly variable [4] and there are no validated clinical or molecular prognostic markers.
Chordoma is hypothesized to originate from remnants of the notochord, which is a mesodermal structure in the embryo and signal tissues for organization and differentiation. Genes implicated in chordoma formation include the T-brachyury gene (TBXT), which is required for differentiation of the notochord and formation of mesoderm during posterior development [5].
TBXT is the most relevant susceptibility gene identified in chordoma, with germline TBXT duplication conferring susceptibility to familial chordoma [6] (Yang XR et al. 2009) and single-nucleotide polymorphisms (SNPs) in TBXT associated with chordoma risk in both sporadic and familial chordoma [7,8]. The role of TBXT in disease pathogenesis has also been shown to be critical: TBXT expression is considered as a diagnostic marker for chordoma [9,10] and copy number gains of TBXT have been identified in chordoma tumors [11,12]. Further, chordoma growth in cell models could be inhibited by TBXT silencing [12][13][14].
Despite some progress, etiologic factors and genetic susceptibility of chordoma are not well understood and have not been extensively studied. The goal of this study was to identify and characterize rare and pathogenic germline variants in genes of interest for chordoma pathogenesis through next generation sequencing of 138 chordoma cases of European ancestry. Genes of interest from the initial evaluation were also examined in a set of 80 skull-base chordoma cases from an Asian population for replication. Due to the critical relevance of TBXT in the biological mechanisms of chordoma, we investigated germline variants in genes related to TBXT, as well as other genes involved in mesoderm and notochord development. We also evaluated genes that have been associated with chordoma tumor development, including several members of the SWI/SNF complex, PI3K/AKT/mTOR and Sonic Hedgehog pathways. Using this approach, we identified several potential chordoma susceptibility genes that warrant investigation in future studies.

Study Populations
The current study included 138 patients of European ancestry from the United States and Canada, processed using whole exome sequencing (WES). All diagnoses of chordoma were confirmed by reviewing pathologic slides or reports, medical records or death certificates.
We used WES data from 598 healthy Caucasian cancer-free unrelated individuals, from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) [15] and Cancer Prevention Study (CPS) as reference/controls for comparison with the European ancestry patient samples and for conducting rare variant burden tests. These controls were sequenced and analyzed using the same sequencing platform and ensemble variant calling pipeline used for the European ancestry chordoma cases.
We also analyzed germline whole genome sequencing (WGS) data from 80 Chinese skull-base chordoma patients to evaluate candidate genes identified from our main analysis of the European ancestry chordoma patients. In the Chinese patient cohort, all patients were diagnosed with skull base chordoma and underwent endoscopic endonasal surgeries at the Neurosurgery Department of Beijing Tiantan Hospital, Capital Medical University, between October 2010 and November 2017. All methods were performed in accordance with the relevant guidelines and regulations. This research has been approved by the National Cancer Institute (NCI) Institutional Review Board (IRB) (Protocols: 10CN188, 04/08/2020; 78C-0039, 01/08/2021) and the ethics committee of the Beijing Tiantan Hospital (IRB code: 2009-47, 20 December 2009).

Library Construction, Sequencing and Bioinformatics Analysis
For the European ancestry and Chinese cases, germline genomic DNA was extracted from saliva and peripheral blood. WES was performed at the Cancer Genomics Research Laboratory, National Cancer Institute (CGR, NCI). Briefly, SeqCAP EZ Human Exome Library v3.0 (Roche NimbleGen, Madison, WI, USA) was utilized for exome sequence capture. Exome sequencing was performed to a sufficient depth to achieve a minimum coverage of 15 reads in at least 80% of the coding sequence from the UCSC hg19 transcripts database.
After the exclusion of reads containing adapter contamination and low-quality nucleotides, the data were mapped to the reference human genome (UCSC hg19) using the Burrows-Wheeler Aligner software [16]. SAMtools [17], Picard and GATK [18] were used to sort aligned files and perform base quality recalibration, duplicate reads removal and local realignment to generate final BAM files for mutation calling.
Three variant callers, GATK HaplotyperCaller, UnifiedGenotyper and FreeBayes [19], were used to call germline variants. We included all target regions, as well as a 250 bp flanking region on each side. An ensemble variant calling pipeline was then implemented to integrate the analysis results from the above mentioned three callers. Subsequently, the ensemble variant calling pipeline that applies a support-vector machine (SVM) learning algorithm was used to identify an optimal decision boundary based on the variant calling results out of the multiple variant callers to produce a more balanced decision between false positives and true positives.
For the Chinese cases, WGS was carried out by the Novogene Corporation (Beijing, China) on the Illumina HiSeq X platform with an average depth of 41X. Variant calling, QC and filtering steps were conducted by CGR, NCI, using similar approaches as those used for WES of the European ancestry chordoma samples.
Gene prioritization: We assembled a set of 265 genes chosen from the published literature for their role in chordoma physiopathology. The following functional categories were included: TBXT super-enhancer associated genes and chordoma essentiality genes, which are essential for the proliferation and survival of cancer cells in CRISPR-CAS9 screens, notochord related genes, key genes in mesoderm commitment pathway, SWI/SNF complex, PI3K/Akt/mTOR pathway and Sonic Hedgehog pathway. Variants were filtered based on quality control measures, frequency of minor allele in populations and pathogenicity criteria.
Variants quality control: Filtering of WES variants was based on the following criteria: Variants flagged with our pipeline quality control metric (CScorefilter), read depth <10, ABHet <0.2 or >0.8, or called by only one of the three variant callers used were excluded. Only non-synonymous variants, including frameshift, stop/gain, inframe deletion or insertion, missense and splicing site variants, were studied.
Frequency and specificity: Variants were excluded if both of the following criteria were met: (1) they had a minor allele frequency (MAF) of >0.001 in any of the consulted population databases (the 1000 Genomes Project, Exome Sequencing Project (ESP6500) and Exome Aggregation Consortium (ExAC) in European population); (2) they were observed in >2 families from an in-house database (CGR, NCI) of~2000 exomes in~1000 cancer-prone families, excluding chordoma cancer families.
Pathogenicity evidence and in silico analysis: Variants passing quality control and frequency filters in both datasets were evaluated for pathogenicity. Variants were studied further if classified as one of the following: (1) High impact variants (frameshift indels, stop gain/loss, or known splice sites). The first two in silico algorithms are ensemble prediction scores that incorporate results from nine algorithms (SIFT, PolyPhen-2, GERP ++, Mutation Taster, Mutation Assessor, FATHMM, LRT, SiPhy and PhyloP) and allele frequency [20]. (3) Variants classified as pathogenic (P) or likely pathogenic (LP) by ClinVar [21]. (4) Disease-causing mutation (DM) by HGMD [22]. Details of the criteria for classification of pathogenicity are shown in Supplementary Table S1 [23].
Similar filtering as described for the European ancestry dataset was performed in the Chinese dataset processed with WGS. For this latter dataset, only protein-coding genes, for which potentially pathogenic variants were observed in the European ancestry dataset, were considered. Variants were excluded based on MAF of >0.001 in any of the databases in any population, including East Asian. Pathogenicity evidence and in silico analysis were performed as described above for the European ancestry dataset.

Rare Variant Burden Test
We conducted rare variant burden tests for genes with rare and potentially pathogenic variants identified from the European ancestry cohort in 138 chordoma patients and 598 non-cancer control samples from the PLCO and CPS-II studies with European ancestry and processed at CGR, NCI. We used SKAT-O statistics [24], which is a linear combination of the burden test (aimed to test effect size of variants with the same direction in cases and controls) and variance component test (designed to test effect size of variants with different directions in cases and controls).

Results
The main analysis included 138 European ancestry patients; the average age at diagnosis was 46.9 years (range 7-78) and the ratio of male to female patients was 0.7. The vast majority of patients (97%) had classic chordoma histology. The chordoma site distribution was 55.7% skull base, 23% spinal and 20.3% sacral. The Chinese cases were all skull base chordoma, with a mean age of 44.7 (7-79) years, the majority being male (62.5%) and having classic chordoma histology (80%).
We assembled a set of 265 genes chosen from the published literature for their role in chordoma physiopathology, including TBXT super-enhancer associated genes and chordoma essentiality genes in CRISPR-CAS9 screens (26), notochord related genes (21), mesoderm commitment pathway (166), SWI/SNF complex (18), PI3K/Akt/mTOR pathway (33) and Sonic Hedgehog pathway (28). Some of these genes are present in more than one category (Supplementary Table S2). Figure 1 shows the stepwise pipeline used for selecting, prioritizing and filtering the genes and variants for the evaluation.
We identified 34 potentially pathogenic variants (7 loss of function and 27 missense variants) in 2 notochord development genes, 4 Sonic Hedgehog pathway genes, 16 mesoderm commitment pathway genes, 4 PI3K/AKT/mTOR pathway genes, 2 SWI/SNF complex pathway genes, 2 TBXT super-enhancer genes and 1 driver gene (Table 1). These variants were identified in 32 of 138 European ancestry patients (23%). Tables 1 and 2 show patients' characteristics, the type, location and frequencies of these variants, evidence for pathogenicity and the involved pathways/biologic processes. Supplementary Tables  S3 and S4 show details of the variants identified in the European ancestry and Chinese datasets, respectively. We identified 34 potentially pathogenic variants (7 loss of function and 27 missense variants) in 2 notochord development genes, 4 Sonic Hedgehog pathway genes, 16 mesoderm commitment pathway genes, 4 PI3K/AKT/mTOR pathway genes, 2 SWI/SNF complex pathway genes, 2 TBXT super-enhancer genes and 1 driver gene (Table 1). These variants were identified in 32 of 138 European ancestry patients (23%). Tables 1 and 2 show patients' characteristics, the type, location and frequencies of these variants, evidence for pathogenicity and the involved pathways/biologic processes. Supplementary  Tables S3 and S4 show details of the variants identified in the European ancestry and Chinese datasets, respectively.   Several variants were classified as DM/P by HGMD/ClinVar, such as GDF3 (rs146973734), AKT1 (rs397514644) and EXT2 (rs138495222). Many of the variants were either extremely rare in the general population or absent from the internal control samples and population databases such as variants identified in TBXT, ATP6V1B2 and COL2A1.
Burden test: To identify genes with a higher genetic burden of rare variants, we conducted a rare variant burden test of the 265 genes by comparing cases to controls of the same ancestry and sequenced using similar analytics and pipelines. Burden tests were performed in several different ways: (1) analyzing all rare variants in a targeted gene; (2) examining only rare pathogenic variants in a targeted gene; (3) examining all genes in a process or pathway. The small number of cases carrying variants in the same genes limited the power to evaluate statistical significance for individual genes. As a result, none of the examined genes showed statistical significance after correction for multiple testing (Supplementary Table S5). However, fourteen rare pathogenic variants were found only in chordoma cases and were absent from all 598 controls (variants in these genes: AKT1, ATP6V1B2, DEPTOR, FOXA2, HHIP, NFE2L2, PAX6, PDK1, PRKACA, PTCH1, SMARCB1, SOX21, SOX9 and TBXT). Evaluation of the biologic processes or pathways showed suggestive evidence for association only for the notochord-related genes (SKATO P = 0.06; Burden P = 0.03).
We investigated the 31 genes (see Table 1) identified in the main analysis in an independent dataset of 80 Chinese chordoma cases. After applying the same filtering criteria as for the main analysis (see methods), we identified rare pathogenic variants in 11 genes in 18 of 80 (22.5%) Chinese patients. Most variants were very rare, with MAFs ranging between 1.74 E-04 and zero in all control databases (Table 2). Although most genes/variants were observed in only a single Chinese chordoma case, multiple potentially pathogenic variants were observed in COL2A1, LRP2, TCF7L1 and TSC2. In particular, the two variants in TSC2 were observed in two (g. 2134572) or three (g. 2121610) different patients, respectively. Further, the TSC2 variant observed in three Chinese chordoma cases was classified as DM by HGMD (Table 2).

Discussion
In this study, we investigated the role of rare germline variants in genes involved in chordoma related processes/pathways in sporadic chordoma patients from a European ancestry and a Chinese population. We identified a number of rare loss-of-function and predicted deleterious missense variants that were enriched in chordoma cases, suggesting some of these genes may contribute to chordoma susceptibility.
Genes with rare germline variants enriched in chordoma cases were involved in signaling pathways and physiopathological processes that are affected in chordoma at the somatic level (e.g., notochord development, PI3K/AKT/mTOR and SWI/SNF pathways), which underscores the importance of pathway/process-level analysis of germline alterations to identify potentially novel genes. Increasing evidence suggests that specific germline variants might determine which somatic events and mutations are generated and selected in cancer cells during tumorigenesis, such as mutational signatures, allele-specific copy number changes, or altered signaling pathways [25].
Overall, we report that~23% of chordoma cases from two independent populations carried rare germline variants that are potentially pathogenic in selected genes. Eleven genes were shared by the two datasets, including COL2A1 (notochord), EXT1 (mesoderm development), PDK1 (PI3K/Akt/mTOR) and LRP2 (Sonic Hedgehog) and thus worthy of additional studies.
Some of the genes identified in this study have previously demonstrated important biological functions relevant to chordoma development. For example, a recent study found that TBXT is associated with a 1.5 Mb region containing "super-enhancers" and is the most highly expressed super-enhancer-associated transcription factor [26]. ATP6V1B2 was among the genes that were considered as both super-enhancer-associated and essential for chordoma cell viability. sgRNA-mediated TBXT repression experiments have been shown to reduce the expression of this gene and the direct binding of TBXT at the ATP6V1B2 locus has been demonstrated [26]. COL2A1 was also shown to be down-regulated by TBXT expression in a clival chordoma cell line (UM-Chor1) [26]. Interestingly, Col2a1-null mice lack intervertebral discs and cannot remove the notochord [27], suggesting the potential importance of this gene in chordoma development. We also identified rare variants in SHH, a member of the Sonic Hedgehog pathway, which is secreted by fetal notochord and participates in vertebrate patterning of the neural tube during embryonic development via binding to its ligand PTCH1. Interestingly, Shh-expressing notochord cells remain in the vertebral column after embryonic development ends, resembling notochord remnants [28] and suggesting that it may cause transformation of this remnant tissue.
The SMARCB1 gene is a critical component of the SWI/SNF chromatin-remodeling complex which antagonizes the histone methyltransferase EZH2, a molecule that is being targeted by several inhibitors in clinical trials [29]. Inactivation of SMARCB1 has been described as a critical event in various tumors, including malignant rhabdoid tumors and epithelioid sarcoma [30]. Loss of SMARCB1 expression in chordoma has also been identified in poorly differentiated chordomas [31]. Recently, other components of SWI/SNF complex have also been implicated in chordoma pathogenesis, such as PBRM1 as a potential driver gene for chordoma [11].
PI3K/Akt/mTOR signaling in the context of chordoma was first suggested after numerous reports of chordoma tumors in individuals with tuberous sclerosis complex (TSC). Chordoma also occurs in association with TSC, an autosomal dominant neurocutaneous syndrome characterized by abnormal tissue growth in multiple organ systems caused by inactivating germline mutations in either TSC1 or TSC2 [32]. None of the patients in our study with TSC variants have clinical manifestations of TSC. Nevertheless, mutations in PI3K/AKT/mTOR genes have been reported in chordoma tumors with potential therapeutic relevance [11].
The strengths of our study include the careful and comprehensive literature review to select chordoma candidate genes, the evaluation of germline rare variants in two independent chordoma sequencing datasets and the inclusion of controls without cancer from the same population ancestry as our European ancestry chordoma cases that were processed using the same platforms and bioinformatics pipeline. However, we cannot rule out the possibility that other genes, including cancer-predisposing genes, may also play a role in susceptibility.
The main limitation of our study is the relatively small number of chordoma cases in each dataset to statistically test case-control differences at an individual gene level, with most variants found only in single chordoma cases. In addition, we were unable to assess family history for most cases. There was also limited representation of histological subtypes for statistical evaluation of association with the variants identified.
In this study, approximately 23% of patients with chordoma from two different population had rare pathogenic/likely pathogenic variants in various biological processes, suggesting a complex model of pathways potentially important for susceptibility and development, including notochord development, PI3K/AKT/mTOR, Sonic Hedgehog and SWI/SNF complex. Future large studies, particularly at the consortium level, are needed to follow up on our findings for a better understanding of genetic susceptibility of this extremely rare cancer.

Conclusions
Our study searched for germline variants in signaling pathways and other biological processes previously identified in the disease's pathogenesis in patients from two independent populations. We identified rare loss-of-function and predicted deleterious missense variants enriched in chordoma cases, suggesting a complex model of pathways potentially involved in chordoma development and susceptibility. Further evaluation of the identified candidate genes is needed to determine their importance in chordoma risk.   Data Availability Statement: According to US NIH policy, the exome sequencing data for the European ancestry dataset will be released to the database of Genotypes and Phenotypes (dbGAP).