Potential Involvement of NSD1, KRT24 and ACACA in the Genetic Predisposition to Colorectal Cancer

Simple Summary Methods used for the identification of hereditary cancer genes have evolved in parallel to technological progress; however, much of the genetic predisposition to cancer remains unexplained. A new in silico method based on Knudson’s two-hit hypothesis recently identified ~50 putative cancer predisposing genes, but their actual association with cancer has not yet been validated. In our study, we aimed to assess the involvement of these genes in familial/early-onset colorectal cancer (CRC) using different lines of evidence. Our results indicated that most of those genes were not associated with a genetic predisposition to CRC, but suggested a possible association for NSD1, KRT24 and ACACA. Abstract The ALFRED (Allelic Loss Featuring Rare Damaging) in silico method was developed to identify cancer predisposition genes through the identification of somatic second hits. By applying ALFRED to ~10,000 tumor exomes, 49 candidate genes were identified. We aimed to assess the causal association of the identified genes with colorectal cancer (CRC) predisposition. Of the 49 genes, NSD1, HDAC10, KRT24, ACACA and TP63 were selected based on specific criteria relevant for hereditary CRC genes. Gene sequencing was performed in 736 patients with familial/early onset CRC or polyposis without germline pathogenic variants in known genes. Twelve (predicted) damaging variants in 18 patients were identified. A gene-based burden test in 1596 familial/early-onset CRC patients, 271 polyposis patients, 543 TCGA CRC patients and >134,000 controls (gnomAD, non-cancer), revealed no clear association with CRC for any of the studied genes. Nevertheless, (non-significant) over-representation of disruptive variants in NSD1, KRT24 and ACACA in CRC patients compared to controls was observed. A somatic second hit was identified in one of 20 tumors tested, corresponding to an NSD1 carrier. In conclusion, most genes identified through the ALFRED in silico method were not relevant for CRC predisposition, although a possible association was detected for NSD1, KRT24 and ACACA.


Introduction
Estimates indicate that~4% to 15% of all tumors, depending on tumor type, are considered hereditary [1], with genetic alterations being the key determinants of cancer development. Methods used for the identification of hereditary cancer genes have evolved in parallel to technological progress. Classical linkage analysis of large pedigrees followed by positional cloning, and the more recent use of high-throughput sequence capture methods and next generation sequencing technologies, have allowed for the discovery of hereditary cancer genes. Uncovering cancer-predisposing genes improves the molecular diagnosis and personalized surveillance of mutation carriers based on the risks associated with the corresponding gene [2,3].
In 2018 Park et al. published a new in silico method (ALFRED, for Allelic Loss Featuring Rare Damaging) that applies the Knudson's two hit hypothesis to identify putative cancer-predisposing genes, and applied it to approximately 10,000 tumor exomes [5]. Specifically, they performed a pan-cancer analysis in which they measured the enrichment of rare (MAF < 0.1% according to ExAC) damaging (stop-gain, frameshift, canonical slice-site, or missense predicted pathogenic) germline variants in samples with putative somatic loss of heterozygosity (LOH) for a total of 2983 genes carrying at least five rare (predicted) damaging germline variants. The authors identified 13 genes individually enriched for rare (predicted) damaging variants in tumors. Specifically, five of those genes (BRCA1, ATM, BRCA2, NSD1 and TPCN2) were enriched for germline variants in cases compared to controls. They estimated that germline damaging variants in the 13 proposed genes might explain~2.3% of the tumors included in The Cancer Genome Atlas (TCGA), which includes 17 individual cancer types. In addition to the 13 genes identified at a false discovery rate of 20%, 12 more, including MLH1, were identified in the range of 20-50% false discovery rate, and 24 more genes in the range of 50-60%, making a total of 49 candidate genes for cancer predisposition.
Here we aimed to evaluate the actual involvement in CRC predisposition of the genes identified through the ALFRED in silico method.

Materials and Methods
We selected five of the 49 most enriched genes proposed by Park et al. based on specific criteria considered relevant for CRC predisposition, with the aim of identifying the best candidates for CRC. We next performed mutational screening of the selected genes in 736 unrelated patients with familial/early onset MMR-proficient CRC or polyposis, followed by co-segregation analyses in the relatives of variant carriers. We evaluated the mutational status of the selected genes in additional series of CRC patients with publicly available sequencing data to assess the enrichment of rare damaging germline variants in cases compared to controls (gene burden test). The workflow of the study is summarized in Figure 1.

Patients and Samples
The study included 736 patients (not related, and >99% of European origin): 465 familial/early onset MMR-proficient nonpolyposis CRC patients (Table S1), 177 patients with classic or attenuated adenomatous polyposis (Table S2), and 94 patients with serrated/hyperplastic polyposis (Table S3). The included familial/early-onset nonpolyposis CRC patients had been consecutively recruited through the clinical Hereditary Cancer Program of the Catalan Institute of Oncology (Spain), selected based on the absence of MMR deficiency, assessed by immunohistochemistry and/or microsatellite analysis, and on the absence of germline pathogenic variants in MUTYH (biallelic), NTHL1 (biallelic) or the exonuclease domains of POLE and POLD1.

Patients and Samples
The study included 736 patients (not related, and >99% of European origin): 465 familial/early onset MMR-proficient nonpolyposis CRC patients (Table S1), 177 patients with classic or attenuated adenomatous polyposis (Table S2), and 94 patients with serrated/hyperplastic polyposis (Table S3). The included familial/early-onset nonpolyposis CRC patients had been consecutively recruited through the clinical Hereditary Cancer Program of the Catalan Institute of Oncology (Spain), selected based on the absence of MMR deficiency, assessed by immunohistochemistry and/or microsatellite analysis, and on the absence of germline pathogenic variants in MUTYH (biallelic), NTHL1 (biallelic) or the exonuclease domains of POLE and POLD1.
Likewise, polyposis patients were consecutively recruited through the same hereditary cancer clinical program, and they were selected for the current study based on the absence of germline pathogenic variants in APC, MUTYH, POLE, POLD1, NTHL1 or MSH3 in the case of adenomatous polyposis patients, and on the absence of germline pathogenic variants in RNF43, NTHL1 or MSH3 in the case of serrated polyposis patients [6][7][8][9].
Patients provided written informed consent and the study received the approval of the IDIBELL Ethics Committee (PR073/12).
Genomic DNA from peripheral blood was extracted using the FlexiGene DNA kit (Qiagen, Valencia, CA, USA).

Germline Mutation Identification in Pooled Samples
The abovementioned 736 patients were screened for mutations in NSD1, HDAC10, KRT24, ACACA and TP63 using a combination of PCR amplification in pooled DNAs and targeted next generation sequencing, as previously described [10,11]. Eight DNA pools were generated by adding equimolecular quantities of each sample (48-96 samples per pool). Amplification of the genes' coding exons (+/− 20bp flanking regions) was performed in each pool, using Phusion High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) (Primers used are listed in Table S4). Each PCR product was processed as previously described [11,12]. DNA libraries were generated and sequencing at high Likewise, polyposis patients were consecutively recruited through the same hereditary cancer clinical program, and they were selected for the current study based on the absence of germline pathogenic variants in APC, MUTYH, POLE, POLD1, NTHL1 or MSH3 in the case of adenomatous polyposis patients, and on the absence of germline pathogenic variants in RNF43, NTHL1 or MSH3 in the case of serrated polyposis patients [6][7][8][9].
Patients provided written informed consent and the study received the approval of the IDIBELL Ethics Committee (PR073/12).
Genomic DNA from peripheral blood was extracted using the FlexiGene DNA kit (Qiagen, Valencia, CA, USA).

Germline Mutation Identification in Pooled Samples
The abovementioned 736 patients were screened for mutations in NSD1, HDAC10, KRT24, ACACA and TP63 using a combination of PCR amplification in pooled DNAs and targeted next generation sequencing, as previously described [10,11]. Eight DNA pools were generated by adding equimolecular quantities of each sample (48-96 samples per pool). Amplification of the genes' coding exons (+/− 20 bp flanking regions) was performed in each pool, using Phusion High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) (Primers used are listed in Table S4). Each PCR product was processed as previously described [11,12]. DNA libraries were generated and sequencing at high coverage was performed on a HiSeq-4000 (Illumina, San Diego, CA, USA) at the Centro Nacional de Análisis Genómico (CNAG, Barcelona, Spain). Sequencing data analysis was performed as previously described [11]. The median number of reads per base obtained for all coding regions (+/− 5 bp flanking regions) analyzed was 96,441 (range: 188-420,733 reads/base).

Validation of the Obtained Results and Carrier Identification
Variant-specific KASP genotyping assays (LGC Genomics, Hoddesdon, UK) and direct automated (Sanger) sequencing were used for validation of the targeted next generation sequencing results in the pooled samples, and for identification of the carrier(s) of the corresponding variant (primers in Table S4). Sequencing was performed at STAB VIDA (Caparica, Portugal), and sequencing data were analyzed with SeqMan Pro (Lasergene, DNASTAR, Madison, WI, USA).
Evolutionary conservation was assessed using PhyloP and PhastCons (obtained from Mutation Taster), based on alignments of genome sequences from 46 different species.

Co-Segregation and Second Hit Analyses
Families carrying disruptive, splice-site, and missense variants predicted deleterious by >40% of the 12 in silico predictors mentioned above were further studied. Sanger sequencing was used to check for the presence of the variant in available samples from relatives. Second-hit analysis in tumors, considering the presence of somatic mutations or loss of heterozygosity, was performed using direct automated (Sanger) sequencing. Sanger sequencing, for either co-segregation or second-hit analysis, was performed at STABVIDA (Caparica, Portugal), and sequencing data analysis was carried out with SeqMan Pro (Lasergene, DNASTAR, Madison, WI, USA).

Gene Burden Test
Results obtained in our study were analyzed in combination with the data obtained from the Cancer Variation Resource (CanVar; https://canvar.icr.ac.uk/) (accessed on 1 September 2021), which include exome sequencing data from 1006 early-onset CRC patients, 863 of whom do not carry germline pathogenic variants in known CRC predisposing genes [27,28].
In addition, blood DNA (germline) exome sequencing data from 543 CRC patients whose tumors are included in the TCGA repository were analyzed. TCGA sequencing data were obtained from NCBI dbGaP (the Database of Genotypes and Phenotypes) after receiving authorization (access request #92142-3). TCGA exomes were analyzed according to the following workflow: FASTQ files were mapped to the reference genome GRCh37/hg19 using the Burrows-Wheeler Aligner (BWA-MEM). Variant calling was performed using the Haplotype Caller (GATK4), results were normalized, and single nucleotide variants and indels were filtered based on the following criteria: read depth < 8, Fisher strand > 25.0, quality by depth < 6.0, and RMS mapping quality < 50.0.
Whenever available, additional gene-specific published results were included in the burden analysis, such being the case for NSD1 and ACACA. TCGA tumor somatic data from the patients with a germline (predicted) damaging variant in the selected genes were obtained via the NCI's Genomic Data Commons (GDC) platform [29].
For comparison purposes, we used the gnomAD v.2.1.1 non-cancer individuals as control population (n = 134,187 individuals; source: (http://gnomad.broadinstitute.org/, accessed on 1 September 2021). Based on the ethnicities of the patients (familial/early-onset CRC and polyposis patients were mostly of non-Finnish European origin; TCGA CRC patients were 51% white, 12% black or African American, 2% Asian, and the other 35% had no information on ethnicity), we decided to repeat the burden tests using the data obtained from the gnomAD v.2.1.1 non-cancer, non-Finnish European subpopulation as controls (n = 59,095 individuals).

Statistical Analysis
Gene-based burden tests, i.e., comparison of the frequencies of (predicted) damaging variants in patients and controls, were performed using Fisher's exact test (two sided). Statistical significance was considered when p < 0.01 because five genes were analyzed. Statistical tests and odds ratio (OR) calculations were performed with R version 3.5.1 (RStudio Cloud; RStudio, Boston, MA, USA). Figure 1 shows the workflow of the study and a summary of the results obtained.

Gene Selection
With the aim of assessing the actual involvement of the proposed genes in CRC predisposition, we first carried out a pre-selection of the putative cancer predisposing genes identified by Park et al. [5]. To do so, we evaluated the characteristics of the 49 most frequently enriched genes in the original publication based on the following parameters: (i) relevance of the encoded protein in colorectal carcinogenesis; (ii) gene function, focused on relevant hereditary CRC pathways such as DNA repair, Wnt, BMP/TGF-β or mTOR pathways; (iii) expression in normal colon mucosa; (iv) cancer driver gene (https://www. intogen.org, accessed on 1 February 2020); (v) resistance to mutation, measured by a low observed vs. expected ratio of loss-of-function (LoF) variants in control population (source: gnomAD v.2.1.1); and (vi) if the frequency of loss-of-function variants in controls (n = 1609) did not exceed their frequency in familial/early-onset CRC patients (n = 1006) (case-control data obtained from Chubb et al. [28]). Considering the mentioned characteristics, five of the 49 genes were selected: NSD1, HDAC10, KRT24, ACACA and TP63. Table 1 shows the characteristics of the selected genes, and Table S5 highlights the main reasons for exclusion of the remaining 44 genes.

Gene Mutational Screening of Familial/Early-Onset CRC and Polyposis Patients
Mutational screening of the five selected genes was carried out in 736 unrelated patients, including 465 familial/early onset MMR-proficient nonpolyposis CRC patients, 177 patients with classic or attenuated adenomatous polyposis, and 94 patients with serrated polyposis. We identified a total of 12 rare (MAF < 1% according to gnomAD v.2.1) variants (predicted deleterious by >40% of 12 in silico prediction tools) in 18 unrelated probands ( Table 2). No carriers of ACACA rare predicted damaging variants were detected.  Of note, the variant classification guidelines of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) were not applied to the identified variants, because the recommendations indicate that they should not be used for the classification of variants in genes without a clear association with the disease [39]. Based on this, the variants listed in Table 2 should all be considered as variants of unknown significance regarding their association with cancer.

NSD1
Three rare, predicted damaging, germline missense variants were identified in three unrelated probands (3/736 patients). NSD1 c.3056G>A (p.R1019H) was found in a woman diagnosed with breast cancer and 20 colon adenomas at age 50, and with no family history of cancer. Variant c.3089T>C (p.L1030S) was identified in a patient diagnosed with two CRCs at ages 52 and 59, and with no first-degree relatives affected with cancer. Lastly, NSD1 c.3151G>A (p.E1051K) was found in: a woman diagnosed with CRC at age 55 and with >70 hyperplastic/serrated polyps; in her sister, who had 16 colorectal polyps at age 61; and in one of her sons, who had two colorectal polyps at age 36. A polyp-and cancer-free son resulted noncarrier. The proband's father had been diagnosed with bladder and liver tumors at age 72 and 76 respectively, her mother with CRC at age 79, and her maternal grandfather with stomach cancer at 60 years of age. Unfortunately, due to sample unavailability, no co-segregation studies could be performed in those generations. The pedigrees of the carrier families are shown in Figure S1.

HDAC10
Two heterozygous carriers, a priori not related, of the predicted damaging c.308C>T (p.A103V) variant were identified (Pedigrees in Figure S2). One had been diagnosed with breast cancer at age 26, and with CRC and polyps at 35. The other carrier was diagnosed with attenuated polyposis with multiple polyp types at age 63. Neither carrier had relevant cancer family history. HDAC10 c.827G>A (p.R276G) was present in a man diagnosed with two metachronous CRCs (age at diagnosis: 37 and 43), and with 26 hyperplastic polyps and one adenoma at age 37. His brother and mother were diagnosed with colorectal polyps at 43 and 62 years old, respectively. His cancer family history included one CRC, a prostate cancer, and a pancreatic cancer in two relatives. No co-segregation analysis could be performed.  Figure S3.

TP63
We identified three rare, predicted damaging, germline variants in TP63. Variant c.84T>G (p.H28Q) was identified in two a priori unrelated patients: one diagnosed with CRC at age 50, and another diagnosed with endometrial cancer and CRC at ages 45 and 49, respectively. Both probands had family history of other tumor types. TP63 c.1127G>A (p.R376H) was identified in a female patient diagnosed with CRC and breast cancer at 56 and 59 years of age, respectively. Her cancer family history included other four CRC cases, four breast cancer cases, and one head and neck cancer identified in her father. TP63 c.1459C>T (p.R487C) was found in a man diagnosed with five CRCs and 11-20 adenomatous polyps at age 39, with no familial cancer history. Family pedigrees are shown in Figure S4.

Gene Burden Analysis: Assessment of the Association of the Selected Genes with CRC Predisposition
With the aim of elucidating the actual association of germline variants in the selected genes with a predisposition to develop CRC, we compared the frequency of germline damaging and predicted damaging variants in the selected genes in controls (134,187 gnomAD (v.2.1.1) non-cancer individuals) versus the frequency in patients, categorized as: (i) familial and/or early-onset CRC patients (465 from our study, 1006 from Chubb et al. [28]) (source: https://canvar.icr.ac.uk/) (accessed on 1 September 2021), and other reported studies for specific genes); (ii) polyposis patients (271 from our study); and (iii) (mostly) sporadic CRC patients (543 patients from TCGA) ( Table 3). For the selection of the variants, we applied a filter that considered variants with a gnomAD non-cancer population MAF below 0.1%, and we used a REVEL cutoff of 0.4, a different, possibly more stringent, value than the criteria used in the discovery phase (>40% of 12 in silico prediction tools). With this criterion, some of the variants listed in Table 2 were not accounted for in the burden test. We applied this cutoff to minimize the inclusion of misclassified, non-damaging missense variants. Since most familial/early-onset CRC and polyposis patients were of non-Finnish European origin, we also performed the analysis considering the gnomAD non-Finnish European subpopulation as controls (Table S7).
Despite the lack of statistically significant differences, NSD1, KRT24 and ACACA showed higher frequency of disruptive variants in cases than in controls. This tendency was predominantly observed in familial/early-onset CRC patients for NSD1. No association with polyposis was detected for any of the five genes (Table 3).
When comparing the results to non-Finnish European, non-cancer gnomAD individuals as controls, the tendency for NSD1, KRT24 and ACACA disruptive alleles remained, and significant association was detected for KRT24 damaging and predicted damaging variants when comparing TCGA CRC patients to controls (OR = 2.57; 95% CI: 1.35-4.45; p = 0.002) (Table S7). Based on the lack of association when comparing to gnomAD non-cancer individuals (OR = 0.97; 95% CI: 0.26-2.51; p = 1), it is possible that some of the variants identified in TCGA CRC patients are over-represented in non-European populations, which constitute at least 13% of the TCGA CRC patients analyzed.

Somatic Second Hits
Due to sample availability, we were able to study the presence of acquired somatic mutations or LOH in the selected genes in eight CRCs belonging to NSD1 and KRT24 variant carriers (Table 2). No somatic second hits were identified in the CRCs developed by the carriers of NSD1 p.L1030S and p.E1051K, or in the tumors developed by six of the eight KRT24 variant carriers, including the CRC of the patient with KRT24 c.130C>T (p.R44*) (Table S6). Of the 14 carriers of damaging and predicted damaging germline variants in the selected genes identified among the 543 TCGA CRC patients (Table 3), only one, an African American woman diagnosed at age 71 and carrier of the germline variant NSD1 c.4892A>G (p.K1631R), had a CRC with a somatic mutation in the same gene: c.6143T>A (p.I2048N).

Discussion
Park et al. devised a statistical method, termed ALFRED, that tests Knudson's two-hit hypothesis genome-wide to systematically identify cancer predisposition genes from cancer genome data [5]. By applying ALFRED to >10,000 tumor exomes from 30 cancer types, they identified up to 49 putative cancer predisposition genes. This study caught our interest and we decided to test their hypothesis by assessing the role of the identified ALFRED genes in the predisposition to CRC. First, we performed a pre-selection of genes based on different criteria, which led to a shortened list of five genes as the best candidates to be involved in CRC predisposition: NSD1, HDAC10, KRT24, ACACA and TP63. We identified a total of 12 damaging and predicted damaging variants in 18 probands of a series of 465 MMR-proficient CRC patients and 271 polyposis patients without germline pathogenic variants in known polyposis genes. To demonstrate the association of pathogenic variants in those genes with an increased risk of CRC, we then compared the frequency of damaging and predicted damaging variants in CRC patients and controls, including data from our study as well as others publicly available (publications and databases). Despite the lack of statistical differences between cases and controls, perhaps due to the small number of positive cases, overrepresentation of disruptive (stop-gain and frameshift) variants in cases was observed for NSD1, KRT24 and ACACA. No clear association was observed for HDAC10 and TP63.
NSD1 (histone H3 lysine 36 methyltransferase) is involved in chromatin organization and is considered an epigenetic regulator [40]. Somatic loss-of-function NSD1 mutations are among the most prevalent lesions in human head and neck and lung squamous cell carcinomas, neuroblastomas and glioblastomas, and NSD1 gene silencing has been detected in clear cell renal cell carcinoma and urogenital cancers (reviewed by Tauchmann and Schwaller [40]). Little is known regarding the role of NSD1 in colorectal cancer; however, publicly available data indicate that NSD1 somatic alterations occur in 4% of colon cancers (source: cBioPortal; accessed January 2022).
Heterozygous pathogenic variants in NSD1 are detected in 70% to 93% of typical Sotos syndrome patients [41]. Sotos' disruptive mutations are spread throughout NSD1; however, pathogenic missense mutations related to the syndrome are clustered in highly conserved functional domains between exons 13 and 23 [42]. Looking for clinically validated variants in ClinVar, a total of 246 coding changes have been reported as pathogenic in Sotos syndrome patients: 196 disruptive and 18 canonical splice-site variants are distributed throughout the gene (exons 5 to 23), whereas all missense pathogenic variants (n = 32) are located between exons 13 and 23 (Figure 2A), in agreement with the observation by Douglas et al. in 2003 [42]. Despite putatively sharing the same autosomal dominant inheritance, variants identified in CRC patients show different characteristics. While most variants identified in Sotos patients are loss-of-function, only one disruptive variant has been identified in CRC patients (NSD1 c.7874G>A; p.W2625*), while the others included nine missense predicted pathogenic (REVEL>0.4). Five out of the nine missense variants identified in CRC patients affected exons 5 and 6, and the other four exons 13, 18 and 23 ( Figure 2B). In contrast to the exons 13-23 location usually observed in Sotos syndrome, CRC patients showed a more homogeneous distribution of missense pathogenic variants, similar to that observed in controls ( Figure 2C). Park et al. had already noticed differences between the missense variants identified in cancer patients compared to those observed in Sotos syndrome patients [5]. While NSD1 was the second most significantly enriched gene in the study performed by Park et al., our results did not show a clear association with CRC predisposition. Zhunussova et al. analyzed 125 early-onset CRC patients from Kazakhstan with a gene panel that included NSD1. Despite their statement that NSD1 was one of the most mutated genes (399 variants in the 125 patients), only one variant, NSD1 c.1865G>C (p.C622S), had a REVEL score > 0.4 [31]. In fact, in addition to the pan-cancer association, this gene was found to be particularly associated with ovarian, stomach, bladder, lung and liver cancers, but not specifically with CRC in the original ALFRED publication [5]. KRT24 encodes a keratin essential for the cytoskeleton of epithelial cells and it also influences cellular response to pro-apoptotic signals. Our results suggested that disruptive variants in KRT24 were enriched in CRC patients compared to controls (0.11% vs. 0.07%), although the differences did not reach statistical significance. Aside from the pancancer association, Park et al. did not identify a particular association for CRC in the case of KRT24 [5]. Overexpression of KRT24 has been found in the normal mucosa of earlyonset MMR-proficient CRC patients compared to the normal mucosa of healthy controls, supporting its role in CRC predisposition [35], although its association with the presence of germline variants in the gene was not evaluated. Apart from our analysis, no other published studies include the study of germline variants in KRT24 in CRC patients. Somatic mutations in KRT24 occur in 1.6% of CRCs (source: cBioPortal; accessed Jan 2022).
ACACA (acetyl-CoA carboxylase or ACC1) is a key catalyzer in the biogenesis of long-chain fatty acids, which are essential for cancer cell survival during hypoxia [44]. KRT24 encodes a keratin essential for the cytoskeleton of epithelial cells and it also influences cellular response to pro-apoptotic signals. Our results suggested that disruptive variants in KRT24 were enriched in CRC patients compared to controls (0.11% vs. 0.07%), although the differences did not reach statistical significance. Aside from the pan-cancer association, Park et al. did not identify a particular association for CRC in the case of KRT24 [5]. Overexpression of KRT24 has been found in the normal mucosa of early-onset MMR-proficient CRC patients compared to the normal mucosa of healthy controls, supporting its role in CRC predisposition [35], although its association with the presence of germline variants in the gene was not evaluated. Apart from our analysis, no other published studies include the study of germline variants in KRT24 in CRC patients. Somatic mutations in KRT24 occur in 1.6% of CRCs (source: cBioPortal; accessed January 2022).
ACACA (acetyl-CoA carboxylase or ACC1) is a key catalyzer in the biogenesis of longchain fatty acids, which are essential for cancer cell survival during hypoxia [44]. Inhibition of ACACA leads to decreased cell proliferation, decreased apoptosis, and increased risk of metastasis or recurrence [45][46][47][48]. While Park et al. identified a pan-cancer association, no specific association with CRC was detected [5]. Our results showed an over-representation of disruptive variants in CRC patients compared to controls, although the difference did not reach statistical significance. Thutkawkorapin et al., by performing exome sequencing in 51 early-onset CRC patients without family history of cancer, identified a missense predicted pathogenic variant in ACACA, p.R2208Q, in one of the patients, and proposed it as a candidate for CRC predisposition [36]. To our knowledge, no additional studies of ACACA in CRC patients have been published. Somatic ACACA mutations occur in 4% of CRCs (source: cBioPortal; date of access: January 2022) Regarding the other two genes, HDAC10 and TP63, for which no over-representation of (predicted) damaging variants was identified in CRC patients compared to controls, no previous studies have identified an association with CRC predisposition. TP63 alleles have been associated with susceptibility to different cancer types, but not to CRC (some examples: [49][50][51][52][53]). Somatic mutations in HDAC10 and TP63 occur in 1.1% and 2.4% of CRCs respectively (source: cBioPortal; accessed January 2022).
Considering the rationale behind the ALFRED in silico method, we assessed the presence of somatic second hits (somatic variant or LOH) in available tumor samples from the carriers identified in our series (two tumors from NSD1 variant carriers and six tumors from KRT24 variant carriers), and from the 14 TCGA CRC patients with a damaging or predicting damaging variant in any of the five selected genes. Only one of the 20 CRCs had an acquired somatic mutation in the corresponding gene, which corresponded to a tumor from an NSD1 variant carrier. Somatic methylation was not evaluated.
The major methodological limitations of our study include: (i) Sample sizes for the burden tests were insufficient, which may have prevented the identification of significant associations. Due to the extremely low prevalence of disruptive variants, larger series of patients need to be analyzed, adding complete co-segregation and second hit analyses. (ii) The gene pre-selection step might have excluded one or several relevant genes for CRC predisposition. In this regard, future studies should not systematically discard the remaining 44 ALFRED genes as potentially involved in CRC predisposition. (iii) Moreover, based on the lack of studies that functionally link alterations in those genes with colorectal carcinogenesis, functional studies that prove their role in the initiation of (colorectal) cancer will also be key for their confirmation as CRC predisposition genes. (iv) While our study covers CRC, most ALFRED genes might be involved in the predisposition to other tumor types, such as ovarian, breast and endometrial cancers, as had been proposed in the original publication [5].

Conclusions
Aiming to assess the involvement in CRC predisposition of previously identified potential cancer predisposition genes (ALFRED genes), we performed a mutational screening of five selected genes in 736 familial/early-onset CRC and polyposis patients, followed by gene burden analyses that compared the frequency of damaging and predicted damaging variants in CRC patients and controls. Our study showed that all, or at least most, ALFRED genes did not seem to be relevant for CRC predisposition, at least not as monogenic cause of the disease and/or following a classic tumor suppressor model (Knudson's second hit hypothesis). Nevertheless, the results obtained in our study, although nonsignificant probably due to insufficient sample size, suggest a possible association of NSD1, KRT24 and ACACA disruptive (loss-of-function) variants with CRC, requiring validation in larger series of patients.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14030699/s1, Table S1: Characteristics of the 465 MMRproficient non-polyposis unrelated CRC patients included in the study, Table S2: Characteristics of the 177 unrelated adenomatous polyposis patients included in the study, Table S3: Characteristics of the 94 serrated polyposis patients included in the study, Table S4: Amplification and sequencing primers used for Sanger and targeted next generation sequencing, Table S5: Reasons for exclusion for the other 44 genes identified by Park et al., Table S6: Clinical characteristics, co-segregation data and second hit results of the carriers of the rare, predicted damaging variants identified among the 736 patients included in the study, Table S7: Burden analysis of the germline variants identified in the five selected candidate genes, considering the gnomAD non-Finnish European (NFE), non-cancer subpopulation as controls, Figure S1: Pedigrees of the families carrying NSD1 rare germline (predicted) damaging variants, Figure S2: Pedigrees of the families carrying HDAC10 rare germline variants, Figure S3: Pedigrees of the families carrying KRT24 rare germline variants, Figure S4: Pedigrees of the families carrying TP63 rare germline variants.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of IDIBELL (protocol code: PR073/12).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Data Availability Statement: All data supporting the reported results may be found in the article and supplementary material.