Variant Characterization of a Representative Large Pedigree Suggests “Variant Risk Clusters” Convey Varying Predisposition of Risk to Lynch Syndrome

Simple Summary Approximately 20% of colorectal cancer (CRC) cases are diagnosed in individuals under 40, with a severe prognosis due to germline variant accumulation. Many of these variants have been classified as hereditary cancer causative, while others remain poorly researched. The identification of germline variants across different populations is critical for accurate prognosis, treatment, and follow-up. We aimed to identify and predict the functional implications of germline variants using whole-genome sequencing of a Tunisian pedigree with Lynch syndrome CRC risk. Two SNPs/indels were identified in genes with CRC association (MLH1 and PRH1-TAS2R14) and four in genes with non-CRC cancer association (PPP1R13B, LAMA5, FTO, and NLRP14). Three structural variants overlapped genes associated with non-CRC digestive cancer (RELN, IRS2, and FOXP1) and one overlapped RRAS2 with non-CRC cancer associations. This study provides further evidence of genetic predisposition according to the risk clustering of variants based on their functional implications and risk mechanisms within the same pedigree. Abstract Recently, worldwide incidences of young adult aggressive colorectal cancer (CRC) have rapidly increased. Of these incidences diagnosed as familial Lynch syndrome (LS) CRC, outcomes are extremely poor. In this study, we seek novel familial germline variants from a large pedigree Tunisian family with 12 LS-affected individuals to identify putative germline variants associated with varying risk of LS. Whole-genome sequencing analysis was performed to identify known and novel germline variants shared between affected and non-affected pedigree members. SNPs, indels, and structural variants (SVs) were computationally identified, and their oncological influence was predicted using the Genetic Association of Complex Diseases and Disorders, OncoKB, and My Cancer Genome databases. Of 94 germline familial variants identified with predicted functional impact, 37 SNPs/indels were detected in 28 genes, 2 of which (MLH1 and PRH1-TAS2R14) have known association with CRC and 4 others (PPP1R13B, LAMA5, FTO, and NLRP14) have known association with non-CRC cancers. In addition, 48 of 57 identified SVs overlap with 43 genes. Three of these genes (RELN, IRS2, and FOXP1) have a known association with non-CRC digestive cancers and one (RRAS2) has a known association with non-CRC cancer. Our study identified 83 novel, predicted functionally impactful germline variants grouped in three “variant risk clusters” shared in three familiarly associated LS groups (high, intermediate and low risk). This variant characterization study demonstrates that large pedigree investigations provide important evidence supporting the hypothesis that different “variant risk clusters” can convey different mechanisms of risk and oncogenesis of LS-CRC even within the same pedigree.


Introduction
Colorectal cancer (CRC) is the most frequent neoplasm worldwide, accounting for 8% of cancer-related deaths [1,2].Pathogenetic variants in known high-penetrance cancer-riskassociated genes have been implicated in up to 8% of all CRC cases, where one in five (20%) cases of this type were diagnosed in under 40 year olds [3][4][5] compared to the average worldwide age at diagnosis of 65 [6].Familial association (defined to be families with at least two affected members) appears in an estimated 35% of all CRC cases [7].The most common familial CRC is Lynch syndrome (LS) [8,9].The familial forms of CRC genetic predisposition have been correlated with germline mutations or epimutations in mismatch repair (MMR) genes such as MLH1, MSH2, MSH6, and PMS2 for nonpolyposis cases and in APC and MUTYH for Adenomatous colonic polyposis with recessive inheritance [4,5].Despite the current knowledge of genetic predisposition in these hereditary forms of CRC (such as LS, Gardner syndrome, Juvenile polyposis coli, and others), much of the familial relationships and mechanisms of risk remain unexplained [10].
Genome-wide association via high throughput sequencing (HTS) technologies and the wide collection of variant functional predictive analytics, clouded applications, and databases are critical to the success of identifying new germline likely causative variants implicated in cancer predisposition.These powerful emerging tools help detect deleterious genomic changes in part responsible for the hereditary CRC development, diagnosis, predicted optimal treatment, and therefore long-term prognosis.Approximately 40% of patients with an inherited tumor syndrome exhibit a variant of uncertain significance, as revealed through sequencing analyses that examine germline variants involved in the production of truncated proteins and associated with alterations caused by hereditary pathologies at the germinal level [11].These variants typically involve a single amino acid substitution, which cannot a priori be definitively classified as pathogenic or benign [12].Conversely, synonymous nucleotide substitutions, which generally do not cause alterations in protein structure, have been found to be pathogenic in some instances, depending on their genomic location [13].Additionally, variants appearing together in the same gene or different genes may coexist and co-segregated with the disease phenotype within a single family, potentially explaining the correlated predisposition risk of the family.In some cases, these variants contribute more significantly to cancer risk than classic pathogenic Mendelian variants, and when implicated in tumor predisposition, can cooperatively contribute to an increased risk of cancer development as low-risk alleles [14,15].However, HTS studies have not covered all such cancers and therefore, it is highly likely that functionally pertinent variants and mutations and genes conveying predisposition to CRC and LS are yet to be discovered.This gap in the current knowledge of familial forms of CRC such as LS requires further clinical evaluation of hereditary CRCs supported by germline studies of familial cases [16].
In this study, we aim to identify, annotate, and computationally predict the functional implication of previously known and novel germline SNPs, indels, and structural variants using a whole-genome sequencing approach of a Tunisian large pedigree with three familiarly grouped members affected or at-risk to LS-CRC.

Sample/Data Collection
An LS-affected, large-pedigree Tunisian family with 37 total and 12 known affected members was recruited for this study (Figure 1).Peripheral blood from 11 members (oval circled) was collected.Individual subjects' clinical, environmental, and behavioral data secure, private manner consistent with the Declaration of Helsinki and the permission of Salah Azaiz Institute Ethics Committee registration number: ISA/2016/02.All subjects were informed about the purposes of the study and consented in writing to participate in the study.The 11 subjects were stratified into three groups based on familial cancer status.The high risk to LS group (HRLS, Figure 1, red ovals) are affected subjects of the pedigree; intermediate risk to LS (IRLS, Figure 1, green ovals) includes CRC-free subjects that have at least one affected parent, and low risk to LS (LRLS, Figure 1, blue ovals) are those with no relatives affected in the subject's immediate triplet (subject and both parents).Lynch syndrome status within this family was tested and confirmed during routine clinical work.The three patients collected from in the HRLS group were classified as meeting both the Amsterdam criteria II that have been established by the International Collaborative Group on HNPCC for assistance in identifying Lynch syndrome [17,18] and the original Bethesda guidelines [19].Subsequent blood germline testing using PCR amplification and direct sequencing of the entire coding region and the exon-intron boundaries for MMR genes (MLH1, MSH2, and MSH6) [20,21] revealed a single deleterious germline alteration affecting the MLH1 gene (mutation c.-168_c.116 + 713 del).This corresponds to a 997 bp deletion that encompasses the entirety of exon 1, a portion of intron 1, and a section of the MLH1 promoter, and was observed in all subjects within the HRLS group who underwent germline testing.
Data summarized in Table 1 were recorded in a study database and maintained in a secure, private manner consistent with the Declaration of Helsinki and the permission of Salah Azaiz Institute Ethics Committee registration number: ISA/2016/02.All subjects were informed about the purposes of the study and consented in writing to participate in the study.The 11 subjects were stratified into three groups based on familial cancer status.The high risk to LS group (HRLS, Figure 1, red ovals) are affected subjects of the pedigree; intermediate risk to LS (IRLS, Figure 1, green ovals) includes CRC-free subjects that have at least one affected parent, and low risk to LS (LRLS, Figure 1, blue ovals) are those with no relatives affected in the subject's immediate triplet (subject and both parents).

DNA Extraction and Quality Assessment
Genomic DNA was extracted from the 11 blood samples according to the manufacturer's recommendation using a Flexigene DNA Whole Blood Kit (Qiagen, Hilden, Germany).DNA quality and quantity were assessed using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and electrophoresis migration in agarose gel 1%.The genomic DNA with good quality was subjected to library preparation prior to sequencing.

Whole Genome Sequencing (WGS)
Libraries were prepared using Nextera XT kit (Illumina, San Diego, CA) and pairend sequencing (2 × 300 base pairs) with the Miseq Reagent V3 kit (Illumina) following the manufacturer's instructions.The Nextera enzyme mix was used to simultaneously fragment input DNA and tag with universal adapters in a single tube reaction.Library purification was performed by Agincourt AMPure XP beads (Beckman Coulter, IN, USA) and Bioanalyzer (Agilent, Wilmington, DE, USA) was used for quantification and quality checking [22].Libraries were sequenced using the Illumina NextSeq500 platform (Illumina Inc., San Diego, CA, USA).A total of 1564.4GB with 142.22 GB on average per sample of raw data was generated on the sequencer, resulting in a mean sequence coverage depth of 45.88-fold (range of 37.71-to 59.17-fold).

Bioinformatic Variant Analysis (BVA)
The full bioinformatics variant analysis (BVA) is described in Figure 2 and the Supplemental Material.We used a novel functional implicated variant pipeline created in our previous work on breast cancer [23], modified to account for the pedigree relationship between subjects and familial and thus the risk-related nature of the detected variants.Briefly, for each subject's high-throughput sequencing (HTS) sequence, alignment, poor-quality read filtering, single nucleotide polymorphisms (SNPs), insertions/deletions (indels), and structural variants (SVs) were called and variant quality score recalibration and filtering, annotation and removal of common and likely non-functional variants, and assessment of cancer-associated genes were performed.The full bioinformatics variant analysis (BVA) is described in Figure 2 and the Supplemental Material.We used a novel functional implicated variant pipeline created in our previous work on breast cancer [23], modified to account for the pedigree relationship between subjects and familial and thus the risk-related nature of the detected variants.Briefly, for each subject's high-throughput sequencing (HTS) sequence, alignment, poor-quality read filtering, single nucleotide polymorphisms (SNPs), insertions/deletions (indels), and structural variants (SVs) were called and variant quality score recalibration and filtering, annotation and removal of common and likely nonfunctional variants, and assessment of cancer-associated genes were performed.

Clinical Characterization of the Pedigree Members
The large Tunisian pedigree (Figure 1) includes 39 members across four generations; 12 of them were diagnosed with LS CRC.In this study, 11 members' (3 HRLS, 4 IRLS, and 6 LRLS) germline genomes were fully sequenced and their medical, environmental, and behavioral data were carefully analyzed.Individual and familial characterizations are described in Table   33.34%(1/3) HRLS, 0% (0/4) IRLS, and 50% (3/6) LRLS.Concerning lifestyle, 66.66% of HRLS recorded high brine and meat consumption, 100% of IRLS had a high fat consumption, and 75% of LRLS presented high meat and fat consumption.A total of 66.66% of HRLS were tobacco and alcohol users,

Clinical Characterization of the Pedigree Members
The large Tunisian pedigree (Figure 1) includes 39 members across four generations; 12 of them were diagnosed with LS CRC.In this study, 11 members' (3 HRLS, 4 IRLS, and 6 LRLS) germline genomes were fully sequenced and their medical, environmental, and behavioral data were carefully analyzed.Individual and familial characterizations are described in Table 1.The HRLSs presented an average age of 48 ± 12.16 and BMI of 26.08 ± 3.11, IRLSs had an average age and BMI of 38.25 ± 17.63 and 26.77 ± 3.47, respectively, and the LRLSs had an average age and BMI of 35.50 ± 24.11 and 27.52 ± 2.64, respectively.Male sex distribution was 33.34% (1/3) HRLS, 0% (0/4) IRLS, and 50% (3/6) LRLS.Concerning lifestyle, 66.66% of HRLS recorded high brine and meat consumption, 100% of IRLS had a high fat consumption, and 75% of LRLS presented high meat and fat consumption.A total of 66.66% of HRLS were tobacco and alcohol users, 25% of IRLS were tobacco users, and 50% of LRLS were tobacco and alcohol users.A total of 66% of HRLS had hyperglycemia and hypertension, 75% of IRLS had hypertension and 50% had hyperglycemia, and 25% of LRLS had hypertension.An analysis of the clinical characteristics of each group found no statistically significant associations.

SNPs and Indels
Before filtering, 7,836,438 SNPs and 2,039,903 indels were detected across the 11 genomes.A total of 23.63% (2,334,312 of 9,876,341) of all variants satisfied the low-frequency threshold (<0.01 AF 1000 G ALL and non-TCGA ExAC ALL) and 3.46% (80,905 of 2,334,312) of the variants demonstrated a probable deleterious function (CADD scaled score > 10).Subsequent annotation stratified the likely functionally implicated variants into coding (2824) and noncoding (78,081) variants.Lastly, filtering for variants predicted deleterious by having at least three of MutationTaster, PolyPhen V2, Provean, and SIFT resulted in 1961 coding and 9826 non-coding predicted deleterious variants.From the 1961 low-frequency, predicted functionally deleterious coding variants, 79 were found in all 11 sequenced individuals: 37 in all HRLS members, 27 in the IRLS group, and 15 in the LRLS.Of the 9826 non-coding filtered variants, all 9826 were found in members of at least one group; 4171 of the 9826 were found in each HRLS member, 2820 were found in every IRLS member, and 2835 in every LRLS member.In addition, 19 non-coding variants were exclusively detected only in a particular group of the pedigree: 2 variants exclusive to members in HRLS, 3 exclusives to IRLS, and 14 exclusives to LRLS, and 4 of these had a RegulomeDB Score < 4. Most of the detected non-coding variants were Intergenic (37,412) and found in all samples, whereas 30344 Intronic variants were found in all samples, and only 7 were found in samples in the same groups (Table 2).

SNP and Indels with Evidence of Familial and Risk-Implicated Genes
Of the 83 variants identified in the cohort, 39 genes were identified containing at least one variant.PABPC3 had four variants (rs79397892, rs78826513, rs78552667, and rs80261016) found in all samples and one variant (rs201411821) found in all high and intermediate risk samples (HRLS and IRLS).Three of these five variants (rs78826513, rs201411821, and rs80261016) were classified as oncogenic driver variants according to SNPnexus ("Driver" as defined in the oncogenic classification by Cancer Genome Interpreter).KRT18 had four variants found in all samples, though none satisfied the oncogenic (or "Driver" predicted as tumor driver according to Cancer Genome Interpreter) threshold.One additional variant (rs201602708) located in MACF1 did not satisfy the oncogenic threshold.Five of the detected variants were in different genes that were previously described as common mutations in a collection of cancers.rs63750539 in the MLH1 gene has been described in several types of cancer including CRC and LS, rs373141354 in the PPP1R13B gene is associated with melanoma according to the Genetic Association of Complex Diseases database, rs551763507 in the LAMA5 gene is correlated to neuroblastoma, rs76670455 in NLRP14 gene to leukemia, and rs763119571 in TAS2R19 is also described in CRC according to the Cancer Genome Interpreter database.Concerning non-coding variants, four variants belonging to different genes (rs544153916 in PODN, rs116197074 in SCP2, and rs116526711 in MAML3) were noted only in samples from the LRLS group, and one of these variants rs115378978 in the FTO gene was previously associated with prostate cancer according to Disorders and SNPnexus databases (Table 3).

Structural Variants (SVs)
A total of 11,171 SVs were found via smoove.Of the total SVs detected, their classifications were 4120 deletions, 194 duplications, 6560 breakends, and 297 inversions.Filtering for low-frequency novel variants resulted in 1122 deletions, 105 duplications, 5492 breakends, and 137 inversions.Filtering by highly likely functional (AnnotSV ranking > 3) resulted in 164 deletions, 17 duplications, 274 breakends, and 18 inversions.Finally, filtering for total length (>= 50 bp) resulted in 140 (of the 164) deletions.A familial analysis was performed to test sharing across study groups found, finding 154 breakends, 133 deletions, 4 duplications, and 13 inversions in each of the 11 subjects.A total of 35 shared breakends (23 shared within HRLS, 4 in IRLS, and 8 in LRLS) and 18 shared deletions (7 in HRLS, 4 in IRLS, and 7 in LRLS).Two duplications were shared by all members of LRLS and one inversion in IRLS and one inversion in LRLS.AnnotSV analysis for the SVs genomic location showed that 26 breakends and 17 deletions were intronic variants, but the 2 noted inversions were Transcript Start-Transcript End variants and the only detected duplication was an intronic variant (Table 4).

SVs with Risk-Implicated Genes
The 48 identified SVs were evaluated for functional prediction; the results showed that 43 SVs overlapped with genes with a probable functional impact, and 9 SVs overlapped with genes with an unlikely functional impact.Of these 43 SVs of probable impactful, 5 SVs overlapped with 4 genes with likely impact, 4 in the HRLS group (2 SVs in RELN, 1 each for FOXP1 and RRAS2 genes), and 1 in the IRLS group for the IRS2 gene previously associated with multiple cancers, including CRC according to OncoKB Cancer Genes list database.Two of the forty-three breakend SVs found in all members of the LRLS group had a potential impact on RELN, a gene correlated to several cancers including gastric cancer according to OncoKB Cancer Genes list database.One of the forty-three duplication SVs in all members of the IRLS group contains IRS2, a gene associated with esophageal, intestinal, stomach, and CRC cancers according to the My Cancer Genome database.One deletion SV in all members of LRLS impacts FOXP1, a gene correlated to several cancers including esophagogastric, gastrointestinal, and CRC in the My Cancer Genome database.Finally, one breakend SV found in all members of HRLS affects RRAS2, a gene associated with both breast and ovarian cancers according to the My Cancer Genome database (Table 5).

Discussion
Several studies have examined the complex molecular heterogeneity of LS CRC in large families.The literature suggests that LS is caused by genetic and epigenetic variants sporadically found in genes, such as MMR (MLH1, MSH2, MSH6, PMS2, and EPCAM) associated with flat intra-mucosal neoplastic lesions [24,25].However, a recent study by Binder et al. defined a third pathway for LS and showed the existence of two distinct genetic subtypes of the LS CRC [26].These recent findings suggest that the risk and genesis of LS CRC may be caused by various multiply expressed functionally important "variant risk clusters" of germline mutations, each cluster independently associated with various pathways to carcinogenesis, and in a similar manner, each cluster may define both the type and degree of risk to LS CRC even within one large pedigree.The evidence of our results coupled with the broader collection of results in the literature supports such a working hypothesis.
These observations lead to the hypothesis that, in addition to the relatively wellcharacterized MMR deficiency in LS, other germline mutations or groups of mutations may contribute to the disruption of previously unassociated pathways, thus being associated with varying risks and oncogenesis of LS CRC.In this study, we investigated the germline mutational profiles of a large pedigree Tunisian family with Lynch syndrome-associated colorectal cancer using high throughput whole-genome sequencing.Subjects were grouped by familial cancer status into high, intermediate, and low risk to LS.Overall, we identified 94 germline variants, including 11 novel and rare cancer pathogenic variants previously described in cancer.Then, we clustered the identified germline variants according to the pedigree risk status into LRLS, IRLS, and HRLS groups.
The high-risk LS patients shared a missense mutation (rs63750539, p.Ala111Val) in MLH1 which is unlikely to be the cause of LS-CRC predisposition.
The MLH1 gene has been established as a causative gene for LS and presents the highest risk of CRC among individuals over 75 (46.6% of women and 51.4% of men) who are affected by the MLH1 variation.Rates range from 0% (at age 30) to 48.3% (at age 75) in females, and from 4.5% (at age 30) to 57.1% (at age 75) in males [22,23].In another study, MLH1 variants were correlated with the highest risk of developing CRC in both heredity and sporadic cases [27].Furthermore, two MLH1 5 UTR variants (c.-28A > G and c.-7C > T) were associated with early-onset CRC [28].In addition, a cohort study reported that MLH1 is the most frequently mutated gene in early-onset sporadic CRC patients, exhibiting four pathogenic variations: c.C793T (p.R265C), c.C1029A (p.Y343X), c.C793T (p.R265C), and c.C1029A (p.Y343X) [29].On the other hand, recent studies have found that promoter methylation of the MLH1 gene is prone to be silenced in CRC carcinogenesis pathways, and around 50% of MLH1-deficient tumors exhibit MLH1 promoter methylation [30].Moreover, MLH1 promoter methylation in CRC cases was highly correlated with a BRAF V600E somatic mutation [31].This variant was clinically classified as a hereditary sequence variant identified in disease-related genes directly affecting the clinical management of patients with LS-CRC [32].To better understand the high risk effect on the pedigree, we investigated known driver mutations that are likely related to MMR deficiency.First, we investigated mutations in genes that play a key role in the adenoma-carcinoma model of CRC such as APC, KRAS, TP53, and binding/transactivated genes.We found the rs373141354 variant in the PPP1R13B gene (p.Gly866Arg), which assists TP53 activation during the cell apoptosis and lowers their ability shared by all subjects HRLS group [33].Although TP53 is known to be rarely mutated in LS [34], our findings can be attributed to the high efficiency of TP53 in maintaining genomic integrity by arresting cells with mutated or damaged DNA in the G1 phase of the cell cycle to enable the repair mechanism or induce the apoptosis pathway [35].The balance between cell cycle arrest and induced apoptosis depends on TP53 efficiency, which is related to the PPP1R13B activity identified in our research.Hence, we suggest that the variant rs373141354 (p.Gly866Arg) is associated with an increased risk of LS-CRC due to its low efficiency in cell cycle arrest.
Concerning the detected SVs overlapped with genes that have been previously correlated with different types of cancers, including digestive cancers, four shared SVs were found among HRLS subjects.Two SVs (7_103463079_103463080_BND_1 and 7_103463462_103463463_BND_1) were found to be related to the intronic region of the RELN gene, which has been correlated to several cancer risks including gastric cancer [36].The large CpG islands are located at RELN promoter sites, and their transcriptional silence has been shown to be strongly controlled by promoter hypermethylation [37].Consequently, a relationship between SVs and DNA methylation in cancer is speculated.Recent studies suggest that somatic copy number alterations in cancer are associated with DNA methylation [38], and numerous studies demonstrate that SVs may have a causal role in regulating CpG methylation [39,40].Conversely, it is also possible that methylation could lead to SV imbalance by increasing DNA breaking [41,42].Our observation enhances our growing understanding of the relationship between genetic (SVs) and epigenetic variation in cellular phenotype and the mechanism of gene regulation as well as the traits underlying the evolution of cancer with the presence of SVs in specific genome sites.The 71242366_71242638_DEL_1 deletion was noted in the FOXP1 gene.A large amount of substantial evidence has demonstrated that the tumor microenvironment is closely linked to the initiation, promotion, and progression of CRC through various mechanisms, such as immune suppression and the angiogenesis process [43].Variation in FOXP3, an intracellular key molecule for Treg development and function, has been associated with a dysregulation in subverting antitumor immune responses and promoting tumor progression.The last SV detected, 11_14348706_14348707_BND_1, was related to the intronic region of the RRAS2 gene, which has previously been described in breast and ovarian cancers [44].The role of LS in ovarian cancer was established and widely accepted, but the long-standing question of whether breast cancer should also be included under the umbrella of LS is still debated [45,46].A recent study, consistent with other studies, shows that carriers of LS mutations tend to have earlier manifestations of breast cancer [47].
Regarding the IRLS group, a unique missense variant rs551763507 (p.Gly3688Glu) in the LAMA5 Laminin gene was identified in all subjects of the group.Based on Laminin's function, these variants are not the most probable candidates to play a role in CRC susceptibility [48].However, recent studies have identified LAMA5 in orthotropic metastases.The expression of Laminin 511 was associated with the upregulation of a set of genes regulating angiogenesis in TCGA data [48,49].The Gordon group has demonstrated that the Laminin chains are localized within the vascular basement membrane on the basolateral surface of cancer cells, while colonic epithelial cells normally do not express vascular basement membrane Laminins.Consequently, the profound effect on vascular morphology and function upon LAMA5 mutation and inhibition of LAMA5 expression specifically by colon cancer cells indicates that cancer cell Laminin 511 deposition is important for colonic cells, promoting angiogenesis.Our results may suggest that inhibiting the production of vascular basement Laminins by tumor cells may serve as an efficient approach to prevent growth and the ability of tumor cells to regulate angiogenesis.Another novel variant was identified as an IRLS member as rs76670455 in the NLRP14 gene.NLRP14 has been described as a negative regulator of IFN responses.Interestingly, as inflammatory signaling pathways contribute to B cell lymphoma transformation, it is tempting to speculate that NLRP14 might contribute to cancer [50].Another interesting aspect of NLR proteins is their expression in a panel of immune cells, notably myeloid cells and B cells, and their function as a negative regulator of inflammatory responses [51].This differential expression of NLRP14 might be involved in the malignant transformation process.For the noted SVs in this HRLS pedigree group, only one variant was detected, the 13_110418067_110419056_DEL_1 in the intronic region of the IRS2 gene, which was shared by all IRLS subjects and has been described as related to several digestive cancers.Over-expression of IRS2 increases CRC cell adhesion to a similar extent as IGF-1 stimulation.Changes in adhesion, both increasing and decreasing, are important properties of metastasizing cancer cells and are involved in the invasion process, migration, and distant seeding of tumors [52].In addition, it has been proven that the PI3K pathway is frequently dysregulated during CRC progression [53].The TCGA Network demonstrated that high levels of IRS2 expression are mutually exclusive with IGF2 over-expression and with other mutations in the PI3K pathways in CRC.This suggests that the over-expression of IRS2 may be one mechanism by which the PI3K pathway could be dysregulated in CRC [54].In summary, IRS2 appears to be a potential candidate as an oncogene driver, and the IGF1R-IRS2-PI3K axis could be an important therapeutic target in CRC.
In connection with variants specific to LRLS, we noted rs763119571 in the TAS2R19 gene with a deleterious function.Previous research has shown the association of this variant with CRC risk [24].Genetic variants in type 2 bitter taste receptors (TAS2R) may influence health-related outcomes and are expressed within the oral cavity [55,56], the gastrointestinal mucosa [57], and the lungs [58].TAS2R variants are hypothesized to play roles in an individual's food preferences [59] and the neutralization and expulsion of toxins from the colon/rectum [60], thereby influencing cancer risk.The last noted variant with the deleterious function was rs115378978 in the FTO gene.Different polymorphisms of the FTO gene have been consistently associated with obesity.However, recent genome studies reveal that genetic variants in this gene are associated not only with human adiposity and metabolic disorders but also with several cancers, including colorectal cancer, since they can activate several signaling and hormonal pathways to increase cancer incidence.The hormones included in this carcinogenesis process could be ghrelin, oxytocin, and Leptin.Hence, FTO polymorphisms could exert an influence on the hormonal balance and physiologic factors and might increase cancer risk [60].The variants identified in FTO, TAS2R, and NLRP14 genes were correlated with lifestyle-related factors for cancer installation, which are expected to be found in cases belonging to the LRLS group who are CRC-free.No oncogenic impactful SVs were detected in the LRLS group, which may explain the low risk of these subjects developing LS CRC and the crucial role of these variants in heredity cancer development.
Our current findings clearly illustrate that subtle, familiarly grouped genetic factors underlying risk to LS CRC extend beyond the well-documented familial CRC syndrome genes.Our data suggest that comprehensive germline testing in all LS CRC patients will provide comprehensive results to identify substantially more opportunities for robust and accurate genetically driven cancer characterization and subsequent prevention than established in current practice.Moreover, the magnitude of developing secondary cancer (breast or ovarian) is still unknown.Hence, our study sheds light on the importance of risked grouped germline screening to identify secondary risks and even predict the site of metastases during CRC progression.This study's limitations include the lack of testing on all family members.On the other hand, given the limited studies on SV identification, the impact of the SVs found in our study was predicted based on the hypothesis and recent functional studies.Thus, further studies in many LS-CRC families are needed to confirm the effect of the 57 SVs identified in our study.In addition, despite the extensive advantages of whole-genome sequencing and data processing, there remain several gaps in the technology and analysis.For example, the restricted resolution is caused by limitations in the read depth related to the quality of aligned sequences; a higher depth provides a greater power to call and identify new variants [61].However, our mixed pipeline generated for our WGS-specific study allows us to detect new and known coding and non-coding variants across the whole genome of the 11 sequenced individuals, greatly facilitating the germline profile evolution of LS-CRC pedigrees and opening new opportunities for cancer pedigree studies.The introduction of several commercially available multigene panels has tremendous promise for clinical use but simultaneously raises weaknesses such as clear criteria for selecting above average risk patients to undergo such clinical panel testing [62] and the optimal choice of the prescribed panel [63].Several multigene panels are available for hereditary CRC and the National Comprehensive Cancer Network (NCCN) has provided a useful protocol for predicting the appropriate panel according to the family history of patients to better optimize the patient care [64].A limitation of our study is that not all members of the family were tested for the MLH1 gene variant (c.-168_c.116+ 713del) detected in the family.This may lead to the study being considered a population-based WGS investigation in the context of MLH1 carriers and non-carriers, potentially weakening our conclusions about genetic modifiers in LS.

Conclusions
In this study, we analyzed the germline landscape of a Tunisian family with a predisposition to LS CRC and identified a total of 94 germline variants affecting 39 genes, 6 of which have been previously described in cancer, and 57 SVs, 48 of which were related to 43 genes, with 4 of these genes categorized as oncogenes.According to the familial definition of LS risk in the pedigree members, we identified three "variant risk clusters" associated with the high, intermediate, and low LS CRC risk groups in the pedigree.The results showed that variants related to high-risk LS members may be causative of the disease, while other variants present in intermediate-risk members may develop LS through very different mechanistic disruptions and low-risk members with these variants may not develop LS at all.The application of HTS technology with such variant clustering for germline screening will efficiently provide further insights into the etiology of hereditary cancer and a huge opportunity to improve LS clinical suspicion.The significance of germline variants in cancer predisposition is still poorly explored, and this study contributes to filling this knowledge gap.
medical records and personal interviews based on an interrogatory form conducted by the study personnel.

Figure 1 .
Figure 1.A pedigree of predisposition to colorectal cancer Lynch syndrome.I, II, III, IV the number of generations.* Sequenced subjects; Group 1 (high risk to LS group): subject and parent affected by LS-CRC, red circles; Group 2 (intermediate risk to LS): subject LS-CRC free and one of the parents is affected, green circles; Group 3 (low risk to LS): subject and both parents are LS-CRC free, blue circles.

Figure 1 .
Figure 1.A pedigree of predisposition to colorectal cancer Lynch syndrome.I, II, III, IV the number of generations.* Sequenced subjects; Group 1 (high risk to LS group): subject and parent affected by LS-CRC, red circles; Group 2 (intermediate risk to LS): subject LS-CRC free and one of the parents is affected, green circles; Group 3 (low risk to LS): subject and both parents are LS-CRC free, blue circles.

Table 1 .
Demographic and clinical characteristics of the included family members.

Table 1 .
Demographic and clinical characteristics of the included family members.
. The HRLSs presented an average age of 48 ± 12.16 and BMI of 26.08 ± 3.11, IRLSs had an average age and BMI of 38.25 ± 17.63 and 26.77 ± 3.47, respectively, and the LRLSs had an average age and BMI of 35.50 ± 24.11 and 27.52 ± 2.64, respectively.Male sex distribution was

Table 2 .
Variants count for the whole pedigree subjects and shared variants by subjects under the same pedigree group with the functional annotation of noncoding variants according to ANNOVAR.

Table 3 .
Classification of detected variants by gene and cancer impact according to pedigree groups.
AF: 1000 G Phase 3 all population allele frequency; row in bold: variant previously described as associated with cancer; CRC: colorectal cancer; SNP: single nucleotide polymorphism; ID: identification; rs: reference SNP; INT: intronic; EX: exonic; NA: not applicable; high risk to LS group (HRLS); intermediate Risk to LS (IRLS); low risk to LS (LRLS).

Table 4 .
Variants count for the whole pedigree subjects and shared variants by subjects under the same pedigree group with variants location according to AnnotSV.

Table 5 .
Classification of structural variants by gene and cancer impact according to pedigree groups.