Whole Genome Sequencing of Familial Non-Medullary Thyroid Cancer Identifies Germline Alterations in MAPK/ERK and PI3K/AKT Signaling Pathways

Evidence of familial inheritance in non-medullary thyroid cancer (NMTC) has accumulated over the last few decades. However, known variants account for a very small percentage of the genetic burden. Here, we focused on the identification of common pathways and networks enriched in NMTC families to better understand its pathogenesis with the final aim of identifying one novel high/moderate-penetrance germline predisposition variant segregating with the disease in each studied family. We performed whole genome sequencing on 23 affected and 3 unaffected family members from five NMTC-prone families and prioritized the identified variants using our Familial Cancer Variant Prioritization Pipeline (FCVPPv2). In total, 31 coding variants and 39 variants located in upstream, downstream, 5′ or 3′ untranslated regions passed FCVPPv2 filtering. Altogether, 210 genes affected by variants that passed the first three steps of the FCVPPv2 were analyzed using Ingenuity Pathway Analysis software. These genes were enriched in tumorigenic signaling pathways mediated by receptor tyrosine kinases and G-protein coupled receptors, implicating a central role of PI3K/AKT and MAPK/ERK signaling in familial NMTC. Our approach can facilitate the identification and functional validation of causal variants in each family as well as the screening and genetic counseling of other individuals at risk of developing NMTC.


Introduction
Thyroid cancer is the most common endocrine malignancy with an age adjusted incidence of 0.5-20/100,000 persons per year [1]. Significant regional differences exist with Italy being among the countries with the highest incidence rates in the world [1]. An increasing incidence has been observed worldwide during the past decades, which can to a certain extent be related to changes in the availability of medical services and in standard clinical practice. On the other hand, regional diagnosed with PTC or micro-PTC (II-2, II-3, II-6, III-1) and one child with benign nodules (II-1). Her unaffected son was deemed a reliable control (II-4). WGS (*) was performed on five family members. In family 2, there were six cases (III-1, III-3, III-4, IV-3, IV-4, IV-5), one probable case (IV-1) and one control (IV-2) out of which six underwent WGS. Family 3 consisted of two related cases (IV-4, IV-5) and one unrelated case (III-1) of which all three underwent WGS. Family 4 is characterized by bilateral PTCs concurrent with other subtypes of NMTCs (Hürthle cell cancer, follicular cancer). Four family members were diagnosed with thyroid cancer of which all underwent WGS (II-2, III-1, III-2, III-3). WGS was performed on eight family members of family 5. Five members were affected by PTC, Hürthle cell cancer, micro-PTC or a combination of two of the subtypes (II-2, II-3, II-5, II-8, II-9). Four members were possible carriers either affected by benign nodules or deceased (I-1, II-4, II-6) and two were unaffected (II-1, II-7).

Whole Genome Sequencing and Variant Evaluation
WGS for 23 cases and 3 controls was performed using Illumina-based small read sequencing after DNA was isolated from peripheral blood using the QIAamp ® DNA Mini Kit (Qiagen, Cat No. 51104) according to the manufacturer's instructions.

Variant Calling Annotation and Filtering
Sequencing data was mapped to a reference human genome (assembly version Hs37d5) using BWA mem (version 0.7.8) and duplicates were removed using biobambam (version 0.0.148). Single nucleotide variants (SNVs) and indels were called from all the samples in a family together using Platypus (version 0.8.1). ANNOVAR, 1000 Genomes, dbSNP and ExAC (Exome Aggregation Consortium) were used in the annotation of variants as explained in detail in our previous paper [14]. Variants to be evaluated further were selected using the following criteria: i) A quality score greater than 20, and a coverage greater than 5x; ii) All Platypus filters were met. Variants with a minor allele frequency (MAF) less than 0.1 % in 1000 genome and ExAC-nonTCGA data were selected for further analysis. A pairwise comparison of shared rare variants among the cohort was performed to check for sample swaps and family relatedness.

Whole Genome Sequencing and Variant Evaluation
WGS for 23 cases and 3 controls was performed using Illumina-based small read sequencing after DNA was isolated from peripheral blood using the QIAamp ® DNA Mini Kit (Qiagen, Cat No. 51104) according to the manufacturer's instructions.

Variant Calling Annotation and Filtering
Sequencing data was mapped to a reference human genome (assembly version Hs37d5) using BWA mem (version 0.7.8) and duplicates were removed using biobambam (version 0.0.148). Single nucleotide variants (SNVs) and indels were called from all the samples in a family together using Platypus (version 0.8.1). ANNOVAR, 1000 Genomes, dbSNP and ExAC (Exome Aggregation Consortium) were used in the annotation of variants as explained in detail in our previous paper [14]. Variants to be evaluated further were selected using the following criteria: (i) A quality score greater than 20, and a coverage greater than 5x; (ii) All Platypus filters were met. Variants with a minor allele frequency (MAF) less than 0.1 % in 1000 genome and ExAC-nonTCGA data were selected for further analysis. A pairwise comparison of shared rare variants among the cohort was performed to check for sample swaps and family relatedness.

Variant Filtering following the FCVPPv2
Variant evaluation was performed using the criteria of our in-house developed Familial Cancer Variant Prioritization Pipeline v2 (FCVPPv2) [14]. This process is summarized in Figure 2 and explained in the following text.

Variant Filtering following the FCVPPv2
Variant evaluation was performed using the criteria of our in-house developed Familial Cancer Variant Prioritization Pipeline v2 (FCVPPv2) [14]. This process is summarized in Figure 2 and explained in the following text.

Segregation in Pedigrees
The variants were filtered based on pedigree data considering family members diagnosed with NMTC or micro-PTC as cases, benign nodules or goiter as potential variant carriers and unaffected members as controls. The probability of an individual being a Mendelian case or true control was considered. The general rule was that variants had to be present in all cases and absent from all controls.

Variant Ranking Using In Silico Tools
After filtering variants based on pedigree segregation, the CADD tool v1.3 [15] was applied. Variants with a scaled PHRED-like CADD score greater than 10, which accounts for the top 10% of probable deleterious variants in the human genome, were prioritized. Variants were then selected according to their conservation scores. High evolutionary conservation suggests functional importance of a position. Genomic Evolutionary Rate Profiling (GERP), PhastCons and PhyloP were used to assess conservation of the variant position, whereby GERP scores >2.0, PhastCons scores >0.3 and PhyloP scores >3.0 indicate a high level of conservation and are therefore used as thresholds in the selection of potentially causative variants. After that, all missense variants were assessed for deleteriousness using the following tools: SIFT, PolyPhen V2-HDIV, PolyPhen V2-HVAR, LRT, MutationTaster, Mutation Assessor, FATHMM, MetaSVM, MetLR, PROVEAN, VEST3 and RI using dbNSFP [16]. Variants predicted to be deleterious by at least 60% of these tools were shortlisted for further analysis. Lastly, intolerance scores were considered. These were merely used to rank the variants and not as cutoffs for selection. The ranking of variants according to the intolerance scores of the corresponding genes relies

Segregation in Pedigrees
The variants were filtered based on pedigree data considering family members diagnosed with NMTC or micro-PTC as cases, benign nodules or goiter as potential variant carriers and unaffected members as controls. The probability of an individual being a Mendelian case or true control was considered. The general rule was that variants had to be present in all cases and absent from all controls.

Variant Ranking Using In Silico Tools
After filtering variants based on pedigree segregation, the CADD tool v1.3 [15] was applied. Variants with a scaled PHRED-like CADD score greater than 10, which accounts for the top 10% of probable deleterious variants in the human genome, were prioritized. Variants were then selected according to their conservation scores. High evolutionary conservation suggests functional importance of a position. Genomic Evolutionary Rate Profiling (GERP), PhastCons and PhyloP were used to assess conservation of the variant position, whereby GERP scores >2.0, PhastCons scores >0.3 and PhyloP scores >3.0 indicate a high level of conservation and are therefore used as thresholds in the selection of potentially causative variants. After that, all missense variants were assessed for deleteriousness using the following tools: SIFT, PolyPhen V2-HDIV, PolyPhen V2-HVAR, LRT, MutationTaster, Mutation Assessor, FATHMM, MetaSVM, MetLR, PROVEAN, VEST3 and RI using dbNSFP [16]. Variants predicted to be deleterious by at least 60% of these tools were shortlisted for further analysis. Lastly, intolerance scores were considered. These were merely used to rank the variants and not as cutoffs for selection. The ranking of variants according to the intolerance scores of the corresponding genes relies on the assumption that a variant in a gene intolerant to functional genetic variation is more likely to be deleterious than one that is tolerant to functional variation. We used three intolerance scores based on NHLBI-ESP6500, ExAC datasets and a local dataset, all of which were developed with allele frequency data. The ExAC consortium has developed two additional scoring systems using large-scale exome sequencing data including intolerance scores (pLI) for loss-of-function variants and Z-scores for missense and synonymous variants. These were used for nonsense and missense variants respectively. In our final list, we also included missense variants in known tumor suppressor genes and oncogenes independent of their deleteriousness and intolerance scores. However, all variants had to meet previous cut-offs, i.e., MAF >0.1, pedigree segregation, CADD-PHRED >10 and positive conservation scores.

Analysis of Non-Coding Variants
Non-coding regions make up over 98% of the genome and possess millions of potentially regulatory elements and noncoding RNA genes. Hence it is crucial to analyze the potential pathogenic impact of such variants in a Mendelian disease. Putative miRNA targets at variant positions within 3 untranslated regions (UTRs) and 1 kb downstream of transcription end sites were detected by scanning the entire dataset of the human miRNA target atlas from TargetScan 7.0 [17] with the help of the intersect function of bedtools. We scanned the 5UTRs and 1 kb regions upstream of transcription start sites for transcription factor binding sites using SNPnexus (version 3; Dec 2017) [18]. For regulatory variants, we merged enhancer [19] and promoter [20,21] data from the FANTOM5 consortium and super-enhancer data from the super-enhancer archive (SEA) [22] and dbSUPER [23] using the intersect function of bedtools to identify putative enhancers, promoters and super-enhancers in our dataset. We accessed epigenomic data and marks from 127 cell lines from the NIH Roadmap Epigenomics Mapping Consortium via CADD v.1.3 [15], which gave us information on chromatin states from ChromHmm [24] and Segway [25]. The CADD analysis of 3 UTRs also gave us mirSVR scores for putative miRNA targets; a score lower than -0.1 is indicative of a "good" miRNA target [26]. Furthermore, we used SNPnexus to obtain non-coding scores for each variant and to identify regulatory variants located in CpG islands. Top 3 'UTR and downstream variants that had CADD scores >10 and miRNA target site matches with mirSVR scores <−0.1 were short-listed. Similarly, upstream and 5 UTR variants in enhancers, promoters, super-enhancers or transcription factor binding sites with CADD scores >10 were selected.

Variant Validation
In order to increase the confidence in variant calls and reduce the risk of false positives, we visually inspected the sequencing data of all short-listed variants for correctness using the Integrative Genomics Viewer (IGV; version 2.4.10) [27].

Ingenuity Pathway Analysis (IPA)
IPA (Qiagen; http://www.qiagen.com/ingenuity; analysis date 08/04/2019) was used to perform a core analysis to identify relationships, mechanisms, functions, networks, and pathways relevant to the genes affected by variants that passed the mean allele frequency cut-off, fulfilled family-based segregation criteria, had CADD scores >10 and were not intergenic or intronic variants. Data were analyzed for all five families together. Top canonical pathways were identified from the IPA pathway library and ranked according to their significance to our input data. This significance was determined by p-values calculated using the right tailed Fisher's exact test. These values indicated the probability of association of genes from the input dataset with the canonical pathway by random chance alone. Ratios were also calculated for each pathway by dividing the number of genes from the input dataset that map to the pathway by the total number of genes in that pathway. The ratios did not influence the ranking of the canonical pathways.
IPA was also used to generate gene networks in which upstream regulators were connected to the input dataset genes while taking advantage of paths that involved more than one link (i.e., through intermediate regulators). These connections represent experimentally observed cause-effect relationships that relate to expression, transcription, activation, molecular modification and transport as well as binding events.

Whole Genome Sequencing
In this study, five families with reported recurrence of NMTC were analyzed. WGS identified a total of 112254, 207873, 120323, 91427 and 101081 variants which were reduced by pedigree-based filtering to 6368, 9373, 3123, 7060 and 2708 in families 1-5, respectively. Non-synonymous SNVs were the most common exonic variants ( Figure S1).

Final Prioritization of Candidates according to the FCVPPv2
After applying the FCVPPv2, the number of potential pathogenic protein coding variants was reduced to 31. These variants are listed in Table 1. A number of genes are of high significance to our study as they are either related to cancer or play a role in thyroid metabolism. CHEK2 is a known tumor suppressor gene involved in DNA damage response [28]. EWSR1 generates a powerful oncogenic protein causing Ewing sarcoma [29], RET is a proto-oncogene well-known in hereditary medullary thyroid carcinoma NRP1 is known to be positively associated with the progression of breast cancer [30], POT1 is a known predisposing gene in malignant melanoma [31] and TG encodes the precursor of iodinated thyroid hormones and is associated with susceptibility to autoimmune thyroid diseases (AITD) [32].
FCVPPv2 also identified 14 upstream and 5 UTR variants, which are shown in Table 2. Among them, three variants are of particular interest in thyroid cancer. The PCM1 variant is a 5 UTR variant that our data showed to affect three transcription factor binding sites (Egr-3, AP-2alphaA and AP-2 gamma). Chromosomal aberrations involving this gene have been associated with PTC and a variety of hematological malignancies [33]. The other 5 UTR variant is located in the P4HB gene which is known to be involved in the structural modification of the thyroglobulin precursor in hormone biogenesis [34]. Both variants are present in CpG islands and have been predicted to be localized at an active transcription start site by ChromHmm and Segway. The third variant is an upstream variant in the DAPL1 gene, shown to affect the binding sites of MAZR and Sp1, a potential tumor suppressor in thyroid cancer, by SNPnexus and Segway.
Furthermore, 25 variants located downstream and in 3 UTRs were shortlisted (Table 3). Among them, two genes of importance can be highlighted, namely ACVR1B and NLK. Mutations in the ACVR1B gene are associated with pancreatic cancer [35]. The variant in the 3 UTR of ACVR1B is localized at a target site for miR-6871-5p with a context ++ percentile score of 53, indicating a relatively good context for repression of the mRNA due to this miRNA. Altered expression of NLK is associated with cancer development and has been shown to be an independent prognostic factor in colorectal cancer [36]. The corresponding variant to this gene has two predicted miRNA target sites for miR-6818-5p and miR-6867-5p with high context ++ percentile scores (88 and 79, respectively).
Variants prioritized by the FCVPPv2 were also present in pathways, networks, and disease categories shown to be significantly enriched in FNMTC by IPA. Table 1. Top exonic variants prioritized following the FCVPPv2. Chromosomal positions, classifications, PHRED-like CADD scores and the percentage of positive intolerance (Int) and deleteriousness (Del) scores are included for each variant. Additional information regarding protein-protein interactions (STRING), localization in protein domains (InterPro [37]) and the biological function of the respective protein (GeneCards [38]) is included.  Table 2. Top upstream and 5 UTR variants prioritized according to the FCVPPv2. Variant annotation, chromosomal position, and regulatory consequences according to FANTOM5, SEA, CADD and SNPnexus are listed. The FANTOM5 database gives information on known promoters. CADD gives an overall deleteriousness score together with chromatin state information based on ChromHmm and Segway scores and information on transcription factor binding sites (TFBSs). Location of the variants within a specific TFBS and CpG island were obtained from SNPnexus. A cumulative non-coding score is shown as a percentage of positive scores from all scores listed in the footnote. Cut-offs for these scores are also indicated in the footnote.

Ingenuity Pathway Analysis (IPA) Shows Enrichment of GPCR and RTK Mediated Pathways
In order to identify key biological functions and signaling pathways affected in FNMTC, we filtered the variants according to pedigree segregation, CADD scores and location, excluding intronic and intergenic variants. The variants were in 339 genes, with 92, 122, 14, 72 and 39 genes coming from families 1-5 respectively. Of these genes, 210 gene IDs could be mapped by IPA and were part of the subsequent analysis (Table S1). The remaining 129 genes were uncharacterized genes with RP11 IDs, and thus could not be mapped.
Of the top 150 diseases and bio functions, 123 were cancer-related with thyroid cancer at position 99 (p = 3.17 × 10 −5 ), NMTC at position 120 (p = 6.39 × 10 −5 ), differentiated thyroid cancer (DTC) at position 125 (p = 7.88 × 10 −5 ) and PTC at position 148 (p = 2.16 × 10 −5 ) (Table S1B). There was a high overlap of molecules among the four thyroid cancer related categories. This overlap of eight genes included two genes prioritized using our pipeline (RET and TG), that are of particular interest in thyroid cancer.
With the aim of evaluating the canonical pathway results to determine the most significant pathways in our dataset, we created a network of the top 18 overlapping canonical pathways (Table S1C, Figure 3). The threshold of common genes between the pathways was set at 2. G-protein coupled receptor (GPCR) and receptor tyrosine kinase (RTK) mediated pathways, as major mediators of thyroid cancer development, were represented by 12 pathways (Figure 3). The genes involved in the top 18 pathways along with their corresponding variants are listed in Table S2.

Ingenuity Pathway Analysis (IPA) Shows Enrichment of GPCR and RTK Mediated Pathways
In order to identify key biological functions and signaling pathways affected in FNMTC, we filtered the variants according to pedigree segregation, CADD scores and location, excluding intronic and intergenic variants. The variants were in 339 genes, with 92, 122, 14, 72 and 39 genes coming from families 1-5 respectively. Of these genes, 210 gene IDs could be mapped by IPA and were part of the subsequent analysis (Table S1). The remaining 129 genes were uncharacterized genes with RP11 IDs, and thus could not be mapped.
Of the top 150 diseases and bio functions, 123 were cancer-related with thyroid cancer at position 99 (p= 3.17 × 10 −5 ), NMTC at position 120 (p=6.39 × 10 −5 ), differentiated thyroid cancer (DTC) at position 125 (p=7.88 × 10 −5 ) and PTC at position 148 (p= 2.16 × 10 −5 ) (Table S1B). There was a high overlap of molecules among the four thyroid cancer related categories. This overlap of eight genes included two genes prioritized using our pipeline (RET and TG), that are of particular interest in thyroid cancer.
With the aim of evaluating the canonical pathway results to determine the most significant pathways in our dataset, we created a network of the top 18 overlapping canonical pathways (Table S1C, Figure 3). The threshold of common genes between the pathways was set at 2. G-protein coupled receptor (GPCR) and receptor tyrosine kinase (RTK) mediated pathways, as major mediators of thyroid cancer development, were represented by 12 pathways (Figure 3). The genes involved in the top 18 pathways along with their corresponding variants are listed in Table S2. . Top 18 overlapping canonical pathways visualized as a network, which shows each pathway as a single "node" colored proportionally to the Fisher's Exact Test p-value, where brighter red indicates higher significance. Nodes marked with asterisk (*) belong to the group of GPCR and RTK mediated pathways.

Network Analysis Reinforces the Central Role of PI3K/AKT and MAPK/ERK Signaling in FNMTC
We conducted a network analysis using the IPA software to predict interacting molecular networks significant to our input-data and to evaluate genes with a central role in FNMTC (Figure 4, Table S1D). Since the IPA network analysis includes paths with intermediate regulators that involve more than one link, a comprehensive picture of the possible gene interactions was generated. The networks were ranked according to scores that were generated by considering the number of focus . Top 18 overlapping canonical pathways visualized as a network, which shows each pathway as a single "node" colored proportionally to the Fisher's Exact Test p-value, where brighter red indicates higher significance. Nodes marked with asterisk (*) belong to the group of GPCR and RTK mediated pathways.

Network Analysis Reinforces the Central Role of PI3K/AKT and MAPK/ERK Signaling in FNMTC
We conducted a network analysis using the IPA software to predict interacting molecular networks significant to our input-data and to evaluate genes with a central role in FNMTC (Figure 4, Table S1D). Since the IPA network analysis includes paths with intermediate regulators that involve more than one link, a comprehensive picture of the possible gene interactions was generated. The networks were ranked according to scores that were generated by considering the number of focus genes (input data) and the size of the network to approximate the relevance of the network to the original list of focus genes. We focused on the three highest scoring networks, which had scores ranging from 33 to 51 (Table S1D).
In coherence with the pathway analysis, the network analysis reinforces the importance of central perpetrators of GPCR and RTK mediated signaling (AKT, ERK1/2: Networks 1 & 3) and their downstream effectors (NFκB, CREB: Network 2). Furthermore, Network 3 encompasses a number of genes related to thyroid metabolism including TG from our prioritized shortlist. genes (input data) and the size of the network to approximate the relevance of the network to the original list of focus genes. We focused on the three highest scoring networks, which had scores ranging from 33 to 51 (Table S1D).
In coherence with the pathway analysis, the network analysis reinforces the importance of central perpetrators of GPCR and RTK mediated signaling (AKT, ERK1/2: Networks 1 & 3) and their downstream effectors (NFκB, CREB: Network 2). Furthermore, Network 3 encompasses a number of genes related to thyroid metabolism including TG from our prioritized shortlist.

Overlapping Pathways in Familial Non-Medullary Thyroid Cancer
Since GPCR and RTK mediated signaling were highlighted in both pathway and network analyses, we propose a pathway to facilitate a general understanding of FNMTC at a molecular level ( Figure 5).
Since GPCR and RTK mediated signaling were highlighted in both pathway and network analyses, we propose a pathway to facilitate a general understanding of FNMTC at a molecular level ( Figure 5). Activation of GPCR receptors can activate MAPK/ERK signaling as well as PI3K/AKT signaling via one of the four subclasses of G-proteins (Gαs, Gαi/o, Gαq/11, and Gα12/13). Dimerization of receptortyrosine kinase (RTK) receptors can be induced by growth factors such as EGFR and GDNF, which results in the phosphorylation and subsequent activation of the receptor monomers. Receptor activation is linked to downstream signal transduction pathways like the MAPK signaling cascade and the PI3K/AKT system via adaptor proteins. Genes from our dataset that were present in these pathways as activators or regulators are highlighted in Figure 5.

Discussion
The high heritability of thyroid cancer can be attributed to both rare, high-penetrance mutations and common, low-penetrance variants [4,13]. The former is best identified by studying families with a Mendelian pattern of inheritance of the disease in question. We used this principle in our study and identified 31 exonic and 39 non-coding rare potentially pathogenic variants segregating with the disease in five PTC-prone families.
Scientific and technological advancements in genomics have allowed WGS to become the state-ofthe-art tool not only for the identification of driver mutations in tumors but also for the identification of novel cancer predisposing genes in Mendelian diseases. The former has led to improvements in personalized medicine, wherein therapeutic approaches are based on targeting dysregulated pathways Activation of GPCR receptors can activate MAPK/ERK signaling as well as PI3K/AKT signaling via one of the four subclasses of G-proteins (G αs , G αi/o , G αq/11 , and G α12/13 ). Dimerization of receptor-tyrosine kinase (RTK) receptors can be induced by growth factors such as EGFR and GDNF, which results in the phosphorylation and subsequent activation of the receptor monomers. Receptor activation is linked to downstream signal transduction pathways like the MAPK signaling cascade and the PI3K/AKT system via adaptor proteins. Genes from our dataset that were present in these pathways as activators or regulators are highlighted in Figure 5.

Discussion
The high heritability of thyroid cancer can be attributed to both rare, high-penetrance mutations and common, low-penetrance variants [4,13]. The former is best identified by studying families with a Mendelian pattern of inheritance of the disease in question. We used this principle in our study and identified 31 exonic and 39 non-coding rare potentially pathogenic variants segregating with the disease in five PTC-prone families.
Scientific and technological advancements in genomics have allowed WGS to become the state-of-the-art tool not only for the identification of driver mutations in tumors but also for the identification of novel cancer predisposing genes in Mendelian diseases. The former has led to improvements in personalized medicine, wherein therapeutic approaches are based on targeting dysregulated pathways specific to the affected individual. There are also some reports of WGS being successfully used to implicate rare, high-penetrance germline variants in cancer, for example POT1 mutations in familial melanoma [39] and POLE and POLD1 mutations in colorectal adenomas and carcinomas [40]. Identification of cancer-predisposing mutations is a critical step in cancer risk assessment and can help in cancer screening and prevention strategies. Furthermore, the implication of predisposition genes and their respective pathways may facilitate development of targeted therapy. However, one has to be critical in reporting novel variants before appropriate functional validation and evaluation of their penetrance in a large cohort of families. The importance of this step is exemplified by controversial findings regarding the implication of HABP2 G534E in familial NMTC [41].
Some of the genes shortlisted based on FCVPPv2 have already been identified in other cancers. These include CHEK2 mutations in breast cancers and also in a variety of other cancers including thyroid cancer [28], EWSR1 in Ewing sarcoma [29], RET in hereditary medullary thyroid carcinoma, NRP1 in breast cancer [30] and germline POT1 variants in malignant melanoma [31]. Moreover, it is interesting to note that the expression of NRP2, an important paralog of the NRP1 gene, has been correlated to lymph node metastasis of human PTC and is required in the VEGF-C/NRP2 mediated invasion and migration of thyroid cancer cells [42]. The upstream variant in the DAPL1 gene is shown to affect the binding sites of MAZR and Sp1 by SNPnexus and Segway. MAZR1, also known as PATZ1, has been shown to be downregulated and delocalized in thyroid cancer cell lines derived from papillary, follicular and anaplastic thyroid carcinomas [43]. Another study has demonstrated the role of PATZ1 as a tumor suppressor in thyroid follicular epithelial cells and its involvement in the dedifferentiation of thyroid cancer [44].
Other genes of interest shortlisted based on the pipeline (PNPLA8, PTGIR, RET, GNB2 and POT1) were involved in the enrichment of MAPK/ERK and PI3K/AKT pathways. The MAPK pathway is the most frequently mutated signaling pathway in human cancer and is thus considered one of the most promising targets for cancer therapy. This pathway plays a central role in the induction of biological responses such as cell proliferation, differentiation, growth, migration and apoptosis [45]. Initiated by an extracellular mitogenic stimulus that leads to the activation of RTK or GPCR, the MAPK/ERK pathway leads to the phosphorylation and subsequent translocation of ERK into the nucleus. ERK activation plays a central role in the induction of cell cycle entry and the suppression of negative regulators of the cell cycle [46]. Although MEK1 and MEK2 can be activated by multiple MAP kinase kinase kinases (MAP3Ks) as well as by RAF, they serve as sole activators of ERK1/2 and thus as gatekeepers of the MAPK cascade [47]. Overexpression or aberrant activation of RTKs or their immediate downstream targets (PI3K, RAS and SRC) can result in the upregulation of the MAPK/ERK signaling pathway [48]. A common somatic mutation in this pathway is BRAFV600E, which has been implicated in melanoma [49], thyroid and colorectal cancer [50] and hairy cell leukemia [51].
The importance of the PI3K/AKT pathway in thyroid cancer was first recognized when patients suffering from Cowden's syndrome caused by a germline mutation in the PTEN gene were found to have FTC [52]. PI3K activation phosphorylates and activates AKT which can have numerous downstream effects via activation or inhibition of multiple proteins that are involved in cell growth, proliferation, motility, adhesion, angiogenesis, metabolism and apoptosis.
Furthermore, our findings are in line with recent studies on PTC tissues and PTC cell lines have implicated activation of MAPK/ERK and PI3K/AKT pathways in thyroid carcinogenesis [53][54][55]. Interestingly, somatic alterations that lead to the activation of the MAPK pathway as well as of the PI3K/AKT pathway are common in aggressive thyroid cancers, such as metastatic or recurrent PTC/FTC and ATC [56]. The targeting of downstream RAS effectors has already been shown to be a promising approach, however patients treated with RAF or MEK inhibitors frequently develop drug resistance [47]. Targeting the downstream ERK kinase, which is also known as the gatekeeper of the MAPK cascade, can overcome the acquired drug resistance induced by upstream kinase inhibitors [57]. In this context, it is also important to note the similarity between our proposed model for the molecular mechanisms in FNMTC and the reported molecular mechanisms in non-familial NMTC. It is known that patients with familial NMTC may have a more aggressive form of the disease, with larger tumors in younger patients and increased rates of extra-thyroid extension and lymph node metastasis. This suggests that FNMTC should be explored further to gain a better understanding of the cause of increased aggressiveness. However, none of the variants were identified in more than one family. As the phenotypes of our families differed (as described in Figure 1), it is likely that also the mutations causing the disease in the families also are different. We analyzed only 5 families and no other WGS data on FNMTC are available, thus restricting the possibility to confirm the variants in larger data sets. Functional analysis of promising candidates highlighted in this study may shed some light to the mechanisms underlying this phenomenon.
Interpreting WGS data and selecting one out of millions of genetic variants as the cause of hereditary cancer is a daunting task and highlights the importance of the use of a standardized protocol like the FCVPPv2. We were able to prioritize 31 exonic and 39 non-coding potential cancer-predisposing variants using our family-based pipeline from which we hope to pinpoint one candidate gene for each family. The final selection and implication of one candidate gene predisposing to cancer in each family is beyond the scope of this paper as it will involve further steps including population screening and functional studies. In the present study, we decided to focus on the analysis of pathways that are enriched in familial NMTC to see how the variants prioritized using our pipeline fit into the general pathway analysis results. The IPA analysis of all genes already presented us with valuable data and there was a high involvement of genes prioritized using our pipeline in the top diseases and bio functions, canonical pathways and networks generated by IPA. Although IPA could give us a general idea of molecular pathways affected in the studied families, it is important to keep in mind that the analysis was conducted at a gene level and not at a variant level. The evaluation at a variant level is largely dependent on the pipeline and its subsequent steps as mentioned above. We have already successfully implemented this pipeline to identify DICER1 as a candidate predisposing gene in familial Hodgkin lymphoma [58] and are confident that our pipeline can be applied to the NMTC families in a similar manner.

Conclusions
In conclusion, WGS data analysis of five NMTC-prone families allowed us to prioritize 31 exonic and 39 non-coding variants from which we subsequently hope to identify one candidate gene per family. Furthermore, we were able to identify pathways and networks significant to our dataset, including important tumorigenic pathways such as MAPK/ERK and PI3K/AKT signaling pathways. The implication of previously reported tumorigenic signaling pathways and the presence of known tumor suppressor or oncogenes in these affected pathways show that the pathogenesis of FNMTC is in concordance with characteristic molecular mechanisms of cancer. The next steps will include selecting one candidate gene per family and validating it with the help of population screening and functional studies. We hope that our results can facilitate personalized therapy in the studied families and contribute to the screening of other individuals at risk of developing NMTC.