Heritability and De Novo Mutations in Oesophageal Atresia and Tracheoesophageal Fistula Aetiology

Tracheoesophageal Fistula (TOF) is a congenital anomaly for which the cause is unknown in the majority of patients. OA/TOF is a variable feature in many (often mono-) genetic syndromes. Research using animal models targeting genes involved in candidate pathways often result in tracheoesophageal phenotypes. However, there is limited overlap in the genes implicated by animal models and those found in OA/TOF-related syndromic anomalies. Knowledge on affected pathways in animal models is accumulating, but our understanding on these pathways in patients lags behind. If an affected pathway is associated with both animals and patients, the mechanisms linking the genetic mutation, affected cell types or cellular defect, and the phenotype are often not well understood. The locus heterogeneity and the uncertainty of the exact heritability of OA/TOF results in a relative low diagnostic yield. OA/TOF is a sporadic finding with a low familial recurrence rate. As parents are usually unaffected, de novo dominant mutations seems to be a plausible explanation. The survival rates of patients born with OA/TOF have increased substantially and these patients start families; thus, the detection and a proper interpretation of these dominant inherited pathogenic variants are of great importance for these patients and for our understanding of OA/TOF aetiology.


Introduction
Oesophageal Atresia (OA) with or without Tracheoesophageal Fistula (TOF) (MIM 189960) is a developmental defect of the foregut that has a prevalence ranging between 1 and 4.5 per 10,000 live births [1,2]. There are five morphological subtypes of which proximal atresia with a distal TOF is most prevalent [3]. The atresia and fistula are surgically treated in the first days after birth. Even after surgical repair, OA/TOF can result in substantial health problems later in life [4]. Males are more likely to be born with this condition than girls; this 3:2 gender disparity is hypothesised to be confounded by genetic and environmental factors [5,6]. OA/TOF can be the sole anatomical malformation, although in approximately half of patients, this anomaly is associated with other congenital defects. Frequently associated malformations are those of the VACTERL spectrum of anomalies: vertebral, anorectal, cardiac, tracheoesophageal, renal or urinary tract, and limb malformations [7,8]. Other anomalies are also common, e.g., microcephaly, duodenal atresia, single umbilical artery, micrognathia, pyloric stenosis, and genitourinary malformations [7,9].  Fistula (TOF). From left to right, advancing stages of tracheal (yellow) and oesophagus (blue) development are shown. (a) In the normal situation, the oesophagus and trachea fully separate. Tracheoesophageal separation starts before lung bud formation: ventral primordial tracheal (yellow) and dorsal oesophagus (blue) fields develop. Subsequently, a saddle shaped structure expands rostrally, separating the future oesophagus from the future trachea. Unidentified defects result in disturbances of the rostral expanding tracheoesophageal septum (arrow), resulting in narrowing and subsequent rupture of the future oesophagus. (b) Type A, isolated OA. The expansion of the first septum is blocked. The septum expands dorsally, resulting in the formation of oesophageal atresia without a fistula. (c) Type B, OA with proximal TOF. The expansion of the first septum is blocked. A second septum forms and expands rostrally. The first septum expands dorsally, resulting in the formation of a proximal fistula and oesophageal atresia. (d) Type C, OA with distal TOF. The expansion of the first septum is blocked. A second septum forms and expands rostrally as well as dorsally, resulting in the formation of a distal fistula and oesophageal atresia. (e) OA with dual TOF. There are multiple blockage points. The middle septum expands dorsally, creating both a proximal and distal fistula as well as oesophageal atresia. (f) Type E, H-type fistula. The expansion of the first septum is blocked. A second septum forms and expands rostrally, resulting in the formation of a fistula.

Genetic Contribution to OA/TOF Aetiology
The association between similar recurrent anomalies could be caused by genetic defects in one specific gene. Indeed, OA/TOF is a variable feature in more than 70 (often mono-) genetic syndromes (see Table 1). These syndromes have autosomal recessive, Xlinked recessive, and autosomal dominant inheritance. However, they have a much lower prevalence compared with that of OA/TOF, and in most of these syndromes, OA/TOF is a variable characteristic that is not frequently present [24,25]. Mutations in DNA repair genes (FANC genes), genes involved in endocytic vesicular trafficking [26,27], the splicing machinery [28], and several transcription factor genes (e.g., SRY (sex determining region Y)-box 2 (SOX2), MYCN Proto-Oncogene, and BHLH Transcription Factor (MYCN)) could explain some of the aetiology of OA/TOF in patients. OA/TOF is very heterogeneous, and neither de novo mutations [26,27,29] nor de novo Copy Number Variations (CNVs) [30,31] impact a shared locus or gene frequently. However, the total contribution of these changes is substantial. Table 1. Human disease and animal candidate genes modified after [24,25]. * evaluated with Molecular Inversion Probe Screening [32], NA = Not available. Missense and synonymous variation z-scores (Mis_z and Syn_z), probability of intolerance to heterozygous loss of function variation (pLI), or recessive variation (pRec) are derived from the GnomAD database version 2.   [54,55] This, in combination with limited knowledge of, for instance, affected types of cells, large phenotypic and genetic heterogeneity, and the exact heritability results in the absence of a clear genetic diagnosis in the vast majority of OA/TOF patients. Usually, OA/TOF is a sporadic finding: familial recurrence rate is low (1-3%) [99,100] and parents are usual unaffected. Since 1988 onwards, patients with tracheoesophageal anomalies have been included in our Erasmus MC-Sophia TE cohort. In this cohort, there are only a few "familial" TE patients (1.9%) in five families. Parents in these families are unaffected, and recessive or x-linked modes of inheritance are likely genetic mechanisms.
Both a lack of overlap between genes implicated by animal models and a lack of phenotypes of human syndromic genes in animal models hamper the interpretation of genetic findings. In diagnostic genetic evaluations, the interpretation of sequence variants is structured according to the effect on protein, in vivo and in vitro evidence of deleteriousness, segregation of the variation in affected and unaffected individuals (including population frequency), gene characteristics, and the technical sequence quality of the detected variant [107]. Following these criteria, de novo nonsense, missense, or in-frame deletions and insertions in a gene with a low rate of variation and supporting in vivo functional evidence are classified as pathogenic. Therefore, a de novo protein alteration in a conserved gene from which in vivo evidence is present (e.g., an animal knockout model) will almost certainly be classified as pathogenic. This evidence is strengthened if the population frequency and in silico prediction of a deleterious effect fit the frequency of the disease and predict a variant to be deleterious.
In Table 1, established genes from animal models and human OA/TOF syndromes are depicted alongside their probability of loss of function, probability of intolerance to bi-allelic variation, missense z-score, and synonymous z-scores. This table was modified after [24,25], and a substantial portion of these genes was evaluated in a large cohort of patients with OA-and VACTERL-associated anomalies using a Molecular Inversion Probe Candidate gene screening [32]. Interestingly, Screening VACTERL patients including those with OA/TOF as well as exome and genome sequencing of patients did not result in high de novo rates in these genes [26,27,29,32].
One of the reasons for this could be intrinsic to the genes itself. Using public databases, it is possible to predict how well genetic variation in a specific gene is tolerated. Next, we can divide the OA/TOF-associated genes on the in vivo evidence and overlap them with animal models: genes that could be associated in both patients and animals (group 1), patients with mutations in the genes who have OA/TOF and animals with foregut anomalies (group 2), patients who have OA/TOF but animal models targeting these genes as being lethal (group 3), no patients described but animal models with OA/TOF (group 4), no patients described but animal models with foregut anomalies (group 5), and patients who have OA/TOF but animal models with no phenotype (group 6). After ranking these genes on in vivo evidence and overlap with human phenotypes (groups 1-6) and evaluating the genetic variation in these genes in a large control cohort (https://gnomad. broadinstitute.org, accessed on 2 July 2021), the differences in gene characteristics are present (see Table 1, Figure 2).
Next, we can divide the OA/TOF-associated genes on the in vivo evidence and overlap them with animal models: genes that could be associated in both patients and animals (group 1), patients with mutations in the genes who have OA/TOF and animals with foregut anomalies (group 2), patients who have OA/TOF but animal models targeting these genes as being lethal (group 3), no patients described but animal models with OA/TOF (group 4), no patients described but animal models with foregut anomalies (group 5), and patients who have OA/TOF but animal models with no phenotype (group 6). After ranking these genes on in vivo evidence and overlap with human phenotypes (groups 1-6) and evaluating the genetic variation in these genes in a large control cohort (https://gnomad.broadinstitute.org, accessed on 2 July 2021), the differences in gene characteristics are present (see Table 1, Figure 2).  Depicted are the average value and variance of the pRec score (orange) and PLI scores (blue, a) derived from https://gnomad.broadinstitute.org/, accessed on 2 July 2021. pRec; gene likely intolerant for recessive variation, pLI; gene likely intolerant for heterozygous loss of function variation. The right matrix depicts the six different groups and corresponding statistical evaluation of the differences in pLI and pRec scores of in vivo animal model and patient phenotype genes: 1, human patients and animal models with EA and or TOF; 2, human patients with OA/TOF and animal models with foregut anomalies; 3, human patients with OA/TOF and animal model with lethal mutations; 4, no human patients but animal models with OA/TOF; 5, animal models with foregut anomalies but no human patients described; and 6, human patients described but animal models with no phenotype. (b) The characteristics (see Table 1) are compared using an univariate test and T-statistics. For readability, E depicts the scientific E notation for × 10 −a .
For example, the genes in groups 3 and 4 might be more prone to recessive variation compared with the genes in groups 1, 2, 5, and 6 (vice versa genes in groups 1 and 2 have a higher intolerance to heterozygous loss of function variation.) Differences in these gene characteristics could be part of the reason why there is limited overlap between animal models (mostly knockout) and human OA/TOF phenotypes (often de novo and autosomal dominant). This suggests that, based on the gene characteristics of the gene, animal knockout studies overestimate the importance of some genes on human disease Genes 2021, 12, 1595 7 of 18 development and/or embryos with pathogenic mutations in these genes die in utero prematurely, and we underestimate their importance. Therefore, the targeted gene itself could add to the lack of variability in the genetic background of inbred animals [108].

De Novo Mutation in Isolated Phenotypes
The first successful surgical repairs of oesophageal atresia in patients were accomplished halfway the previous century [109]. Mortality remains high (up to 80%) in developing countries [110]. In wealthier countries mortality has decreased substantially (2-5%), and most patients survive and reach adulthood [111,112]. This is especially true for isolated OA/TOF and patients with non-life-threatening associated anomalies. De novo mutations affecting genes involved in tracheoesophageal separation could be a plausible explanation given the absence of phenotypes in parents and the (mostly) sporadic nature of this birth defect. However, the pathways, biological processes, and signalling molecules implicated from animal models as well as monogenetic syndromes are often important during the development of many organs. A high impact mutation in such genes might not be compatible with life (see Figure 2 and Table 1) as the implicated genes are conserved coding regions and there is a clear absence of carriers (as deducible from their high z-scores for Single Nucleotide Variants (SNVs) and PLI for nonsense variants) in control populations.
There is limited evidence for the relationship between detected de novo genetic anomalies and the presence of isolated OA/TOF [26,27]. The de novo rate in the patient groups is similar to that in unaffected controls. The de novo rate of single nucleotide changes in control populations varies between 1 and 2 mutations on average in the exome and between 44 and 82 in the genome. These de novo mutations in control populations are compatible with life (at least into adulthood) and in general do not result in structural anomalies [113]. However, due to, e.g., differences in penetrance, methylation patterns, and gene expression, mutations that do not result in an obvious phenotype in one individual can have drastic consequences in the other. It could be that we detected the responsible de novo mutations but failed to recognise these detected changes as deleterious or pathogenic. If so, large cohorts of patients need to be screened to distinguish enriched genes (or regions) with de novo variation from control populations. Using this approach, we can discriminate contributing de novo changes from changes not related to OA/TOF. Detected de novo (or dominant inherited) changes could be classified as a change in which not enough evidence is present to be classified as pathogenic or benign (Variant of unknown significance) when the particular mutation is seen at low frequency in control populations. Previously, we postulated the slippery slope model in which several stochastic, mechanical, environmental, and genetic factors can be present in an unaffected individual because of compensatory mechanisms and protective factors present during development. However, when this balance is disturbed (e.g., by a de novo mutation), the balance quickly shifts (slips) from one-to-many affected organ systems or intrauterine death. In line with this postulation, it is tempting to speculate that the patients with isolated OA/TOF are those patients with a strong protective background and a (seemingly) unrelated or low impact de novo mutation. We simply miss most patients with a more pronounced pathogenic de novo mutation due to intrauterine death. If we detect a pathogenic de novo mutation, it is in a surviving patient with multiple affected organ systems.

De Novo Mutation in Complex Phenotypes
Interestingly, especially complex OA/TOF seems to be affected by de novo mutation [26,27,29]. Three germ layers-endoderm, mesoderm, and ectoderm-each give rise to different organ structures. Often, more than one organ system is affected in patients with oesophageal atresia. These include vertebral, anorectal, cardiac, renal, limb, and urogenital malformations. A somatic mutation impacting multiple organ systems (1) could occur in a cell before the two-layered blastula matures into the three-layered gastrula; (2) could affect cells that mature in cell types that impact multiple organ systems (e.g., the ectoderm derived neural crest cells), impacting cells and genes that signal from one cell type to the other (e.g., BMP signalling from mesoderm to endoderm) early in development; or (3) could affect a protein (or an interacting protein) with specific spatiotemporal expression patterns. For instance, SOX2 and CHD7 are two OA/TOF syndrome associated genes with specific patterns of associated malformations. They form a complex and regulate common target genes. Chd7 haplo-insufficient mice have reduced Jag1 expression in the ear which results in defects in the vestibule as seen in patients with CDH7 mutations (CHARGE syndrome) [114]. In line with this, many de novo changes that Wang and colleagues found were in genes that code for interaction partners of SOX2 and another OA/TOF associated syndromic gene (EFTUD2) [26].
In most complex patients' cells, two or three germ layers are affected, for instance, the urogenital system (intermediate mesoderm), cardiac system (lateral mesoderm and neural crest), vertebrae (paraxial mesoderm), and gastrointestinal system (endoderm). However, most often, more germ layers are involved in organ development, e.g., the anal canal forms from the endoderm and the ectoderm, and separation of the urogenital cavity is separated from the anorectal canal by the mesoderm-derived anorectal septum. Similarly, the epithelium of the gastrointestinal system is endoderm derived, the smooth muscle cells and connective tissue layers are mesoderm derived, and the enteric nervous system is neural crest derived. During embryogenesis, timing and distribution of somatic mutations can have consequences for the somatic mosaicism in the foetus (see Figure 3). Postzygotic somatic mosaicism could be one of the reasons discordant monozygotic twins develop.

Discordant Monozygous Twins
About one in forty pregnancies is a twin pregnancy, and one in three of these twin

Discordant Monozygous Twins
About one in forty pregnancies is a twin pregnancy, and one in three of these twin pairs are monozygous (MZ), siblings originating from one oocyte [116]. The latter implies that MZ twins are genetically identical. Usually, monozygotic twins are also phenotypically very similar. However, MZ twins with concordant chromosomal anomalies, pathogenic CNV, or mutations can have a discordant disease phenotype [117,118]. This phenotypical discordance at birth could be the result of, for instance, differences in epigenetic modifications or, surprisingly, environmental exposure differences. However, recently, it has been shown that not all MZ twins have exactly the same genome [119]. For instance, Bruder and co-workers identified three somatic intra-twin Copy Number Variation (CNV) differences in a cohort of nine Parkinson-like discordant and ten healthy monozygotic twins [120]. It has been suggested that DNA changes could cause twinning as the blastocyst recognises these mutated cells as foreign, resulting in splitting of the blastocyst [116]. Recently, CNV differences were also reported in monozygotic twins discordant for certain congenital anomalies [121][122][123] and both Voigt et al. and Kaplan et al. described postzygotic somatic mosaicism in monozygotic twins discordant for neurofibromatosis type 1 [124,125]. However, Baranzini and co-workers could not find evidence for sequence differences in MZ twins discordant for Multiple Sclerosis [126], and neither could Solomon in a twin pair discordant for VACTERL association [127]. The twinning incidence is 2.6 times higher in OA pregnancies compared with the general background [8]. The Erasmus MC-Sophia cohort of congenital anomalies includes eight OA pairs (labelled OA1 to and OA 8), which are described previously [128]. A description of their phenotype can be found in Table 2.

Absence of Discordant Somatic Mutations in Blood
The concordance rate in monozygotic twins with isolated OA is elevated compared with that of dizygotic twins [129], indicating a genetic component. The twinning incidence is 2.6 times higher in OA pregnancies compared with the general background [8]. Twin concordance rates are relatively low for OA, about 10% [128]. Monozygotic twinning occurs at a very early stage during development from the two-cell stage up to the 16-cell morula stage. Depending on the cell stage at which a mutation occurs, which cells end up in which foetus and the impact that the mutation has on cell survival and proliferation a mutation can be found at various levels of heterozygosity throughout the body (somatic mosaicism, see Figure 3).
The six twin pairs investigated have additional anomalies (see Table 2). We hypothesised that, if a pathogenic somatic mutation was present in the affected twin, it should have arisen very early in development and, as a consequence, be present throughout the body in a frequency detectable by exome sequencing at moderate sequencing depth (see Table 3). Furthermore, this mutation should be undetectable in the unaffected sibling. If detected, the somatic mutation should have a deleterious effect (e.g., a deleterious protein altering change). The blood cells (including the lymphocytes from which DNA is extracted for analysis) are derived from the (splanchnic) mesoderm. Using exome sequencing and different variant detection algorithms (see Appendices A.1 and A.2), we determined if we could find protein altering variant differences between discordant twins. Although differences between discordant siblings were detected using exome sequencing, these appear to be technical artefacts since none of high-quality PSD changes could be validated with Sanger sequencing (see Table 3). Table 3. Determination of the exonic sequence differences in discordant monozygotic twins. All DNA was extracted from blood. CNVdiff; differences in Copy Number Variation size or presence between twin pairs, TAR; total of aligning reads, TARot; total aligning reads on target, ACot; average coverage on target, ACot20; percentage of target covered at least 20X, PPA; predicted protein altering including variants affecting splicing, PPArare,; rare (MAF < 0.001) protein altering, PSD; putative sequence differences, sequence differences depicted using (1) GATK unified genotyper, (2) negative binomial statistics, and (3) Fisher exact test and repeat filter. DAV; differences after validation with Sanger sequencing. Counts and percentages depicted as twin sibling affected and twin sibling not affected. The threshold to detect these differences (a few high-quality alternative alleles are sufficient) is much lower compared with the ability to validate these differences with Sanger sequencing (at least 5-10% of alternative allele has to be present to detect known changes). We were unable to confirm near heterozygous high-quality changes present in both forward and reverse strand differences between the discordant siblings (PSD variants, see Table 3). We did not validate the remaining thousands of other putative differences not predicted to alter the protein. Furthermore, potential disruptive differences could be present outside the exome, in regions not covered enough by the capture kit or that have a low alternative allele count. Mutations in regulatory elements may have a more selective impact on tissue-specific expression, thereby preventing lethality (caused by coding mutation, which has an effect in all tissues where it is expressed). Regardless, high-frequency somatic mosaicism of putative deleterious de novo mutations were not detected in exomes of DNA derived from the blood of discordant monozygous twins.

The Impact of De Novo Mutations during Adulthood
Survival rates of OA/TOF have improved substantially, and the growing adult patient population impose new challenges in long-term morbidity and genetic counselling of (future) parents with a corrected OA/TOF. Patients growing up often have respiratory and gastrointestinal symptoms that require long-term follow-up [4], and adults have an increased risk of developing Barrett's oesophagus (BE) [130].

Barrett's Oesophagus
The aetiology of BE is multifactorial. A genetic predisposition, chronic gastroesophageal reflux, and several other risk factors result in Barrett's oesophagus. This intestinal metaplasia can progress into oesophageal adenocarcinoma (OAC) [131]. BE is a metaplastic mosaic of crypts with unique genetic profiles [132][133][134]. There is a large overlap of pathways important for foregut development, disease genes, and the pathways implicated in BE development and the genetic risk loci. This is suggestive of a shared aetiology and de novo genetic mutation in the germline (Figure 3b,c), and second driver mutations in the affected tissue itself (Figure 3d) could be (part of) the aetiology of BE and its progression to cancer seen in these patients. This intriguing hypothesis needs further investigation as the first hit can be detected before BE occurs.

Heritability of De Novo Mutations
Although progress has been made, it remains difficult to distinguish the background de novo rate from causal de novo mutations in patients with OA/TOF. This is due to several reasons. De novo mutations in OA/TOF affect many genes-often singletons-and the de novo rate in complex patients is only marginally elevated compared with the population background and isolated OA/TOF de novo rate. Additionally, in every pregnancy, there is a 3 to 5 percent chance of a congenital anomaly-independent of congenital anomalies in the family. As the current family recurrence rate of OA/TOF is low, the segregation of pathogenic alterations is rare and new disease genes are not detected too often. However, de novo mutation in the germline can be transmitted to future generations. As the previous patient population with (often undetected) de novo mutations start having families, it is of the utmost importance to offer these patients genetic counselling and access to sequencing of their (and their parental) genomes.

Recommendations
Somatic mutations could be the result of parental germline mosaicism or the result of patient somatic mutations. As a consequence, detection of these anomalies can be successful in saliva or blood for the first condition but unsuccessful for the latter situation if the affected tissue or cell types are not sampled. As most patients born with OA/TOF in the last few decades survived and started families, detection of these de novo mutations (and discrimination between causal and benign changes) are of great importance for the next generation. If a suspected clinical syndrome cannot be confirmed by single gene analysis, screening patients and their parental DNA for the de novo mutations and segregation analysis of putative dominant inherited pathogenic variants is highly recommended. Additionally, there is increasing evidence that at least subpopulations of patients born with OA/TOF are predisposed to develop Barrett oesophagus and are at risk of developing oesophageal cancer. In addition to gastroesophageal reflux, de novo mutations could add to this increased risk. If present in the oesophagus, de novo mutations could be responsible for the increased susceptibility seen in these patients.  Genomic DNA was fragmented (Covaris, Inc. Woburn, MA, USA), and yield and fragment size were determined with the Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) using the Agilent DNA 1000 Kit (Agilent Technologies, Santa Clara, CA, USA). We used the SureSelect Human All Exon 50 Mb Targeted exome enrichment kit v2 (Agilent Technologies, Inc., Santa Clara, CA, USA) and Illumina TruSeq version 4 (Illumina, San Diego, CA, USA) paired end 2× 101 bp sequence procedure on the Hiseq2000 sequencer (Illumina, Inc., San Diego, CA, USA) for all twin pairs. Subsequent de-multiplexing, alignment to the hg19 reference genome with Burrows-Wheeler Aligner version 0.6.2 (BioWeb, Dublin, Ireland) [135], generation of chromosome sorted BAMfiles with SAM tools version 0.1.12a (SAMtools, Kamatero, Greece) [136], and quality control were automated in the NARWHAL pipeline [137]. The quality control parameters include total aligned reads on target, mean coverage of the target region, and target with at least 20× coverage. These are listed in Table 3. We did not obtain enough DNA material for whole exome sequencing of twin pair OA4 and OA7. However, we were able to include an additional discordant OA pair, OA 8. Genotyping and InDel calling were performed with the Bayesian genotyper incorporated in the genome analysis toolkit version 1.2.9 [138], SAM tools mpileup, or in-house developed callers. Variants were annotated with ANNOVAR [139]. Moreover, raw sequencing read alignment, variant calling, and quality control was also performed using CLC-bio (Qiagen Inc., Venlo, the Netherlands). We used both variant calling methods of this software tool: a quality based method and a probabilistic method. Variant confirmation was performed using Sanger sequencing.

Appendix A.2. Whole Exome Variant Analysis and Twin Comparison
We pair-wise compared the variants detected in the Illumina's Exomev1.1 genotyping chip, GATK pipeline, CLC-bio quality-based and probabilistic variant calling pipeline and screened for variants present in one and absent in the other sibling. Moreover, we compared the variants of the Illumina's Exomev1.1 (Illumina, San Diego, CA, USA) genotyping with those called with SAM tools Mpileup (SAMtools, Kamatero, Greece) and the "probabilistic differentiator" developed in-house. First, non-covered base-pairs are labelled with "N". Next, using first binomial and second Fisher exact statistics, we screened the exome variants detected in the SAMtools Mpileup (SAMtools, Kamatero, Greece) pipeline for statistically significant (p ≤ 1.0 × 10 −8 ) differences, taking into account coverage and variant frequency. The top-ranking significant differences were validated with Sanger sequencing in twins and their parents.