A Complex Intrachromosomal Rearrangement Disrupting IRF6 in a Family with Popliteal Pterygium and Van der Woude Syndromes

Clefts of the lip and/or palate (CL/P) are considered the most common form of congenital anomalies occurring either in isolation or in association with other clinical features. Van der woude syndrome (VWS) is associated with about 2% of all CL/P cases and is further characterized by having lower lip pits. Popliteal pterygium syndrome (PPS) is a more severe form of VWS, normally characterized by orofacial clefts, lower lip pits, skin webbing, skeletal anomalies and syndactyly of toes and fingers. Both syndromes are inherited in an autosomal dominant manner, usually caused by heterozygous mutations in the Interferon Regulatory Factor 6 (IRF6) gene. Here we report the case of a two-generation family where the index presented with popliteal pterygium syndrome while both the father and sister had clinical features of van der woude syndrome, but without any point mutations detected by re-sequencing of known gene panels or microarray testing. Using whole genome sequencing (WGS) followed by local de novo assembly, we discover and validate a copy-neutral, 429 kb complex intra-chromosomal rearrangement in the long arm of chromosome 1, disrupting the IRF6 gene. This variant is copy-neutral, novel against publicly available databases, and segregates in the family in an autosomal dominant pattern. This finding suggests that missing heritability in rare diseases may be due to complex genomic rearrangements that can be resolved by WGS and de novo assembly, helping deliver answers to patients where no genetic etiology was identified by other means.


Introduction
Orofacial clefts (OFC), specifically, clefts of the lip and/or the palate (CL/P), are among the most common form of congenital craniofacial anomalies affecting about 1 in 500 to 1 in 2500 births depending on the population [1,2]. The majority of cleft lip and palate cases occur as a non-syndromic isolated phenotype with complex disease etiology, while about 30% of cases are syndromic occurring in association with other mendelian phenotypes [1,2]. Previous studies have shown that both genetic and environmental factors contribute to the cause of orofacial clefts making it difficult to identify the main etiology in many cases; however, it was also shown that many of the syndromic cases of CL/P were due to chromosomal abnormalities and/or monogenic causes [3]. Currently, more than 500 syndromes have been found to be associated with syndromic CL/P, with Van der Woude syndrome (VWS, OMIM 119300) being the most common, accounting for about 2% of all CL/P cases [4,5]. VWS is dominantly inherited and characterized by orofacial clefts with additional phenotypes including lower lip pits and hypodontia in some cases [5,6]. Like VWS, Popliteal Pterygium Syndrome (PPS, OMIM 119500) is an autosomal dominant condition also characterized by orofacial clefts, lower lip pits with additional features including genital and skeletal anomalies, skin webbing, and syndactyly of toes or fingers [5,6]. PPS is considered a more serious form of VWS with popliteal webbing, CL/P, and syndactyly used as key clinical differentiators occurring in more than half of all PPS cases [4,7]. Both VWS and PPS are very rare occurring in about 1 in 35,000 and 1 in 300,000 births respectively, and both are reported to be caused by mutations in the Interferon Regulatory Factor 6 gene (IRF6, OMIM 607199) [4,5].
Structural variants (SVs) are a class of genomic variation that are larger than 50 base pairs in size and account for about 1% of differences in terms of base content across human populations [8]. SVs may be simple deletions and duplications (together referred to as copy number variants (CNVs)) or complex events, including inversions, insertions, and translocations, all of which may either be balanced or unbalanced and/or co-occur with CNVs, in some cases involving three or more breakpoints [9]. Like other classes of genomic variation, SVs can be either benign or disease-causing, where they lead to gene disruption or gene dosage alterations [8]. Previous studies have reported complex SVs in patients with different Mendelian disorders; for instance, an inverted triplicated segment within a duplication was found at MECP2 and PLP1 gene loci in patients with Lubs syndrome and Pelizaeus-Merzbacher disease [9,10]. Additionally, recent studies reported the association of complex structural variants with neuropsychiatric conditions and ASD [9]. Despite these discoveries, the true prevalence of complex SVs in Mendelian disorders remain poorly understood due to the analytical and technical challenges of both discovering and interpreting such variants from short read data analysis pipelines, which are typically used in both whole exome (WES) and whole genome sequencing (WGS) today [9].
Recent improvements in variant identification tools, and in particular the development of more sensitive and specific algorithms for SV discovery, have aided the effort to identify and understand complex structural variations from short or long DNA fragments [11,12]. Once identified, de novo assembly of genomic regions of interest may help resolve difficult regions and understand their role of complex SVs in Mendelian disease [13,14].
Here, we describe the use of WGS on a family with both VWS and PPS who were initially negative by clinical WES, but in whom we use WGS with combined SV tools to identify and validate a novel complex intrachromosomal rearrangement on chromosome 1 disrupting the IRF6 protein. We believe such approaches could be generalized to other pediatric syndromic disorders that are exome-negative but for which WGS data can be generated and where orthogonal SV detection approaches may help identify genetic etiology.

Detailed Case Description
The index patient is a female 2nd child to non-consanguineous parents. She presented to the Plastics and Craniofacial clinic at 17 months of age and was diagnosed with Popliteal Pterygium syndrome (PPS). She had bilateral cleft lip (complete on the left, incomplete on the right) and cleft palate (both were repaired elsewhere) with large residual anterior fistula (Figure 1B). She also had lower lip pits and cysts on both sides, right popliteal pterygium, and bilateral complete simple syndactyly of second and third webspace of the hands (only nail plates fused were on the right-hand 3rd-4th digits) ( Figure 1C-E). She was otherwise developmentally normal, had normal intelligence, normal cardiovascular and respiratory system, with no other anomalies or medical conditions noted. The older sister presented to our institution at 38 months of age with bilateral cleft lip and cleft palate (both repaired elsewhere) with large residual anterior fistula and had lower lip pits and cysts on both sides consistent with the diagnosis of van der woude syndrome (VWS) ( Figure 1F). Similarly, the father -39 years old-also showed clinical features of VWS, having bilateral cleft lip (repaired elsewhere), unrepaired cleft of the entire secondary palate, and lower lip pits and cysts ( Figure 1G). Both the father and the older sister were otherwise normal, with no other anomalies or medical conditions noted. Sanger sequencing of the IRF6 gene revealed no candidate pathogenic variants, prompting enrollment into the Qatari Mendelian Disease Program, where whole genome sequencing (WGS) was performed for the entire family.
( Figure 1C-E). She was otherwise developmentally normal, had normal intelligence, normal cardiovascular and respiratory system, with no other anomalies or medical conditions noted. The older sister presented to our institution at 38 months of age with bilateral cleft lip and cleft palate (both repaired elsewhere) with large residual anterior fistula and had lower lip pits and cysts on both sides consistent with the diagnosis of van der woude syndrome (VWS) ( Figure 1F). Similarly, the father -39 years old-also showed clinical features of VWS, having bilateral cleft lip (repaired elsewhere), unrepaired cleft of the entire secondary palate, and lower lip pits and cysts ( Figure 1G). Both the father and the older sister were otherwise normal, with no other anomalies or medical conditions noted. Sanger sequencing of the IRF6 gene revealed no candidate pathogenic variants, prompting enrollment into the Qatari Mendelian Disease Program, where whole genome sequencing (WGS) was performed for the entire family.

Results
The family was first referred with a suspicion of PPS and VWS, however routine pathology investigation, including re-sequencing of known gene panels and microarray testing, found no pathogenic variants in IRF6-a known candidate gene causing both conditions-segregating with disease in this family. The family was thus enrolled for WGS as part of the Qatar Mendelian Disease Program [15], to identify a genetic etiology for their clinical phenotype. All family members were enrolled by informed consent, and their

Results
The family was first referred with a suspicion of PPS and VWS, however routine pathology investigation, including re-sequencing of known gene panels and microarray testing, found no pathogenic variants in IRF6-a known candidate gene causing both conditions-segregating with disease in this family. The family was thus enrolled for WGS as part of the Qatar Mendelian Disease Program [15], to identify a genetic etiology for their clinical phenotype. All family members were enrolled by informed consent, and their genomes were sequenced to a minimum depth of 30×, and data was processed to identify rare, putatively pathogenic variants segregating with disease (for details see Appendix A). The WGS analysis revealed no immediate candidate pathogenic variants in the sub−50 bp range (single nucleotide variants or insertions/deletions (INDELS)). We therefore employed an in-house structural variant discovery pipeline (see Appendix A), to look for possible chromosomal abnormalities that could be causing the phenotype.
Among 136 deletion, duplication, and inversion events shared by affected family members (Figure 2), we detected a copy-neutral rearrangement segregating in affected family members on the long arm of chromosome 1 (1q32.2) that appeared to overlap the IRF6 gene. This event was captured by our analysis pipeline as two consecutive deletions, 177 kb and 251 kb, that were both encompassed by a larger 429 kb duplication event (Figure 3). To further resolve the possible rearrangement event, we selected all reads within 500 kb window upstream and downstream of the putative breakpoints and proceeded with de novo assembly of short-read data at this locus. This approach revealed a rearranged allele in which the sequence of the two candidate 'deleted' segments originally detected were in fact reversed in order along the same chromosome, i.e., a deletion of the upstream segment and re-insertion of this first segment downstream of the second segment. Notably, the breakpoints identified from the de novo assembly suggested that exon 6 of the IRF6 gene was disrupted ( Figure 4A). By comparison to publicly available as well as internal Qatari structural variation data [16], this variant appeared to be novel, and was shared among the 3 affected family members, and absent from the unaffected mother and sibling, and thus consistent with autosomal dominant disease etiology.
The WGS analysis revealed no immediate candidate pathogenic variants in the sub−50 bp range (single nucleotide variants or insertions/deletions (INDELS)). We therefore employed an in-house structural variant discovery pipeline (see Appendix A), to look for possible chromosomal abnormalities that could be causing the phenotype.
Among 136 deletion, duplication, and inversion events shared by affected family members (Figure 2), we detected a copy-neutral rearrangement segregating in affected family members on the long arm of chromosome 1 (1q32.2) that appeared to overlap the IRF6 gene. This event was captured by our analysis pipeline as two consecutive deletions, 177 kb and 251 kb, that were both encompassed by a larger 429 kb duplication event (Figure 3). To further resolve the possible rearrangement event, we selected all reads within 500 kb window upstream and downstream of the putative breakpoints and proceeded with de novo assembly of short-read data at this locus. This approach revealed a rearranged allele in which the sequence of the two candidate 'deleted' segments originally detected were in fact reversed in order along the same chromosome, i.e., a deletion of the upstream segment and re-insertion of this first segment downstream of the second segment. Notably, the breakpoints identified from the de novo assembly suggested that exon 6 of the IRF6 gene was disrupted ( Figure 4A). By comparison to publicly available as well as internal Qatari structural variation data [16], this variant appeared to be novel, and was shared among the 3 affected family members, and absent from the unaffected mother and sibling, and thus consistent with autosomal dominant disease etiology.  We then proceeded to validate this rearrangement in the lab. To do this, we designed 3 sets of primers around the rearranged breakpoints ( Figure A1). We amplified these genomic segments using PCR, and found the reference allele in all family members. While unaffected members were homozygous for the reference allele, the 3 affected members were heterozygous for it, alongside the rearranged allele as predicted from the in silico analysis ( Figure 4A,B). Genes 2023, 14, x FOR PEER REVIEW 5 of 10   We then proceeded to validate this rearrangement in the lab. To do this, we designed 3 sets of primers around the rearranged breakpoints ( Figure A1). We amplified these genomic segments using PCR, and found the reference allele in all family members. While unaffected members were homozygous for the reference allele, the 3 affected members were heterozygous for it, alongside the rearranged allele as predicted from the in silico analysis ( Figure 4A,B).

Discussion and Conclusions
In this paper, we present a two-generation family with an intrafamilial phenotypic variability who were found to have a complex rearrangement event on chromosome 1 disrupting the continuity of the IRF6 gene. The index patient was diagnosed with popliteal pterygium syndrome, while the father and the sister showed clinical features of van der woude syndrome, both of which are caused by mutations in the IRF6 gene.
Both popliteal pterygium and van der woude syndromes are dominantly inherited and mainly characterized by orofacial clefts. In popliteal pterygium syndrome, the most common feature present in more than 90% of cases is cleft palate with/without cleft lip and it includes additional cutaneous, genital, and musculoskeletal phenotypes such as popliteal skin webbing occurring in 58% of cases, genital anomalies in 37%, syndactyly in 50% and nail anomalies in 33% of cases, and any three of these phenotypes must be present for the diagnosis of PPS [17]. On the other hand, in addition to the cleft lip and/or

Discussion and Conclusions
In this paper, we present a two-generation family with an intrafamilial phenotypic variability who were found to have a complex rearrangement event on chromosome 1 disrupting the continuity of the IRF6 gene. The index patient was diagnosed with popliteal pterygium syndrome, while the father and the sister showed clinical features of van der woude syndrome, both of which are caused by mutations in the IRF6 gene.
Both popliteal pterygium and van der woude syndromes are dominantly inherited and mainly characterized by orofacial clefts. In popliteal pterygium syndrome, the most common feature present in more than 90% of cases is cleft palate with/without cleft lip and it includes additional cutaneous, genital, and musculoskeletal phenotypes such as popliteal skin webbing occurring in 58% of cases, genital anomalies in 37%, syndactyly in 50% and nail anomalies in 33% of cases, and any three of these phenotypes must be present for the diagnosis of PPS [17]. On the other hand, in addition to the cleft lip and/or palate, lower lip pits must be present in order to be diagnosed with van der woude syndrome [4].
Previously, Tan et al. reported a de novo 2.3 Mb microdeletion of 1q32.2 region involving the IRF6 gene in a patient diagnosed with van der woude syndrome [6]. Upon searching the literature, no cases with complex structural variants involving the IRF6 gene locus were found the variant in this family was a highly complex rearrangement that did not affect exonic sequence, and would therefore be missed using routine targeted and/or exome sequencing approaches. While it is known that families with IRF6 mutations can have significant inter-and intra-familial phenotypic variability, such cases are normally seen in families with point mutations in IRF6, no previous cases were reported with such complex intrachromosomal rearrangement event [18]. Further, the wide phenotype diversity seen in the previously described and current cases, supports the idea that the two syndromes, VWS and PPS, represent two ends of a phenotypic spectrum of a single condition rather than two separate disorders [18].
The use of WGS offers the ability to detect a multitude of genomic variants ranging from simple base-substitutions to complex rearrangements, including copy number changes, as well as copy-neutral inversions and translocations [19]. However, a major challenge remains accurately analyzing, filtering, and interpreting WGS data to identify such variants [19]. In this study, we identified variants on chromosome 1 that by traditional tools suggested 2 successive deletions overlapping a duplication. This prompted further analysis to resolve this variant. We therefore leverage de novo local genome assembly as a powerful approach to reconstruct the allele from the ground up. This approach revealed a complex intra-chromosomal translocation, where one genomic segment was removed and re-inserted downstream of its neighboring segment on the 1q32.2 region. This rearrangement affecting exon 6, disrupting the continuity of the IRF6 gene, and causing the dominantly inherited phenotypes observed in this family.
In the human genome, structural variants (SVs) are considered a significant source of variation, and it was shown that complex structural variants, representing 2% of SVs, are more abundant than what have been previously thought, playing an important role in Mendelian diseases [9]. However, due to the challenge in identifying and interpreting such complex SVs, they are typically missed or under-reported during genomic analyses. This familial case study demonstrates the potential of using de novo assembly from short-read WGS data and the use of several structural variant detection tools to identify such rare and complex intrachromosomal rearrangements, which could help with solving the missing heritability in a subset of patients with rare diseases in clinical settings. Blood samples were collected from all available family members. Genomic DNA was extracted and WGS data were generated and processed as described previously [20]. Variant identification was based on the following criteria: (1) Being in the coding region including exonic, splice-site region, (2) being rare (<1%) in all mutation databases (i.e., 1000 genomes [21], gnomAD [22], ExAC [23], and ESP6500 [24], and (3)  Family-level consensus vcf files were generated using SURVIVOR [25]. We retained only SVs reported by at least two tools, with sizes ranging from 50 bp to 10 Mb. The annotation of structural variants was carried out using ANNOTSV version 2.2 [26]. We primarily retained variants predicted to disrupt genes shared by affected individuals, absent in unaffected family members and either novel or very rare in global databases. For the visualization of candidate structural variants, we used SAMPlot version 1.0.7 [PMID: 34034781], and confirmation was performed as described below. For estimation of exact region around breakpoints affected by rearrangement event we debug local assembly and produce the assembly graph at breakpoints of event using SvABA [PMID: 29535149].

Appendix A.2. Primer Design
To confirm the 3 breakpoints identified algorithmically, primers were designed across the breakpoints to differentiate the normal from the rearranged allele. For the reference allele, primer set 1 (P1) was around the 1st breakpoint, primer set 2 (P2) around the 2nd breakpoint, and primer set 3 (P3) around the 3rd breakpoint ( Figure A1). Similarly, for the rearranged allele a mixture of the same primer's sets was used around the breakpoints to confirm the occurrence of a rearrangement event in affected family members ( Figure A1). The sequence of the primers used is found in Table A1. Visualization of rearranged allele sequence generated using SvABA de novo assembly( Figure A2).
Blood samples were collected from all available family members. Genomic DNA was extracted and WGS data were generated and processed as described previously [20]. Variant identification was based on the following criteria: (1) Being in the coding region including exonic, splice-site region, (2) being rare (<1%) in all mutation databases (i.e., 1000 genomes [21], gnomAD [22], ExAC [23], and ESP6500 [24], and (3)  , and applied the best practices recommended for each tool. Family-level consensus vcf files were generated using SURVIVOR [25]. We retained only SVs reported by at least two tools, with sizes ranging from 50 bp to 10 Mb. The annotation of structural variants was carried out using ANNOTSV version 2.2 [26]. We primarily retained variants predicted to disrupt genes shared by affected individuals, absent in unaffected family members and either novel or very rare in global databases. For the visualization of candidate structural variants, we used SAMPlot version 1.0.7 [PMID: 34034781], and confirmation was performed as described below. For estimation of exact region around breakpoints affected by rearrangement event we debug local assembly and produce the assembly graph at breakpoints of event using SvABA [PMID: 29535149].

Appendix A.2. Primer Design
To confirm the 3 breakpoints identified algorithmically, primers were designed across the breakpoints to differentiate the normal from the rearranged allele. For the reference allele, primer set 1 (P1) was around the 1st breakpoint, primer set 2 (P2) around the 2nd breakpoint, and primer set 3 (P3) around the 3rd breakpoint ( Figure A1). Similarly, for the rearranged allele a mixture of the same primer's sets was used around the breakpoints to confirm the occurrence of a rearrangement event in affected family members ( Figure A1). The sequence of the primers used is found in Table A1. Visualization of rearranged allele sequence generated using SvABA de novo assembly( Figure A2). Figure A1. Primer design. Forwards and reverse primers were designed around the three breakpoints in both the reference and the rearranged alleles. Figure A1. Primer design. Forwards and reverse primers were designed around the three breakpoints in both the reference and the rearranged alleles.  Figure A2. Visualization of rearranged allele sequence generated using SvABA de novo assembly.

Appendix A.3. Polymerase Chain Reaction (PCR)
A ProFlex PCR System thermocycler (Applied Biosystems, Foster City, CA, USA) was used. The PCR primer pairs were designed to be specific to the region around the breakpoints in both the reference and rearranged alleles. PCR reactions included a MasterMix (Invitrogen, Carlsbad, CA, USA), forward primer (10 µM), reverse primer (10 µM), and purified DNA. Reaction mix was amplified using 35 cycles as the following: a denaturation phase at 95 °C for 30 s, an annealing phase for 45 sec at a temperature that suites the primers and an extension phase at 72 °C for 1 min. Amplified products (110 bp) were analyzed using ChemiDoc MP Imaging System (Bio-Rad, Hercules, CA, USA) following a 1.5% agarose gel electrophoresis (Invitrogen, Waltham, MA, USA) containing SYBR Safe (Invitrogen, Waltham, MA, USA).