Identification of SOX18 as a New Gene Predisposing to Congenital Heart Disease

Congenital heart disease (CHD) is the most frequent kind of birth deformity in human beings and the leading cause of neonatal mortality worldwide. Although genetic etiologies encompassing aneuploidy, copy number variations, and mutations in over 100 genes have been uncovered to be involved in the pathogenesis of CHD, the genetic components predisposing to CHD in most cases remain unclear. We recruited a family with CHD from the Chinese Han population in the present investigation. Through whole-exome sequencing analysis of selected family members, a new SOX18 variation, namely NM_018419.3:c.349A>T; p.(Lys117*), was identified and confirmed to co-segregate with the CHD phenotype in the entire family by Sanger sequencing analysis. The heterozygous variant was absent from the 384 healthy volunteers enlisted as control individuals. Functional exploration via luciferase reporter analysis in cultivated HeLa cells revealed that Lys117*-mutant SOX18 lost transactivation on its target genes NR2F2 and GATA4, two genes responsible for CHD. Moreover, the genetic variation terminated the synergistic activation between SOX18 and NKX2.5, another gene accountable for CHD. The findings strongly indicate SOX18 as a novel gene contributing to CHD, which helps address challenges in the clinical genetic diagnosis and prenatal prophylaxis of CHD.


Introduction
Congenital heart disease (CHD) represents the most frequent kind of human birth malformation, occurring in~1% of all live newborns and up to 10% of stillbirths globally [1,2]. Notably, if mild cardiovascular structural anomalies are included, including bicuspid aortic valve as the most frequent cardiovascular developmental aberration with an estimated incidence of 1 to 2 per 100 of the general population, the total prevalence of CHD in live-born infants may be as high as~5% [3]. As a collective diagnosis for cardiovascular developmental deformities, CHD is anatomically categorized into >20 different clinical subtypes, encompassing pulmonary stenosis (PS), patent ductus arteriosus (PDA), atrial septal defect, and hypoplastic left heart [1,[4][5][6]. Although some mild CHD can resolve spontaneously, severe CHD often leads to worse quality of life associated with health [7][8][9], reduced exercise tolerance [10][11][12], brain injury and neurodevelopmental anomaly [13][14][15][16], thromboembolism [17,18], infective endocarditis [19,20], pulmonary arterial hypertension [21][22][23], chronic kidney disease and acute kidney injury [24][25][26], impaired The present research project was fulfilled by the tenets of the Declaration of Helsinki. The local institutional ethics committee approved the protocols applied to this research. Written informed consent forms were signed by either the patients ≥18 years old or the guardians of the children <18 years old, before the start of the current investigation. For the current study, a pedigree suffering from autosomal-dominant CHD spanning four generations was enlisted from the Chinese Han-race population. A cohort of 384 unrelated volunteers with no CHD was enlisted as the control subjects from the same population in the same geographical area. All study subjects underwent clinical evaluation, including a review of personal, familial, and medical histories, physical examination as well as a transthoracic echocardiogram. Patients ranging from infants to adults were diagnosed with CHD by echocardiography, and some conditions were further validated surgically by a surgeon. Approximately 2 mL of blood samples were collected from every study individual. Extraction of genomic DNA was routinely conducted from study individuals' blood leucocytes.

Genetic Assay
For a selected family member to construct a whole-exome library, a total amount of 2 µg of genomic DNA was utilized. The constructed whole-exome library was captured using the SureSelect Human All Exon V6 kit (Agilent Technologies, Santa Clara, CA, USA) and sequenced under the HiSeq 2000 platform (Illumina, San Diego, CA, USA), as per the manufacturer's protocol. Bioinformatical assay of the datasets produced by whole-exome sequencing (WES, New York, NY, USA) was conducted as described else-where [81][82][83][84]. Sanger sequencing was conducted to validate the candidate variations discovered by WES and bioinformatical analysis. All coding exons along with splicing boundaries of the gene harboring a confirmed deleterious variation were sequenced in all the family members available and the 384 healthy subjects. Co-segregation analysis was carried out on the whole family of this investigation. Additionally, the Single Nucleotide Polymorphism database (dbSNP; https://www.ncbi.nlm.nih.gov/ accessed on 16 May 2020), the 1000 Genomes Projects (1000G; http://www.1000genomes.org/ accessed on 16 May 2020), and the Genome Aggregation Database (gnomAD; http://gnomad-sg.org/ accessed on 16 May 2020) were retrieved to verify its novelty.

Statistics
The promoter activity was gauged using a ratio of firefly relative to renilla luciferase activity and shown as mean ± standard deviation as previously described [73]. The Student's unpaired t-test was adopted to make a comparison between the two groups. When comparisons among multiple groups were performed, one-way ANOVA with a Tukey-Kramer HSD post hoc test was applied. A two-tailed p-value of <0.05 indicated a significant difference.

Phenotypic Data of the Studied Family with CHD
In the current research, shown in Figure 1, a 32-member family affected by CHD spanning four generations was identified from the Han-race population in China. In this large family, 30 living family members were available, including 15 female family members and an equal number of male family members, with ages varying between 5 and 62 years. All nine affected family members were diagnosed with PDA by echocardiogram and underwent catheter-based interventional treatment for closure of PDA. A representative echocardiographic image of the proband's PDA is given in Figure 2.
Additionally, there were four members (I-1, II-1, III-3, and IV-2 in Family 1) also suffering from PS. Genetic research of the family ( Figure 1) unveiled that PDA was transmitted in the whole family with apparently autosomal-dominant inheritance, with complete penetrance. The index patient was a five-year-old boy, and his grandmother's father (I-1 in Family 1) had been diagnosed with PDA and PS as well as lymphedema and died of chronic congestive cardiac failure at the age of 64 years. Furthermore, all CHD sufferers also had telangiectasia and hypotrichosis. No environmental factors prone to CHD were recognized among all family members. The clinical profile of the family members affected by CHD is presented in Table 1.

Discovery of a New CHD-Causative Variation in SOX18
WES was completed in six PDA-affected pedigree members (II-1, II-8, III-3, III-13, IV-2, and IV-8 in Family 1) as well as four healthy pedigree members (II-2, II-7, III-4, and III-14 in Family 1), which yielded~21 Giga bases of sequence data per pedigree member, showing 99% coverage of the human genome (GRCh37/hg19) with~77% located to the target region. A mean of 17,028 nonsynonymous variations (varying from 15,410 to 18,136) per member passed filtering according to the likely transmission models, of which 12 nonsense and missense variations in heterozygous status passed filtering by ANNOVAR with a minor allele frequency of <0.1% and were possessed by the six PDA-affected members who underwent WES, as given in Table 2. Sanger sequencing was performed for each variant, and only the SOX18 variant c.349A>T (p.Lys117*) was co-segregated with the disease in the whole family. Of the other 11 genetic variations, 7 genetic variations (c. and c.1848A>T in ZBTB1) were also present in the healthy members, whilst the remaining 4 generic variations (c.566A>C in KCNT2, c.503G>A in ZMAT3, c.2812A>T in ANK2 and c.1129A>T in RBM20) were absent in two affected members (II-3 and III-8) in the family. Hence, these 11 genetic variations were unlikely to be responsible for PDA in this family. The primers used to amplify the entire coding regions and splicing boundaries of the SOX18 gene were shown in Table 3. Table 3. Intronic primers to amplify the coding exons and splicing donors/acceptors of the SOX18 gene.

Coding Exons
Forward Primers (5 →3 ) Backward Primers (5 →3 ) Amplicons (bp) The chromatograms exhibiting the heterozygous SOX18 variation (A/T) together with its wild-type homozygous bases (A/A) are given in Figure 3a. The schematic representations displaying the crucial structural motifs of both wild-type SOX18 and Lys117*-mutant SOX18 are drawn in Figure 3b. The detected SOX18 variation, which was not found in 768 referential chromosomes, was not released in such databases as dbSNP, 1000G, and gnomAD, indicating a new SOX18 variation. This variant in SOX18,
Furthermore, for multiple SOX18-binding sites, the consensus sequences of the 5 -WWCAAWG-3 (5 -A/TA/TCAAA/TG-3 ) motifs in the promoter of NR2F2 (accession no. NC_000015.10; transcript variant 1) were mapped and are shown in Figure 6a, while multiple SOX18-binding site variants in the promoter of GATA4 (accession no. NC_000008.11; transcript variant 1) were mapped and are shown in Figure 6b. In addition to the SOX18binding sites (highlighted by red color), the primer sequences (highlighted by bold underlines) and the first exons encoding mRNAs (highlighted by green color) are also shown, just in order to facilitate finding them from genomic DNA sequences.

Discussion
For the current investigation, a Chinese Han-race family affected by autosomaldominant CHD spanning four generations was recruited. By WES and bioinformatics analysis of the pedigree members, a heterozygous SOX18 variation, namely NM_018419.3: c.349A>T; p.(Lys117*), was discovered and substantiated via Sanger sequencing assay to co-segregate with the CHD phenotype in the entire family. This variation in SOX18 was neither found in 768 referential chromosomes nor retrieved in dbSNP, 1000G, or gnomAD. Quantitative biological measurements unveiled that Lys117*-mutant SOX18 was unable to transcriptionally activate the promoters of NR2F2 and GATA4, two CHD-causing genes [85]. Furthermore, the variation nullified the synergistic activation between SOX18 and NKX2.5, another gene responsible for CHD [85]. The findings strongly indicate that genetically compromised SOX18 contributes to the molecular pathogenesis of CHD. However, although the SOX18 mutation nicely segregates with the disease, and is a novel mutation, there is a view that an occurrence in more than one family is necessary to be able to call a gene disease-causing in CHD. Hence, it is necessary to perform a sequencing analysis of SOX18 in other cohorts of PDA patients to identify more PDA-causative SOX18 mutations in unrelated families with PDA.
SOX18 was mapped on human chromosome 20q13.33, encoding a transcription factor with 384 amino acids, which belongs to a member of the SRY (sex-determining region Y)-related box (SOX) family of transcription factors [86,87]. Human SOX18 protein has two critical structural domains: the transcription activation domain (TAD) and the high-mobility group (HMG) domain. The N-terminal HMG box is essential for the sequence-specific binding of SOX18 to the consensus SOX DNA-binding motif of (A/T)(A/T)CAA(A/T)G (with the core DNA consensus sequence being AACAAT) in the promoters of its target genes, whilst the C-terminal TAD functions to transactivate target genes and also serves as a binding region for other transcriptional factors as transcriptionally cooperative partners of SOX18 [86,88]. SOX18 is abundantly expressed in the heart and vessels during embryogenesis, playing a critical role in embryogenic cardiovascular development and postnatal neovascularization, maybe by regulating the expression of target genes key to cardiovascular development, such as NR2F2 and GATA4, alone or synergistically with its partners including NKX2.5 and MEF2C [73,78,[86][87][88][89][90]. Moreover, loss-of-function variations in NR2F2, GATA4, and NKX2.5 as well as MEF2C have been implicated in the occurrence of CHD [85,91,92]. In this investigation, the identified Lys117* variation was anticipated to create a truncated SOX18 protein lacking TAD and a part of the HMG domain, which was anticipated to exert a loss-of-function effect. Functional assays demonstrated that Lys117*-mutant SOX18 had no transactivation on its two representative downstream genes NR2F2 and GATA4, alone or synergistically with its partner NKX2.5. Additionally, the present investigation unveiled that Lys117*-mutant SOX18 had no dominant-negative effect on wild-type SOX18. The findings suggest SOX18 haploinsufficiency as an alternative genetic mechanism underpinning CHD that occurred in this family.
In vertebrates, at least 20 SOX genes have been cloned and subdivided into 10 groups (from group A to I), of which SOX18, SOX7, and SOX17 belong to group F of the SOX family (SoxF) [93]. It was demonstrated that three SOXF members are all co-expressed in the cardiovascular system and function to regulate cellular specification and tissue differentiation during cardiovascular development [93]. It may be attributable to abnormal cardiovascular morphogenesis that SOX18 variation predisposes to CHD. In many species of animals, encompassing mice, zebrafish, Xenopus, and humans, SOX18 is expressed predominantly in the embryonic cardiovascular systems, playing a key role in cardiogenesis and vascular development, mainly via regulating the specification and differentiation of endothelial cells and cardiogenic mesoderm [86,87,93,94]. In Xenopus, injection of morpholinos against either Sox18 or Sox7 mRNAs led to partial inhibition of cardiogenesis, whereas co-injection of Sox18 and Sox7 morpholinos caused strong inhibition of cardiogenesis [94]. Furthermore, Sox18 mRNAs rescued the effects of the Sox7 morpholinos and vice versa, indicating that the two SOX proteins have functionally redundant roles [94]. In zebrafish, Sox18 and Sox7 morphants individually manifested minor vascular aberrations, whilst Sox18/Sox7 double morphants displayed severe arterial-venous abnormalities as well as branching abnormalities of intersomitic vessels and loss of circulation in the trunk [86]. Additionally, only a part of Sox7 −/− zebrafish exhibited a lack of trunk circulation and a short circulatory loop, while the phenotypes were observed with complete penetrance in double Sox18 −/− /Sox7 −/− zebrafish, suggesting that Sox18 and Sox7 exert a redundant role during cardiovascular morphogenesis [86]. In mice, aberration of heart looping, enlargement of the cardinal vein, and deformation of anterior dorsal aorta occurred in the Sox17-deficient embryos, while, in the Sox17/Sox18 double-knockout embryos, more severe deformities occurred in the anterior dorsal aorta as well as head/cervical microvasculature, and an aberrant fusion of the endocardial tube as well as abnormal differentiation of endocardial cells was observed in some cases [93]. In the mice overexpressing SOX18 with a dominant-negative mutation, hemorrhages (rupture or distention of peripheral embryonic vessels), edema, and embryonic demise occurred due to cardiovascular defects [93]. By contrast, Sox18 −/− mice were viable and fertile, without apparent abnormality in their hearts and vasculature, suggestive of functional compensation by Sox7 and Sox17, the two other SoxF genes [93]. In humans, loss-of-function alterations in both SOX7 and SOX17 have been related to CHD [73,78]. Moreover, mutations in TFAP2B have been reported to cause syndromic PDA by interfering with the inhibitory effect of TFAP2B on the canonical Wnt/β-catenin signaling pathway [95]. Given that all of the Xenopus, murine and human SOXF factors have a conserved β-catenin binding domain at the C-terminus and interact with β-catenin to repress the activity of β-catenin/TCF transcriptional complexes, and therefore suppress the Wnt/β-catenin signaling [86], the SOX18 mutation identified in our study contributed to PDA probably by a similar mechanism. Taken together, these research results indicate that genetically compromised SOX18, one of three SOXF subgroup members that function redundantly, contributes to CHD in humans.
Previously deleterious mutations in multiple genes have been associated with syndromic PDA in humans, including TBX1-associated DiGeorge syndrome, TBX5-associated Holt-Oram syndrome, PTPN11-associated Noonan syndrome, SMADIP1-associated Mowat-Wilson syndrome, CREBBP-associated Rubinstein-Taybi syndrome, TGFBR1/2-associated Loeys-Dietz syndrome, ABCC9/KCNJ8-associated Cantu syndrome, and TFAP2B-associated Char syndrome [96]. Moreover, pathogenic mutations in several genes have been related to non-syndromic CHD with PDA being a prominent phenotype, including FLNArelated PDA and periventricular heterotopia and MYH11/ACTA2-related PDA and aortic aneurysm [96]. Additionally, there were common single nucleotide polymorphisms in several genes associated with enhanced susceptibility to non-syndromic PDA, including the rs987237 polymorphism in TFAP2β, the rs1056567 polymorphism in TRAF1, and the rs5186 polymorphism in AGTR1 [96]. In this investigation, SOX18 was identified as a new gene predisposing to non-syndromic PDA. However, whether SOX18 regulates the expression of these known PDA-related genes remains to be elucidated.
Previous investigations have unveiled that a premature translation termination codon may result in the degradation of mRNA in different types of organisms and cell lines through a mechanism named nonsense-mediated mRNA decay (NMD), a translationdependent, multi-step process which monitors and degrades faulty or irregular mRNA [97]. In the present research, the nonsense mutation in SOX18 created a premature translation termination codon; hence, the mutant SOX18 mRNA was likely to undergo NMD, though not all nonsense mutations triggered it [98]. At present, we could not rule out NMD in the SOX18 mutation carriers because of the unavailability of their cardiac tissue samples, where the mutant SOX18 protein might be expressed. Even though the mutant SOX18 mRNA underwent NMD, the overall quantity of SOX18 mRNA would be reduced by half, leading to haploinsufficiency, which was consistent with our functional results. Of note, downstream intron or pre-mRNA splicing, which brings about the deposition of a multi-protein complex, termed exon-junction complex, approximately 20-24 nucleotides upstream of each exon-exon junction, is necessary for the degradation of mRNA harboring a premature translation termination codon by the mechanism of NMD. Therefore, NMD could not occur in the context of cDNA constructs [97].
Notably, variations in the SOX18 gene have previously been involved in hypotrichosislymphedema-telangiectasia syndrome as well as aortic dilation, pulmonary hypertension, dysmorphic face, renal failure, hydrocele, chylothorax, dysplastic nails, and cutis marmorata in humans [99]. In the present study, in addition to CHD, all the affected family members also manifested telangiectasia and hypotrichosis. Furthermore, the proband's grandmother's father also had lymphedema. Hence, these observational results expanded the phenotypic spectrum ascribed to SOX18 variations.

Conclusions
The current research suggests SOX18 as a new gene contributing to CHD, which is conducive to the clinical prognostic risk evaluation and timely prenatal prophylaxis of CHD in a subset of cases. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study or their legal guardians.

Data Availability Statement:
The data supporting the discovery of the current research are available upon a reasonable request.