Functional Characterization of NIPBL Physiological Splice Variants and Eight Splicing Mutations in Patients with Cornelia de Lange Syndrome

Cornelia de Lange syndrome (CdLS) is a congenital developmental disorder characterized by distinctive craniofacial features, growth retardation, cognitive impairment, limb defects, hirsutism, and multisystem involvement. Mutations in five genes encoding structural components (SMC1A, SMC3, RAD21) or functionally associated factors (NIPBL, HDAC8) of the cohesin complex have been found in patients with CdLS. In about 60% of the patients, mutations in NIPBL could be identified. Interestingly, 17% of them are predicted to change normal splicing, however, detailed molecular investigations are often missing. Here, we report the first systematic study of the physiological splicing of the NIPBL gene, that would reveal the identification of four new splicing isoforms ΔE10, ΔE12, ΔE33,34, and B’. Furthermore, we have investigated nine mutations affecting splice-sites in the NIPBL gene identified in twelve CdLS patients. All mutations have been examined on the DNA and RNA level, as well as by in silico analyses. Although patients with mutations affecting NIPBL splicing show a broad clinical variability, the more severe phenotypes seem to be associated with aberrant transcripts resulting in a shift of the reading frame.


Introduction
Cornelia de Lange syndrome (CdLS; OMIM 1227470, 300590, 610759, 614701, 300882) is a congenital developmental disorder characterized by distinctive craniofacial features, growth retardation, cognitive impairment, limb defects, hirsutism, and abnormalities of other systems with variable expressivity [1]. Mutations in five genes, encoding structural components of the cohesin complex (SMC1A, SMC3, and RAD21) and its regulators (NIPBL and HDAC8) have been found in patients with CdLS [2][3][4][5][6][7][8][9]. Cohesin was originally described for its function in regulating sister chromatid cohesion during mitosis and meiosis, but have also been demonstrated to play a critical role in DNA-damage repair and the regulation of gene expression [3].
Approximately 60% of patients with CdLS carry an identifiable mutation in NIPBL [2,4,5]. This gene is located on chromosome 5p13.2 and contains 47 exons. Thus far, only two splicing isoforms have been detected in embryonic tissues, although there may be more variants due to the large size of NIPBL [10,11]. In the main isoform (A), exons 2-47 codify for 2804 amino acids. The alternative isoform (B) does not include exon 47 and ends in an expanded variant of exon 46, codifying for 2697 amino acids. Both isoforms are conserved in vertebrates and are identical from amino acid 1 to 2683, while the C-terminal ends are different [11][12][13].
Currently, nearly 300 mutations in NIPBL gene have been reported, 49 of which affect splice-sites, representing 17% of total mutations [14,15]. However, systematic investigations on physiological splicing have not been evaluated in most of the cases. While truncating mutations are usually associated with a severe phenotype, and non-truncating mutations with a mild phenotype, clinical manifestations observed in patients with splicing mutations are highly variable in CdLS [4,16].
In this article, we have performed the first systematic analysis of the physiological splicing of NIPBL, which has yielded four new splicing variants. In addition, we have characterized the pathological splicing on twelve CdLS patients, in which we have identified seven new splicing mutations. This study was carried out using various molecular approaches on DNA, on RNA and in silico analyses.

Clinical Findings
Clinical evaluation by at least two expert clinicians of all patients included in this study confirmed they meet the criteria for CdLS according to Kline et al. [1]. Molecular and clinical data relative to each patient are summarized in Table 1.
The relevance of all nine splice-site mutations identified was investigated by specific RT-PCR using RNA isolated from fresh blood samples in order to identify aberrant splicing variants.     Among the aberrant transcripts found, five cause a frameshift (patients 1, 2, 4, 5, 6, and 8), whereas three preserve the reading frame (Patients 3A, 3B, 7A, 7B, and 10) ( Figure 1, Table 1).
The effect of all mutations was evaluated in silico with the programs Splice Site Prediction by Neural Network, HSF Matrices and MaxEnt [19,20].
For mutations located in the canonical position +1, which resulted in exon skipping, donor site disruption was confirmed (Table 1) with the three different programs used.
For c.5575-1G>A, which causes a loss of the first bp of exon 30, the three programs confirmed the disruption of the acceptor sequence, and predicted the generation of a new acceptor site one nucleotide downstream (Table 1).
For the rest of acceptor site mutations, in c.869-2A>G, c.3856-5delT, and c.5329-6T>G that resulted in exon skipping, acceptor site disruption was predicted (Table 2). However, the change c.6109-3T>C, which only showed the normal transcript, barely modified the acceptor site score strength ( Table 2).
For donor site mutations, c.4320+4A>G leads to the insertion of four bps after exon 19. This mutation disrupted the original donor site in exon 19, but HSF Matrices and MaxEnt predicted the creation of a new one, four positions downstream (Table 1). In exon 45, the mutation c.7860+5G>A, which caused the deletion of 33 bp at the 3' end, disrupted the original donor site, but a cryptic donor sequence was found 33 nucleotides upstream (Table 1).

New Physiological Splicing Variants of NIPBL
Using various combinations of different oligonucleotides specifically aligning to diverse exons in NIPBL, five splice variants could be amplified and confirmed on cDNA level (Figure 2b, Table 2). Interestingly, only one of which has been described as isoform B, while the remaining four have not been reported. All four new variants are the result of whole exon skipping, affecting exons 10, 12, 33 + 34, or 45. They have been submitted to GenBank, with the following accession numbers: KJ807789, KJ807790, KJ807791, and KJ807792.   (Table 2).

Discussion
In this work, we describe the first coordinated analysis of twelve CdLS-causing splice-site mutations in the NIPBL gene on DNA and RNA level as well as by in silico analyses. In order to properly assess aberrant splicing, we initially investigated the physiological NIPBL-splicing using RNA isolated from human leukocytes of normal controls.
Currently, two NIPBL splicing isoforms have been found in embryonic human tissues. Isoform A encodes a 2804 amino acid protein, while isoform B differs at in the 3'-part and codes for a 2697 aa NIPBL protein [12]. By our analyses we could identify and confirm the presence of four new isoforms (splice variants) in addition to isoform A and B in adult human leukocytes. One of these new isoforms, named Isoform B', represents a transcript similar to isoform B excluding exon 45 which results in a shift of the reading frame. Systematic amplification of overlapping fragments could describe three new variants, carrying a deletion of exon 10 (ΔE10), exon 12 (ΔE12), and exons 33 + 34 (ΔE33,34), respectively (Figure 2b, Table 2). These findings were further supported by different in silico analyses indicating very weak splice acceptor-sites of exons 10, 12, 33, 45, and 47 (Table 2), with the exception of splice acceptor-site exon 34, which was predicted as strong splice-site [21]. We could show a combined skipping of exon 34 with the weak exon 33, which may drag the strong exon 34 during the splicing process, as previously reported for other genes [22,23].
Variants with deletion of exon 10, exon 12, and exons 33 + 34 maintain the reading frame, and could lead to functional proteins. The deletions of exons 10 and 12 affect the amino-terminal half of the gene, which is highly conserved in evolution [11]. However, the deletion of exons 33 + 34 affects the ancient carboxy-terminal half, which is conserved due to lower eukaryotes. Bioinformatic analyses suggest that variant ΔE10 could affect the undecapeptides repeat, which has been associated with transcriptional regulation; while variant ΔE12 would eliminate the predicted nuclear localization signal (NLS) of this protein [24]. On the other hand, variant ΔE33,34 would provoke the loss of the H3 domain in the HEAT repeats, which plays an important role in the interactions between NIPBL and the histone deacetylases (HDACs) 1 and 3 (Figure 2b) [25].
Among the twelve patients studied, we have detected nine splice-sites mutations, seven of them new, which represent 14% of the splicing mutations reported to date ( Figure 1, Table 1). This kind of mutations show a random distribution across NIPBL, unlike nonsense and missense mutations, which tend to accumulate respectively in the first or in the last half of the gene [4,11].
Sometimes, splicing mutations can provoke the partial loss of the exon by activating cryptic splice sequences [28]. An example would be the mutation c.7860+5G>A, which disrupts the splicing donor sequence in exon 45. In this case, we would expect whole exon deletion, since exon 45 contains a weak acceptor sequence and skips physiologically yielding the variant B' (Figure 2b). However, we have found an aberrant transcript with the deletion of 33 nucleotides at the 3' end of exon 45 (Figure 1). Bioinformatic analyses could predict a cryptic donor sequence at this position that is activated by the disruption of the original donor sequence (Table 1) [29]. Moreover, the mutation c.7860+5G>A could affect the ratio of physiological transcripts [23].
Eventually, splicing mutations can disrupt the original splice-sites and generate new ones [30,31], like in the mutations c.4320+4A>G (patient 5) and c.5575-1G>A (patient 8) (Figure 1, Table 1). In patient 5, an expansion of four nucleotides in exon 19 could be observed that disrupted the reading frame. This is in contrast to similar mutations previously reported (c.4320+2T>A and c.4320+5G>C), that result in an in frame skipping of exon 19 [4,32]. In patient 8, the mutation c.5575-1G>A was expected to cause the skipping of exon 30. Instead, it has generated an aberrant transcript corresponding to the deletion of the first nucleotide of exon 30. In silico tools predict that both mutations create new functional splice sites that were confirmed by sequencing of aberrant transcripts (Table 1).
Among the mutations analyzed here, c.6109-3T>C has not been considered as a functional splicing mutation since it has not generated aberrant transcripts (Figure 1), and was inherited from the healthy mother. Interestingly, this sequence variation has been previously reported in four CdLS patients [16][17][18] but was suggested to be not relevant for splicing by Leiden Open Variant Database (LOVD) [33].

Patients and Controls
This study includes twelve patients diagnosed following the criteria from Kline et al. [1]. There are eight patients from Germany, and two familial cases, one from Poland (patients 3A and 3B, son and mother) and the other one from Spain (patients 7A and 7B, son and father). In accordance with the Declaration of Helsinki, the study had been approved by the ethics committee of the University of Lübeck, on November 2007 (reference number: 07-158). Patients' parents have written individual informed consent to participate in the study. To perform the experiments, a pool of four control cDNAs from normal subjects was used.

DNA Extraction and Sequence Analysis
Genomic DNA was extracted from peripheral blood leukocytes using the standard procedures. The primers used to amplify the exons of the NIPBL gene and their splice junctions are provided on request. The PCR products obtained were purified with USB ExoSAP-IT PCR Product Cleanup (Affymetrix, Santa Clara, CA, USA) according to the manufacturer's instructions, and sequenced on an ADN 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).
The nucleotides of NIPBL cDNA were numbered according to the NIPBL isoform 1 (GenBank accession No. NM_000642). The mutation nomenclature was designated according to the Human Genome Variation Society [34] and confirmed by Mutalyzer [35]. In order to make data publicly available, mutations and associated phenotypic information were submitted to the Leiden Open Variant Database, Leiden, The Netherlands [14].

RNA Extraction and cDNA Synthesis
RNA was extracted from blood leukocytes using the PAXgene Blood RNA Kit (PreAnalytiX GmbH, Hombrechtikon, Switzerland) according the manufacturer's instructions. Single-stranded cDNAs were synthesized with 500 ng of RNA from each patient using the First Strand Synthesis Kit (Thermo Fisher Scientific Inc, Waltham, MA, USA) with random hexamers, following the manufacturer's protocol.

Identification of Splicing Variants
Physiological splicing variants were obtained using cDNA from a pool of four control individuals. Total NIPBL cDNA was amplified by PCR in overlapping fragments using three different approaches: dividing NIPBL into 4 fragments (A, B, C, and D), 8 fragments (A1, A2, B1, B2, C1, C2, D1, and D2), and 23 fragments (F1-F23). In this last strategy, an additional PCR (F22B) was performed to specifically amplify the isoform B (Figure 2a). Primers are provided on request.
To evaluate the aberrant splicing caused by mutations found in the NIPBL gene, specific PCRs that amplified the exons surrounding the mutations were performed on cDNA of each patient. Primers used are provided on request. The same reactions were carried out on cDNA from peripheral blood from a control individual.
For each PCR reaction, 2 μL of cDNA were used as a template in a total 20-μL mixture. Amplifications were carried out using 10 pmol of each PCR primer, 1× reaction buffer, 1.5 mM Mg 2 SO 4 , 200 μM dNTPs and 0.5 U Taq DNA polymerase. PCRs were performed in a thermocycler (Applied Biosystems) for 35 cycles at an annealing temperature of 56 °C.
PCR products obtained were analyzed by electrophoresis in 2% agarose gels, and all the bands were excised and purified with QIAEX Gel Extraction Kit (QIAGEN, Hilden, Germany), or purified with USB ExoSAP-IT PCR Product Cleanup (Affymetrix) when there was only a single band. The identity of each band was confirmed by sequencing on an ADN 3130 Genetic Analyzer (Applied Biosystems).

In Silico Splicing Analysis
First, we analyzed all NIPBL wild-type and mutated exons with Splice Site Prediction by Neural Network [19,36]. This bioinformatic tool assigns a strength score from 0.00 to 1.00 to acceptor (3' ss) and donor (5' ss) splice sites of each exon. It was used to evaluate the strength of affected exons, to predict disruption or creation of splice sites and to identify potential cryptic splice sites.
Later, we used a variety of tools integrated in the Human Splicing Finder [20] to perform a more exhaustive analysis of the exons affected by physiological/aberrant splicing. HSF [37] and MaxEnt [38] are tools that predict splice sites strength and can complement the data obtained from Splice Site Prediction by Neural Network.

Conclusions
In this study, he have performed the first systematic study of the physiological splicing of the NIPBL gene, that has allowed us to identify four new variants ΔE10, ΔE12, ΔE33,34, and B', which should be kept in mind in order to assess the pathological splicing ( Figure 2b). In addition, we have characterized eight splicing mutations, seven of which new, that means 14% of the reported mutations. The analysis of the RNA has ruled out that c.6109-3T>C is a splicing mutation, so its pathogenicity mechanism remains unclear ( Figure 1). We also have confirmed that among the broad clinical variability that show the splicing mutations, the more severe phenotypes seem to associate to mutations generating frameshift transcripts.