Functional Analysis of the PCCA and PCCB Gene Variants Predicted to Affect Splicing

It is estimated that up to one-third of all variants causing inherited diseases affect splicing; however, their deleterious effects and roles in disease pathogenesis are often not fully characterized. Given their prevalence and the development of various antisense-based splice-modulating approaches, pathogenic splicing variants have become an important object of genomic medicine. To improve the accuracy of variant interpretation in public mutation repositories, we applied the minigene splicing assay to study the effects of 24 variants that were predicted to affect normal splicing in the genes associated with propionic acidemia (PA)—PCCA and PCCB. As a result, 13 variants (including one missense and two synonymous variants) demonstrated a significant alteration of splicing with the predicted deleterious effect at the protein level and were characterized as spliceogenic loss-of-function variants. The analysis of the available data for the studied variants and application of the American College of Medical Genetics and the Association for Molecular Pathology (ACMG/AMP) guidelines allowed us to precisely classify five of the variants and change the pathogenic status of nine. Using the example of the PA genes, we demonstrated the utility of the minigene splicing assay in the fast and effective assessment of the spliceogenic effect for identified variants and highlight the necessity of their standardized classification.


Introduction
Pathogenic variants in the PCCA and PCCB genes are responsible for the rare autosomal recessive metabolic disease called propionic acidemia (PA) (OMIM#606054). The products of these genes form the heterododecameric enzyme propionyl CoA carboxylase (PCC), which converts propionyl CoA to methymalonyl CoA in the mitochondrial matrix [1]. The deficient activity of PCC leads to accumulation of toxic propionic acid metabolites and dysfunction of the respiratory chain and the urea cycle pathway. PA is clinically heterogeneous and has a variable age of onset ranging from severe neonatal forms to mild adult-onset forms [2]. In most cases, the symptoms of PA manifest in the early neonatal period and without treatment quickly become life-threatening. The symptoms include seizures, poor feeding, vomiting, hypotonia, metabolic acidosis, ketonuria, hypoglycemia, hyperammonemia and cytopenia [3]. Timely medical and dietary interventions could significantly improve the patient's condition and stabilize the metabolic state. Therefore, fast and proper genetic diagnosis can play an important role in patient management [3][4][5].
The ClinVar (https://www.ncbi.nlm.nih.gov/clinvar) (accessed on 15 April 2021) and Human Gene Mutation (http://www.hgmd.cf.ac.uk/) (accessed on 15 April 2021) databases are the main repositories of PA variants and currently comprise 720 and 322 unique variants, respectively. Missense variants are the predominant cause of PA, followed by small insertions and deletions, splicing variants and large genomic deletions [6]. Due to the increasing availability of molecular genetic diagnostics, the rapidly growing number of novel variants presents a challenge to researchers to establish their molecular consequence. In the case of missense variants, enzymatic assay and Western blot are widely used to reveal their deleterious effect on the PCC structure and activity [7,8].
Another class of variants that disrupts normal splicing is of particular interest due to development of the specific splice-modulating approaches for PA [9,10]. Splicing variants account for up to 50% of pathogenic variants in some genes and are suggested to be underrepresented in others due to the lack of their functional analysis at the mRNA level [11][12][13]. Splicing variants could be located anywhere within genes and be occasionally classified as synonymous, missense or nonsense variants. The standard for functional analysis of splicing variants is the analysis of mRNA from patient cells. This approach does, however, have some limitations, e.g., the aberrant mRNA isoform could be degraded by the nonsense-mediated mRNA decay mechanism or the gene of interest may be expressed only in tissue that is difficult to assess. To overcome these issues, the in-vitro minigene splicing assay presents a fast and robust tool for analysis of potentially spliceogenic variants [14][15][16].
The need for precise classification of studied variants led to the development of ACMG/AMP guidelines and their gene-and disease-specific refinements curated by Clin-Gen Workgroups (https://clinicalgenome.org/) (accessed on 15 April 2021) [17,18]. During the analysis of public repositories of PA variants, we faced the problem of incorrect classification of splicing variants, which is inconsistent with ACMG/AMP recommendations. In addition, we revealed a number of variants classified as missense or synonymous, but which were highly spliceogenic according to bioinformatic analysis, and variants classified as splicing but lacking any functional characterization at the mRNA level.
Therefore, the aim of this work was to improve the PA variant classification, expand the knowledge of splicing variants in the studied genes and evaluate the utility of bioinformatic analysis and the minigene assay for the characterization of PA variants.

Variant Selection and Bioinformatic Analysis
The subjects of this study were the PCCA and PCCB gene variants available in the ClinVar and HGMD databases and classified as pathogenic, likely pathogenic and of uncertain significance. In addition, we included four splicing variants previously identified in our lab [19].
For the variants which create new splicing sites (SS) or disrupt wild type (WT) SSs, the results of HSF3.1, SpliceAI, MMSplice and SpiP were considered. Exonic variants which do not create or disrupt SSs were analyzed using Ex-Skip, Hexplorer and HExoSplice, which are the recommended tools for analysis of probable alteration of exonic regulatory splicing motifs [27].
The standard cutoffs and values for the variant to be significant are as follows: HSF3.1 matrices, >15% difference in SS strength; MaxEntScan module of HSF3.1, >30% difference in SS strength; and SpliceAI, delta score > 0.5. MMSplice and SpiP provide the discrete values and corresponding probabilities. For the analysis of exonic regulatory splicing motif alteration, the variants with the highest Ex-Skip, Hexplorer and HExoSplice scores were selected, because this type of prediction has low specificity and depends largely on whether the splicing of the studied exon relies on the recognition of regulatory motifs and their specific positioning [27].

Minigene Assay
The PCCA and PCCB exons with a minimum 100 bp adjacent intronic sequence were amplified by PCR with high fidelity polymerase and cloned into multiple cloning sites between two constitutionally spliced exons of the expression vector pSpl3_Flu2. The pSpl3_Flu2 vector is a modification of the pSpl3_Flu vector (with the deletion of the strong cryptic donor SS downstream of the multiple cloning site), which was used for analysis of splicing variants in the PAX6 gene [29]. The studied variants were introduced into the minigenes by overlap-extension PCR. Transfection was performed in 24-well plates at~80% cell confluency with 0.5 µg of plasmid DNA via the calcium phosphate method [30]. After 48 h, cells were harvested, and RNA was extracted and reverse transcribed. Plasmidspecific cDNA was amplified and visualized by polyacrylamide gel electrophoresis (PAGE) with subsequent gel purification (if needed) and Sanger sequencing.
The results of the performed functional analysis were combined with published data for the studied variants to precisely classify them according to ACMG/AMP guidelines. The following specific conditions were applied: PS3 (functional studies), alteration of splicing in the vast majority of mRNA molecules that leads to frameshift and nonsense mediated decay (NMD) or predicted deleterious effect at protein level; PP3 (computational data), if more than one of the used bioinformatic algorithms predict splicing alteration; BS3 (functional studies, if a synonymous variant does not cause any significant splicing alterations; and PM3_supportive, if a variant is in trans with other variant of uncertain significance or in a homozygous state in a patient with a phenotype of PA.

The PCCA and PCCB Variant Selection
Using freely available splicing prediction tools (described in Section 2), we analyzed all pathogenic, likely pathogenic and of-uncertain-significance variants of the PCCA and PCCB genes available in the HGMD and ClinVar databases, and those previously identified in our lab [19]. From the list of variants for which at least one bioinformatic tool predicted the significant splicing alteration, we excluded those that were already characterized at the mRNA level. Among the variants that were predicted to alter exonic regulatory splicing motifs, we selected eight with the highest predictive scores.
As a result, 24 variants were selected (the variants that create or alter SSs are presented in Table 1 and the variants that alter regulatory splicing motifs are presented in Table S1)

Minigene Assay
For the analysis of probable splicing alterations caused by the studied variants, we applied the minigene splicing assay, a fast and effective method for evaluating the spliceogenic effect of genetic variants.
For each studied variant, the corresponding exon with at least 100 b.p. of flanking intronic sequence was cloned into expression vector pSpl3_Flu2 between two constitutionally spliced exons (V1 and V2). Given the large length of flanking introns, 15 exons (the PCCA exons 2, 6, 8, 10, 13, 14, 15, 16, 18, 21 and 22 and PCCB exons 5, 7, 8 and 11) were cloned separately and only the PCCA exon 24 was combined with exon 23 to better reproduce the native genomic milieu, because we detected the splicing artifacts during the PCCA: c.2119-9A>G variant analysis. All of the resulting WT minigene constructs demonstrated adequate exon recognition and the absence of cryptic SS activation and, therefore, were used to test the studied variants ( Figure 1A). The WT and mutant minigenes were transfected into HEK293T cells. After 48 h, the cells were harvested, and RNA was extracted and reverse transcribed. Minigene-specific cDNA was amplified with primers located in exons V1 and V2 of the vector and visualized by PAGE ( Figure 1A). PCR products were Sanger sequenced with preparatory gel extraction if more than one product was visualized.
intronic sequence was cloned into expression vector pSpl3_Flu2 between two constitutionally spliced exons (V1 and V2). Given the large length of flanking introns, 15 exons (the PCCA exons 2, 6,8,10,13,14,15,16,18,21 and 22 and PCCB exons 5,7,8 and 11) were cloned separately and only the PCCA exon 24 was combined with exon 23 to better reproduce the native genomic milieu, because we detected the splicing artifacts during the PCCA: c.2119-9A>G variant analysis. All of the resulting WT minigene constructs demonstrated adequate exon recognition and the absence of cryptic SS activation and, therefore, were used to test the studied variants ( Figure 1A). The WT and mutant minigenes were transfected into HEK293T cells. After 48 h, the cells were harvested, and RNA was extracted and reverse transcribed. Minigene-specific cDNA was amplified with primers located in exons V1 and V2 of the vector and visualized by PAGE ( Figure 1A). PCR products were Sanger sequenced with preparatory gel extraction if more than one product was visualized. The results of the minigene assay are presented in Figure 1A,B and Table 2. Overall, of the 14 variants that demonstrated the alteration of splicing, only three (PCCA: c.1187T>G, c.1643+1_1643+2dup; PCCB: c.655-2A>G) led to the complete absence of the WT mRNA isoform and the presence of frameshifted isoforms, which are the substrates of NMD. The remaining variants led to the synthesis of mRNA isoforms which escape NMD. Therefore, to better characterize their deleterious effect, we further analyzed them using PCC enzyme homology modeling.

Analysis of the Affected Protein Structures
To characterize the deleterious effect of the studied variants at the protein level, we performed the homology modeling of the PCC heterodimer and located the functional domains and catalytic residues based on the previously published data [33][34][35] (Figure 2). PCCA: c.2119-9A>G leads to 8 b.p. frameshifting insertion p.Val707Asnfs*4 with premature stop-codon formation in the PCCA last exon. The corresponding PCCA mRNA escapes NMD, but the synthesized protein lacks 22 C-terminal amino acids which are involved in the formation of the highly conserved biotinyl-binding domain ( Figure 2G).
The remaining in-frame deletions caused by the studied variants involve the highly conserved and, in some cases, catalytic residues of well-characterized domains. As a result, their deleterious effect on protein function is beyond doubt (Figure 2 and Table 2). escapes NMD, but the synthesized protein lacks 22 C-terminal amino acids which are in-volved in the formation of the highly conserved biotinyl-binding domain ( Figure 2G).
The remaining in-frame deletions caused by the studied variants involve the highly conserved and, in some cases, catalytic residues of well-characterized domains. As a result, their deleterious effect on protein function is beyond doubt (Figure 2 and Table 2).

Classification of the Studied Variants
The ACMG/AMP guidelines were used to classify the studied variants [17,18]. The PS3 criterion (well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product) was applied to the variants which did not demonstrate any significant residual amount of the WT mRNA isoform in the minigene assay and whose deleterious effect was established at the mRNA or protein level. Among these, the variants located in the canonical dinucleotides acquire the PVS1 criterion (null variant) instead. Additionally, we analyzed the available published data for these variants and collected additional criteria to perform their precise classification. As a result, we classified five variants and changed the pathogenic status of nine (Table 2).

Discussion
The imprecise classification of identified variants in public databases could confuse researchers and even lead to wrong diagnoses. Recently, the ACMG/AMP guidelines and Clinical Genome Sequence Variant Interpretation Working Groups (ClinGen WGs) provided the most detailed recommendations for standardized variant classification. ClinGen WGs were expanded by gene-and condition-specific expert panels, but recommendations for the precise classifications of splicing variants based on predictive and functional evidence are still being developed.
The ClinVar and HGMD databases are the main repositories of PA variants. During the detailed analysis of the splicing variants reported there, we faced the problem of unstandardized classification that is not based on any specific assertion criteria, nor supported by the proper functional studies. For example, the PCCA: c.1353+5_1353+9del and c.2040G>A variants are reported as "pathogenic", but the main criteria were the bioinformatic predictions [36]. The PCCB: c.1091-8_1091-3del variant is reported as pathogenic, but no functional or any other evidence is provided. Therefore, according to ACMG/AMP guidelines, the proper status of these variants is uncertain significance. The results of our functional analysis clearly established their deleterious effect and, considering the other criteria, allowed us to classify them as pathogenic or likely pathogenic. The PCCA: c.2119-9A>G was identified in three PA patients without two causative PCCA or PCCB mutations, but the authors classified it as a polymorphism because the cDNA analysis showed the normal-sized PCR products (Table 4 of [7]). Our results of the minigene assay suggest that PCCA: c.2119-9A>G causes an 8 b.p. insertion in PCCA exon 24 and is the spliceogenic loss-of-function variant, which leads to synthesis of the truncated protein with altered biotinyl-binding domain. Because the 8 b.p. insertion could not be distinguished in some cases during gel electrophoresis, and the authors did not report whether Sanger sequencing was performed, their classification is questionable.
Another widely underestimated nuance is the establishment of the deleterious effect of the variants located in canonical dinucleotides before the application of the PVS1 criteria. The ACMG/AMP recommendations for interpreting the PVS1 criteria state that the researcher should pay attention to potential cryptic SSs, the activation of which could preserve the reading frame. The predictive power of widely used algorithms is still too low to effectively predict the cryptic splice sites' activation because the additional splicing elements, such as enhancers and silencers, are involved in the specific splice site usage. The analysis should also be performed at the protein level to establish the effect of the potential in-frame indels. For example, the PCCA: c.468+1G>A and PCCA: c.717-2A>G variants were predicted to cause in-frame deletions, which was confirmed by our minigene assay. Subsequent analysis at the protein level demonstrated that the deletion p.Val139_Leu156del caused by the PCCA: c.468+1G>A involves the Tyr143 catalytic residue and clearly established its deleterious effect. PCCB: c.717-2A>G leads to the 24 b.p. deletion p.Asp240_Gln247del that alters the beta-sheet in the biotin carboxylase domain and its deleterious effect is not as obvious, although this is the highly flexible and conservative element that serves as the lid for the protein's active site.
Missense variants are the most frequent mutations that cause human genetic disease, but their functional characterization at the protein level is a difficult task. When novel rare missense or synonymous variants are found in the studied gene, a researcher should perform bioinformatic analysis with splicing prediction tools, because these variants could disrupt SSs when located at first or last three positions of the exon, activate cryptic SSs or alter motifs of regulatory splicing proteins. If the alteration of splicing is suspected, the patient's mRNA analysis, or, in its absence, the minigene assay can readily characterize the deleterious effect. The missense variant PCCA: c.1187T>G (p.Val396Gly) clearly demonstrated splicing alteration due to cryptic SS activation with complete absence of a full-length mRNA isoform. The synonymous PCCA: c.2040G>A (p.Ala680=) variant located in the last exonic nucleotide causes exon skipping in the vast majority of transcripts. By comparison, the PCCB: c.543G>C (p.Leu181=) variant leads to exon skipping and a significant residual number of full-length transcripts, thus it could cause a mild phenotype or represent a rare benign variant. Unfortunately, the patient's data were not described in ClinVar for this variant, so its status remains uncertain due to the lack of sufficient criteria. The PCCB: c.882C>T (p.Pro294=) variant demonstrated no significant splicing alterations. This is a rare synonymous variant with no effect either on minigene mRNA splicing or on protein structure, although it could play some more complex roles in molecular pathogenesis, such as modification of mRNA secondary structure or RNA interference.
The PCCA: c.819+9A>G variant was predicted to activate the strong cryptic donor SS, but the minigene assay did not demonstrate any differences from the WT minigene. This could be explained by additional factors involved in splice site recognition that favor WT SS, such as mRNA secondary structure and the specific location of splicing regulatory proteins' motifs.
The PCCA: c.183+4_183+7del, c.468+1G>A and c.1284+2dup, and PCCB: c.655-2A>G variants were identified in our lab earlier in patients with the clinical and biochemical phenotype of PA [19]. The performed functional analysis together with other criteria allowed us to classify these as pathogenic variants and establish the patients' diagnosis.
Variants that were predicted to disrupt regulatory splicing motifs did not show any difference from the WT, although in some genes they account for up to 60% of pathogenic variants [37]. Our results could be explained by the characteristic gene structure, for which the absence of alternative exons and isoforms is described, which suggests the straightforward regulation of splicing, mediated by strong SSs.
Overall, the minigene assay demonstrated a fast and effective approach in the analysis of the PCCA and PCCB variants. The majority of the presented minigene constructs comprise a single exon and can be easily cloned. Although this approach was not able to detect all of the splicing outcomes, e.g., skipping of two exons or utilization of more distant cryptic SSs, it could be effectively used to estimate whether the identified variant has the spliceogenic potential.
Regarding the splicing prediction tools, SpliceAI showed the best performance, correctly predicting the effect for 15 of 16 variants. For the variants that disrupt the canonical dinucleotides and are located in the intronic part of SSs, all of the tools correctly predicted the splicing alteration. The most complex variants for analysis showing a significant discrepancy between tools are those located in the exonic part of donor SS (PCCB: c.543G>C, c.763G>A, c.882C>T and PCCA: c.2040G>A) and those that activate the cryptic SS (PCCA: c.819+9A>G, c.1187T>G).

Conclusions
The application of the minigene assay for the analysis of the 24 PCCA and PCCB variants demonstrated a fast and effective approach to establishing their spliceogenic effects. The protein homology modeling could be further used to better characterize the deleterious effect of splicing variants. The results of this work led to the precise ACMG/AMP-based classification of 16 PA variants, for 12 of which we characterized the deleterious effect at the mRNA or protein levels. We suggest that this approach should be expanded to other disease-associated genes to improve the variant classification in mutation databases.