In Silico Analysis of Missense Mutations as a First Step in Functional Studies: Examples from Two Sphingolipidoses

In order to delineate a better approach to functional studies, we have selected 23 missense mutations distributed in different domains of two lysosomal enzymes, to be studied by in silico analysis. In silico analysis of mutations relies on computational modeling to predict their effects. Various computational platforms are currently available to check the probable causality of mutations encountered in patients at the protein and at the RNA levels. In this work we used four different platforms freely available online (Protein Variation Effect Analyzer- PROVEAN, PolyPhen-2, Swiss-model Expert Protein Analysis System—ExPASy, and SNAP2) to check amino acid substitutions and their effect at the protein level. The existence of functional studies, regarding the amino acid substitutions, led to the selection of the distinct protein mutants. Functional data were used to compare the results obtained with different bioinformatics tools. With the advent of next-generation sequencing, it is not feasible to carry out functional tests in all the variants detected. In silico analysis seems to be useful for the delineation of which mutants are worth studying through functional studies. Therefore, prediction of the mutation impact at the protein level, applying computational analysis, confers the means to rapidly provide a prognosis value to genotyping results, making it potentially valuable for patient care as well as research purposes. The present work points to the need to carry out functional studies in mutations that might look neutral. Moreover, it should be noted that single nucleotide polymorphisms (SNPs), occurring in coding and non-coding regions, may lead to RNA alterations and should be systematically verified. Functional studies can gain from a preliminary multi-step approach, such as the one proposed here.


Introduction
Lysosomal storage diseases (LSDs) are a large group of inherited disorders leading to various clinical symptoms, caused by defects in lysosomal enzymes, transporter proteins, activator proteins, or other proteins involved in lysosomal function or biogenesis. Such defects lead to total or partial loss of enzyme activity and consequent accumulation of substrate, which results in impaired organelle function, leading to subsequent multi-organ dysfunction. The enzymes involved in two of the less rare LSDs are lysosomal glucocerebrosidase (GlcCerase, glucosylceramidase or acid-β-glucosidase, EC 3.2.1. 45), and lysosomal acid-α-galactosidase (α-GAL or α-Gal A, EC 3.2. 1.22). Most commonly, complete and were used in the present study [9][10][11]. Missense mutations were selected in different domains of the GlcCerase and of the α-GAL proteins with different types of functional evidence of causality. In order to broaden the scope of the study, mutations in three other genes related to neurodegenerative diseases were also added to the present study.

Results
The aim of this work was to investigate the prediction value of different bioinformatics tools, applying them to single amino acid substitutions in the GBA1 and GLA genes.

Results
The aim of this work was to investigate the prediction value of different bioinformatics tools, applying them to single amino acid substitutions in the GBA1 and GLA genes.

Results
The aim of this work was to investigate the prediction value of different bioinformatics tools, applying them to single amino acid substitutions in the GBA1 and GLA genes.
GlcCerase and α-GAL structures were obtained from the Protein Data Bank (PDB). GBA1 mutations (p.F109V, p.P182L, p.D140H, p.K157Q, p.W184R, p.N188S, p.E326K, p.R359Q, p.G377S, p.R395P, p.N396T, p.P415R, and p.L444P) and GLA mutations (p.D33G, p.M42V, p.R112C, p.F113L, p.R118C, p.C142W, p.D231G, p.D266N, p.S297F and p.D313Y) were mapped into 3D GlcCerase and α-GAL structures: the first X-ray human GlcCerase to be solved (PDB code 1OGS) [4] and into the first 3D α-GAL structure (PDB code 1R46) [5] (Figures 1 and 2). Three-dimensional structures were designed using PyMOL (http://www.pymol.org) in order to visualize how these alterations could affect enzyme structure.   The missense mutations, depicted in Figures 1 and 2, were computationally analyzed and the retrieved results were compared with the in vitro results and functional data (Table 1), in order to ascertain the validity of the platforms used and evaluate the prediction accuracy regarding the establishment of genotype/phenotype correlation. Differences in the results obtained reflect the different types of algorithms used in the computational platforms. In order to broaden the scope of the present study, we analyzed 14 additional mutations in other genes involved in neurodegenerative lysosomal-related disorders ( Table 2). In all cases, functional studies were available. These previous mutations provided an ampler comparison between in vitro and in silico results. Legend: wt-wild-type; NA-Results not available with that computational tool.

Discussion
In vitro mutagenesis and subsequent expression of mutant proteins, or functional studies and characterization, is a cumbersome task in terms of time, workload, and cost. For these reasons, in silico analysis is a desirable, fast, inexpensive, and reliable way to boost our understanding of how an amino acid substitution could affect the protein structure and function. Availability of 3D protein structures enables the mapping of amino acid substitutions and, therefore, helps complement the information acquired from different computational platforms. These aspects facilitate preliminary research in the biomedical field. As observed with the tools used here, the incorporation of more data increases the accuracy of the results, and thus makes predictions more reliable.
When a novel missense mutation is detected in a disease context, and its polymorphic nature has been excluded by population studies, it is possible to predict its outcome through in silico analyses, by first performing computational SNP evaluation followed by modeling the amino acid substitution into the 3D protein structure. In silico analysis is necessary to predict the impact of novel mutations in diseases such as the lysosomal disorders analyzed here. However, general limitations exist-for instance, the structural-based prediction tools may be unable to accurately predict mutation effects due to a lack of homologous structures in the databases. In such cases, functional analysis studies should be performed to elucidate how the missense mutation affects the protein function and contributes to the patient phenotype.
Overall, the retrieved results from the different computational platforms were rather similar, although they use different data sources and algorithms. The biggest difference observed seemed to be between PROVEAN and the other platforms, since it takes into account fewer variables. On the other hand, SNAP2 relies on protein and DNA data, as well as evolutionary and conservation information, and therefore is able to check more aspects regarding the impact of amino acid substitutions. In the case of α-GAL and GlcCerase mutations p.F113L and p.W184R, the location on the periphery of the proteins could suggest that they did not have a significant effect on enzyme activity and stability. However, the computational studies indicate them as damaging missense mutations and in vitro studies confirm that the respective proteins are unstable with reduced activity (p.F113L) or even inactive (p.W184R) [12,19]. These types of mutations usually occur in specific protein binding sites. These specific amino acids can be located on sites that are vital for the dimerization in α-GAL or tetramer formation in GlcCerase [38], or be located in sites where the activator proteins (Saposin B in α-GAL and Saposin C in GlcCerase) binds. Binding disruption will lead to partial or total loss of protein function.
A major limitation of this study is that there are few neutral, or low-score, variants to be analyzed. This problem arises because studies are not exhaustive enough and mutations that may look neutral are often not sufficiently investigated. Particular attention should be given to mutations in the "milder" spectrum. In addition to amino acid substitutions, SNPs may alter RNA processing by interfering with consensus sequences. A silent mutation or a "neutral" amino acid substitution may alter consensus sequences involved in splicing and lead to abnormal transcripts. Such mutations risk being overlooked and labeled as non-causal. An example to take into account is that of an apparently neutral/silent mutation on the CSTB gene (p.Q22Q in Table 2), which affected RNA processing and was proven to be causal only by functional studies [39,40].
Limitations of in silico analysis also arise since mutations (in the patient) may have additive or compensatory effects and the tools used only predict single protein changes. Besides the wide range of mutant variants and clinical phenotypes, in some cases, mutations in the same gene may be associated with more than one disease. Certain GBA1 mutations are known to be associated with Gaucher Disease (GD) and with Parkinson's disease (PD) [41,42]. A good example of this association is mutation p.E326K, which has been repeatedly investigated [14,38,43]. The association of a single protein with different diseases is an additional limitation for in vivo and in vitro assays. Recently, a complex integration of in silico computational analysis has been used for the understanding of the association of GBA1 mutations in GD and PD [44]. This latter approach, integrating multiple parameters, namely molecular dynamics, seems to pave the way for the development of more dependable in silico computational modeling approaches.
In general, it is possible to conclude that in silico methods remain an accurate way to make a rapid analysis regarding the expected effect of mutations. Nonetheless, the more factors that are taken into account, the more accurate the prediction will be. In order to take the best advantage of in silico analysis, different computational platforms should be used, trying to cover the major factors influencing protein structure and function. RNA processing alterations should also be routinely investigated by in silico analysis. The SNP impact at the RNA level can be investigated by using some of the various RNA assessment tools, such as Human Splicing Finder [45], GeneSplicer [46], NetGene2 [47], or Berkeley Drosophila Genome Project (BDGP) Splice Prediction by Neural Network [48].

In Silico Methods
Twenty-three missense mutations were analyzed using four different computational platforms freely available online (Table 1). PROVEAN (Protein Variation Effect Analyzer) (http://provean.jcvi. org/) is a computational tool that predicts whether an amino acid substitution or indel will have an impact on the biological function of a protein. PROVEAN is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important. Results are given as "deleterious" or "neutral", according to scores [49,50]. The PolyPhen-2 (Polymorphism Phenotyping v2) program (http://genetics.bwh.harvard.edu/pph2/) uses the sequence homology and knowledge of 3D structures; it predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. The results are classified as "benign", "possibly damaging", "probably damaging", or "unknown" [7,51]. The ExPASy Swiss-model [52] is a fully automated protein structure modeling server, accessible via the ExPASy web page (https://swissmodel.expasy.org/), and was also used in this study [53]. The SNAP2 (screening for non-acceptable polymorphisms) program (www.rostlab. org/services/SNAP/) incorporates evolutionary information, predicted aspects of protein structure, and other relevant information in order to make predictions regarding the functionality of mutated proteins [11]. The results are retrieved as "having an effect" or "being neutral", and a score, correlated with the severity of the change, is given for each substitution along with the percentage of expected accuracy [10].

Conclusions
In the present work, we show that a comparison of the results between various platforms is crucial and, in the case of the most deleterious mutants, the results are generally clear. In the case of the more neutral mutations, functional studies and more refined in silico approaches are fundamental for the understanding of the mutation's impact on the RNA processing, protein function, and pathophysiology of the disease.