Consistency of the Tools That Predict the Impact of Single Nucleotide Variants (SNVs) on Gene Functionality: The BRCA1 Gene

Single nucleotide variants (SNVs) occurring in a protein coding gene may disrupt its function in multiple ways. Predicting this disruption has been recognized as an important problem in bioinformatics research. Many tools, hereafter p-tools, have been designed to perform these predictions and many of them are now of common use in scientific research, even in clinical applications. This highlights the importance of understanding the semantics of their outputs. To shed light on this issue, two questions are formulated, (i) do p-tools provide similar predictions? (inner consistency), and (ii) are these predictions consistent with the literature? (outer consistency). To answer these, six p-tools are evaluated with exhaustive SNV datasets from the BRCA1 gene. Two indices, called Kall and Kstrong, are proposed to quantify the inner consistency of pairs of p-tools while the outer consistency is quantified by standard information retrieval metrics. While the inner consistency analysis reveals that most of the p-tools are not consistent with each other, the outer consistency analysis reveals they are characterized by a low prediction performance. Although this result highlights the need of improving the prediction performance of individual p-tools, the inner consistency results pave the way to the systematic design of truly diverse ensembles of p-tools that can overcome the limitations of individual members.


Introduction
To fulfill its biological function under specific environmental conditions, such as the cellular milieu, each protein must be folded into a defined three-dimensional structure, known as its native structure. Structural modifications of proteins may result in partial or total loss of function, as in the case of cystic fibrosis disease [1,2]. These modifications can also be harmful to the cell for reasons not directly related to protein function, as in the case of Alzheimer's, Parkinson's, and Huntington's disease [3,4], where misfolded proteins bind together into aggregates that accumulate and are toxic for the cell. One of the main factors underlying the conformation of a protein is the amino acid sequence. A change in an individual nucleotide (also known as a single nucleotide variant or SNV) in a protein coding gene may lead to an amino acid change. In this case, the SNV involves a non-synonymous consistency of the selected p-tools were evaluated against two particular datasets involving the breast cancer type one susceptibility protein encoded by the BRCA1 gene. Both datasets comprise in vitro experiments allowing the exhaustive screening of BRCA1 mutation effects [14]. The first dataset comprises roughly 4000 SNVs on 1792 nucleotide positions generated by means of the saturation genome editing (SGE) technique [15] relying on the CRISPR-Cas9 technology.
The second dataset comprises 1056 amino acid mutations in the first 191 residues of the BRCA1 protein generated by means of site-saturation mutagenesis were the authors [16] perform a multiplex homology-directed DNA repair assay designed to test whether homology-directed repair (HDR) [17] of double-strand DNA breaks occurs in BRCA1 mutant cells. Due to its CRISPR-Cas9 foundation, the SGE technique may induce multiple genetic mutations beyond the desired one. These undesired mutations may compromise the viability of cells beyond the effect of the SNVs under study. As a result, conclusions concerning the pathogenecity of BRCA1 SNVs drawn from SGE could, in principle, be biased. Fortunately, this does not appear to be the case and the results reported in [15] are in good agreement with those reported in [16], confirming the value of the SGE technique for performing high throughput studies into the effect of SNVs.
From a computational point of view, the SGE technique provides exhaustive and unbiased SNV datasets as every gene position can be tested for all possible mutations. In addition, site-saturation mutagenesis allows the generation of exhaustive and unbiased single-amino acid mutagenesis datasets for the BRCA1 protein. Although only a fraction of these mutations are accessible by SNVs relevant to human disease, the information content of the whole dataset is definitively higher and thus better for evaluation studies of p-tools. On the whole, the availability of exhaustive and unbiased datasets of SNVs or mutated amino acids remarkably simplifies and normalizes the evaluation of p-tools. To the best of our knowledge, the public availability of SGE datasets is currently limited to the BRCA1 gene. This gene belongs to the 'first wave' of susceptibility genes for common types of cancer [18]. Therefore, the identification of carriers of pathogenic mutations in this gene is expected to be more impactful for cancer control.

P-Tools
The effect of SNVs on the functionality of the BRCA1 gene was assessed by means of the PolyPhen2 [10], the Provean [19], the Align GVGD [20], the Strum [9], the Cupsat [21], and the Panther [8] prediction tools. In all cases, except for PolyPhen2, in which we used the HumVar classification model (advanced options), which was better suited for this study, their online version configured with default parameters were used. For Cupsat predictions, the Protein Data Bank (PDB) file of BRCA1 was provided. Further details about the selected p-tools can be found in the Appendix A.

Datasets
BRCA1-SGE dataset. The authors [15] studied the ability to grow haploid human cells in cell cultures. Cells were edited by means of the CRISPR-Cas9 technology with a focus on every nucleotide (saturation genome editing) of the BRCA1 gene in a region spanning 13 different exons known to encode critical functional domains. The original study comprises nearly 4000 mutations belonging to exons 2-5 and 15-23, including some adjacent intron sequence. Cultured cells that managed to survive to gene editing were considered to hold a functional BRCA1 protein. The original dataset was filtered to remove misleading SNVs classified as "Likely Benign" missense mutations. As a result, the final dataset comprises 387 "pathogenic" missense SNVs (positive examples) and 1405 "benign" missense ones (negative examples).
BRCA1-HDR dataset. The authors [16] performed a Multiplex Homology-Directed Repair Assay with the aim of quantifying the effect of 1056 amino acid substitutions in the BRCA1 N terminus comprising residues 2-192 known to include the ring domain in residues 7-98. As proper folding of the RING domain is required for the stability and function of the full-length protein, the authors analyze whether the mutated BRCA1 protein is able to maintain its DNA repair function in the homology-directed repair (HDR) pathway using, in tissue culture, a green fluorescent protein (GFP) based reporter assay [17] in which the functionality of BRCA1 can be detected by identifying green-flourescent cells. The information about the impact of amino acid mutations on the HDR pathway was depicted graphically using a color scale.
An in house R [22] script was used to convert the graphical information to a plain text format. Based on the depletion scores (fluorescence drops respect to a subset of cells having a functional GFP allele encoding an active protein) observed across four replicates of the multiplex HDR reporter assay, mutations showing a depletion in none or just one replicate were considered "benign" (negative examples). On the other hand, mutations showing a depletion state in at least three replicates were considered "pathogenic"; mutations showing depletion states involving two replicates were discarded. As a result, the final dataset comprises 59 "Pathogenic" variants (positive examples) and 977 "benign" ones (negative examples).
As expected, both datasets turned out to be highly imbalanced with most of the mutations being of the "benign" type. To quantify the degree of data imbalance, the relative gap G = #pathogenic−#benign #mutations between positive and negative examples was computed for each dataset. G values of −0.56 and −0.88 were observed for the BRCA1-SGE and BRCA1-HDR datasets, respectively.

Inner Consistency Analysis
The task of assessing the inner consistency of the p-tools faces the problem of the heterogeneity of their outputs. It is not simply a problem of outputs involving different scales but of their semantic meaning. Usually, p-tools provide categories to classify the impact of mutations on the functionality of a gene. However, these categories are not equally distributed through their original numerical scales, thus conversions made by the tools are not linear. Furthermore, different numerical scales are used, from probabilities and free-energy values, to ad-hoc scores. Hence, normalization approaches do not make sense. We note, however, that once categories are defined for a p-tool, they naturally induce an internal ranking for numerical predictions. Given a pair of p-tools and a dataset of mutations, the agreement between their internal rankings can be used to assess their inner consistency.
Under this baseline, we first considered the Kendall rank correlation coefficient (τ) [23] measuring the ordinal association between two measured quantities. Briefly, given a pair of p-tools and a set of target mutations, high values of τ are expected whenever target mutations receive similar ranks in both tools. Formally, let M = {m 1 , m 2 , . . . , m i , . . . , m j , . . . , m n } be a set of mutations with n being the number of mutation sites multiplied by the number of allowed mutations per site. Also, let t S (m) : M → X S denote the effect of mutation m predicted by a given p-tool S with X S be the most informative scale provided by S. In addition, let ≺ S ⊆ M × M be the less-damaging-than relation induced by S on mutations m i and m j so that m i ≺ S m j if t S (m i ) < t S (m j ), i < j ≤ n. Finally, to simplify the notation, for any p-tool S, three orderings are possible for any pair of mutations m i and m j , namely, m i ≺ m j , m i m j , and m i ∼ m j , i < j ≤ n.
A concordant pair of predictions for p-tools S and P is accounted whenever m i m j or m i ≺ m j occurs for both S and P, i < j ≤ n. Conversely, a discordant pair of predictions is accounted for p-tools S and P whenever m i m j occurs for P(S) and m i ≺ m j occurs for S(P), i < j ≤ n. Alternatively, if m i ∼ m j occurs for either S or P, a neither concordant nor discordant pair of predictions is accounted, i < j ≤ n. Based on these considerations, the Kendall τ coefficient can be defined as follows: P-tools with native numerical outputs provide convenient categorical outputs by the adoption of sharp thresholds. This common practice may induce false concordant/discordant pairs in the Kendall τ computation which misleads the comparison of p-tools. For example, let us consider [0, 0.4] being the support of the category label "Benign" with predictions in the [0, 1] range. Intuitively, prediction values of 0.39 and 0.41 are so close that we may not use them to differentiate categories of mutation effects. Hence, although the Kendall τ coefficient can be used with p-tools numerical outputs, its value for measuring the inner consistency of p-tools raises some concerns.
Furthermore, the numerical outputs of p-tools may differ due to computational precision issues, additionally inducing false concordant/discordant pairs in the Kendall τ computation that further misleads the quantification of the inner consistency of p-tools. In brief, the Kendall τ coefficient appears too "sensitive" to assess the inner consistency of p-tools with numerical outputs. To overcome this problem, let us first define a convenient function r S (m i , m j ) characterizing the specific ordering assigned to mutations m i and m j , i < j ≤ n, by any p-tool S: We now introduce a novel index, called K all , able to properly account for all different prediction pairs issued by p-tools S and P: For p-tools involving native categorical outputs, category labels are ordered based on their impact on gene functionality, e.g., for category labels {benign, possibly, probably}, the preference relation benign ≺ possibly ≺ probably is assumed. On the other hand, for p-tools involving numerical outputs, equality δ > 0 thresholds are required to avoid the false counting of either concordant or discordant pairs. Let S be a p-tool with a numerical output and an equality threshold δ S . Hence, the preference of S on mutations m i and m j , i < j ≤ n, is defined as follows: Since p-tools generally involve different prediction ranges, their thresholds must be set accordingly. In the absence of prior information, setting these thresholds to some predefined percentage of their prediction ranges appears as a fair approach. The problem becomes how to set that percentage. At first glance, the thresholds must be large enough to avoid small prediction differences and numerical errors to induce discordant counts, but also small enough to avoid the false counting of either concordant or discordant pairs.
To shed light on the percentage equality threshold trade-off problem, let us consider the mutations m i and m j , i < j ≤ n, and the predictions issued by the tools S and P. Let us consider first the case where m i ≺ m j holds for both tools. Also, let us define then m i ≺ S m j and m i ≺ P m j so that an agreement is counted for K all . However, if ∆ S ≤ δ < ∆ P , then m i ∼ S m j and m i ≺ P m j , so that a disagreement is counted for K all . However, if δ ≥ ∆ P , then m i ∼ S m j and m i ∼ P m j , so that an agreement is counted for K all again.
Similar counting arguments can be used to analyze the cases m j ≺ S m i and m i ≺ P m j . In all cases, as the percentage equality threshold is increased from 0%. K all first decreases and then increases monotonically until the percentage equality threshold reaches 100%. All mutations then become indistinguishable and K all reaches its maximum value (1). To summarize, K all does not show a monotonic behavior with respect to the percentage equality threshold. Supplementary studies were performed to asses the critical percentage equality threshold where K all accomplishes its minimum.
Two independent datasets of mutations, namely, the DM-V dataset comprising reported mutations of the Drosophila melanogaster vermilion (V) gene and the CHKV-E2 dataset comprising reported mutations of the Chikongunya virus E2 gene, were used to evaluate the K all index with respect to increasing values of the percentage equality threshold. All p-tools were analyzed except Panther as this only provides a categorical output. As a result (see Figure 1), the percentage equality threshold was set to 5%, with an intermediate value between 0% (no threshold) and that value where (∼10%) K all falls to its minimum. Users of p-tools might be additionally interested in the identification of pairs of p-tools showing not only a considerable proportion of disagreements but a particular form of them, that involving opposite predictions, i.e., m i ≺ S m j and m j ≺ P m i . In this case, the K strong index can be used: While the K all index measures the proportion of pairs of predictions for which conflicting orderings are observed, the K strong index focuses only on extreme conflicting orderings. In practice, users might use the K all index for the identification of similar p-tools looking for K all values close to one. Conversely, users might use the K strong index for the identification of different p-tools looking for K strong values close to zero. Beyond these considerations, the ranges and the directions of K strong and K all are similar so that values closed to 1 indicate that pair of p-tools are likely to order all pairs of mutations in a similar way, while values closed to 0 indicate they are likely to order them differently. Similar counting arguments to those used with the K all index, can be used to asses the effect of percentage equality thresholds on the K strong index. Differently from K all , a monotonic decreasing behaviour is observed for K strong for increasing values of the percentage equality threshold. However, since we expect that K strong only dissects the inner consistency information already provided by its more general K all counterpart, practical K strong evaluations were performed with the percentage equality threshold derived from K all independent studies (5%).
Users of K all and K strong are generally interested in the evaluation of inner consistency aspects of p-tools predictions. In this regard, both K all and K strong rely on the consistency of preferences exhibited by pairs of p-tools across pairs of mutations. However, consistent preferences might hide quite different mutation effects. Without loss of generality, let us assume a common output scale for the p-tools S and P, and let us consider the mutations m i and m j , i < j ≤ n. In addition, let us assume pairs of predictions t S (m i ) = 0.11 and t S (m j ) = 0.12 issued by S, and t P (m i ) = 0.91 and t S (m j ) = 0.92 issued by P, so that r S (m i , m j ) = r P (m i , m j ) = 1 holds. Although both S and P predict that m i is less damaging than m j , the pairs of predictions are in opposite ranges of the scale and involve quite different effects: While m i and m j might be benign according to S, they are both pathogenic according to P. This toy example points out that inner consistency measurements between pairs of p-tools may require the evaluation of multiple aspects, from the consistency of pairwise preferences to the consistency of the semantics behind individual predictions.
Aiming to shed light on the semantic aspect of p-tools inner consistency measurements, the Spearman's rank correlation coefficient was considered. Briefly, the Spearman's correlation [24] between two variables equals the Pearson's correlation between the rank values of the two variables. However, while the Pearson's correlation assesses only linear relationships, the Spearman's correlation assesses general monotonic relationships, whether linear or not. For n distinct mutations, Spearman's rank (ρ s ) correlation coefficient is associated to predictions issued by p-tools S and P can be computed using the following popular formula: where d i is the difference between the ranks assigned to the i-th mutation by S and P, i ≤ n. In the case of identical predictions, the average value of their ascending ranking positions is used. Although correlation coefficients are intended to measure the "strength of pairwise relationships", they might be confused by unclear rankings like those induced by p-tools with numerical outputs. On the other hand, although neither the K all nor the K strong indices consider the absolute position of p-tool predictions, i.e., their semantic aspect, they are not confused by small differences in numerical prediction values due to the introduction of the equality threshold for preference relationships. As a result, both K all and K strong are good candidates for making productive evaluations of p-tools inner consistency aspects.

Outer Consistency Analysis
Standard information retrieval metrics including the accuracy, the precision, the recall, the F1-score, and the Matthews correlation coefficient (MCC) were considered to evaluate the outer consistency of p-tools: where TP, TN, FP, and FN stand for the number of true positive, true negative, false positive, and false negative predictions respectively. It is worth noting that special care should be taken with the above metrics when analyzing highly imbalanced datasets like those induced in experiments involving the high throughput screening of genetic mutations. Fortunately, the human being is a highly robust system, thus we expect most of the SNVs to be negative examples (benign mutations). Therefore, the accuracy is not a good metric for measuring the outer consistency of p-tools as a naive predictor set to predict only TN mutations would achieve a very high accuracy. On the other hand, the precision metric is useful to measure the proportion of mutations predicted as positive examples that were indeed TP predictions (pathogenic mutations). Similarly, the recall metric is useful to measure the proportion of positive examples that were indeed TP predictions, with respect to the ground truth for positive examples. Both the precision and recall metrics disregard TN predictions. There is also often an inverse relationship between the precision and recall metrics so that it is possible to increase one of them at the expense of reducing the other; the F1-score, originally defined for document classification problems where TN predictions also do not matter, is defined as the harmonic mean of the precision and recall metrics. Finally, the MCC is a statistic robust to differences in the proportion of negative and positive examples that can be more appropriate than the F1-score when negative examples matter is some way. The MCC is called a correlation coefficient because it is −1 when predictions are completely wrong, 1 when they are completely correct, and 0 when they are not better than random predictions.
In order to analyze the outer consistency of p-tools, their outputs were binarized. Align GVGD predictions in "C0" and "C15" classes were considered negative examples (benign) and predictions in the "C45", "C55", "C65" classes were considered positive ones (pathogenic). Similarly, Provean predictions in the "Neutral" class were considered negative examples and predictions in the "Deleterious" class were considered positive ones. On the other hand, Panther predictions in the "Benign" class were considered negative examples and predictions in the "Damaging" class were considered positive ones. For Strum and Cupsat, predictions with ∆∆G >= 0 were considered negatives examples, while predictions with ∆∆G < 0 were considered positive ones. Finally, Polyphen2 predictions in the "Benign" class were considered negative examples and predictions in the "Probably" class were considered positive ones. In all the cases, p-tool predictions involving intermediate categories were disregarded for the outer consistency analysis.

Inner Consistency Results
Inner consistency measurements accomplished by means of the K all and K strong indices are shown in Tables 1 and 2, respectively. The most "similar" and the most "different" p-tools identified by the K all and the K strong indices respectively, are highlighted in bold. Based on K all , the Provean and Align GVGD are the most similar p-tools. Based on K strong , the Polyphen2 and Align GVGD are the most different p-tools. As expected, K strong achieve larger values than K all ; this is reasonable as K strong only considers opposite preference relationships. The P-tools abbreviations are: Provean (Prov), Align GVGD (Gvgd), Cupsatd (Cupd) , Cupsatt (Cupt), Panther (Pthr), and Strum (Strm). Table 1. The inner consistency between pairs of p-tools measured by the K all index, set to work with a 5% percentage equality threshold. The elements above the diagonal correspond to the BRCA1-SGE dataset while the elements below it correspond to the BRCA1-HDR dataset. The P-tools abbreviations are: Polyphen(Pph2), Provean (Prov), Align GVGD (Gvgd), Cupsatd (Cupd) , Cupsatt (Cupt), Panther (Pthr), and Strum (Strm). In addition, Table 3 shows the inner consistency measurements accomplished by the Spearman's correlation coefficient. These results show that many of the p-tools are poorly correlated. In principle, this may be attributed to differences in the semantic of predictions in each p-tool scale and/or the sensitivity of the Spearman's correlation coefficient to p-tools with numerical outputs. For both the BRCA1-SGE and BRCA1-HDR datasets, the most correlated p-tools are Provean and Align GVGD, whose correlation coefficients are highlighted in bold. This is reasonable as both p-tools use sequence alignments to predict the effect of mutations.  To shed light on the type of inner consistency information that K all and K strong are able to provide, we analyzed them against the Spearman's correlation coefficient. In Figure 2, K all and Spearman appear related to each other in some degree. We note, however, that while both Cupsatt and Cupsatd are poorly correlated with almost all the other p-tools according to Spearman, they are close to many other p-tools according to K all . On the other hand, both K all and Spearman show that Provean and Align GVGD are highly correlated. Finally, Figure 3 shows that K strong and Spearman are clearly uncorrelated. Remarkably, while K strong identifies Polyphen2 and Align GVGD as the most different p-tools, Spearman identifies Cupsatd and Cupsatt as the most negatively correlated ones. Although the K strong result makes sense since Polyphen2 and Align GVGD use different learning strategies and information sources, the Spearman result does not make sense since Cupsatd and Cupsatt are variations of the same algorithm (Cupsat) on the same information source.

Outer Consistency
The measurement of p-tools outer consistency is shown in Table 4. Only the information about TP and TN predictions is shown together with the MCC and F1-score statistics. Accuracy, precision, and recall metrics are shown in Appendix A. The BRCA1-SGE dataset has a rather imbalanced distribution of positive and negative samples (G = −0.56). Three of the p-tools, Align GVGD, Cupsatt, and Panther, correctly predict more than 78% of the positive examples (TP). However, only Panther reasonably predicts negative ones (57%). We note, however, that the three p-tools also introduce many false positive predictions (see Appendix B). Based on the MCC and the F1-score, we can say that the best compromise in the prediction performance is achieved by Panther. The BRCA1-HDR dataset is highly imbalanced (G = −0.88). For this dataset, four of the p-tools, Align GVGD, Cupsatt, Panther, and Strum, correctly predict most of the positive examples (TP). However, only Panther reasonably predicts negative ones (74%). Based on the MCC and the F1-score, none of the p-tools achieved an acceptable prediction performance. This may be due to many false positive predictions (300 on average) with only 59 TP (see Appendix A). Provean does not predict any mutations as positive, making the F1-score and MCC equal to 0. On the whole, Panther achieves the best compromise in prediction performance for the considered p-tools on average. However, its prediction performance remains poor. Finally, our results show that although most of the mutations reported for the BRCA1 gene are of the benign type, p-tools tend to classify them as pathogenic from the observed high rates of false positive predictions.

Conclusions
A number of bioinformatics tools have been developed to predict the impact of SNVs on the functionality of protein coding genes. The stronger the agreement between tools that use different prediction approaches and independent sources of information, the greater the confidence we can have in their predictions. Evaluating the level of confidence is particularly important when predictions are used to guide experimental research studies or clinical decisions. In this paper, a computational framework for evaluating the confidence of six tools that predict the impact of SNVs on protein coding genes has been presented. With this aim, two indices called K all and K strong have been introduced. The proposed indices can evaluate the consistency of predictions issued by different tools (inner consistency) without requiring the specific understanding of their outputs. Using these indices, the most similar and most different prediction tools can be identified. As a result, these indices can help to accelerate the understanding of new prediction tools. Last, these indices can help to design truly diverse ensembles of prediction tools, a fundamental requirement for improving the confidence of individual members of the ensemble.
Inner consistency studies were complemented with outer consistency studies focusing on the extent to which predictions matched the experimental results reported in literature. Without loss of generality, experimental data involving the high throughput screening of genetic mutations on the BRCA1 gene were considered. The outer consistency studies confirmed the importance of selecting suitable information retrieval metrics since reference datasets are expected to be highly imbalanced. In general, the prediction performance of the tools was rather low with a clear trend towards the introduction of false positive predictions. On the whole, our results highlight the importance of understanding the intrinsic limitations of tools dealing with the prediction of SNV effects on protein coding genes.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. P-Tools Main Features
PolyPhen-2 http://genetics.bwh.harvard.edu/pph2/-(Polymorphism Phenotyping v2) is a software tool for predicting the possible impact of an amino acid substitution on the structure and function of a human protein. It is based on a number of sequence, phylogenetic, and structural features characterizing the substitution. Predictions are performed by a naïve Bayesian classifier. The sequence-based features include position-specific independent Count (PSIC) scores, multiple sequence alignment (MSA) properties, and the position of mutations with respect to domain boundaries as defined by Pfam [25]. The structure-based features include solvent accessibility, changes in solvent accessibility for buried residues, and crystallographic B-factor. Two pair of datasets, namely HumDiv and HumVar, can be used for the generation of the corresponding classification models. The default classification model uses HumDiv data and is preferred for evaluating rare alleles, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection. The HumVar classification model is better suited for the diagnostics of Mendelian diseases which require distinguishing mutations with drastic effects from all the remaining human variations, including abundant mildly deleterious alleles.
The PolyPhen-2 output is a table with a classifier label of the type benign/possible damaging/ probably damaging, a classifier probability of the mutations being damaging, a classifier model False Positive Rate (1-specificity) at the above probability, and a classifier model True Positive Rate (sensitivity) at the above probability. In this work, the probabilities of the mutations being damaging were considered for the inner consistency analysis and a binarization of categorical outputs was used for the outer consistency analysis. In both studies, the HumVar classification model was selected.
Provean http://sift.jcvi.org/-(Protein Variation Effect Analyzer v1.1.3) is a software tool for predicting whether an amino acid substitution has an impact on the biological function of a human or mouse protein. It is based on the change, caused by a given variation, in the similarity of the query sequence to a set of its related protein sequences. For this prediction, the algorithm is required to compute a semi-global pairwise sequence alignment score between the query sequence and each of the related sequences. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. The output prediction information of this tool is a table with a prediction label column and a score column. If the score is equal to or below a predefined threshold (e.g., −2.5), the protein variant is predicted to have a "deleterious" effect. If the score is above the threshold, the variant is predicted to have a "neutral" effect.
Align GVGD http://agvgd.hci.utah.edu/-(Grantham Variation and Grantham Deviation) is a software tool that combines the biophysical characteristics of amino acids and protein multiple sequence alignments to predict where missense substitutions in genes of interest fall in a spectrum from enriched deleterious to enriched neutral. The output prediction information of this tool is a table with a score column that represents an extension of the Grantham difference, to score missense substitutions against the range of variations present at their position in a multiple sequence alignment and a categorical column with seven classes ordered from most likely to interfere with function to least likely.
Strum https://zhanglab.ccmb.med.umich.edu/STRUM/-(Structure based Prediction of Protein Stability Changes Upon Single-point Mutation) is a software tool for predicting the fold stability change (∆∆G) of protein molecules upon single-point mutations. Strum adopts a gradient boosting regression approach to train the Gibbs free-energy changes on a variety of features at different levels of sequence and structure properties. The unique characteristic of Strum is the combination of sequence profiles with low-resolution structure models from protein structure prediction, which helps to enhance the robustness and accuracy of the method and make it applicable to various protein sequences, including those without experimental structures. The output prediction information of this tool is a column with the ∆∆G value of each mutation.
Cupsat http://cupsat.tu-bs.de-(Cologne University Protein Stability Analysis Tool) is a software tool for predicting changes in protein stability upon point mutations. It uses structural environment specific atom potentials and torsion angle potentials to predict ∆∆G, the difference in free energy of unfolding between wild-type and mutant proteins. To improve accuracy and specificity of predictions, the mutations and mean-force potentials were classified according to different structural regions. Initially, the secondary structure specificity of mutations and mean-force potentials was implemented and the amino acids were classified into helices, sheets, and others. Later, the amino acids belonging to each of these secondary structure elements were further subdivided according to their solvent accessibility.
This method requires the primary and secondary structure information (PDB file) and can be run with two different experimental methods: thermal and denaturants, referred as Cupsatt and Cupsatd within the manuscript, respectively. The output prediction information is a table with a categorical column indicating the overall stability of the mutation (Stabilising or Destabilising), a categorical column with the torsion information of the mutation (Favourable or Unfavourable) and a numerical column of the Predicted ∆∆G (kcal/mol) value. The numerical information was used in our analysis. Some amino acids of the BRCA1 structure were not present in the PDB file used (ID: 1jm7 and 4y2g) and therefore were not considered in the comparisons.
Panther http://www.pantherdb.org/-(Protein Analysis Through Evolutionary Relationships v15.0) is a software tool that calculates substitution position-specific evolutionary conservation (subPSEC) scores based on alignments of evolutionary related proteins to predict the pathogenicity. The alignments are obtained from the PANTHER library of protein families based on Hidden Markov Models (HMMs). The subPSEC score describes the amino acid probabilities, and in particular, positions among evolutionary related sequences. The output prediction information of this tools is a categorical column indicating whether the mutation may or may not affect the functionality of the protein.