The main objective of this study was to use a combination of computational approaches including MD and DRN analysis to characterise CA-VIII, and to investigate the effects of phenotype associated SNVs on protein structure and function.
2.1. Data Retrieval Identifies SNVs Pathogenic to CA-VIII
The Ensembl [
34] and Human Mutation Analysis (HUMA) [
35] databases identified three pathogenic nsSNVs and two benign SNV (see
Table 1). An additional variant G162R was identified from literature studies [
32]. It was noted that although G162R has been associated with CAMRQ3 [
32], ClinVar and OMIM have not reported any phenotype associations. From the data in
Table 1 it is observed that multiple SNVs can occur at the same position within CA-VIII and have either the same or different rs ID. For example
rs267606695 indicates two variations for residue 100; S100A and S100P. These variations have the same rs ID and demonstrate that at position 100, Ser can either be mutated to an Ala or a Pro residue. Of the six identified SNVs, VAPOR (Variant Analysis Portal) [
35] (
Table 1) shows that I-Mutant [
36] and MUpro [
37] predicted stability reduction in all. With respect to the clinical significance of the variants, S100L and E109D are regarded as benign. The results obtained for
G by the two programs MUpro and I-Mutant differ somewhat, reflecting their previously reported accuracy limits [
36,
37]. I-Mutant uses supported vector machines and has been trained to predict
G values. The server offers higher accuracy when 3D structures are located within the PDB (80%), however in our case since prediction was sequence based and the variants have no crystal structures within the PDB this was less accurate (77%). Correlation of experimental vs predicted mutation stability changes of 0.71 and 0.62 (structure and sequence respectively) have also been reported for the server [
36]. Thus the predictions of decreased stability presented in
Table 1 should be regarded as tentative. Variants S100P, G162R and R237Q are associated with CAMRQ3 [
16,
28,
32,
38,
39]. The minimum allele frequency (MAF) is also presented in
Table 1 and shows that all variants occur at a frequency less than 1% of the population except for E109D.
As no variant CA-VIII crystal structures exist, WT and variant proteins were modelled using MODELLER [
40].
Table 1 shows the z-DOPE (normalised discrete protein energy) scores of the variant models calculated. All calculated models have a z-DOPE score less than −1.00 indicating that the variant homology models are of high quality.
To further understand potential SNV effects on CA-VIII structure and function, the relationship between SNV location and protein secondary structure was investigated.
Figure 1 shows the 3-dimensional (3D) SNV location on CA-VIII. The SNVs S100A, S100L and S100P are located at the end of a beta (
) sheet, while E109D, G162R and R237Q are located within a loop secondary structure. Structurally the substitution of Ser at position 100 for a Pro results in the complete destruction of the respective
-sheet and the adjacent shorter
-sheet (residues 71–73). This destruction would result in the loss of hydrogen bonds between the
-sheets, thereby having an impact on protein function and stability. Previous research [
8] suggested that S100P could have an effect on the loop residues 147–162, however no variant associated structural changes were noted for these residues.
2.2. Functional Analysis Reveals Key Protein–Protein Interaction Residues
As previous research had identified CA-VIII residues 44–290 as the minimum binding site amino acids, SiteMap [
41,
42] and CPORT (Consensus Prediction Of interface Residues in Transient complexes) [
43] were used to identify the potential residues participating in the association of CA-VIII with ITPR1.
SiteMap utilises an algorithm similar to Goodford’s GRID algorithm [
44] whereby energetic and geometric properties are used to select site points, followed by the preparation of contour maps based on the computation of hydrophobic and hydrophilic properties at each grid point [
42]. The CPORT server integrates PIER (Protein IntErface Recognition) [
45], cons-PPISP (consensus Protein-Protein Interaction Site Predictor) [
46], ProMate [
47], SPPIDER (Solvent accessibility based Protein-Protein Interface iDEntification and Recognition) [
48] and PINUP (Protein Interface residUe Prediction) [
49] to predict protein-protein interaction residues.
SiteMap results revealed that only four of the top five binding sites discovered had SiteScores >0.80 (
Table S1). The SiteScore represents a weighted average of the number of sites, hydrophobic and enclosure scores, and sums up-to 1.0. Binding and non-binding sites can accurately be determined by a SiteScore of 0.80. The SiteMap identified protein-protein interaction residues were located on the exterior surface of the protein across all binding sites. Each binding site discovered contained more than 30 residues, with the exception to binding site 5 that contained less than 20 residues. Binding site 1 contained the greatest number of residues. Identified binding site amino acids are presented in
Table S1. Analysis of SNV positions and SiteMap data indicates that the variants are located on binding sites 2 (R237Q), 4 (S100A, S100L, E109D and S100P). Variant G162R is the only SNV not located within a binding site. The numerous binding site residues make it difficult to isolate the most important residues for protein-protein interactions, and with limited research on CA-VIII, selection of the most important binding site residues using SiteMap alone is difficult. CPORT predictions were therefore also performed to identify potential binding site residues and results are presented in
Table S1. As observed with SiteMap, the binding site residues contain more than 30 amino acids. It was also noted that R237Q was the only variant located within the binding site.
To enhance residue identification, SiteMap and CPORT results were merged to obtain a consensus of the binding site residues. The residues identified by both SiteMap and CPORT were regarded as the main binding site residues for CA-VIII, and results are presented in
Table S1 and
Figure 1. Data shows that a consensus of 38 binding site residues were identified, with R237Q being the only SNV occurring within these residues. Results also indicate that the majority of binding site residues are located between residues 44–290, therefore agreeing with the minimum binding site residues previously discovered [
6]. The data not only expands on this previous research by identifying the potential binding site residues within the range, but also identifies the residues 26–40 as also important. This observation is supported in previous literature whereby cleavage of the first 43 N-terminal residues resulted in a 16-fold decrease to CA-VIII activity [
6]. Results also demonstrate that the N-terminal (green) and C-terminal (red) residues are located within close proximity to each other. In 2013 research by Aspatwar et al. [
8] found that the CA-VIII region containing residues 150–157 could interact with ITPR1. Results in
Table S1 and
Figure 1 agree with this finding as Gly151 and Ile153 were identified as potential binding site residues. It is currently not known as to whether CA-VIII interacts with other cellular proteins, therefore the identified binding site residues could interact with proteins other than ITPR1. For this study all identified residues have been assumed to interact with ITPR1 only.
2.3. Sequence Analysis Identifies Residues Essential to CA-VIII Structure
Previous analysis into the acatalytic CA isoforms by Aspatwar et al. [
33] demonstrated phylogenetic relationships between CA-VIII, CA-X and CA-XI. Their research however did not identify conserved residues regions important for stability and/or function within the protein. Noting this research gap, protein sequence analysis was performed to identify structurally and functionally important residue regions. Essential residues are expected to be highly conserved across different species.
Expanding on the previously identified CA-VIII binding site residues, the sequences and structures of CA-II and CA-VIII were compared. Although protein sequences share 40% identity, their 3D structural alignment shows an root mean square deviation (RMSD) difference of 1.302 Å (CA-II
WT homology model from [
50], and modelled CA-VIII
WT structure) indicating structural similarity. The CA-II and CA-VIII protein sequences share 40.81% identity and their alignment is presented in
Figure S1. However noting the similar structures, the sequence alignment was used to map important CA-II residues onto CA-VIII to assist in essential residue identification. Data in
Table S2 presents the mapped CA-II and CA-VIII protein residues and their potential function based on the sequence alignment.
From
Table S2 it is noted that the residues are divided into two groups; those that are important to catalytic function for example; CO
2 binding site residues, active site water network residues and Zn
2+ coordinating residues, and residues responsible for maintaining protein stability [
51,
52]. As CA-VIII is acatalytic the ringed amino acids were regarded as being important for the maintenance of protein stability. These residues include; Trp29, Tyr31, Trp37, Phe41, Phe117, Trp119, Phe201 and Phe250. In addition, Ser50, Leu93, Val115, Ile198 and Arg275 though not aromatic could assist with protein stability. For the remaining residues the amino acid substitutions could have other physiological functions not evident from the alignment. Previous studies have shown that the replacement of Arg116 with a His amino acid restores CO
2 hydration activity in CA-VIII [
53], and therefore these residues could be of importance then. Currently it is unknown whether the non-aromatic amino acids in
Table S2 have an adaptive role in the function of acatalytic CAs, and further research is required. Interestingly, the Arg116 substitution that prevents CA-VIII from coordinating Zn
2+ was also predicted as a potential binding site residue. This could indicate possible acatalytic adaption for Arg116. It however remains unclear as to whether the catalytic CA-VIII
His116 mutant in the previous study [
53] was also capable of associating with ITPR1, which would assist in the discovery of potential adaptive roles of Arg116.
2.4. Variant Presence Causes Conformational Changes to CA-VIII
With VAPOR results indicating stability decreases, RMSD for each MD simulation frame was calculated for the WT and variant proteins and results are presented in
Figure S2A. Results demonstrate that G162R shows greater structural changes during MD simulation compared to the other proteins, suggesting potential variant instability. Results in
Figure 2A present RMSD distributions demonstrating the Kernel density estimation (KDE) conformational sampling of CA-VIII during MD. The KDE is a non-parametric statistical procedure used to calculate the probability density function (PDF) of a variable. KDEs are closely related to histograms, and offer an advantage whereby there is no information loss through binning as observed in histograms. KDEs smooth the data improving interpretation allowing easier determination of distribution shape. Peaks indicate the RMSD of the most sampled protein conformation, whereas, width is indicative of the number of conformations sampled.
S100A maintains the most similar conformations to that of the WT during MD, whereas S100L exhibits the largest RMSD difference from the WT protein. As S100L is benign, this could suggest that the pathogenic effects of S100A, S100P, G162R and R237Q may not be due to global conformational changes but could be as a result of localised changes to protein residues. Though S100A, S100L and S100P all occur at the same position,
Figure 2A demonstrates that S100A and S100P have the greatest structural overlap with the WT protein. It is also observed that S100L and R237Q form two distinct conformational clusters. S100L structures cluster at approximately 1.1 Å and 1.9 Å, while R237Q forms clusters at 1.5 Å and 2.5 Å. The S100L and R237Q conformations sampled at 1.1 Å and 1.5 Å are however sampled to a lesser extent. Evidence of variant associated instability of G162R is observed in
Figure 2A through the presence of three potential peaks at 2.3 Å, 2.8 Å and 3.5 Å suggesting three potential major conformational clusters. The first two clusters share some structural overlap with the WT protein. The conformations sampled at 2.3 Å and 3.5 Å are however sampled at a lower frequency during MD simulation. These G162R observations are in agreement with the RMSD findings in
Figure S2A.
Data in
Figure 2A also shows that S100P samples less conformations than the WT and other variants (distribution width). Previous research by Turkmen et al. in 2009 [
16] into the effects of S100P on CA-VIII structure and function suggested that S100P was associated with a reduction to protein stability. Further research in 2010 by Aspatwar et al. [
33], suggested that substitution of a Ser by Pro at the
-sheet end (
Figure 1) would result in shorter and more constrained adjacent loops as an effect of poor protein folding [
33]. The poor protein folding is supported by the
-sheet destruction. More constrained loops could also explain the smaller conformational sampling and the increases to protein rigidity observed. Variant rigidity increases could have an impact on the allosteric effect of CA-VIII on ITPR1, and proteins could be too constrained to cause significant conformational changes within the receptor.
To observe the effects of SNV associated conformational sampling on protein compactness, Rg analysis was conducted and the results are presented in
Figure 2B and
Figure S2B. It is noted that the mutant values are smaller than that of the WT by up to 1.41% (
Table 2). Although smaller Rg value may indicate increased stability, this small change does not contradict the stability prediction results presented in
Table 1. Comparisons of the contributions to structural differences between the RMSD and Rg metrics are presented in
Table 2 and data illustrates that the structural differences between the WT and variant CA proteins are better indicated by the RMSD.
2.6. Variant Presence Is Associated with Changes to Residue Accessibility and Communication
In the previous sections RMSF and DCC highlighted at variant associated effects on the motion and flexibility of protein residues. In this section DRN analysis was used to investigate whether SNV presence has an effect on residue accessibility and communication.
Figure S4A presents the
L (change to residue accessibility) of WT and variant proteins (WT − variant). A negative
L in suggests that the variant protein residues are moving away each another and are less accessible, whereas a positive
L indicates that the residues in variant proteins are moving closer to each other and are more accessible. A
L value of 0 indicates no changes to residue accessibility between WT and variant proteins.
Comparing the variant proteins to each other, data shows that most protein residues maintain a
L close to 0. This result indicates that for the majority of the protein there are subtle to no residue accessibility changes. To identify residues showing the most significant changes to accessibility, amino acids with a
L greater than or less than two standard deviations were calculated, and results are presented in
Table 3. From the data in
Table 3 it is observable that majority of the amino acids comprise of the Glu rich N-terminal residues (residues 21–36) for all variants.
The variant S100L shows an unexpected result in whereby N-terminal residues show both an increase (residues 26–29 and 35) and decrease (residues 32–34) to accessibility. In addition to residues 32–34 becoming less accessible it is noted that residues 263–265 also show a reduction to accessibility. E109D
L results show a similar trend to that observed for S100L. Accessibility increases are present in binding site residues 26–29, and decreases observed in residues 31–33. With increases and decreases to accessibility occurring to residues in close proximity this could indicate the existence of a possible compensatory mechanism, whereby as one group of residues move closer together, another group moves further apart to maintain binding site integrity. This compensatory mechanism could explain the benign clinical significance (
Table 1). As the green and red regions are next to each other (see
Figure 1) the changes to residue accessibility could also assist with the maintenance of binding site integrity within CA-VIII.
Comparing with the pathogenic variants, results suggest that the increase and decrease to residue accessibility to maintain binding site integrity may have to occur for multiple adjacent binding site residues. Pathogenic SNVs show accessibility increases to binding site residues (residues 26–29), however this effect is not compensated for by accessibility decreases to other multiple adjacent binding site residues (residues 31–35). The L decreases only occur to isolated residues and do not span multiple residues. Accessibility increases to Trp29 could also assist with stability maintenance in the protein.
To fully understand the changes residue accessibility could have on residue communication, average
BC was calculated, and results are presented in
Figure 5. The higher the average
BC the more important the residue is for communication within the protein. Data in
Figure 5 demonstrates that the residues; Glu139, Ile165, Ala167, Val231, Trp233 and Asn273 are the most important residues for communication within CA-VIII. Using sequence alignment (
Figure S1) these amino acids map onto the CA-II residues; Glu117, Val142, Gly144, Val206, Trp208 and Asn243 which are of functional importance to CA-II (
Table S2. These CA-II residues have also previously been identified as important for communication [
50]. Additionally, residues Tyr113, His118, Glu128 and His129 are also associated with high average BC. Comparison of these residues with
Table S2 demonstrates that His118 (His96
CA-II) and Glu128 (Glu106
CA-II) are of catalytic importance to CA-II. As CA-VIII is non-catalytic, this could highlight at acatalytic adaptations of these residues that could assist with the function and/or stability of CA-VIII. Tyr113 has also been identified as a potential binding site residue (
Figure 1).
Data in
Figure 5 shows an interesting finding whereby Asn273 in S100L is associated with the highest average
BC of all residues in all the proteins. In addition, Trp29 and Gly30 in E109D show high average
BC compared to the WT and other variants. Changes to the usage of these residues could indicate a compensatory measure to maintain structural stability through Trp29 and binding site integrity through Gly30.
Figure S4B presents the average
BC of the WT and variant proteins (WT − variant). An average
BC of 0 indicates that there is no change to communication of the residue within the variant protein. Positive and negative average
BC indicate a decrease and increase to the communication of variant protein residues respectively. Results in
Figure S4B show that, unlike results observed with
L, numerous protein residues have
BC values greater or less than 0 suggesting that SNV presence has some effect on residue communication.
Data in
Table 3 indicates the residues from
Figure S4B demonstrating changes to average
BC greater than or less that two standard deviations. From the data it is observable that there are no observed accessibility and communication changes to the SNV positions apart from G162R. This suggests that substitutions at positions 100 (S100A, S100P and S100L), 109 (E109D) and 237 (R237Q) are not associated with direct changes to residue communication, indicating allosteric effects to the structure and function of CA-VIII. It is observed that the variant R237Q has the most residues showing decreases to residue communication, while G162R shows a reduction to the communication of the most binding site residues.
Analysis of the residues demonstrating a
BC increases in
Table 3 highlights the possible variant mechanisms of action. Results illustrate a reduction in at least two aromatic cluster residues in all variants, with Trp37 communication reduction common in all variants. Reductions to the usage of either of the N-terminal aromatic residues Trp29, Trp37 and Phe41 could also explain the poor stability observed in previous research [
16] and the development of CAMRQ3, and the stability reductions (negative
G) observed in
Table 1. These amino acids are important to CA-VIII structure (
Table S2. This could also explain the lack of correlation to residue movement observed in the variant DCC results (
Figure 3). Lys96 though its importance to CA-VIII is yet to be determined, it is associated with a reduction to residue usage in all variants with the exception to G162R. In addition, there is also a reduction to usage of the binding site residues which could affect CA-VIII interactions with ITPR1 and result in the dysregulation of Ca
2+ homeostasis. Interestingly, with the exception to S100P and G162R the other variants show an increase in the use of Trp29 which is responsible for stability. The increase to the residue usage could signify a compensatory measure within the other variants in order to maintain enzyme stability.
Assessment of the benign variants in
Table 3 indicates that S100L and E109D are associated with usage reductions in the fewest stability maintenance residues (green and orange colours) compared to the pathogenic variants. This suggests that the benign variants do not destabilize the CA-VIII to as a great extent compared to the other variants. This is further supported by the negative
G data
Table 1.