Gamma Carbonic Anhydrases from Hydrothermal Vent Bacteria: Cases of Alternating Active Site Due to a Long Loop with Proton Shuttle Residue

: Accelerated CO 2 sequestration uses carbonic anhydrases (CAs) as catalysts; thus, there is much research on these enzymes. The γ -CA from Escherichia coli (EcoCA- γ ) was the ﬁrst γ -CA to display an active site that switches between “open” and “closed” states through Zn 2+ coordination by the proton-shuttling His residue. Here, we explored this occurrence in γ -CAs from hydrothermal vent bacteria and also the γ -CA from Methanosarcina thermophila (Cam) using molecular dynamics. Ten sequences were analyzed through multiple sequence alignment and motif analysis, along with three others from a previous study. Conservation of residues and motifs was high, and phylogeny indicated a close relationship amongst the sequences. All structures, like EcoCA- γ , had a long loop harboring the proton-shuttling residue. Trimeric structures were modeled and simulated for 100 ns at 423 K, with all the structures displaying thermostability. A shift between “open” and “closed” active sites was observed in the 10 models simulated through monitoring the behavior of the His proton-shuttling residue. Cam, which has two Glu proton shuttling residues on long loops (Glu62 and Glu84), also showed an active site switch affected by the ﬁrst Glu proton shuttle, Glu62. This switch was thus concluded to be common amongst γ -CAs and not an isolated occurrence. Ö.T.B.


Introduction
Carbonic anhydrases (CAs) are fast catalysts for the interconversion of carbon dioxide (CO 2 ) to bicarbonate ions (HCO 3 − ) [1] and, for this reason, have been considered as ideal CO 2 sequestration agents. The discovery of CAs emanated in the late 1980s with the first CA having been identified from bovine blood. Since then, eight different CA classes have been discovered, and these include the α-, β-, γ-, δ-, η-, ζand θ-CAs and ι-CAs [2][3][4][5][6][7]. They contain a catalytic metal ion in the active site, which is usually zinc (Zn 2+ ), but it has also been observed to be iron (Fe 2+ ) or cadmium (Cd 2+ ) in some γ-CAs [8][9][10][11][12]. γ-CAs, which are one of eight known classes, exhibit a unique trimeric biological assembly unlike in the αand β-CAs, which are dimeric and tetrameric, respectively. Three active sites are found in the γ-CAs, with each one found between two monomers as seen in the structure of the γ-CA from the ubiquitous bacterium Escherichia coli (EcoCA-γ) ( Figure 1A). Each monomer is composed of a long helix at the C-terminal and beta-helical prism [1,8]. Similar to the α-CAs, coordination of the Zn 2+ metal ion is tetrahedral and fulfilled by three His residues and a water molecule [8,13,14]. Unlike the α-CAs, however, two of the His residues are from one chain and the third comes from the neighboring chain. The Zn-H 2 O is converted to Zn-OH after a proton is shuttled away by a proton-shuttling residue, normally, a His or a Glu in γ-Cas [15,16]. The resulting nucleophile attacks the CO 2 molecule during catalysis in the CO 2 -binding pocket close to the active site, resulting in the formation of a bicarbonate ion [17]. EcoCA-γ has been reported to maintain high catalytic activities with a kcat of 5.7 × 10 5 s −1 [23]. In our previous study, we monitored the switch from "open" to "closed" state via in silico studies in some γ-CAs belonging to the family Aquificae from hydrothermal vents [24]. Their incapability to occupy the latter state was observed to be a result of the short loop on which the His proton shuttle residue resided. These results motivated the present study whose main objective was to monitor MD simulations of γ-CAs with a longer loop, a feature previously suggested to be necessary for this active site switch. The characteristics referred to above include a residue capable of Zn 2+ coordination (His, Glu, Cys, Asp), which is located on a relatively long loop that is flexible enough for purposes of the residue moving in and out of the active site region.
In this study, we mainly focused on γ-CAs from the class Campylobacteria (previously known as Epsilonproteobacteria), which are found in hydrothermal vent regions [25,26], and we followed similar approaches as in [24]. Hydrothermal vents are high-temperature regions that contain organisms adapted to these temperatures. Their γ-CAs are consequently contemplated to be thermostable, a characteristic that is desirable for the harsh conditions presented by the CO2 sequestration process, thus their investigation in this study [2]. The Campylobacteria γ-CAs were selected because of the presence of a longer loop with the proton-shuttling His residue compared to those from the family Aquificae. The protein sequences were retrieved and aligned along with those from our previous study [24]. Their 3D structures were calculated using homology modeling, and residues that contribute to interface formation were probed using various web servers. Most of these residues were later identified as high communication residues using average betweenness centrality (BC) analysis [27] following molecular dynamics (MD) simulations. Both the multiple sequence alignment (MSA) and motif analysis revealed high residue and motif conservation across the sequences, respectively. The CAs proceeded for MD simulations, which were also performed for Eco- The γ-CA from the ubiquitous bacterium Escherichia coli (EcoCA-γ), which has two crystal structures with Protein Data Bank (PDB) IDs of 3TIO and 3TIS, emerged as the first γ-CA to possess an active site capable of alternating between a "closed" state and an "open" state [18] (Figure 1), an occurrence common in β-CAs [19][20][21][22]. This switch in EcoCA-γ is made possible by a His residue responsible for the shuttling of a proton from the Zn-bound water molecule during catalysis. The residue is positioned close to the catalytic site on a loop that is 11 residues long. The His residue occasionally coordinates the Zn 2+ , obstructing the active site opening, as shown in Figure 1C.
EcoCA-γ has been reported to maintain high catalytic activities with a k cat of 5.7 × 10 5 s −1 [23]. In our previous study, we monitored the switch from "open" to "closed" state via in silico studies in some γ-CAs belonging to the family Aquificae from hydrothermal vents [24]. Their incapability to occupy the latter state was observed to be a result of the short loop on which the His proton shuttle residue resided. These results motivated the present study whose main objective was to monitor MD simulations of γ-CAs with a longer loop, a feature previously suggested to be necessary for this active site switch. The characteristics referred to above include a residue capable of Zn 2+ coordination (His, Glu, Cys, Asp), which is located on a relatively long loop that is flexible enough for purposes of the residue moving in and out of the active site region.
In this study, we mainly focused on γ-CAs from the class Campylobacteria (previously known as Epsilonproteobacteria), which are found in hydrothermal vent regions [25,26], and we followed similar approaches as in [24]. Hydrothermal vents are high-temperature regions that contain organisms adapted to these temperatures. Their γ-CAs are consequently contemplated to be thermostable, a characteristic that is desirable for the harsh conditions presented by the CO 2 sequestration process, thus their investigation in this study [2]. The Campylobacteria γ-CAs were selected because of the presence of a longer loop with the proton-shuttling His residue compared to those from the family Aquificae. The protein sequences were retrieved and aligned along with those from our previous study [24]. Their 3D structures were calculated using homology modeling, and residues that contribute to interface formation were probed using various web servers. Most of these residues were later identified as high communication residues using average betweenness centrality (BC) analysis [27] following molecular dynamics (MD) simulations. Both the multiple sequence alignment (MSA) and motif analysis revealed high residue and motif conservation across the sequences, respectively. The CAs proceeded for MD simulations, which were also performed for EcoCA-γ as a control. Additional simulations were performed for the previously characterized γ-CA from Methanosarcina thermophila (Cam) to discern whether a change in active site would occur when the proton-shuttling residue is a Glu [28,29]. Simulations were performed at 423 K considering the hydrothermal vent origin of the retrieved γ-CAs as well as the elevated temperatures applied during CO 2 sequestration. Interestingly, monitoring of the active site revealed the alternation of states in all the γ-CAs, including the control EcoCA-γ validating the aforementioned hypothesis. Further, the stability of most of the γ-CA structures at the high temperature used, particularly EcoCA-γ and the γ-CAs from Lebetimonas natsushimae (γ-LnCA) and Nitratiruptor tergarcus (γ-NtCA), was observed via radius of gyration (R g ), root mean square deviation (RMSD), root mean square fluctuation (RMSF), and dynamic cross correlation (DCC).
This study hereby revealed the first group of γ-CAs, other than EcoCA-γ, that exhibits the two active site states by reason of a nearby proton-shuttling His residue. It also demonstrates this same phenomenon in the well-studied Cam, whose proton shuttling residue is a Glu unlike the His in the rest of the γ-CAs in this study. Our analysis also suggests new viable CO 2 sequestration agents.

Results and Discussion
A total of 10 γ-CA sequences from hydrothermal vent bacteria belonging to the class Campylobacteria were retrieved from NCBI ( Table 1). The sequence lengths had a range between 174 and 179 residues with the query sequence from Caminibacter mediatlanticus (γ-CmCA) showing high sequence identities above 54% to the retrieved sequences. The γ-CmCA had sequence coverages greater than 96%, and significant E-values (below 1 × 10 −3 ) were observed ( Table 1). The numbering of residues was done according to γ-CmCA except in cases explicitly declared otherwise. Table 1. List of γ-CA sequences retrieved from NCBI as well as their attributes. The query coverage, sequence identity, and E-value were obtained from NCBI BLAST, and the taxonomic families were obtained from NCBI Taxonomy. The query sequence is shown in bold. This section is divided into three subsections, which are (i) sequence analyses, (ii) structural analysis, and (iii) molecular dynamics analyses. Previously investigated hydrothermal vent γ-CA sequences from Persephonella hydrogeniphila (γ-PhCA), Persephonella marina (γ-PmCA) and Thermosulfidibacter takaii (γ-TtkCA) were also included in the sequence analyses but not in the other two as those have already been published [24]. Structure and MD analyses included crystal structures from Cam and EcoCA-γ, with the latter being used as a control in active site analysis following the simulations. Sequences that are related normally have functional residues, regions, and motifs that are conserved, and these were probed in this section using sequence alignments and motif analysis. The γ-CAs from the hydrothermal vents showed high residue and motif conservation across the sequences (Figure 2A,B, respectively). Out of the nine motifs found, seven of them were completely conserved ( Figure 2B). Their positions in each γ-CA as well as their E-values are outlined in Table S1 and shown on the alignment in Figure 2A following γ-SlCA numbering given that it was the structure used to map the motifs. γ-TtkCA showed a significant difference from the other sequences by the absence of Motifs 8 and 9, the latter of which was also absent in γ-PhCA and γ-PmCA. Zn 2+ -coordinating residues, His64, located on Motif 4, His90, and His95, both on Motif 1, were completely conserved in the MSA as expected. This also was true for the residues that contribute to the catalytic site, including Arg45 and Asp47 found on Motif 3 as well as Gln60 and Asp61 present on Motif 4 [1,10,16]. The hydrophobic CO 2 pocket previously profiled [24] was also observed as a well-conserved region in Motif 1 across all sequences. The most relevant observation for this study in the Campylobacteria sequences was the presence of His67 on an elongated loop, which was either 14 or 15 residues long and which is shown by the black box in Figure 2A. This observation is dissimilar to the members of Aquificae (γ-PhCA, γ-PmCA, and γ-TtkCA), which had a shorter loop of seven residues each. This similarity to EcoCA-γ ( Figure S1), which is known to shift between "open" and "closed" active sites because of the His on a long loop, suggests that this phenomenon might also be possible for the Campylobacteria γ-CAs. His67 was found on Motif 4, which formed part of the long loop. The end of this loop was found on Motif 9, which was absent in γ-PhCA, γ-PmCA, and γ-TtkCA due to their shorter loop. Overall, MSAs and motif analysis showed high levels of conservation across the hydrothermal vent γ-CA sequences.

Evolutionary Relationships amongst the γ-CAs Are Investigated Using Phylogeny
The phylogeny of the sequences was inferred using the constructed tree in Figure 3 and was used to discover their evolutionary relationships. Amongst the nine trees calculated, the best tree that matched its respective consensus tree, was that generated using the Le Gascuel protein model [30] with discrete gamma distribution (LGG) with 100% gap deletion. Given the high residue conservation observed in the alignment, a close relationship was deduced amongst these γ-CAs. This was confirmed by the pairwise sequence identities calculated from the MSA used for the phylogenetic tree calculations. The lowest percentage identity observed was 52%, which is considerably high. Two main branches were observed, which separated γ-CAs from the class Campylobacteria from those from Aquificae. γ-TtkCA was observed as an outgroup, which was contemplated since it had relatively lower sequence identities to all the other γ-CAs. γ-CmCA, γ-LnCA, and γ-NpCA had sequence identities above 88% and formed a subgroup in the evolutionary tree. This was anticipated given that all three come from the same family: Nautiliacea. γ-EpCA was 91.3% similar to γ-SlCA as well as γ-SNbcCA, and the branch bootstrap value indicated that, 940 out of 1000 times, the same branch was observed. It is from this that we instigated that the unclassified bacterium Epsilonproteobacteria bacterium 4484_65 belongs to the Sulfurovaceae family and possibly the genus Sulfurovum.  Figure S1), which is known to shift between "open" and "closed" active sites because of the His on a long loop, suggests that this phenomenon might also be possible for the Campylobacteria γ-CAs. His67 was found on Motif 4, which formed part of the long loop. The end of this loop was found on Motif 9, which was absent in γ-PhCA, γ-PmCA, and γ-TtkCA due to their shorter loop. Overall, MSAs and motif analysis showed high levels of conservation across the hydrothermal vent γ-CA sequences.   The structure of the γ-CA is folded in such a way that two of the three sides of each monomeric prism have residues involved in the formation interface of the trimer. This gives rise to a considerable interface area, as observed in the results from PDBePISA [36], which revealed that between 18% and 26% of the total surface area is the interface ( Table  S3). The residues that form interactions in the interface were identified using various programs and are listed in Table 2, along with hotspot residues. Hotspot residues play a more significant role in the stability of the interface, which destabilizes in the event of their mutation to Ala [37]. Residues identified as hotspots whose function is known include the previously mentioned Arg43, His90, and Met107 as well as Tyr174 [1,10,16,24]. The proton shuttle, His67, as well as the other two Zn 2+ -coordinating His residues (His64 and His95) were also observed as interface residues in all the γ-CAs. Glu62 and Glu84 (Cam numbering), which are proton-shuttling residues in Cam, were also identified as interface residues. Met62, which is found close to the first Zn 2+ -coordinating His, was found as a hotspot residue in most of the γ-CAs. In γ-CmCA and γ-NpCA, inter-subunit hydrophobic interactions were identified using the Protein Interaction Calculator (PIC) web server [38], between this residue and the hotspot residue Met60 in the neighboring chain. For the rest of the Campylobacteria γ-CAs, Met62 formed hydrophobic interactions with the Leu residue in the same position as Met60. This Leu residue was also identified as a hotspot residue in these trimers. EcoCA-γ was an exception with Gly as the substitute residue, thus its Met62 residue equivalent, Met65, was only observed forming intra-subunit hydrophobic interactions with Met94. In Cam's structure, interface residue Phe8 was observed for each chain in all three interfaces. Further query of the interface interactions revealed that each Phe8 forms hydrophobic interactions with both neighboring Phe8 residues, i.e., Phe8 A -Phe B , Phe A -Phe C , and Phe B -Phe C , where the superscript letter indicates the chain. Interface residues identified in this section were further probed following MD simulations using average BC analysis. In the second tree constructed with 17 other sequences from organisms that are found in other environments ( Figure S2), most sequences from hydrothermal vents clustered together showing the close relationship amongst them. However, γ-TtkCA, although it showed high sequence identities to the hydrothermal vent sequences, clustered with γ-CAs from the Clostridium genus bacteria, which are mesophilic. The γ-CA from the crystallized Pyrococcus horikoshii (PDB ID: 1V3W, Cap) [15], also from a hydrothermal vent, was seen clustering with that from Brucella abortus, which has also been crystallized with PDB ID: 4N27 [31]. They had a considerably high sequence identity of 72%. This tree revealed a distant evolutionary relationship between the hydrothermal vent γ-CAs and Cam, supported by the low sequence identities.

Interface and Hotspot Residues Are Probed in the Trimeric Structures
All structures modeled had good z-DOPE scores below −1.32 and passed the 80% threshold in Verify3D [32] as shown in the validation results in Table S2. These results, along with those from ProSA [33] and PROCHECK [34], reflected native-like structures. The γ-CAs were modeled as the generic trimers, whose monomers are folded into a prism of short beta-sheets with loops in between them and a long C-terminal helix, which spans the length of the prism. Structures were viewed in PyMOL [35] and superimposition with EcoCA-γ, which confirmed the elongated loop similarity observed in the sequence alignment ( Figure S3). Cam's structure was also superimposed with these structures as well as γ-PhCA and γ-PmCA ( Figure S3), and the two main differences observed were the longer loop containing the proton-shuttling residue (Glu84, Cam numbering) as well as a long loop in the N-terminal, which was not present in all the other structures. An additional α-helix, absent in other structures, was also seen just before the start of the C-terminal long helix. For all subsequent experiments, crystal structures for Cam (PDB ID: 1QRG) and EcoCA-γ (PDB ID: 3TIO) were used.
The structure of the γ-CA is folded in such a way that two of the three sides of each monomeric prism have residues involved in the formation interface of the trimer. This gives rise to a considerable interface area, as observed in the results from PDBePISA [36], which revealed that between 18% and 26% of the total surface area is the interface (Table S3). The residues that form interactions in the interface were identified using various programs and are listed in Table 2, along with hotspot residues. Hotspot residues play a more significant role in the stability of the interface, which destabilizes in the event of their mutation to Ala [37]. Residues identified as hotspots whose function is known include the previously mentioned Arg43, His90, and Met107 as well as Tyr174 [1,10,16,24]. The proton shuttle, His67, as well as the other two Zn 2+ -coordinating His residues (His64 and His95) were also observed as interface residues in all the γ-CAs. Glu62 and Glu84 (Cam numbering), which are proton-shuttling residues in Cam, were also identified as interface residues. Met62, which is found close to the first Zn 2+ -coordinating His, was found as a hotspot residue in most of the γ-CAs. In γ-CmCA and γ-NpCA, inter-subunit hydrophobic interactions were identified using the Protein Interaction Calculator (PIC) web server [38], between this residue and the hotspot residue Met60 in the neighboring chain. For the rest of the Campylobacteria γ-CAs, Met62 formed hydrophobic interactions with the Leu residue in the same position as Met60. This Leu residue was also identified as a hotspot residue in these trimers. EcoCA-γ was an exception with Gly as the substitute residue, thus its Met62 residue equivalent, Met65, was only observed forming intra-subunit hydrophobic interactions with Met94. In Cam's structure, interface residue Phe8 was observed for each chain in all three interfaces. Further query of the interface interactions revealed that each Phe8 forms hydrophobic interactions with both neighboring Phe8 residues, i.e., Phe8 A -Phe B , Phe A -Phe C , and Phe B -Phe C , where the superscript letter indicates the chain. Interface residues identified in this section were further probed following MD simulations using average BC analysis. Table 2. Residues common to three out of five programs identified as participating in interface formation. Hotspot residues are in bold.

Residue Fluctuation and Functionality of High Communication Residues Is Investigated
This section aimed to analyze residue fluctuation patterns during the simulations as well as to utilize BC in discovering residues important in protein communication. On account of the inverse relationship between BC and RMSF [39], these two metrics were jointly plotted together for all the γ-CAs as heatmaps in Figure 4. Generally, the trimeric structures showed high levels of rigidity. This was expected given the abundance of beta sheets in the prisms. Highly flexible regions (shown by the darker regions) were observed in the termini mostly for the γ-CAs γ-HtCA, and γ-NpCA as well as γ-NsbCA. They were also observed in the N-termini of Cam, γ-EpCA, as well as both termini in all chains of EcoCA-γ, γ-SlCA, and γ-SNbcCA. The loop containing the proton-shuttling His residue (seen by the darker regions between residues 50 and 100) had distinctly high RMSF values in all the γ-CAs. All the flexible regions mentioned predictably coincided with low average BC regions. Residues that have a high average BC (shown by the darker regions in the average BC heatmaps) are known to be central in the residue network and have a considerably high amount of communication through them. Those with high fluctuations are not capable of functioning as a continuous central point in the network, thus they normally exhibit low average BC values. However, flexibility is a prerequisite for the operation of some functional residues, such as the proton-shuttling His in CAs. After narrowing down to the 5% residues with the highest average BC in each protein, these residues were plotted as heatmaps ( Figure 5). As expected, His67 was not part of the list for all the proteins. Some functionally important residues, however, were identified. Most of these were identified in Section 2.2.1 as residues in the interface, highlighted in green and a considerable number were hotspot residues, highlighted in red. One or more Zn 2+ -coordinating histidines (shown by the yellow dots, Figure 5) were observed as high communication residues across all γ-CAs, with the highest number being observed in EcoCA-γ. CO 2 -binding pocket residue Met107, which was a hotspot residue in some γ-CAs, also had high average BC values. Overall, the functions of most high BC residues were successfully annotated and correlated to low fluctuating residues. number were hotspot residues, highlighted in red. One or more Zn 2+ -coordinating histidines (shown by the yellow dots, Figure 5) were observed as high communication residues across all γ-CAs, with the highest number being observed in EcoCA-γ. CO2-binding pocket residue Met107, which was a hotspot residue in some γ-CAs, also had high average BC values. Overall, the functions of most high BC residues were successfully annotated and correlated to low fluctuating residues.

Conformational Changes Are Assessed Using RMSD and Rg Analysis
In order to monitor the conformational changes that occur in the γ-CAs during the MD simulations, radius of gyration (Rg) and root mean square deviation (RMSD) analyses were utilized and are represented using three different types of plots in Figures 6  and 7. Monitoring the evolution of these metrics using line graphs was done in order to observe when the structures attained equilibrium during the simulation as well as to see any distinct changes in Rg and RMSD. KDE and violin plots were plotted to view the distribution of structures with different Rg and RMSD values across the simulations. All γ-CAs from hydrothermal vents showed a stable RMSD, which was equilibrated for

Conformational Changes Are Assessed Using RMSD and R g Analysis
In order to monitor the conformational changes that occur in the γ-CAs during the MD simulations, radius of gyration (R g ) and root mean square deviation (RMSD) analyses were utilized and are represented using three different types of plots in Figures 6 and 7. Monitoring the evolution of these metrics using line graphs was done in order to observe when the structures attained equilibrium during the simulation as well as to see any distinct changes in R g and RMSD. KDE and violin plots were plotted to view the distribution of structures with different R g and RMSD values across the simulations. All γ-CAs from hydrothermal vents showed a stable RMSD, which was equilibrated for most of the simulation. This was anticipated at this temperature given that the hydrothermal vents they come from are high-temperature environments. Interestingly, the RMSD line graph for Cam showed a significant increase at 75 ns and then equilibrated up to 100 ns. Cam is moderately thermostable showing stability up to 55 • C [28]. However, unlike these organisms, E. coli is a mesophilic bacterium, but EcoCA-γ was observed to exhibit the most stable RMSD compared to the rest of the structures. Compactness was well-maintained for the γ-CAs throughout the simulations. The observations for both R g and RMSD were expected due to the rigidity of the trimeric structures. most of the simulation. This was anticipated at this temperature given that the hydrothermal vents they come from are high-temperature environments. Interestingly, the RMSD line graph for Cam showed a significant increase at 75 ns and then equilibrated up to 100 ns. Cam is moderately thermostable showing stability up to 55 °C [28]. However, unlike these organisms, E. coli is a mesophilic bacterium, but EcoCA-γ was observed to exhibit the most stable RMSD compared to the rest of the structures. Compactness was well-maintained for the γ-CAs throughout the simulations. The observations for both Rg and RMSD were expected due to the rigidity of the trimeric structures.

A Conformational Switch in Proton Shuttling His Close to the Active Site Is Observed
The mobility of the proton-shuttling residue in CAs was surmised due to the rotation towards and away from the active site during proton shuttling. However, coordination proximity to the metal ion and occlusion of the active site in γ-CAs was not contemplated until the crystallization of EcoCA-γ. This switch was investigated in this section for all the simulated γ-CAs. Figure 8 shows the distances between the His residue and Zn 2+ throughout the CA simulations performed. It was interesting to observe that in the Campylobacteria γ-CAs simulated, the proton-shuttling His was observed in one or more active sites to be in a coordination distance of 2.2 Å to the metal ion in a number of frames during the simulations [40]. EcoCA-γ, which was used a control for these observations also showed Zn 2+ coordination by its His69, as seen in the crystal structures. It was further observed in the results for Cam that one of the Glu residues responsible for proton shuttling, Glu62, indeed was in coordination distance to the Zn 2+ during the simulation. This corroborates with previous studies, which have indicated that Glu62 relays protons to Glu84 during catalysis [8]. Crystallized structures have shown this distance to be approximately 2.1 ± 0.3 Å [40]. Cam's Glu84, which was found in the position equivalent of His67 in the Campylobacteria γ-CAs as well as EcoCA-γ was not the Glu residue coordinating the Zn 2+ during the simulation. Glu84 was instead observed maintaining a distance of at least 4 Å away from the metal ion ( Figure S4) as indicated previously

A Conformational Switch in Proton Shuttling His Close to the Active Site Is Observed
The mobility of the proton-shuttling residue in CAs was surmised due to the rotation towards and away from the active site during proton shuttling. However, coordination proximity to the metal ion and occlusion of the active site in γ-CAs was not contemplated until the crystallization of EcoCA-γ. This switch was investigated in this section for all the simulated γ-CAs. Figure 8 shows the distances between the His residue and Zn 2+ throughout the CA simulations performed. It was interesting to observe that in the Campylobacteria γ-CAs simulated, the proton-shuttling His was observed in one or more active sites to be in a coordination distance of 2.2 Å to the metal ion in a number of frames during the simulations [40]. EcoCA-γ, which was used a control for these observations also showed Zn 2+ coordination by its His69, as seen in the crystal structures. It was further observed in the results for Cam that one of the Glu residues responsible for proton shuttling, Glu62, indeed was in coordination distance to the Zn 2+ during the simulation. This corroborates with previous studies, which have indicated that Glu62 relays protons to Glu84 during catalysis [8]. Crystallized structures have shown this distance to be approximately 2.1 ± 0.3 Å [40]. Cam's Glu84, which was found in the position equivalent of His67 in the Campylobacteria γ-CAs as well as EcoCA-γ was not the Glu residue coordinating the Zn 2+ during the simulation. Glu84 was instead observed maintaining a distance of at least 4 Å away from the metal ion ( Figure S4) as indicated previously [8]. Overall, these results reinforce the findings from crystallization of EcoCA-γ. They also support the hypothesis that the presence proton-shuttling residue on a long loop in proximity to the catalytic site in γ-CAs indicate their ability to switch between "open" and "closed" states. also support the hypothesis that the presence proton-shuttling residue on a long loop in proximity to the catalytic site in γ-CAs indicate their ability to switch between "open" and "closed" states.

High Residue Correlations Are Perceived Using Dynamic Cross Correlation Analysis
Dynamic cross correlation is an analysis used to investigate whether the pairwise movement of residues in a protein is correlated (in the same direction) or anti-correlated (in opposite directions). Residue correlated motions were investigated using dynamic cross correlation (DCC) analysis, and the matrices were plotted as heatmaps in Figure 9. Given the extent of trimer rigidity observed during RMSF analysis (Section 2.3.1) as well as the high number of residues identified in the interface, high residue correlated motions across the structures were expected. For all structures, reduced correlations were observed between residues in the loop containing the proton-shuttling His residue and those in the rest of the structure (shown by the lighter coloring between residues 50 and 100). This observation was supported by the low communication going through these residues in average BC analysis as well as their high fluctuations. Less concerted movements were also observed in the termini regions, which showed high RMSF values in

High Residue Correlations Are Perceived Using Dynamic Cross Correlation Analysis
Dynamic cross correlation is an analysis used to investigate whether the pairwise movement of residues in a protein is correlated (in the same direction) or anti-correlated (in opposite directions). Residue correlated motions were investigated using dynamic cross correlation (DCC) analysis, and the matrices were plotted as heatmaps in Figure 9. Given the extent of trimer rigidity observed during RMSF analysis (Section 2.3.1) as well as the high number of residues identified in the interface, high residue correlated motions across the structures were expected. For all structures, reduced correlations were observed between residues in the loop containing the proton-shuttling His residue and those in the rest of the structure (shown by the lighter coloring between residues 50 and 100). This observation was supported by the low communication going through these residues in average BC analysis as well as their high fluctuations. Less concerted movements were also observed in the termini regions, which showed high RMSF values in Section 2.3.1. DCC analysis for EcoCA-γ was corroborated with RMSF analysis, which showed low residue correlations across the protein. γ-HtCA also showed relatively lower positive correlated motions compared to the rest of the proteins, possibly implying a lower thermostability similar to EcoCA-γ. The highest correlations were observed for Cam and γ-LnCA as well as γ-NtCA, and the latter two are presumed to be as thermostable as the former if not more thermostable. DCC was therefore able to function as an additional analysis to assess γ-CA thermostability. showed low residue correlations across the protein. γ-HtCA also showed relatively lower positive correlated motions compared to the rest of the proteins, possibly implying a lower thermostability similar to EcoCA-γ. The highest correlations were observed for Cam and γ-LnCA as well as γ-NtCA, and the latter two are presumed to be as thermostable as the former if not more thermostable. DCC was therefore able to function as an additional analysis to assess γ-CA thermostability.

Sequence Retrieval and Alignment
The γ-CA from Caminibacter mediatlanticus was queried from NCBI and used based on the thermostability and efficiency of its α-CA, which has been previously studied [7,41]. One iteration of the Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) in NCBI was used to obtain a list of homologous γ-CAs. CAs from this list were confirmed through literature sources to have been isolated from hydrothermal vents. The NCBI taxonomy site was used to check the taxonomic class of each of these narrowing down the list to γ-CAs from the bacterial class, Campylobacteria.

Sequence Alignments and Motif Analysis
The γ-CA sequences for hydrothermal vent organisms Persephonella marina, Persephonella hydrogeniphila, and Thermosulfidibacter takaii, which were studied previously [24], were included in the sequence alignments as well as the motif analysis. The alignment programs used were ClustalOmega [53], multiple alignment using fast Fourier transform (MAFFT) FFT-NS-i [54], as well as tree-based consistency objective function for alignment evaluation (T-Coffee) [55,56] following which the result showing the best alignment of conserved residues was chosen. Conserved short repeating patterns, termed motifs, were searched across these γ-CA sequences using MEME [57] and filtered using an E-value threshold of 0.05. Those with E-values higher than the threshold were discarded.

Phylogenetic Analysis
For purposes of evaluating the evolutionary relationships amongst the retrieved sequences as well as Persephonella marina, Persephonella hydrogeniphila, and Thermosulfidibacter takaii γ-CAs, phylogenetic trees were constructed using the MEGA v7 [58]. Using the unaligned sequences as input, protein models for tree calculations were generated. Tree calculations proceeded using the first three models that showed the lowest Bayesian Information Criterion (BIC) values calculated using three gap deletions of 90%, 95%, and 100%. These models were the Whelan and Goldman model with gamma distribution (WAGG), LGG, and the Whelan and Goldman model with gamma distribution and invariant sites (WAGGI) with BIC values of 4040.1, 4043.3, and 4047.7, respectively, for the 100% and 95% gap deletions. For the 90% deletion, BIC values of 4069.0, 4073.8, and 4076.8 were recorded for the WAGG, LGGI, and WAGGI, respectively [30,59]. For each protein model, three trees were generated with different gap deletions of 90%, 95%, and 100%, respectively. The bootstrap method was used with a total of 1000 bootstraps for each tree. The initial tree was constructed using the neighbor joining (NJ/BioNJ) method, and the nearest-neighbor interchange (NNI) maximum likelihood statistical method was applied. Trees were viewed in Dendroscope v3.5.9 [60], and the tree which matched its respective consensus tree was chosen as the best.

Homology Modeling, Interface, and Hotspot Residue Identification
Template identification for purposes of homology modeling was carried out for all retrieved sequences without structures using PRIMO [65]. All sequences were at least 40% identical to the template, the γ-CA from Pyrococcus horikoshii (PDB ID: 1V3W, Cap) [15], and the template covered over 97% of the sequences. Modeling of the γ-CAs in their trimeric biological assemblies was performed using MODELLER v9.20 [66], following which the structures were validated using various programs. The first structure validation scoring used was the z-DOPE score. Structures with more negative scores below a threshold of −0.5 were considered more nativelike and for each sequence, the top 5 with the lowest z-DOPE scores proceeded for validation with PROCHECK [34], ProSA [33] and Verify3D [32] web servers. For Cam and EcoCA-γ, the crystal structures were retrieved from the PDB and used for subsequent calculations.
The interfaces of the proteins were probed for residues that contribute to stability of the trimers using five different web servers. Each γ-CA was individually queried in HotRegion [67]; the Knowledge-based Fade and Contacts (KFC) server [68]; PPCheck [69]; Protein Interfaces, Surfaces, and Assemblies (PDBePISA) [36]; and Robetta [37] web servers. Residue consideration for the interface was contingent on it appearing in at least three out of the five servers. HotRegion, KFC, PPCheck, and Robetta were further queried for hotspot residues, and a residue was considered if it appeared in at least three web server results.

Molecular Dynamics Simulations
Protein preparation for MD simulations involved protonation of the structures at pH 8 using H++ web server [70] as well as inferring previously generated and validated Zn 2+ parameters to ensure its maintenance in the active site [71]. An alkaline pH is required for metal carbonate precipitation during CO 2 sequestration, justifying the pH used in this study. The conversion of topology and coordinate files from AMBER- [72] to GROMACS- [73] compatible files was achieved by tleap [74] from the AmberTools20 [72]. The systems were solvated using the TIP3P cubic water box with a 10 Å box boundary, and the AMBER ff14SB force field [75] was applied. Minimization was performed using the steepest descent method, where termination was effectuated when the system reached a maximum force below 1000 kJ mol −1 nm −1 . Canonical ensemble (NVT) and isothermal/isobaric (NPT) equilibration were performed one after the other at 423 K using a 2 fs time step for 100 ps each. Simulations were then run on CHPC clusters using GROMACS v2016.1 [73] for 100 ns at 423 K for each protein, using up a total of 27,565 CPU hours. GROMACS was also used for calculation of R g and RMSD as well as RMSF for all trajectories. Visual molecular dynamics (VMD) [76] were utilized to monitor the bond distances between Zn 2+ and the proton-shuttling His residue (Glu residues in Cam) in all active sites for all trajectories, and Gnuplot v5.2 [77] was used to plot the distances as a function of time.

Average Betweenness Centrality Analysis
The dynamic residue networks (DRNs) in a protein are created using nodes, which are the residues and edges, which connect one node within a particular distance to another. For each protein, DRNs were generated in MD-TASK [27] using a 10-ps time step interval in the last 15 ns of the 100 ns trajectories. The C α atoms (C β for Gly) were utilized as nodes, and these were required to be within 6.7 Å of each other for it to be considered an edge. The shortest path between two residues/nodes was calculated using the calc_network.py script in MD-TASK for every residue pair, and in each frame, the number of shortest paths going through each node were summed up to provide it with a betweenness centrality (BC) value. The higher the BC, the more the paths passing through the node thus the more central it is in the DRN. BC values for each node were averaged using the avg_network.py script giving rise to the average BC, which was normalized between 0 and 1 for each γ-CA.

Dynamic Cross Correlation Analysis
MD-TASK [27] was utilized in the monitoring of the residue motion correlations in the γ-CAs using the calc_correlation.py script. Calculations were performed using the C α atoms (C β for Gly) for each residue over the 100 ns trajectories, generating pairwise correlation matrices. These were plotted as heatmaps using Python scripts.

Conclusions
The intriguing discovery of an alternative conformation of the active site in the γ-CA from E. coli motivated the present study, whose main aim was to detect a similar occurrence in γ-CAs from hydrothermal vent systems. The experiments around this investigation were built on the postulation that two features be found present in a γ-CA. These are (i) a proton-shuttling residue close to the active site and (ii) a long, flexible loop on which the proton shuttle residue resides. Similar to previously investigated γ-PhCA, γ-PmCA, and γ-TtkCA, a proton-shuttling residue was found in the Campylobacteria γ-CAs, but, unlike the three γ-CAs, the retrieved γ-CAs had a longer loop of between 14 and 15 residues. This was observed in the MSA analysis showing high conservation across the sequences, which were closely related, as witnessed in the evolutionary tree as well as the sequence identity heatmap. Cam, a well-characterized and crystallized γ-CA, also had the above-mentioned features. Following structure calculations, the retrieved γ-CAs as well as Cam and EcoCAγ were simulated at 423 K, and, as anticipated, a transitioning from "open" to "closed" was observed in one or more chains in all the proteins. Zn 2+ coordination by the first proton-shuttling residue, Glu62 in Cam was unexpected given that Glu84 was the protonshuttling residue structurally aligned with EcoCA-γ's His70. Behavior of the γ-CAs at the elevated temperature of 423 K was monitored post-simulation using R g , RMSD, RMSF as well as DCC analyses. EcoCA-γ displayed remarkable potential thermostability in all the analyses closely followed by γ-NtCA and γ-LnCA. The rest of the γ-CAs were relatively stable, although γ-HtCA and γ-NpCA showed lowered thermostability particularly with lowered residue correlations and slightly higher fluctuations in DCC and RMSF analyses, respectively. Motif analysis enabled the discovery of nine highly conserved motifs, eight of which harbored hotspot and interface residues. The importance of these residues was further substantiated by average BC analysis of the trajectories, where most of the top 5% residues were found in the interface. Zn 2+ -coordinating residues were also found as high communication residues. Overall, this work contributes to the knowledge pool of CAs through the discovery of more γ-CAs that exhibit an alternating active site like EcoCA-γ. Further research involving in vitro testing of the thermostability properties of EcoCA-γ as well as the simulated hydrothermal vent γ-CAs as prospective CO 2 sequestration agents would be highly crucial.