Exploring the Dynamics of Holo-Shikimate Kinase through Molecular Mechanics

: Understanding the connection between local and global dynamics can provide valuable insights into enzymatic function and may contribute to the development of novel strategies for enzyme modulation. In this work, we investigated the dynamics at both the global and local (active site) levels of Shikimate Kinase (SK) through microsecond time-scale molecular dynamics (MD) simulations of the holoenzyme in the product state. Our focus was on the wild-type (WT) enzyme and two mutants (R116A and R116K) which are known for their reduced catalytic activity. Through exploring the dynamics of these variants, we gained insights into the role of residue R116 and its contribution to overall SK dynamics. We argue that the connection between local and global dynamics can be a tt ributed to local frustration near the mutated residue which perturbs the global protein dynamics.


Introduction
Shikimate-3-phosphate (S3P) is a vital compound involved in the metabolism of pathogenic microorganisms which cause diseases such as tuberculosis and gastric ulcers [1][2][3][4]. Its production relies on the following chemical reaction, which is driven by the Shikimate Kinase (SK) enzyme:

⎯⎯
(1) structural data and molecular dynamics (MD) simulations in the ns time-scale [1,20,21]. The connectors of these two domains are the three regions of the stiffer CORE domain, namely the CORE I (residues 1-31), CORE II (residues 61-108), and CORE III (residue 124-162) regions. These domains together with the substrates, the Mg 2+ ion, and the flexible tail of the enzyme can be seen in Figure 1. In our previous work where we studied the dynamics of SK in the absence of substrates, we observed that mutations at the level of residue 116 (Arg) perturbed the overall dynamics of the enzyme [19]. Understanding how these perturbations are produced and their effects in catalysis is important not only for SK but for any other enzyme as protein dynamics is intimately linked to function [22]. For instance, as it is mentioned in this reference, it is known that currently one cannot predict enzyme functions based purely on static data of experimentally resolved structures.
Enzyme dynamics, both local (at the active site level) and global, play a critical role in facilitating enzymatic reactions and in determining the overall catalytic performance. Mutational studies provide a powerful tool to investigate the functional significance of specific residues within an enzyme. In this study, we focus on the Arg116 residue in SK due to its reported impact on catalytic efficiency upon mutation to either Ala or Lys [23]. In addition to this, it has been suggested that this residue is involved in expelling products from the active site [1]. Through comparing the dynamical behaviour of the WT enzyme with the Arg116 mutants, we aim to shed light on the underlying molecular mechanisms that contribute to the observed decrease in catalytic performance.
Here, we aim to expand the current understanding of Shikimate Kinase (SK) through conducting microsecond time-scale molecular dynamics (MD) simulations of the holoenzyme in the product state. We focus on three variants of the enzyme: the wild-type (WT) enzyme and two mutants (R116A and R116K) with significantly reduced catalytic activity.
In this work, the connection between local and global dynamics of the SK enzyme, is facilitated by using the concept of frustration [24]. Frustration arises from the observation that certain interactions in complex systems are not readily satisfied. It reflects the inherent challenge of achieving simultaneous optimization of multiple competing factors or constraints. In the context of enzymes' functioning, it has been found that catalytic sites are usually highly frustrated [25]. In addition to this, frustration of the substrate-product interactions plays a crucial role in modulating the overall behaviour of the system and influencing functional outcomes such as the facilitation of the rate-determining step [14,16].

Simulation Setup
The crystallographic structure of Helicobacter pylori SK was used in our simulations (PDB ID: 3MUF, at 2.3 Å resolution) [23], which was also used in previous studies [17][18][19]26]. An initial set of coordinates for the protein chain, substrates (with phosphoryl groups fully deprotonated at the product state), ions, and water molecules was obtained from the CHARMM-GUI [27] online server. The Mg 2+ ion was added manually. Force field parameters for atoms were obtained from the CHARMM-36 force field [28][29][30]. In the present simulations, the missing tail of the protein was included. All molecular components were assembled with the CHARMM (v. 45b1) built-in tools [31]. As in our previous study for the apoenzyme, we considered three variants of the SK, i.e., the wild type (WT) and the mutants R116A and R116K. The protein chains and substrates were padded with a 20 Å buffer, which included TIP3P waters [32] as well as Na + and Cl − ions (at 150 mM concentration), producing a cubic-shaped box.
An initial minimization procedure consisting of 50,000 steps was performed with the NAMD package (v. 2.14) [33] using harmonic constraints with a force constant of 1 kcal/(mol Å 2 ) applied on the heavy atoms of the protein, the substrates and the Mg 2+ ion. This procedure was followed by an equilibration phase in the NVT ensemble, keeping the harmonic constraints active, with the Langevin method at 303.15 K and using a time step of 2 fs. Then, the constraints were removed and a 600 ns simulation in the NPT ensemble was conducted using the Langevin piston thermostat [34] with a piston period of 50 fs, a piston decay of 25 fs, and a target pressure of 1 bar.
The final set of coordinates together with the NAMD force field parameters and topology files were converted to the equivalent GROMACS input files with the assistance of the ParmEd tool [35]. Similar values for the parameters of the particle mesh Ewald (PME) method [36,37], the van der Waals and LINCS algorithm [38,39] were used as in our previous study [19] with the GROMACS simulation package (v. 2021) [40][41][42][43][44][45][46]. The production runs were carried out in the NPT ensemble during ~4 s for each variant, using a 2 fs time step. The Parrinello-Rahman method [47] regulated the pressure with a coupling constant and compressibility of 5 ps and 4.51 × 10 −5 bar −1 , respectively, at atmospheric pressure. As for the temperature, it was regulated with the Nosé-Hoover thermostat.

Analysis
For the root mean square deviation (RMSD), root mean square fluctuation (RMSF), and the principal component analysis [48] (PCA), C, O, N, and C  atoms were considered. To obtain better statistics, three independent MD trajectories were employed for the calculation of the average RMSD values and standard deviation bars. The RMSD and RMSF values were computed with the VMD [49] built-in tools. The PCA was performed with the built-in tools of GROMACS. Trajectory frames were saved at intervals of 5000 steps. The first five residues of the protein chain, which displayed high fluctuations, were not considered in the analysis.
Regarding the Neural Relational Inference analysis [50,51], the model was trained during 600 epochs, achieving a mean square error (MSE) in the testing set of 0.068, 0.062, and 0.057 for the WT, R116A, and R116K cases, respectively. A default number of time steps per sample (50) was used.
The frustration analysis was performed with the standalone version of the Frus-tratometeR package [52,53] where the configurational model was selected with an electrostatic constant of 4.15. Also, a 5 Å cut-off was used for computations of interactions around the selected residue number 116. Here, frames saved every 0.002 s were employed.

Results and Discussion
The global dynamics of SK was monitored through the RMSD, RMSF, PC, and NRI analysis while the local dynamics at the active site was investigated with the assistance of the frustration analysis. Our findings are summarised in the following Sections 3.1-3.4.

RMSD and RMSF Analysis
The average RMSD curve of the WT variant shows higher stability than that of the mutants R116A and R116K during the first 1 s of the simulation (see Figure 2a). After that, the protein structures became more open, which can be associated with the product release step, but even in this case, the WT structure was more stable than those of the mutants, as revealed by the average RMSD values and standard deviation bars. The distribution of the RMSD values, displayed in Figure 2b, shows a bimodal behaviour for the WT variant. This behaviour, together with the standard deviation bars, tell us that the protein can explore different conformations between both open and closed states while it is bound to the substrates. In contrast to this, the WT apoenzyme exhibited a unimodal distribution skewed towards open states [19]. In the case of R116A, the distribution is also bimodal (see Figure 2c) but it is shifted towards higher values of the RMSD, suggesting a highly perturbed structure. Regarding R116K, the explored RMSD range is similar to that of the WT variant, but the distribution is unimodal; this can be seen in Figure 2d. The behaviour of the distributions in the mutant variants suggests that the dynamics of the WT holoenzyme is modified upon mutations, especially in the R116A variant.
We monitored the dynamics of the different regions for the protein structure of the three variants through the RMSF values in Figure 3. The largest RMSF values for the WT variant, in the present holo form, were observed in similar regions to a previous study for the apo structure, i.e., the LID, SBD, and the 310 feature in the CORE II domains [19], which acts as an energetic counterweight [8].
The R116A mutation destabilises the initial holo structure drastically. Regarding R116K variant, the fluctuations were of the same order of magnitude than those of the WT case. The main differences were in the fluctuations of the SB domain, which were quenched, but those in the LID domain increased for this mutant.

Principal Component Analysis (PCA)
A PCA was performed on the WT enzyme, revealing the presence of two distinct clusters along the PC1 axis in a relatively compact space region, representing the open and closed conformations which are separated by the dashed (orange) line in Figure 4a. In contrast, the mutant R116A displayed a broader range of conformational states, indicating destabilisation compared to the apoenzyme [19] (see Figure 4b). Conversely, the mutant R116K showed a reversed stabilising effect with respect to the apoenzyme structure (Figure 4c). . The R116K structure is stabilised with respect to the apoenzyme structure (c). In (d), the residues' contributions to the first eigenvector are plotted for the three variants. Here, we see that the regions with the largest fluctuations are the LID region (in red), the SB region (in orange), and the region around the 310 helix, which is part of the CORE domain. Figure 4d illustrates the contribution of each residue to the eigenvector with the highest variance. Notably, similar regions, including the SB, 310, and LID regions, made significant contributions in all three variants. However, in the mutants, the LID emerged as the major contributor, while in the WT, it was the region around the 310 helix. This region also encompasses the P-loop, which is responsible for maintaining the proper positioning of the -phosphoryl group [21].
The projection of the WT case trajectory onto the first eigenvector in Figure 5 reveals the involvement of the SB and LID domains, which were the major players in the first principal component in the apoenzyme state, in the opening and closing motion. However, in the present case, additional key contributors to this principal component included the P-loop (GFMGSGKSS), the Walker B motif (VISTGG) [20], and the 310 helix (GIVMH). This suggests that these elements play significant roles in shaping the observed conformational changes and dynamics of the enzyme.

Neural Relational Inference Analysis (NRI)
Higher-order interaction networks were investigated using the Neural Relational Inference (NRI) analysis [50,51], which employs a graph neural network trained with an encoder-decoder architecture that learns the latent space of molecular dynamics (MD) trajectory data and captures long-range interactions. By using this analysis we obtained the distribution of learned edges between residues in the main regions which can be visualised in Figure 6a-c for the WT and the mutants R116A and R116K, respectively. In Figure  6d-f, the contributions from individual residues are aggregated into per-domain probabilities.
One can notice that in the case of the WT variant, in Figure 6a, the SB and LID domains exhibit the highest level of activity receiving contributions from all other domains. The CORE III domain also displays interactions from other domains to a lesser extent. This observation is further supported by the domain interaction map shown in Figure 6d for the WT case. A more pictorial representation of these results is achieved through directed graphs in Figure S1, where the connection patterns among the protein regions are shown.
In the R116A mutant, the interactions with the SB domain are weakened, whereas the interactions with the LID domain are strengthened, as depicted in Figure 6b,e. The corresponding directed graph for this mutant can be seen in Figure S1b. Regarding the R116K mutant, the interdomain interactions of both the SB and LID domains are mostly weakened, and the CORE III domain emerges as the most interactive domain. Notably, the interdomain interaction between CORE III and SB domains, observed in the WT case, is also present in the R116K mutant, as illustrated in Figure 6c,f and also Figure S1c.

Frustration Analysis
As we have already seen, both the apo and holoenzymes undergo alterations in their dynamics compared to the WT enzyme upon mutations. This phenomenon can be attributed to significant changes occurring in the local environment surrounding the mutated residue. To investigate the local environment of R116, we performed local frustration analysis. Frustration is a concept that arises when certain interactions in the protein interfere with one another, leading to conflicts [54]. In this way, a local minimally frustrated structure indicates a state where most of the local interactions are well-satisfied, while a highly frustrated structure indicates a state where most of the local interactions are in conflict. Between these extremes lies a neutrally frustrated structure, which represents an intermediate state. In more physical terms, minimally frustrated regions exhibit well-defined structures while highly frustrated regions tend to be more flexible [55].
Our analysis revealed a notable degree of minimal frustration within the local environment of R116 for the WT case, this can be seen in Figure 7a. This feature is more relevant in the time range below 1 s (indicated by the vertical dashed line), during which the substrates maintain a well-positioned orientation in the binding pocket. However, beyond this timeframe, the local environment exhibits increased flexibility. Conversely, R116A exhibits a substantial degree of frustration within its local environment as shown in Figure  7b. Remarkably, even though the structure of R116K fluctuates in a manner similar to the WT case, as indicated by the RMSF results in Figure 3, its local environment exhibits a higher degree of frustration than in the WT case (see Figure 7c). Thus, we argue that a locally frustrated region, in the mutants' case, results in different local dynamical motions that disrupt the coordination of the global dynamical motions of the WT protein structure ( Figure S1). These findings emphasise the influential role played by the local environment in governing the global dynamics of the enzyme and suggest that the evolutionary process has finely tuned the local environment of SK to optimise its functionality. Also, because disruption of interaction networks upon mutations has been observed in other enzymes, such as AdK [15], we argue that local frustration can be a general phenomenon in enzymes.
The different local and global dynamics of the mutants can induce different "chewing" motions on the substrates (including reactant, transition, and product states) than in the WT case, which can in turn contribute to their reduced catalytic activity. For instance, in the present study, we found that SK opens faster in the mutants than in the WT, which affects the product release step. Remarkably, a faster opening event upon mutations was recently observed in AdK [15]. In addition to this, the "chewing" motions in the mutants can create non-optimal orientations for residues during catalysis.
As in the apoenzyme case, we found that the choreography of the motions in the WT holoenzyme is destroyed for the studied mutant cases [5,19].

Conclusions
Our analysis of the RMSD curves and RMSF profiles revealed distinct dynamics and structural differences between the wild-type (WT) Shikimate Kinase (SK) variant and the R116A and R116K mutants. The WT variant exhibited higher stability throughout the whole simulation compared to the mutants. The distribution of RMSD values indicated a bimodal behaviour in the WT variant, suggesting transitions between open and closed states. In the case of the R116A mutant, the bimodal distribution was shifted towards higher RMSD values. As for the R116K mutant, the distribution was unimodal and skewed towards open states. These findings indicate that the dynamics of the WT holoenzyme are modified when site mutations are introduced, particularly in the R116A variant.
Furthermore, our principal component analysis (PCA) revealed clusters corresponding to open and closed conformations in the WT enzyme in a narrow conformational space, while the mutants displayed broader conformational spaces. The analysis of the per-residue contributions to the largest variance eigenvector showed that similar regions are involved in the dynamics of all variants, including the SB, 310, and LID regions. However, the major contributions differed, with the mutants primarily influenced by the LID region and the WT variant influenced by the region around the 310 helix, including the Ploop.
Additionally, our study employed NRI analysis to investigate higher-order interaction networks. The WT variant demonstrated extensive interactions, with the SB and LID domains being the most active and receiving contributions from other domains. In contrast, the R116A mutation weakened SB interactions while strengthening LID interactions. The R116K mutation disrupted interdomain interactions, with the CORE III domain becoming the most interactive region.
Lastly, our analysis of the local environment around residue R116 using frustration analysis revealed that the WT variant exhibited a high degree of minimal frustration, indicating well-satisfied local interactions optimised via evolution. In contrast, the R116A and R116K variants displayed high levels of frustration, suggesting conflicts among local interactions. We argue that these local conflicts, in the mutants' case, disrupt crucial contacts around the mutational site (located at the LID domain), resulting in different local dynamics, and that this, in turn, causes the difference in the global enzyme dynamics with respect to the WT case. This highlights the significant role of the local environment in modulating enzyme dynamics and further emphasises the importance of considering both local and global factors in understanding enzymatic behaviour.
In conclusion, our analysis of the dynamics and interactions in the WT and mutant variants of SK provides insights into the structural and functional consequences of specific mutations. These findings contribute to a deeper understanding of the interplay between residue dynamics, conformational transitions, and higher-order interactions in SK. We found that the choreography of these interactions is disrupted for the mutants considered in this study. The knowledge gained from it can facilitate the design of novel strategies for modulating enzyme activity and guide future investigations aimed at optimising enzymatic performance for various applications. For instance, our study suggests that one needs to consider the dynamics of the enzyme as a major player in enzyme tuning and design.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biophysica3030030/s1, Figure S1: Directed graphs obtained from a 0.6 probability threshold with the Neural Relational Inference (NRI) method for the WT, R116A, and R116K variants in (a-c), respectively.
Funding: This research received no external funding.
Acknowledgments: This research was conducted using the resources of High Performance Computing Center North (HPC2N). NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign (http://www.ks.uiuc.edu/Research/namd/, accessed date 1 January 2021).

Conflicts of Interest:
The authors declare no conflict of interest.