Effects of Colored Noise in the Dynamic Motions and Conformational Exploration of Enzymes

: The intracellular environment displays complex dynamics influenced by factors such as molecular crowding and the low Reynolds number of the cytoplasm. Enzymes exhibiting active matter properties further heighten this complexity which can lead to memory effects. Molecular simulations often neglect these factors, treating the environment as a “thermal bath” using the Langevin equation (LE) with white noise. One way to consider these factors is by using colored noise instead within the generalized Langevin equation (GLE) framework, which allows for the incorporation of memory effects that have been observed in experimental data. We investigated the structural and dynamic differences in Shikimate kinase (SK) using LE and GLE simulations. Our results suggest that GLE simulations, which reveal significant changes, could be utilized for assessing conformational motions’ impact on catalytic reactions.


Introduction
The intracellular physical environment exhibits complex dynamics due to various factors, including crowding components and the low Reynolds number of the cytoplasm, among others [1,2].Furthermore, this complexity is heightened by certain components, such as enzymes [3,4], which display active matter properties, meaning they possess internal driving forces [5].Within this environment, a single molecule, for instance, a protein chain, is subject to continuous collisions with other components.Collectively, these factors can induce correlations between the motions of any molecule and its surroundings, resulting in memory effects [3,6].
This physical environment is generally not fully captured by molecular simulations in which some of the aforementioned factors are often neglected, such as the effects of the surrounding molecules, except for possibly water and ions.A standard approach to simulate the effects of the surrounding environment, viewed as a "thermal bath", is through the Langevin equation (LE) with an external random force [7][8][9][10].This random force is typically modeled as "white noise" where it is assumed that the interactions of the system and thermal bath are uncorrelated over time [10,11].However, experimental observations indicate that the true physical environment plays a more complex role, for instance, in proteins, in which correlations exist among certain degrees of freedom, which are reflected in the behavior of time-correlation functions [12][13][14].
The investigation of protein dynamics, particularly in enzymes, is critically important due to its potential role in catalysis-a topic that remains elusive [26,27].The influence of the fluctuating environment on dynamics is typically incorporated into simulation studies via white noise [28,29], wherein, as mentioned above, the correlations are neglected a priori.Correlations can be crucial for enzymatic reactions as time-dependent observables, such as rate constants, display a wide range of variation [12].For instance, these correlations can influence the hierarchy of functional motions [27] relevant during catalysis.Previous work reported the uncoupling of dynamic motions and catalysis [28]; however, the results were based on a specific type of noise satisfying the fluctuation-dissipation theorem (FDT).
In this work, we investigated the structural and dynamic differences in the Shikimate kinase (SK) enzyme when LE and GLE simulations are employed.SK is an enzyme involved in the production of chorismate, an essential compound for the functioning of pathogenic bacteria [30].The simulation lengths considered in this work align with the timescales captured in current quantum mechanical and molecular mechanical (QM/MM) simulations [31].Given that the LE and GLE simulations displayed structural and dynamic changes in SK, we propose that GLE simulations could be utilized to assess the influence of conformational motions on catalytic reactions in QM studies.

Materials and Methods
In this section, we outline the methodology employed in this work, specifically the Langevin equation (Section 2.1) and the generalized Langevin equation (Section 2.2), as well as the protocol followed for conducting the simulations (Section 2.3).

Langevin Equation (LE)
The LE is used to simulate the dynamics of a particle i, with a moment p i , in an environment modeled by a Stokes term with a friction coefficient γ. f i and ζ i are the deterministic and random forces acting on the particle, respectively, and expressed as follows: .
In the LE, it is assumed that fluctuations from the heat bath, which is kept at a temperature T, occur on a shorter time scale than those of the particle itself [10,11,25].Because of this, the former fluctuations can be modeled as white noise.This noise satisfies the following expression, which agrees with the FDT [9,32,33]: where k B is the Boltzmann constant, m i is the particle's mass, and δ ij is the Kronecker delta.The Dirac delta function δ(t − t ′ ) reflects the fact that the impacts from the environment are almost instantaneous, and therefore they are not correlated in time.

Generalized Langevin Equation (GLE)
When the motions of the molecules in the system are correlated in time with the thermal fluctuations, the dynamics is better described using the GLE [25,34] as follows: .
A key distinction between this expression and that of the LE (Equation (1)) lies in the friction term, as the former incorporates the history of the momentum weighted by a kernel function, as detailed in Ref. [35].
A similar kernel was used previously by Ceriotti et al. in the context of thermostats for MD [11].Here, t L is the local average memory time that is used to smooth out high frequency while increasing low frequency fluctuations in the system.λ denotes the intensity of the guiding frictional force.Notably, this kernel contains the Ornstein-Uhlenbeck term (second term in Equation ( 4)), which vanishes when one considers long-time averages ( t L → ∞ ) or minimal guiding force contributions ( λ → 0 ).The colored noise, η i , also satisfies the FDT, akin to Equation (2) for LE, but it is related to the kernel in Equation (4) instead.
The fact that noise types described by Equations ( 2) and ( 5) satisfy the FDT guarantees that the systems are in equilibrium.Previous studies have employed various noise models to elucidate single-molecule experimental results [12][13][14].However, it is important to note that colored noise is not limited to the type described here (Equations ( 4) and ( 5)); other types can also be explored that may lead to systems out of equilibrium.

Setting up the Simulations
SK from Helicobacter pylory (PDB code 3MUF, structure resolved at 2.3 Å) was used as a test case.It was studied previously through extensive MD simulations by one of the authors [36,37].The protein structure was solvated with TIP3P [38] water molecules and salt, consisting of Na + and Cl − ions, at a concentration of 150 mM, resulting in a 66 Å 3 cubic box.The setup of the simulation box and the input files' generation for MD were carried out on the web-based graphical user interface for CHARMM [39] using the CHARMM-27 force field set of parameters [40][41][42].Short-range interactions for vdW and electrostatic terms were computed explicitly up to a 12 Å cutoff distance.The former term was truncated by using a force-switching approach [43] starting at a 10 Å distance, while in the latter, the long-range interactions were solved with the fourth-order interpolation particle mesh Ewald (PME) method [44,45] with an FFT grid size of 72 in all directions.Simulations were conducted with the AMBER package (version 2022, San Francisco, CA, USA) [46].
The system was initially minimized during 10 × 10 4 steps, with the first 2500 cycles utilizing the steepest-descent method and the remaining steps employing the conjugate gradient method.After minimization, an equilibration procedure was performed in the NVT ensemble at 303.15 K using the Langevin dynamics thermostat with a friction coefficient of 1 ps −1 during 1.25 × 10 6 steps (1 fs time step).Hydrogen bonds were constraint with the SHAKE algorithm [47] for the protein structure and the SETTLE algorithm [48] for the water molecules.Harmonic positional restraints of 1 kcal mol −1 Å −2 on protein atoms were applied during the minimization and equilibration steps.
The final structure of the equilibration step was used as the initial structure for data production for both LE and GLE simulations.In the former, Langevin dynamics was assessed in the NVT ensemble during 600 ns with similar MD parameters as in the equilibration step but with a larger time step of 2fs.Regarding GLE simulations, a local averaging time (t L ) of 0.2 ps and a momentum guiding factor (λ) of 1.0 were used.These simulations were also conducted in the NVT ensemble with similar parameters for the Langevin dynamics as in the LE case.Frames were saved every 0.02 ns for analysis with the first 100 ns being skipped and considered part of the equilibration step.In total, 3× repetitions for each case, LE and GLE, were run to obtain standard deviation bars.

Analysis of Simulations
Principal component analysis (PCA) [49] was performed by selecting the H, C, O, N, and C α atoms with the AmberTools 2022 [50].The first two PC values were used to generate 2D histograms of counts (heatmaps) that provide information on the explored conformational space using a bin size of 1 Å 2 .The ratios of areas were calculated by taking the number of bins explored divided by the total displayed area of 125 × 97 Å 2 .The root mean square fluctuation (RMSF) was computed with in-house Tcl scripts for VMD software (version 1.9.4,Urbana-Champaign, IL, USA) [51] based on the C α atoms.
An interaction network was obtained, for both LE and GLE simulations, using an in-house Julia script (Code S1 in Supplementary Material), where C α atoms formed a link if they lay within a cutoff distance of 7 Å, as suggested in previous works [52][53][54].The weights for the interacting links were computed by taking the ratio of the frequency of link formation during the trajectory and the total number of frames.Based on these two computed networks an alluvial diagram was generated to compare the structural differences between them with Infomap (version 2.6.1,Umeå University, Umeå, Sweden), for community detection and then applying the mapping change algorithm [55][56][57][58].In total, 100 trials with the two-level algorithm were used to obtain network communities [59].Alluvial diagrams facilitate the visual comparison of the community structures across different networks.Infomap communities and alluvial diagrams were generated by using the online alluvial generator [60].
The normalized autocorrelation function (NACF) was computed according to the following equation (Code S2 in Supplementary Material): Here, we consider the distance between the center of masses (COMs) of the LID (residues numbers 109-123), r LID COM , and SB (residues numbers 32-60), r SBD COM , regions, where d(t) = r LID com (t) − r SBD com (t) , δd(t) = d(t) − ⟨d⟩, and ⟨• • • ⟩ denote time averages.The LID and SB domains, together with the link d connecting their COMs, are shown in the inset of Figure 4.

Results and Discussion
In the following subsections, we describe the structural and dynamic differences that were observed in the simulations for the LE and GLE simulations of SK.

Structural Modifications of SK in the Simulations with White and Colored Noise
Previous studies have indicated that the SB and LID domains of Shikimate kinase exhibit a high degree of fluctuations with and without substrates [36,37].Changes in the fluctuations in these domains are important, as they may be linked to catalysis not only in Shikimate kinase [61] but also in other protein kinases such as the topologically similar [62] adenylate kinase enzyme [63].
Our observations indicate an increase in RMSFs in the SB region of SK for the GLE simulations compared to the LE simulations (see Figure 1).Conversely, the LID domain exhibited a slight decrease in fluctuations.Because both domains are involved in the catalytic step, through the binding of the substrates, we argue that the changes in their motion can be used to assess the influence of dynamics on the catalysis through QM-free energy simulations.This will be discussed in more detail in Section 3.3.
LEU118 from the LID domain was split into two community blocks in the GLE case.Thi was also the situation for the LE community of residues SER36, ASP30, and ILE35, which were in the SB domain.This reveals that the interaction network of residues in the LID and SB domains is modified in the GLE simulation compared to the LE case.Taken to gether, the alluvial diagram allowed us to find out where new interactions appeared in the protein networks and to visually present the differences in both networks.

Behavior of Time-Dependent Observables in White and Colored Noise Simulations
Besides the structural analysis, dynamic properties (in terms of NACFs) were com puted.Because the LID and SB domains showed the largest RMSFs (see Figure 1) in the LE and GLE simulations, we computed the NACF of the distance between the COMs o these domains; the results are plotted in Figure 4.The plot shows that up to 10 ns, both

Differences in the Explored of the Conformational Space with the Two Types of Noise
The exploration of the conformational space was monitored through heatmaps, which were built upon the first two PCs; the results are shown in Figure 2. It was noticed that, in the LE case, the sampled conformational space was smaller (39.6% area ratio) than in the GLE case (48.9% area ratio).This was expected as the original purpose of the GLE method was to enhance the exploration of the conformational space.
In the LE simulations (see Figure 2a), four conformational basins can be distinguished, where the protein stays trapped most of the time.However, in GLE simulations (see Figure 2b), only one basin, around the (25,15) bin, is observed, which corresponds to the initial conformation.In this case, after the GLE gathers enough information on the low-frequency fluctuations, it escapes this basin and explores the conformational space more efficiently than the LE simulation.For systems with a few degrees of freedom, the sampled distributions for both LE and GLE simulations should converge to the canonical distribution, but in more complex systems, such as proteins, the relaxation process could extend over long time scales, and the distribution could differ in practical MD [64] and even more in QM simulations.
The difference in the conformational ensemble was also monitored through the analysis of residue network interactions by using alluvial diagrams.They allow for the detection of structural changes between networks [56,60].In the alluvial diagram, the two interaction networks from LE and GLE simulations were compared, and the results can be seen in Figure 3.
ing that a higher degree of correlation was maintained in the latter.We propose that the difference in these correlation rates between LE and GLE simulations may be used to assess the coupling between enzyme dynamics and the catalytic reaction in QM simulations.This may be achieved, for instance, by running both simulations and noticing the differences between energetics (through free energy differences) and dynamics (through the computation of correlation functions).In this diagram, communities are depicted as blocks (in different colors).When a community block from one network maps to another, it indicates that the two communities share the same network nodes.Otherwise, when a community block in the first network splits into two (or even more) community blocks in the second network, it means that some nodes from the original community interact more in the newly formed community from the second network.Thus, splitting reveals changes in the structure of the networks.
We noticed that the LE community formed by the residues ALA124, ARG112, and LEU118 from the LID domain was split into two community blocks in the GLE case.This was also the situation for the LE community of residues SER36, ASP30, and ILE35, which were in the SB domain.This reveals that the interaction network of residues in the LID and SB domains is modified in the GLE simulation compared to the LE case.Taken together, the alluvial diagram allowed us to find out where new interactions appeared in the protein networks and to visually present the differences in both networks.

Behavior of Time-Dependent Observables in White and Colored Noise Simulations
Besides the structural analysis, dynamic properties (in terms of NACFs) were computed.Because the LID and SB domains showed the largest RMSFs (see Figure 1) in the LE and GLE simulations, we computed the NACF of the distance between the COMs of these domains; the results are plotted in Figure 4.The plot shows that up to 10 ns, both systems behaved in a similar manner, with a slightly slower decaying behavior in GLE simulations.However, for longer times, they started diverging.It can also be noticed that beyond 100 ns, the NACF of the LE case decayed faster than that of the GLE case, indicating that a higher degree of correlation was maintained in the latter.We propose that the difference in these correlation rates between LE and GLE simulations may be used to assess the coupling between enzyme dynamics and the catalytic reaction in QM simulations.This may be achieved, for instance, by running both simulations and noticing the differences between energetics (through free energy differences) and dynamics (through the computation of correlation functions).In these simulations, we observed differences in the NACFs between the LE and GLE cases, where the noise changed from white to colored.Now the question is whether there is any physical basis for colored noise.In experiments, colored noise was found to explain time-dependent correlation functions in enzymatic turnovers [12-14].Also, computational studies suggest that the time-dependent transmission coefficient of the unfolded- In these simulations, we observed differences in the NACFs between the LE and GLE cases, where the noise changed from white to colored.Now the question is whether there is any physical basis for colored noise.In experiments, colored noise was found to explain time-dependent correlation functions in enzymatic turnovers [12][13][14].Also, computational studies suggest that the time-dependent transmission coefficient of the unfolded-misfolded amyloid-β [23] and the viscosity dependence of folding-unfolding rates in proteins [21] are better described by using a colored noise in the GLE.Thus, there is already evidence supporting the idea that colored noise is present in physical systems such as enzymes.

Conclusions
In this study, we investigated the structural and dynamic differences in the Shikimate kinase (SK) enzyme when simulated using both the traditional Langevin equation (LE) with white noise and the generalized Langevin equation (GLE) with colored noise.The primary objective was to understand how incorporating memory effects, as opposed to treating the environment as a thermal bath with uncorrelated interactions, influences the structural and dynamic behavior of enzyme systems.By employing colored noise, which accounts for time-correlated interactions, the GLE can provide a more realistic model of the intracellular environment.
The simulations revealed notable structural and dynamic differences in SK between the LE and GLE approaches.Specifically, compared to the LE case, GLE simulations exhibited enhanced RMSFs in the SB domain, which is responsible for substrate binding during catalysis.The difference in the community networks of LE and GLE cases, monitored through alluvial diagrams, showed that mainly the residues in the SB and LID domains exhibited different interactions.Also, the NACF for the distance between the SB and LID domains displayed noticeable differences beyond 10 ns, which is a time scale achievable in QM simulations.
The ability of GLE simulations to capture these differences in the simulations suggests their potential utility in quantum mechanical and molecular mechanical (QM/MM) studies, where understanding the influence of conformational motions on catalytic events is still elusive.As opposed to using restraints on predefined collective variables in a QM/MM simulation to modify the behavior protein motions, one could run GLE simulations where those variables are not necessary.
The structural changes observed in the SK enzyme under GLE conditions indicate that traditional LE simulations may overlook critical dynamic aspects that are essential for enzymatic function.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foundations4030021/s1.Code S1: Julia script for computing an interaction network based on the C α atoms of a protein by using a cutoff distance of 7 Å.Code S2: Python script for the calculation of the NACFs based on the COM distances between two protein domains.

Figure 1 .
Figure 1.RMSF analysis for the LE and GLE simulations shows that the main differences in th fluctuations come from the LID and SB domains.Fluctuations are increased in the LID domain and slightly decreased in the GLE case compared to the LE simulations.

Figure 1 .
Figure 1.RMSF analysis for the LE and GLE simulations shows that the main differences in the fluctuations come from the LID and SB domains.Fluctuations are increased in the LID domain and slightly decreased in the GLE case compared to the LE simulations.

Figure 2 .
Figure 2. Heatmap for LE (a) and GLE (b) simulations built upon the first two PCs.The sampled ensemble in the GLE case is broader than the LE case, but it is also more localized around the bin(25,15).The boundary of the explored conformational space is indicated by the red line.

Figure 2 .
Figure 2. Heatmap for LE (a) and GLE (b) simulations built upon the first two PCs.The sampled ensemble in the GLE case is broader than the LE case, but it is also more localized around the bin(25,15).The boundary of the explored conformational space is indicated by the red line.

Figure 3 .
Figure 3. Alluvial diagram displaying the difference in the communities for the LE and GLE cases.LE and GLE networks and their respective communities were compared using the mapping change algorithm, which compares the structural changes between communities in both networks.Vertical blocks in the alluvial diagram represent identified communities or modules detected by Infomap.In each column, each distinct community is assigned a unique color to visually differentiate it from

Figure 3 .
Figure 3. Alluvial diagram displaying the difference in the communities for the LE and GLE cases.LE and GLE networks and their respective communities were compared using the mapping change algorithm, which compares the structural changes between in both networks.Vertical blocks in the alluvial diagram represent identified communities or modules detected by Infomap.In each column, each distinct community is assigned a unique color to visually differentiate it from others.Where flow streams merge or split, blending of colors indicates nodes moving between communities.

Foundations 2024, 4 ,
FOR PEERREVIEW  8    others.Where flow streams merge or split, blending of colors indicates nodes moving between communities.

Figure 4 .
Figure 4. NACFs of distance between the com of the LID and SB domains (inset) for LE and GLE cases.NACFs displayed strong divergence after 10 ns for both cases (dashed green line).

Figure 4 .
Figure 4. NACFs of distance between the com of the LID and SB domains (inset) for LE and GLE cases.NACFs displayed strong divergence after 10 ns for both cases (dashed green line).