Cross-Correlated Motions in Azidolysozyme

The changes in the local and global dynamics of azide-labelled lysozyme compared with that of the wild type protein are quantitatively assessed for all alanine residues along the polypeptide chain. Although attaching -N3 to alanine residues has been considered to be a minimally invasive change in the protein it is found that depending on the location of the alanine residue, the local and global changes in the dynamics differ. For Ala92, the change in the cross-correlated motions are minimal, whereas attaching -N3 to Ala90 leads to pronounced differences in the local and global correlations as quantified by the cross-correlation coefficients of the Cα atoms. We also demonstrate that the spectral region of the asymmetric azide stretch distinguishes between alanine attachment sites, whereas changes in the low frequency, far-infrared region are less characteristic.


Introduction
To characterize cellular processes at a molecular level, the structure and dynamics of proteins needs to be understood. Such knowledge is also valuable to the direct development and improvement of pharmaceutically active ligands in drug design efforts [1][2][3]. Optical spectroscopy is one possibility to capture the structural and functional dynamics of proteins in the condensed phase. One of the great challenges is to probe site-selective dynamics. This is required in order to specifically target protein sites that are responsible for function.
During the past 10 years, a range of non-natural small molecule modifications to proteins has been proposed. They include-but are not limited to-attaching nitrile to amino acids [4], using the sulfhydryl band of cysteines [5], cyano [6] groups, nitrile labels [7], and complexation with SCN [8], cyanophenylalanine [9], or cyanamide [10]. For lysozme, ruthenium carbonyl complexes have also been shown to provide an understanding of the water dynamics from 2D-infrared experiments [11][12][13]. In this case, it has been explicitly demonstrated that dynamic hydration extends over distances 20 Å away from the protein surface, which is consistent with recent simulations on hydrated Hb [14,15].
Based on recent studies [16] the vibrational dynamics of N − 3 in the gas phase and in solution can be captured quantitatively [17]. Moreover, azidoalanine (AlaN 3 ) is one of the ideal labels that has been shown to be positionally sensitive to probe for local dynamics [16]. AlaN 3 has a comparatively large extinction coefficient, and it absorbs around ∼2100 cm −1 . The incorporation of N − 3 to alanine (Ala) is technically feasible and leads to small perturbations [18]. Thus, AlaN 3 , as also shown previously [16], is an ideal modification to study local protein dynamics.
The present work assesses the structural dynamics within a conformational substate and on time scales (2 ns) for which the 1D-and 2D-IR spectroscopy was found to differ for the spectroscopic probe attached to all alanine residues of human lysozyme [16]. In addition, the structural dynamics of the WT and two selected mutants are characterized on considerably longer (100 ns) time scales. First, the methods are discussed. Next, results on the root mean squared fluctuations and the dynamical cross-correlation maps are presented.
Finally, the spectroscopy in the low-frequency and around the asymmetric azide stretch vibration is considered, and our conclusions are drawn.

Molecular Dynamics Simulation
Molecular dynamics (MD) simulations of WT and all AlaN 3 -labelled proteins were conducted using the CHARMM [19] force field. For the simulations with the multi-dimensional RKHS PES for the spectroscopic probe, a suitably interfaced with the CHARMM program [20] was employed [17]. MD simulations were performed with the TIP3P water [21] model in a cubic box of size (62.1) 3 Å 3 . Figure 1 represents the lysozyme structure used in the current study with all modified Ala residues. The initial x-ray structure corresponds to WT human lysozyme (3FE0 [22]). First, the system is minimized followed by heating and equilibration for 100 ps. Then, production runs 2 ns in length were conducted for all 14 AlaN 3 labels in the NVE ensemble. The total energy is stable, and the probability distribution is Gaussian with a width of ∼1 kcal/mol, whereas the temperature of the system is T = 300 ± 2K (see Figure S1). Snapshots for analysis were recorded every 5 fs. Bond lengths involving Hatoms were constrained using the SHAKE [23] algorithm, and all non-bonded interactions were evaluated using shifted interactions with a cutoff of 14 Å switched at 10 Å [24].
To assess structural changes on longer time scales, simulations with and without the RKHS PES for the azide label were conducted for WT, Ala47N 3 , and Ala92N 3 . With the RKHS PES for the azide label, independent runs 5 to 10 ns in length were performed. For the 100 ns simulations, a simplified, computationally more efficient force field for the N 3 label was used.
The N-N bond was a harmonic oscillator with r e = 1.14 Å and k e = 877.413 kcal/mol/Å 2 and the N-N-N angular potential parameters were θ e = 180 • and k e = 46.706 kcal/mol/radian 2 . Otherwise, the energy function remained unchanged. These simulations employed the OpenMM implementation of CHARMM [25]. Electrostatic interactions were treated with the particle mesh Ewald method [26] with grid size spacing of 1 Å, characteristic reciprocal length κ = 0.34 Å −1 , and interpolation order 6. The simulations were run for 100 ns for each system.

Dynamical Cross Correlation Maps
To quantitatively determine the effect of ligand binding on the protein dynamics, dynamical cross-correlation maps [27,28] (DCCM) and difference dynamical cross-correlation maps (∆DCCM) were calculated using the Bio3D package [29]. Dynamic cross-correlation matrices with coefficients C ij = ∆r i .∆r j /( ∆r 2 i ∆r 2 j ) 1/2 (1) were determined from the positions of the main chain C α atoms in amino acids i and j with positions r i and r j . ∆r i and ∆r j determine the displacement of the ith C α from its mean position over the entire trajectory. For DCCM calculations, the Bio3D package [29] was used. Note that DCCM describes the correlated and anti-correlated motions in a protein, whereas the differences ∆DCCM reports on pronounced differences between unmodified and modified proteins.

Infrared Spectroscopy
The infrared spectrum is calculated by the Fourier transform over the dipole moment autocorrelation function. To that end, the dipole moment µ is obtained for the entire protein and -N 3 label separately from the simulation trajectories of 2 ns production run. For the IR spectra, the correlation function C(t) = µ(0) µ(t) was determined from snapshots saved every 5 fs. Then, the fast Fourier transform of C(t) was determined using a Blackman filter, and the result was multiplied using a quantum correction factor βhω/ (1

Results
For assessing the structural dynamics on the time scale of the infrared spectroscopy, first, the local fluctuations and correlated motions from the 2 ns trajectories were analysed. It has been shown previously that, within a given conformational substate, trajectories of that length are sufficient for converging IR spectra and associated frequency fluctuation correlation functions [31]. As introducing an azide label at different locations of the protein may lead to larger structural changes on longer time scales, additional longer simulations with and without the RKHS PES were run for Ala47N 3 , Ala92N 3 , and for WT (for comparison). The reason for choosing these two residues is that Ala47 has been previously shown to be solvent exposed, while Ala92 is not [16].

Global Dynamics (RMSF and DCCM)
To probe the flexibility or rigidity of the unmodified and modified proteins on the time scale of the IR spectroscopy (2 ns) and within one conformational substate for WT and all modified proteins, the root mean squared fluctuation (RMSF) of lysozyme in solution before and after modification was compared with original X-ray structure as the reference. The comparison was based on all C α atoms in the protein and the results are shown in Figure 2 for all Ala residues.
The RMSF changes of the modified proteins compared with unmodified lysozyme ranged from minor (Ala26, Ala32, Ala42, Ala73, Ala76, Ala92, Ala96, Ala108, and Ala111) to major (Ala9, Ala47, Ala83, Ala90, and Ala94), see Figure 2. Moreover, attaching an -N 3 label to one site may also lead to larger fluctuations at other neighbour or non-neighbour residues. As an example, modification of Ala47 to Ala47N 3 leads to larger RMSFs for residues 12 to 18, 45 to 49, 87 to 93, and 122 to 130. Another example for a more pronounced change in the local flexibility concerns modification of Ala32. In this case, the region for residues 80 to 100 becomes considerably more flexible, which can also be explained by the close contact of the two helices Ala32 on the one hand and residues 90 to 100 are part of, see Figure 1. Finally, attaching N 3 to Ala94 leads to a range of changes in the flexibility of nearby and more distant residues, including residues (10-20), (60-70), (85-100), and (110-130).
In general, attaching -N 3 leads to increased flexibility of the protein, and often the Cterminus around residue 130 becomes more flexible. One example for which the flexibility is decreased concerns modification at Ala108 for which the region (65-75) becomes more rigid. Hence, even on the 2 ns time scale, differences in the IR spectroscopy are accompanied by alterations in the local fluctuations.
Dynamical Cross Correlation Maps: DCCMs describe the correlated (C ij ∼ 1) and anti-correlated (C ij ∼ −1) motions within a protein. As an example, the DCCMs for the WT lysozyme and Ala108N 3 are shown in Figure 3. Bulges along the diagonal correspond to helices, whereas features extending away from the diagonal orthogonal to it are β−sheet structures. For the WT, there is a hinge motion at the intersection between residues 10 and 50 with 0.25 ≤ C ij ≤ 0.5, which disappears upon modification at position Ala108. The intensity of such hinge motions should be compared with insulin monomer for which disulfide bonds are characterized by 0.55 ≤ C ij ≤ 0.70, i.e., about a factor of two-times more intense than in the present case. Similarly, there are several anticorrelated motions in the WT protein, e.g., involving residues 10 and 80 or 50 and 110, which largely disappear upon modification at Ala108.
It is also instructive to compare the DCCM with the RMSF in Figure 2. As was already discussed, the RMSF for Ala108N 3 is reduced for residues in the range (65, 75) compared with the WT protein. This can also be seen in the DCCM for which the positive correlation involving residues (70-80, 60) is considerably stronger in the WT compared with the modified protein.

Local Dynamics (∆DCCM)
To further analyse the modification sites, difference maps ∆DCCM are calculated for all the AlaN 3 modifications that report on residues with pronounced changes in the dynamics after modification. The range of differences considered includes changes (>|0.25|) and smaller than (>|0.75|). For differences |∆C ij | < 0.25, the changes are too insignificant, whereas values |∆C ij | > 0.75 were not observed. Figure 4 shows the corresponding ∆DCCM maps for residues Ala92N 3 , Ala26N 3 , and Ala90N 3 as examples for minor, medium, and major effects on the protein dynamics after modification. Incorporation of -N 3 into the protein at position Ala92 leads to insignificant changes in the correlation between the residues (Figure 4a). This is consistent with the findings from analysis of the RMSF, see Figure 2, which shows only minor variation in the fluctuations without and with N 3 attached to Ala92. On the contrary, for Ala26N 3 , ligand-induced effects between residues (12, 18) and (42, 52) (feature A) and residues (23,35) and (120, 127) (feature B) are found, see Figure 4b. Again, these changes can also be found in the RMSF analysis in Figure 2 although there, the effects specifically for residues (12,18) and (23,35) are smaller, albeit still visible. Both, local (feature B) and global (feature A) changes in the protein dynamics are found.

Spectroscopy
It is also of interest to further consider the vibrational spectroscopy for the different modified proteins. In particular, it is of interest whether attaching the azide label at different positions along the polypeptide chain leads to discernible changes only in the asymmetric N 3 stretch vibration or whether the spectroscopy in the low-frequency (far infrared, THz) range is also expected to be affected by the modification. Given the changes in the couplings between different parts of the protein as found from the DCCM maps, it is conceivable that also the low frequency modes are affected by attaching the label at different locations along the polypeptide chain. Figure 5A reports the IR spectrum in the low frequency range (0-300 cm −1 ) with modifications at positions 9, 47, 73, and 92 together with the high frequency range (2130-2230 cm −1 , panel B), which captures the asymmetric stretch of the -N 3 label. The IR spectrum for all AlaN 3 modifications is shown in Figure S12. Using the same energy function for all AlaN 3 moieties, the results demonstrate that the IR spectra differ in terms of the position of the frequency maxima and their full widths at half maximum (FWHM). Compared with earlier works that determined the 1D IR lineshape from the frequency fluctuation correlation function (FFCF) [16], the present analysis confirms that the frequency maxima of the most red (Ala9) and most blue (Ala73) shifted azide vibrations differ by ∼15 cm −1 . In addition, the spectrum in the low-frequency region (0 to 300 cm −1 ) is reported in Figure 5A. The FFCFs determined previously reported pronounced oscillations for -N 3modification at Ala9, Ala32, Ala42, and Ala92 but their origin remains unclear. Similarly, earlier work in the region of the antisymmetric stretch of the N − 3 anion in ternary complexes with formic acid dehydrogenase and NAD + and NADH, respectively, also reported oscillations on the picosecond time scale of the FFCF [32]. Importantly, in this earlier work the azide anion is not covalently bound to either the protein or the ligand but rather replaces the formate reactant as a transition state analogue in the active site of the protein.
The reported oscillations occur on the sub-picosecond to picosecond time scale. For the covalently bound azide label the recurrences are rather on the sub-ps time scale throughout. It was hypothesized that these recurrences potentially originate from coupling of the azide label to low frequency modes of the environment. However, the spectra in Figure 5A do not support such an interpretation as irrespective of the position of the azide label, the spectral signatures in the range between 0 and 300 cm −1 are largely identical.

The Structural Dynamics of Wild Type and Azide-Labelled Lysozyme on Longer Time Scales
Up to this point, the structural dynamics was analysed on the time scale required for reliably describing the infrared spectroscopy within a given conformational substate. However, it is known that lysozyme samples open and closed conformations on considerably longer time scales [33]. In order to assess the influence of attaching -N 3 to buried and solvent-exposed alanine residues, additional and considerably longer simulations were conducted for Ala47N 3 (solvent exposed), Ala92N 3 (buried), and for WT lysozyme for comparison.
Two sets of simulations were run. Firstly, using the RKHS-based representation of the PES for the azide label and secondly with a conventional (harmonic) force field for the label to access longer time scales. For the first set, five individual runs with simulation times between 5 and 10 ns were performed; whereas, for the second set, one continuous 100 ns simulation was run for each of the systems.
Simulations with the conventional force field for the azide label on the 100 ns confirm some of these findings, see Figure S14 (bottom panel), although the magnitude of some of the fluctuations changes compared to the simulations on the 10 ns time scale. The RMSD ( Figure S14 (top panel)) for the WT protein stabilizes at ∼2 Å whereas that for Ala47N 3 and Ala92N 3 stabilize at close to 3 Å and 2.5 Å, respectively. Still, all three proteins are stable on the 100 ns time scale. It is of interest to note that the RMSD between a closed (I3P 0 ) and an open (I3P A or I3P B ) structure of an I3P mutant of lysozyme is ∼4 Å for the C α atoms [34], which is also consistent with results from extensive MD simulations for the open/close transition in the M6I mutant [33].
Furthermore, the DCCM maps for all three systems for the two sets of simulations are shown in Figures S15-S17. For the WT protein, the DCCM maps from 10 ns and 100 ns simulations are strikingly similar, see Figure S15. Here, the same force field was used for the two simulations. For Ala47N 3 the DCCM from the 10 ns simulation using the RKHS PES for the label is overall similar to that from the 100 ns simulation using the conventional FF. There is one additional correlation between residues (111-120) and (66-70), which appears for the 100 ns simulations together with a few additional, smaller features. Similarly, for Ala92N 3 a pronounced coupling between (82-100) and (52-60) emerges, which was not found from simulations on the 10 ns time scale. In addition, the anticorrelation for residues (45-68) and (105-120) is more pronounced on the longer time scale.
Overall, the positional sensitivity of the spectroscopic response found from simulations on the 2 ns time scale [16], i.e., within one conformational substate, is reflected by specific structural dynamics, see Figures 2-4 and S2-S11. Simulations on the 10 ns time scale find limited differences between the RMSF and RMSD for WT and the two modified lysozymes considered (Ala47N 3 and Ala92N 3 ). On the 100 ns time scale and using an empirical force field for the entire system the differences in the overall structural changes increase to between 2 and 3 Å for the RMSD with respect to the WT reference structure and additional couplings emerge as is found from the DCCM maps. The RMSFs of the three proteins considered are high for the same regions except for residues around Tyr20 for which Ala92N 3 is significantly more flexible.

Conclusions
The present work confirms that azide attached to alanine in lysozyme leads to sitespecific information in the spectroscopy and dynamics on the 2 ns time scale and within the same conformational substate. This is consistent with earlier findings for PDZ2 [35]. Changes in difference between dynamical cross-correlation maps range from insignificant (e.g., for Ala90N 3 ) to major (e.g., for Ala92N 3 ); see Figure 4. Changes in the spectroscopy are accompanied by differing local fluctuations and couplings for structures within the same conformational substate.
On the 10 ns to 100 ns time scales, the WT and modified Ala47N 3 and Ala92N 3 proteins remain structurally intact with increased fluctuations. Thus, it is conceivable that different structural dynamics sampled on longer time scales will also lead to site-specific IR responses. Finally, we found that the spectral region of the asymmetric azide stretch was more informative about the site-specific dynamics compared with the far infrared region, which did not appear to exhibit specific features depending on the location of the modification site. The present work provides a molecularly resolved view of the internal protein dynamics upon introducing small spectroscopic probes at strategic positions of a protein.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.