Article Protein Loop Dynamics Are Complex and Depend on the Motions of the Whole Protein

We investigate the relationship between the motions of the same peptide loop segment incorporated within a protein structure and motions of free or end-constrained peptides. As a reference point we also compare against alanine chains having the same length as the loop. Both the analysis of atomic molecular dynamics trajectories and structure-based elastic network models, reveal no general dependence on loop length or on the number of solvent exposed residues. Rather, the whole structure affects the motions in complex ways that depend strongly and specifically on the tertiary structure of the whole protein. Both the Elastic Network Models and Molecular Dynamics confirm the differences in loop dynamics between the free and structured contexts; there is strong agreement between the behaviors observed from molecular dynamics and the elastic network models. There is no apparent simple relationship between loop mobility and its size, exposure, or position within a loop. Free peptides do not behave the same as the loops in the proteins. Surface loops do not behave as if they were random coils, and the tertiary structure has a critical influence upon the apparent motions. This strongly implies that entropy evaluation of protein loops requires knowledge of the motions of the entire protein structure.


Introduction
A longstanding point of view has been that the dynamics of protein loops might be modeled as if they were polymers capable of randomly sampling their various degrees of freedom.This view has its roots in theoretical polymer physics.Contrary evidence has been presented in studies using Elastic Network Models (ENMs) where the loops are observed to move in strong correlation with the large domains of the structures.From these we have even suggested that the functional loops move with the slow domain motions, and not with any significant independence.However, this remains unproven.
The polymer physics viewpoint would treat loops as random Gaussian chains.This approach has been used to treat the statistical distribution of covalently linked rings in condensation polymers given by Jacobson and Stockmayer [1].The occurrence of rings diminishes for longer chains because of the conformational entropy that grows rapidly with increases in the lengths of the chains.Flory [2] gave an explanation for the formation of the small rings or coils based on various statistical parameters.In this present work, we aim to see whether this random point of view has any validity for protein loops when tested against atomic Molecular Dynamics (MD), and then we compare the MD dynamical freedom, representing the entropies of the loops, to see which extreme viewpoint is more likely, either the random viewpoint or the controlled behavior from the elastic models.In the present study we consider a small set of diverse protein structures, to investigate the behavior of their protein loops, and show that their motional behaviours are far from random, show a high level of complexity and a strong dependence upon the tertiary structure.
The loop regions are often thought to be conformationally less regular fragments of the chain which connect between two secondary structure elements, i.e., alpha helix and beta strands and and also to be more generally exposed at the surface.They have quite variable lengths in their different occurrences.Loops exposed on the surface often play a vital role in protein functions, primarily because they have a greater chance of interacting with the solvent and other molecules.Multiple experimentally determined structures often show the apparent restricted motion of protein loops [3], such as those pairs of structures corresponding to the trajectory between an 'open' and a 'closed' state.But, these pairs of structures are extremely limited, and in some cases other important intermediate states may to exist.The general results for loops from elastic network studies support the point of view that they are not just random coils moving randomly, but instead often possess well defined characteristics showing limited motions coupled with the large domain motions.
Two recent reviews [4,5] discussed the importance of modelling of loops and their entropic contributions to investigate protein folding pathways.The relative disorder in the folded and unfolded ensembles was quantified as an entropic difference playing an important role in the folding process.Others have realized the importance of loop entropies in the ligand binding process [6].Successes in RNA structure prediction based on secondary structure considerations have lead to many papers that consider the entropies of nucleic acid stem loops [7].There have been many recent papers that have devised new methods for sampling conformations to improve entropy evaluations [8][9][10].In one of our recent studies [11], we showed that the loops and the coordination of their motions with the entire structure are critical for the functional purposes, and that the funcitonal loops tend to move in coordination with the dominant slow modes of motions of the protein structures; whereas the functionally unimportant loops moved more independently.
In this paper, we investigate whether there is any plausible relationship between the loop motions and the characteristics of the loop such as its length and surface exposure.We present a detailed analysis of the dynamic trajectories based on atomic Molecular Dynamics and also show that these motions closely resemble those computed with ENM, specifically the Anisotropic Network Models (ANM) [12].In the following work we find that there is no simple relationship between the length of the loop, its flexibility and function.The loops behave not as a random coil, but instead in ways that relate inherently to the topology of the specific protein and its tertiary structure.

Results and Discussion
The relative mobilities of each residue are important and are used frequently for studying protein motions; these can serve as an approximate measure of entropy.We compute Mean Square Fluctuations (MSFs) from the molecular dynamics trajectories and compare them to the X-ray crystallographic temperature factors, as well as the MSFs calculated from the ANM using a cutoff of 12 Å.Further details of the structures used and the molecular dynamics trajectories are given in Table 1. Figure 1 illustrates these mobilities for 1flh, while the other three proteins have their mobilities shown in Supplemental Figure S2.It is apparent that all three metrics show the same regions of the structure to be the most mobile, but the relative magnitudes of the motion differ somewhat among the X-ray B factors, the atomic MD and the ANM.While both the ANM and MD simulations correlate well with the X-ray temperature factors, they typically correlate even better with one another.The crystallographic B factors can be affected by the intermolecular interactions within the crystalline state or other contributions from the crystal environment, and these may account for some of the differences.
Table 1.Protein structures used in the analyses.MD trajectories for the homologs of the four proteins used in this study were downloaded from the MoDEL [13] database and are listed here.For two structures, a full match was not found.Rather, simulations consisting of only one domain were available.To confirm these similarities in motions, we show in Figure 2 a direct comparison of the motions of several loops in the structures calculated with two independent approaches for four protein structures.The computed mean square fluctuations of the loops from MD and ANM are colored spectrally (from blue to red), with blue indicating parts having the smallest fluctuations, and red the ones with the highest mobilities.There are strong similarities between the relative mobilities of the various parts of (3) 'ALA' for poly-alanine of the corresponding length; (4) 'Free-EC' where the free peptide is simulated, but the ends are constrained; and (5) 'ALA-EC' for end-constrained poly-alanine.Four loop fragments of 1flh are extracted, MD is performed using each fragment, and the trajectory is analyzed with PCA using the C Į coordinates.Poly-alanine chains of the same length are also simulated to test any effects of the specific side chain interactions.The percent of variance captured by the first 5 PCs is shown each trajectory.The smaller three loops are surface exposed loops, while the 15 residue fragment is a buried strand that connects two surface exposed loops with each of the three SSEs containing 5 residues.See Supplemental Information for the sequences and location of each fragment.Overall, certain trends are evident: Both the free and alanine behaviors are similar, with the shorter segments showing greater cohesion in their motions, with the longer fragments having a small fraction of their motions captured by the first PCs.Within the context of the protein these segments consistently show less cohesion for the motions of the longer segments, but nonetheless generally a greater cohesion than for the excised segments and alanine segments.One result seems to be readily comprehended-that these segments have significantly less freedom when they are attached to the remainder of the protein, as can be seen from the numbers in the second to last column.The last column shows the WRMSIP [Equation (3)] between each loop trajectory and the trajectory extracted from the full structure.To capture the relationship between the directions (the PCs) and the weights (the percentage of variance captured), we compute a weighted root mean squared inner product according to Equation (3) and present the results in Table 2. Supplementary Table S1 compares the trajectories of free loops with the corresponding end-constrained trajectory.We find that the motions of free loops or end-constrained loops are considerably different from the motions realized within the protein structures.

Length
Given the high agreement between mobilities calculated from ENM and MD, it appears feasible to compute conformational entropy using the ENM from a representative structure.The extent of motion is related to entropy because it is an indication of the number of accessible microstates that the system can occupy.These fluctuation-based entropies could be used in many contexts, and we have begun to use them in combination with knowledge-based energy potentials for refinement of protein tertiary structure predictions and similarly as the basis for selection of native-like docking poses.Our proof-ofconcept paper for this approach is given in the reference Zimmermann et.al. [15].An extended study with positive results across three datasets of commonly used docking benchmarks is to appear in The Journal of Physical Chemistry [16].

Anisotropic Network Model
The Anisotropic Network Model (ANM), which has been extensively developed and summarized in depth elsewhere [12,17] and was based on the original concept from Tirion [18], is utilized here to compute coarse-grained dynamics based for a structure.Such structures can be taken from many sources, including X-ray crystallography, NMR, Electron Microscopy, and pre-equilibrated Molecular Dynamics (MD) conformations.The ANM model assumes that the structure represents a minimum energy conformation and that all deviations from this conformation have an energetic cost.The motions of the structure that have the least energetic cost are favored and dominate the computed motions.Such motions also are collective, involving internal motion throughout the bulk of the structure.These have been extensively applied and are often found to represent the large scale domain motions better than atomic molecular dynamics simulations because they do not require the long computed trajectories from molecular dynamics.They have been shown to provide an efficient sampling of the motions of proteins structures, and as a result should also be useful for evaluating entropies.

Analysis of Molecular Dynamics Trajectories
Numerous molecular dynamics trajectories are available to download from the MoDEL database [13].They are distributed from the database in a compressed form that captures the motion apparent in the first Principal Components [19], or PCs, of the simulation such that at least 90% of the variation is captured.In this way, much of the random noise is filtered out, as in Essential Dynamics [20].Four different proteins were chosen based their diversity in size and function: aspartic protease, myoglobin, triosphosphate isomerase, and reverse transcriptase.Trajectories of these structures or their close homologues were downloaded, and these are listed in Table 1.For two structures, simulations of only one domain from the full structure were available.In the case of triosphosphate isomerase, the protein acts as a homodimer, but the simulation was performed on the monomer.For reverse transcriptase, the RNase H domain has been simulated.All trajectories used were simulated for 10ns using Amber 8.0 software [21], the Amber99 force field, and TIP3P explicit solvation.To ensure that we are using properly equilibrated data, the first 5ns of the trajectories were discarded.The final 5ns were used in analysis and the first of these frames used in ANM generation.
Because we seek to analyze the results on the residue level, the atomic trajectories are first reduced to only the C Į atom positions prior to analysis.The covariance between each atom pair is then quantified with the normalized time averaged dot product of the changes in position [22]: where ǻR i is the displacement vector of atom i between consecutive time steps and <> denotes time (ensemble) averaging.In this study, we focus on the dynamics of loops and their relationship to the remainder of the structure.Therefore, it is of interest to generalize Equation ( 1) to an even more coarse level.Secondary Structure Elements (SSEs) are identified from DSSP [14] and are defined as a segment of sequence with the same secondary structure.To investigate the correlation between motions of pairs of secondary structure segments, including individual loops, the time averaged dot product between two SSEs is defined in Equation ( 2) and is the average of the covariance of the individual atoms within each of the two SSEs: where n(SSE a ) is the number of residues in SSE a.
In order to capture the difference in sampling between two trajectories we compute the weighted root mean squared inner product (WRMSIP) between the first I PCs from the first trajectory and the first J PCs from the second.First, we compute the relative weight of the i th PC, w i .For each pair of PCs between the trajectories we compute the ratio of their weights: , where the superscript denotes which trajectory it is from.We then weight the pairwise inner products: where ‫ܥܲ‬ ଵ is the i th principal component from the first trajectory, ‫ܥܲ‬ ଶ is the j th from the second.The dot product accounts for the agreement in direction, while the weight accounts for the extent of sampling in that direction.RMSIP has been used in many studies and is explained well in Leo-Macias et al. [23], however it does not capture the PC weights as does our modification.Values approaching 1 indicate that the ensembles are identical, while smaller numbers indicate reduced coverage.All calculations presented here use I = J = 10 so that we consider the bulk of the important motions.
It should be noted that our WRMSIP counts differences in a nonlinear way.For example, say we consider the first three PCs from two trajectories where the directions of the PCs are identical and they have weights (percent of variance) of ‫ݓ‬ ଵ ൌ ሾͲǤͷǡ ͲǤ͵ǡ ͲǤʹሿ and ‫ݓ‬ ଶ ൌ ሾͲǤͷǡ ͲǤͶǡ ͲǤͳሿ.The WRMSIP would be 0.75.While ten percent of the variance has shifted from PC3 in the first trajectory to PC2 in the second, the WRMSIP decreases by more than 0.1.

Comparison of Structural Loops and Free Peptides
The complexity of loop motions within the MD simulations is quantified by Principal Component Analysis (PCA).PCA transforms the input trajectory data into a new coordinate system where the PCs form the basis.The first PC captures the largest fraction of the variance, the second captures the largest part of the remaining variance, and so on.If loop motions are highly random in nature, then individual PCs can be anticipated to capture only a fraction of the total variance.More correlated motions will be more concisely captured by a smaller set of PCs.This will allow us to distinguish between loop motions with either highly diverse motions or those having more internally correlated directions of motion.We perform PCA for each loop in each structure individually in order to determine the cohesiveness of its motions.
As control cases, we also perform 10 ns atomic MD simulations of representative loops extracted from the protease 1flh as free peptides of lengths 4, 7, 9, and 15 residues using the CHARMM27+CMAP [24,25] force field.While small differences in long timescale dynamics (hundreds of ns) have been observed between different force fields, overall their agreement is quite high [26,27].It is possible that the specific side chains present in these loops will affect the types or extent of motions sampled.Thus, a second set of control simulations is performed by using the same parameters for poly-alanine chains of the same lengths.
It is possible that the differences in motions between the free peptides and the peptides within their structural context are due to end-constraints.That is, when the peptide is within a protein structure, its ends are not free to move, since they are constrained by the flanking structure.To test this effect, we perform a second type of control simulation where the N-and C-terminal C Į atoms are harmonically constrained by a force of 5 kcal/mol.These are also presented in Table 2 and labeled by "-EC" for End-Constrained.
As the percent of variance captured is only an approximate measure of the conciseness of a set of motions and does not compare the directions of motion between two datasets, we also compute the dot products between the directions indicated in essential dynamics of loops within the protease 1flh and the corresponding excised free peptides.To further capture the agreement between the trajectories including the percent of variance that each PC captures, WRMSIP is calculated according to Equation (3).

Conclusions
In this work we investigate the relationships between loop motions and the motions of protein structures.There exists no apparent dependence on loop length or the number of solvent exposed residues within a loop.Rather, the nature of the tertiary structure is a dominant influence over these motions, and prevents the development of any general rules.Many loops have high agreement in their direction of motion with the secondary structures they connect-helices or strands.Due to the cohesive nature of the ANM, it could be argued that sets of cohesive motions derived for protein loops may be an artifact of the model, and not the genuine behavior of protein structures.However, in the present study we have presented an analysis of atomic MD trajectories, which might be expected to enable a greater extent of randomness for the molecular interactions, and these studies also confirm the large difference in loop dynamics between the free and structured contexts, as well as the lack of any general relationship between loop size, exposure, and mobility.We have shown that free peptides behave extremely differently from loops within proteins.Therefore, surface loops do not behave as if they were random coils, but the tertiary structure has a critical impact upon the realized motions.

Table 2 .
Analysis of simulations for isolated loop peptides.The loop dynamics are simulated in one of five contexts: (1) '1flh' indicating the fragment's motions in the full structure were extracted; (2) 'Free' indicating that the loop was simulated in isolation;