Article Quantitative Comparison of Conformational Ensembles

A number of measures have been used in the structural biology literature to compare the shapes or conformations of biological macromolecules. However, the issue of how to compare two ensembles of conformations has received far less attention. Herein, the problem of how to quantitatively compare two such ensembles is addressed in several different ways using concepts from probability and information theory. Ultimately, such metrics could be used in the evaluation of structure-prediction algorithms and the analysis of how conformational mobility is inhibited by bound ligands.


Introduction
The comparison of conformations of biological macromolecules has traditionally been purely geometric and primarily concerned with measuring the difference of two fixed structures [1][2][3][4][5][6].In contrast, the issue of comparing ensembles of conformations has received relatively little attention.This issue can be particularly important if the macromolecules in question have significant flexibility.For example, consider a protein that is comprised of a large relatively rigid body and a flexible loop.The shape of the loop may vary significantly over time while the shape of the rigid portion remains largely fixed.Given two conformations of such a protein, traditional geometric methods of comparison may weight the importance of the shape of the loop and the rigid body equally, whereas comparing two conformational ensembles of the same protein may reduce the importance of the shape of the loop based on the uncertainty of its location with respect to the rigid body.This difference is the primary focus of this paper.
Several groups have looked at distributions and ensembles of protein conformations from a variety of perspectives [7][8][9][10].Additionally, the flexibility and associated entropy of proteins has been studied [11][12][13][14][15][16][17][18].This includes the work by Damm and Carlson presented in [19].They proposed a geometric evaluation of protein prediction based on Gaussian-weighted root mean square deviation (RMSD).However, their method differs significantly from what is presented here as their weights are based on the distance between atoms in two different conformations as opposed to the flexibility of the proteins individually.
The remainder of this section establishes terminology and definitions that are used throughout the paper, and reviews methods for quantitatively comparing two conformations or shapes of the same macromolecule.Such metrics are used both to compare structures in biological contexts and in evaluating the performance of algorithms in competitions such as CASP [20,21] and CAPRI [22,23].The rest of the paper is then structured as follows.Section 2 describes how conformational ensembles can be described as Gaussian distributions on high-dimensional configuration spaces.Section 3 explores similarity measures between conformational ensembles.This includes information-theoretic measures based on the Maxwell-Boltzmann probability distribution and its marginals.Section 4 provides some numerical examples of the metrics explored relative to more traditional approaches.An appendix is included that provides some information for reducing the degrees of freedom for a coarse-grained elastic network.

Terminology and Definitions
In the literature, the words conformation and configuration are sometimes used interchangeably, and at other times are given slightly different meanings.Here configuration is used to mean a collection of particles that are arranged in space in a particular way.When, in addition, the constituents of a configuration are connected together in a specified way, the result is a conformation.Connectivity information is topological in nature, and so conformation = configuration + topology.For example, a conformation can be described by a set of coordinates {x i } for i = 1, ..., n and a connectivity matrix, Γ.Here x i = [x i , y i , z i ] T ∈ R 3 is a three-dimensional position vector, and Γ = Γ T ∈ R n×n is a symmetric matrix consisting of 1's and 0's where 1 means that two points are connected and 0 means that they are not.The set of points by itself, without the connectivity information contained in Γ, would describe a configuration.
When the connections allow for some degree of mobility, the specific vectors in the set {x i } can change, while Γ remains constant.The set of all possible conformations consistent with the constraints is described as an ensemble.In the case of a macromolecule, the set of allowable conformations is defined by constraints from the covalent structure and non-covalent interactions such as hydrogen bonds, steric repulsion, and hydrophobicity.Even when all of these constraints are imposed, a significant degree of conformational mobility exists.
In contrast to the concepts of configuration and conformation, the concept of shape is a coarser description.Shape is concerned with the external appearance rather than internal connectivity.Shape can be described in terms of the boundary of a conformation and, since it disregards Γ, contains even less information than a configuration.A description that lies somewhere between configuration and shape is that of density.For example, if ρ i (x) is the electron density of the i th atomic nucleus in a structure containing n atoms, then the density of a whole macromolecule can be described as For example, each ρ i (x) might be described as a concentrated Gaussian distribution.Estimates of shape can be obtained from density by evaluating level curves of the form ρ(x) = c where c is some threshold value.

Established Measures of Conformational and Shape Similarity
Two conformations ({x i }, Γ) and ({x i }, Γ ) are said to be (geometrically) congruent if Γ = Γ and it is possible to find a rigid-body transformation g = (R, t) where R ∈ SO(3) (the set of 3 × 3 rotation matrices) and t ∈ R 3 (the set of three-dimensional translation/position vectors) such that The set of all g is denoted as G, the group of rigid-body motions in three-dimensional space, and g • x i defines an "action" of G on R n .
Two configurations are said to be congruent if {x i } = {g • x i }.This is a weaker condition than the corresponding concept of congruence for conformation for two reasons: (1) there is no constraint on Γ, since there is no Γ as part of the definition of a configuration; (2) equality of sets does not require a correspondence between individual members of the set, and hence is a loser condition than when specific elements are equated.Two densities ρ(x) and ρ (x) are said to be congruent if it is possible to find a g ∈ G such that ρ(x) = ρ (g • x).
When two conformations, configurations, or shapes are not congruent, it is useful to assess how close they are to being congruent.These measures of "similarity" are based on some measure of distance [24].For example, can be used to measure the distance between points where p = 1 (the Manhattan metric) and p = 2 (Euclidean distance) are the most common values.Likewise, distance between two densities can be defined as A common similarity measure between two conformations is When p = 2, this is the commonly used root-mean-square (RMSD) comparison.It is also possible to define differences in conformations without minimizing over G by establishing internal coordinates such as dihedral angles and bond lengths, and comparing these internal coordinates for each conformation, but this approach will not be pursued here.
In the case of two conformations (or configurations) consisting of identical particles without correspondence, the optimal-assignment metric [25,26] can be used where Π n is the set of permutations that acts on the n indices that label the particle positions.
In cases where subsets of particles are of the same type rather than all of them being the same, the minimization over Π n in Equation ( 4) can be broken down into minimizations within each subset of similar particle types.
It is also possible to define similarity metrics of the form but these are less commonly used because they require more effort to compute.Shape similarity can be measured by evaluating where [27] g It can be shown that all of these measures of similarity satisfy the formal properties of a metric.If the densities are normalized so as to be probability density functions, then the Kullback-Leibler divergence (or relative entropy) is defined as A similarity measure can be defined as Unlike the similarity metrics discussed above, this one is not symmetric.Whereas the similarity measures reviewed in this section are for individual conformations, configurations, or shapes, it can be useful to compare two sets (or ensembles) of such objects rather than two individual objects.The next section addresses continuous ensembles that can be described as probability densities in high-dimensional spaces.

Generating Conformational Ensembles
Consider a macromolecule with atomic nuclei in a single conformation given by {x i }, and let the mass of the i th of these be m i .Then the kinetic energy will be where ẋ denotes the derivative of x with respect to time and • denotes the dot product of vectors, , and [M ] is the 3n × 3n diagonal matrix with blocks on the diagonal of the form m i I 3 for i = 1, ..., n.
The i th conjugate momentum vector is Concatenating these gives P = [p T 1 , ..., p T n ] T , and so the kinetic energy can be written as The potential energy for the macromolecule will be of the form V = V (X ; X 0 ) where X 0 is the set of Cartesian coordinates where energy is minimized.For an ensemble that is generated by small conformational changes induced by Brownian motion bombardment of a molecule that is free to move in the absence of an external potential, V (X ; T is the Hessian (or stiffness) matrix evaluated at X 0 , the referential (lowest energy) conformation, In this case, the Maxwell-Boltzmann distribution will be where is the partition function that normalizes f (X, P) to be a probability density, and β = 1/k B T where k B is the Boltzmann constant and T is temperature in degrees Kelvin.The distribution in Equation ( 8) is a 6n-dimensional Gaussian distribution with covariance where ⊕ denotes the direct sum of two matrices.

Invariance and Degeneracy
In some scenarios there is an external elastic potential corresponding to the macromolecule being tethered to a substrate, in which case Equation ( 9) would be valid.However, an important technical detail that needs to be addressed when there is no external potential is that [K(X 0 )] will be singular.This is related to the fact that when there is no external potential, then for any static g ∈ G, where From the above invariance of V (• ; •) under rigid-body shifts it follows that Moreover, This means that f (g • X, P ; g where R • P = [(Rp 1 ) T , ..., (Rp n ) T ] T , but this fact is not relevant to the current discussion.
In classical statistical mechanics, this invariance under translation and rotation is broken by limiting the motion of molecules to a finite box, which is tantamount to defining an external potential.But in the present context it will be convenient instead to reduce degrees of freedom while maintaining the Gaussian nature of this distribution (which is convenient for the calculations that will follow).The mass matrix has full rank, and so R is not a concern.But the stiffness matrix has rank 3n − 6 due to the 6 degrees of freedom in G. Therefore, either reducing [K] by six degrees of freedom, or by adding an artificial tethering potential, will make it have full rank.If [K] is decomposed as where O 6 is the 6×6 matrix of zeros and Λ is the 3n−6-dimensional diagonal matrix with the remaining eigenvalues, then choosing will define a new (full-rank) stiffness matrix, which is Λ, where 0 6 is the six-dimensional zero vector.Alternatively, constraints can be imposed to ensure that the body does not undergo any overall rotation or translation from its initial state as it deforms.These result from conservation of linear and angular momentum.They can be imposed as hard constraints, by for example, solving for x n−1 and x n in terms of all other x i 's and back substituting to produce X that is (3n − 6)-dimensional, or as soft constraints imposed with elastic potentials.We take the approach of hard constraints.Another way to impose hard constraints would be to fix a specific point, for example, by translating every point by −x n so that x n becomes 0, and keep it fixed there as the macromolecule undergoes fluctuations.This removes three degrees of freedom.Then, if the vector from x n to x n−1 is forced to have a specific orientation (e.g., pointing along the direction e 1 = [1, 0, 0] T ), this removes two more degrees of freedom.The final rigid-body degree of freedom can be enforced by defining allowable motions of x n−2 to be in the plane spanned by e 1 and e 2 .Additional information on applying these constraints can be found in Appendix A.

Conformational Boltzmann Distribution
We shall be concerned primarily with the conformational Boltzmann distribution, which results from removing the superfluous rigid-body degrees of freedom that leave the potential invariant as in Equations ( 10) and (12).Removing these degrees of freedom and marginalizing over the conjugate momenta results in where Here X represents a reduced set of coordinates that could have been generated in either of the two ways discussed above.Now that [ K( X0 )] is full rank, the conformational partition function can be computed in closed form as

Elastic Network Models
Elastic network models have been used to model the motion of macromolecules [28][29][30][31][32][33][34][35].In the elastic network model the stiffness matrix is computed from a set of coordinates {x 0 i } of a baseline structure in terms of 3 × 3 blocks as where and k ij takes a uniform nonzero value when x 0 i − x 0 j is within a cutoff radius, r 0 , and takes a value of zero otherwise.This leads to a stiffness matrix that is sparse, which leads to efficient computations of normal modes.For our purposes, it will be more convenient to define k ij as follows: where φ(0) = 1 and φ(r) ≈ 1 when 0 < r < r 0 , and decays rapidly to zero on the range r 0 to infinity.Many such functions exist that are differentiable, including sigmoid functions and certain multi-Gaussian difference functions.Then the full stiffness matrix constructed from these blocks explicitly depends on the vector of initial coordinates as [K] = [K(X 0 )].

Similarity Measures for Conformational Ensembles
In this section, measures of similarity that build on the concepts in the previous section are articulated.Let the two ensembles in question be described by probability density functions The difference between them is that either X0 = X 0 , or β = β , or both.Since the conformations were effectively anchored in space in a specific way prior to computing these probability densities, in order to make the stiffness matrix nonsingular, it follows that the probability densities should be optimally aligned before they are compared.This is analogous to what was done in Equation ( 5) in comparing shapes using their mass densities.But now, the conformational probability densities are functions on (3n−6)-dimensional Euclidean space.Herein several measures are compared.Throughout this discussion, N .= 3n − 6 Let P(R N ) denote the set of all smooth probability density functions on R N .Given a measure of distance, divergence, or discrepancy between two probability density functions, and given an action of G on R N , g • R N , a corresponding action on functions can be defined in analogy with Equation (6).A measure of similarity between probability densities describing an ensemble can then be constructed as This similarity of conformational probability densities then serves to quantify how similar two ensembles are.Below, it is shown how various D(f 1 ; f 2 ) can be computed in closed form when f 1 and f 2 are Gaussian distributions.

L 2 Difference
It is known from the literature that the following measure can be computed in closed form for Gaussian distributions on R N , f (µ,Σ) (x) = f (x; µ, Σ): An advantage of Gaussian functions is that the integration of quadratic terms over R N has a closed-form expression.We derived it as follows, Since Equation ( 17) can be rewritten in a closed-form as Similarly, we can write For the numerical examples presented in Section 4 a normalized version of the L 2 difference is presented.This normalized version is given by

Kullback-Leibler Divergence
The Kullback-Leibler divergence (or relative entropy) between two probability density functions is defined as This can be computed in closed form for Gaussians using the identity in Equation ( 18) and as follows:

Numerical Examples and Comparison with RMSD
The examples presented below use structural data from the protein database (PDB) [36] and the 2010 Critical Assessment of protein Structure Prediction (CASP9) [37] to highlight how the metrics in Section 3 differ with respect to RMSD.

Example 1: Perturbed Protein Structure
In order to contrast the proposed metrics in Section 3 with those that are purely geometric based, we perturbed a conformation given in the PDB [36] in two ways.The protein chosen is Streptomyces griseus protease B (PDB ID 3SGB) [36,38] shown in Figure 1.This protein's loops and size make it relatively flexible and well suited to illustrate the difference between these metrics.[39] of PDB ID 3SGB [36,38].
The perturbations are performed using a unit length vector, V ∈ R N .Using V and the alpha carbon locations given in the PDB file, X 0 0 , a family of similar mean conformations were generated as for scalar values of γ.This parameterization of mean conformations then provides a way to compare how the difference measures given in Section 3 vary over a range of RMSD values.Conformational ensembles were then generated using the conformational Boltzmann distribution described in Section 2.2 together with the hard constraints discussed in Sections 2.1 and Appendix A.
The first set of comparisons that was performed was between two different perturbation vectors, V.For the first vector we consider perturbations in the direction of low frequency normal modes.These mode shapes are associated with the dominant direction of motion for large conformational changes.We start by forming a coarse grain elastic network as described in Section 2.3 using the alpha carbons.Based on this elastic network, normal mode analysis was used to obtain the minimum nonzero eigenvalue and its associated unit-length eigenvector.This eigenvector is then used as the perturbation vector V.The second type of perturbation applied was obtained by taking a random vector whose elements are drawn from a zero mean normal distribution with constant variance.This vector was then normalized to unit length.For these examples, all stiffness matrices were constructed using r 0 = 8 Å, k = 1, and β = 1.
Figure 2 highlights the difference of these two perturbation methods.The conformations corresponding to the random perturbation exhibit lower RMSD values at all values of γ than perturbation in the likely direction of motion; for the random perturbations, the RMSD values are roughly half of those associated with the mode shape.This is particularly interesting when contrasted with the Kullback-Leibler divergences and L 2 differences.For these measures, the values associated with the normal mode perturbations are lower than those associated with the random perturbations.This can be attributed to the fact these deviations occur in a direction of high flexibility and both of these measures are functions of the conformation's elastic structure.This effect is particularly pronounced for the Kullback-Leibler divergence.This result highlights the fact that RMSD alone does not capture how similar two conformations are when uncertainty is considered.

Figure 2.
Here we demonstrate how RMSD (top left), Kullback-Leibler divergence (top right), and normalized L 2 difference (bottom) vary with γ (β and r 0 are fixed at 1 and 8 respectively).Two perturbation vectors are given, one corresponding to the lowest frequency mode and one random unit vector.
The second set of comparisons presented utilizes only the normal mode perturbation.However, a variety of values for β and r 0 were used as these parameters effectively alter the "stiffness" of the conformational Boltzmann distribution for a fixed value of k (k = 1 for these examples).Figures 3  and 4 demonstrate the effect of varying β and the cutoff radius, r 0 .It should be noted that the wide range of β values being shown are only for the purpose of illustrating the relative sensitivity of our results to β.The values used may not represent a physiologically realizable range.These figures illustrate several interesting things about how these metrics differ from RMSD.The first thing to note is that while RMSD is constant with respect to β and r 0 , both the Kullback-Leibler divergence and L 2 difference vary with respect to these flexibility measures.It is also apparent that ∆ KL increases with γ; although, the relationship is not linear.For "stiffer" structures (i.e., higher β or higher r 0 ), the slope of ∆ KL with respect to γ is lower and begins to flatten sooner.The relationship between ∆ 2 and γ is more complex.Close to γ = 0, ∆ 2 increases with γ until it reaches a maximum and then begins to decline until it levels out at a value of one.

Example 2: CASP Predictions
One of the motivations for looking at metrics for conformational ensembles is for comparison of protein predictions to experimentally solved proteins.Here we present a target protein structure, T0575 from CASP9 [37] (PDB ID 3NRG [40]), and several groups' predictions.Figure 5 provides a ribbon diagram for the target protein.We examined the 44 predictions of the target that met the following conditions: (1) the prediction contains all of the residues; (2) the prediction is a group's "first" model (i.e., it is the model with which a group is most confident); and (3) the RMSD between the alpha carbons of the prediction and target is less than 6 Å.For each of the predictions and the target, ensembles were generated using the conformational Boltzmann distribution for β = 1 and r 0 = 8 Å.Using these ensembles, the Kullback-Leibler divergence ∆ KL was determined.Figure 6 presents these values of Kullback-Leibler divergence versus their associated RMSD.[39] of PDB ID 3NRG [36,40].
It is believed that the Kullback-Leibler divergence in this context provides an indication of how similar two conformations are in terms of topology and flexibility.To demonstrate this, we used normal mode analysis to perturb the predicted models; we added a weighted sum of the first few nonzero normal modes to the predicted models to minimize the RMSD between the prediction and the target.These first few nonzero normal modes represent the primary directions of motion for elastic networks.Let the vectors associated with the normal modes be ordered such that V i represents the i th nonzero normal mode.Then we can represent this minimization over m normal modes with the following perturbed model vector similar to what was done in Equation ( 25) where X 0 0 is a vector of the original coordinates of the predicted model and γ i 's are scalar weights.If X T arget 0 is the coordinate vector associated with the target structure, we can then use a cyclic gradient descent to minimize RMSD, min The results for this analysis can found in Figures 6 and 7 for the first three normal modes (m = 3).
Figure 6 shows the original Kullback-Leibler divergence versus both the original RMSD and the RMSD optimized using the first three normal modes.Figure 7 provides the percent reduction in RMSD versus the original Kullback-Leibler divergence.In this analysis, we would expect to see the amount of minimization decrease as the Kullback-Leibler divergence increases.This trend is apparent, although there are a few outliers.We note that similar results were observed for other CASP9 targets and different values of m.

Conclusions
Measures to quantify how similar two conformational ensembles are have been articulated here.In contrast to well-known measures of shape similarity between two conformations, which are purely geometric, the comparison of ensembles here requires concepts from probability and information theory.We have illustrated how using the stiffness information in conformational ensembles for two macromolecules can be used to develop difference measures that take into account the positional uncertainty of flexible regions.These new measures are based on the Kullback-Leibler divergence and Numerical examples of how these measures compare with the traditional RMSD were presented.The conformational ensembles for these examples were based on coarse-grained elastic networks.It has been demonstrated that for two different conformations, the RMSD of each of them with respect to a third may not provide enough information to determine which of the conformations is more similar to the third in terms of their positional uncertainty.
The measures presented provide additional information with respect to positional uncertainty.They have the potential to be extended or combined with each other, additional information theoretic measures, and/or traditional geometric measures to make them more robust to a variety of parameters.Additionally, methods for comparing the similarity of an individual structure and an ensemble can be explored.
Recall from Section 2.1 that we can remove 6 degrees of freedom by forcing x 0 n = 0, rotating the conformation so that the vector from x 0 n to x 0 n−1 is aligned with e 1 , and confining x 0 n−2 to the plane spanned by e 1 and e 2 .The resulting coordinate vector X ∈ R 3n can then be written as Let us define B as B = I (3n−7) 0 0 0 0 0 0 0 0 T 0 1 0 0 0 0 0 Here B ∈ R (3n−6)×3n .Let us also define The generalized coordinate system X ∈ R 3n−6 is then x 1 x 2 . . .
Here, and henceforth, a 1 will represent the signed distance of x n−1 from the plane spanned by the global e 2 and e 3 axes as measured along the vector from x 0 n to x 0 n−1 .This ensures that the confined degrees of freedom are respected.Also, a 2 (a 3 ) will represent the signed distance of x n−2 from the plane spanned by the fixed global e 2 and e 3 (e 1 and e 3 ) axes as measured along the transformed e 1 (e 2 ) axis.
If we apply an arbitrary rigid body motion g 0 to the conformation where then We can then write and where R0 ∈ R 3n×3n and t0 ∈ R 3n .(Note: We will let Rn−3 0 ∈ R 3(n−3)×3(n−3) represent a direct sum of n − 3 copies of R 0 .)Then the new coordinate vector can be written as If K X 0 is the stiffness matrix associated with X, the new stiffness matrix K Y 0 can be written as Also, the stiffness matrix corresponding to X can then be written as However, in the new coordinate system that has been transformed via g 0 , a similar equality does not hold in general, Yet there is something that can be said if we consider the rotation and translation separately.First we will consider pure translation (i.e., R 0 = I 3 ); Y = X + t0 , Ỹ = B Y = X + B t0 , and Also, Next, consider pure rotation (i.e., t 0 = 0); Y = R0 X and However We can recover Y using Then one can write Therefore, for pure rotation Using these two cases, we can generate any rigid body motion of one conformation relative to another.For example, if we want the coordinate frames attached to two conformations (X 0 and X 0 ) to be separated by g 0 = (R 0 , t 0 ), we could apply g R 0 = (R 0 , 0) to X 0 and g t 0 = (I, −t 0 ) to X 0 .c 2012 by the authors; licensee MDPI, Basel, Switzerland.This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Figure 6 .
Figure 6.The Kullback-Leibler divergences between predicted models and the CASP9 target are shown with respect to RMSD.RMSDs are given for the original predictions and predictions perturbed along the first three normal modes.

Figure 7 .
Figure 7.The reduction in RMSD realized by minimizing along the first three normal modes is given with respect to the Kullback-Leibler divergence between the predicted model and the target (T0575).