# Bayesian-Maximum-Entropy Reweighting of IDP Ensembles Based on NMR Chemical Shifts

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

_{i}are the reweighted weights, N is the number of structures in the ensemble and m the number of experimental observables. ${\chi}_{red}^{2}$ measures the degree of fitting:

_{REL}is closely related to the effective sample size, N

_{eff}, a useful measure of how much reweighting has taken place (see Equation (3)).

## 3. Results

_{eff}, ranging from 1 to 0. As the amount of reweighting increases to fit the data more accurately, N

_{eff}, decreases. In some systems, the shape of the ${\chi}_{red}^{2}$ vs. N

_{eff}curve allows for the determination of a critical value from which no further reweighting is necessary [26,63]. However, in fitting CS of ACTR we find a relatively homogeneous decrease of ${\chi}_{red}^{2}$, which makes it difficult to decide when to stop fitting(see Figure 2). A second common procedure is the Wald-Wolfowitz run test [64] for the residuals, where one checks that they are mutually independent. Long stretches of residues with the same sign indicate that more reweighting is possible. However, when using BME, decreasing θ decreases the size of the residuals, but not their signs (see Figure S2), i.e., each reweighted CS approaches its target CS from the same side as θ is decreased. This renders the Wald-Wolfowitz run test procedure inadequate.

- (1)
- Split the frames of the trajectory into a training set (t) and a validation set (v) of the same size. We used odd and even frames for the training and validation set respectively. We use sets of the same size because we aim to compare average values and distributions, not individual conformations of the validation set. A validation set as large as the test set is therefore needed to minimize the standard error of the mean and discretization errors of the distribution. We here chose to use an interleaved training and validation set after confirming that the two sets have highly uncorrelated CSs as frames are sampled only once every 1 ns.
- (2)
- Fit the BME for a range of θ values to the training set and apply the optimized Lagrange parameters λ to reweight the validation set.
- (3)
- For each of the θ values, evaluate the following properties
- ${\chi}_{red}^{2}$ (Equation (2)) of the training (${\chi}_{red}^{2}\left(t\right)$) and the validation set (${\chi}_{red}^{2}\left(v\right)$).
- Average distance between the training and the validation sets (D(t,v)).
- Average distance between the training set and the target (goal) distribution (D(t,g)), and average distance between the validation set and the target (goal) distribution (D(v,g)).

_{g}), are sensitive to the global mass distribution of the conformations. For expanded ensembles, an increase of helicity leads to more collapsed ensembles, as helical structures are compact [66,67]. For the a03ws, reweighting leads to an overall decrease of helicity (Figure S10), and therefore to a larger R

_{g}. For the C36m, the slight increase in helicity does not correlate with the R

_{g}, presumably because the initial ensemble is already rather compact. As also emphasized by Best as co-workers, both local (such as secondary structure) and global (such as R

_{g}) properties should be used to characterize IDPs ensembles, and one should not expect the reweighting based on one set of these properties to improve the other [1]. Using these other properties as cross-validation, as it is sometimes done, may lead to an incorrect perception of the amount of reweighting needed. As an example, the C36m fitting shows a minimum that could suggest that the evolution of R

_{g}could be used as a cross-validating property, but the a36m shows a monotonic increase, for the reasons previously mentioned. If several experimental data are known, they should be included in the reweighting to obtain more realistic ensembles with improved local and global properties.

^{−3}. A value of N

_{eff}far from 0 shows that the method is not overfitting. We remind the reader that the amount of reweighting is not determined by θ itself but by the Lagrange λ parameters that result from the optimization procedure. The result of this procedure tells us that, whatever our relative confidence in the computed ensemble and the target CS, the reweighting will be essentially zero.

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

**Figure S1**: Distribution of the difference between the ensemble CS and the average target CS for the 2 ensembles discussed in this work. The x-values correspond to a concatenation of C, CA and CB CS. To be able to reweight the values should be distributed at both sides of y = 0, as is the case.

**Figure S2**: Evolution of the CS difference (residuals) for the reweighting procedure for a range of θ values from 100 (purple values) to 0.1 (yellow values). As the residuals have the same sign for most of the reweighting procedure, one cannot use the Wald-Wolfowitz run test to define the amount of reweighting.

**Figure S3**: Difference in the chemical shifts predicted with Sparta+ and PPM for the a99SBdisp ensemble. The left plots show the value for each residue and the right plots show their distribution for a better visualization of their spread and shift. Atom types are CA (top), C (middle) and CB) bottom.

**Figure S4**: Evolution of ${\chi}_{red}^{2}$ with the effective sample size (N

_{eff}), showing the lack of an L-shaped curve. The blue and orange lines are the same as in Figure 1. The NE suffix stands for “no-error”. It corresponds to the values where the target CS are calculated with PPM. For the sake of clarity ${\chi}_{red}^{2}$ uses the same error as in the case with errors, even though ${\chi}_{red}^{2}$ is ill-defined in this case and should be regarded as a scaled root-mean-square error.

**Figure S5**: The CS difference between the target CS and the computed CS for the C36m ensemble compared to the difference in helicity for these two ensembles.

**Figure S6**: Behaviour of different quantities during the reweighting procedure of the C36m ensemble.

**Figure S7**: Root mean square of the Lagrange parameters λ during the reweighing procedure for all three ensembles. See Equation (4).

**Figure S8**: Behaviour of different quantities during the reweighting procedure of the a03ws ensemble with no predictive errors (NE suffix). As was done in figure S3, ${\chi}_{red}^{2}$ uses the same error as in the case with errors, even though ${\chi}_{red}^{2}$ is ill-defined in this case and should be regarded as a scaled root-mean-square error.

**Figure S9**: α-Helical content for the reweighted and target ensembles for the C36m force field. For the reweighted ensembles the train (C36m-t) and validation (C36m-v) sets are shown. The original ensembles before the reweighting is also shown.

**Figure S10**: Evolution of the radius of gyration for different reweightings.

**Figure S11**: Error in the predictor (PPM) with respect to the target chemical shift (Sparta+) for different secondary structure elements of the a99SBdisp ensemble. For atoms CA and C there is a systematic underestimation of the chemical shift, whereas for CB, there is a systematic overestimation. The errors also depend on the type of secondary structure. The codes are the following: ‘C’: Loops and irregular elements, ‘B’: Residue in isolated beta-bridge, ‘E’: Extended strand, participates in beta ladder, ‘G’: 3-helix (3/10 helix), ‘H’: Alpha helix, ‘I’: 5 helix (pi helix), ‘S’: bend, and ‘T’: hydrogen bonded turn, as determined by the DSSP algorithm implemented in MDtraj.

**Figure S12**: Behaviour of different quantities during the reweighting procedure of the a99SBdisp ensemble. The quantities defined as thin lines would not be measurable in a real case scenario, but their behaviour can be inferred from the quantities in thick lines.

**Figure S13**: Behaviour of different quantities during the reweighting procedure of the a99SBdisp ensemble using secondary chemical shifts. The quantities defined as thin lines would not be measurable in a real case scenario, but their behaviour can be inferred from the quantities in thick lines. Remark that the ${\chi}_{red}^{2}$ values are very small for all θ values. D(t,g) and D(t,v) have been scaled by 1/20 so that the shape of the curves could be seen.

**Figure S14**: α-Helical content for the reweighted and target ensembles using secondary chemical shifts. For the reweighted ensembles the train (a99SBdisp-t) and validation (a99SBdisp-v) sets are shown. The original ensemble before the reweighting is also shown and, as expected, it corresponds exactly to the target ensemble as they are the same. Figure S12: Evolution of the effective sample size (N

_{eff}) with θ for the secondary chemical shift fitting of a99SBdisp ensemble to the a99SBdisp target ensemble.

**Figure S15**: Evolution of the effective sample size (N

_{eff}) with θ for the secondary chemical shift fitting of a99SBdisp ensemble to the a99SBdisp target ensemble.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Zerze, G.H.; Zheng, W.; Best, R.B.; Mittal, J. Evolution of All-atom Protein Force Fields to Improve Local and Global Properties. J. Phys. Chem. Lett.
**2019**, 10, 2227–2234. [Google Scholar] [CrossRef] [PubMed] - Vitalis, A.; Pappu, R.V. ABSINTH: A new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem.
**2009**, 30, 673–699. [Google Scholar] [CrossRef] [PubMed] - Krzeminski, M.; Marsh, J.A.; Neale, C.; Choy, W.-Y.; Forman-Kay, J.D. Characterization of disordered proteins with ENSEMBLE. Bioinform. Oxf. Engl.
**2013**, 29, 398–399. [Google Scholar] [CrossRef] [PubMed] - Ozenne, V.; Bauer, F.; Salmon, L.; Huang, J.-R.; Jensen, M.R.; Segard, S.; Bernadó, P.; Charavay, C.; Blackledge, M. Flexible-meccano: A tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics
**2012**, 28, 1463–1470. [Google Scholar] [CrossRef] [PubMed] - Estaña, A.; Sibille, N.; Delaforge, E.; Vaisset, M.; Cortés, J.; Bernadó, P. Realistic Ensemble Models of Intrinsically Disordered Proteins Using a Structure-Encoding Coil Database. Structure
**2019**, 27, 381–391.e2. [Google Scholar] [CrossRef] [Green Version] - Best, R.B.; Zheng, W.; Mittal, J. Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J. Chem. Theory Comput.
**2014**, 10, 5113–5124. [Google Scholar] [CrossRef] - Best, R.B. Computational and theoretical advances in studies of intrinsically disordered proteins. Curr. Opin. Struct. Biol.
**2017**, 42, 147–154. [Google Scholar] [CrossRef] - Anandakrishnan, R.; Izadi, S.; Onufriev, A.V. Why Computed Protein Folding Landscapes Are Sensitive to the Water Model. J. Chem. Theory Comput.
**2019**, 15, 625–636. [Google Scholar] [CrossRef] - Piana, S.; Donchev, A.G.; Robustelli, P.; Shaw, D.E. Water dispersion interactions strongly influence simulated structural properties of disordered protein States. J. Phys. Chem. B
**2015**, 119, 5113–5123. [Google Scholar] [CrossRef] - Shabane, P.S.; Izadi, S.; Onufriev, A.V. General Purpose Water Model Can Improve Atomistic Simulations of Intrinsically Disordered Proteins. J. Chem. Theory Comput.
**2019**, 15, 2620–2634. [Google Scholar] [CrossRef] - Bonomi, M.; Heller, G.T.; Camilloni, C.; Vendruscolo, M. Principles of protein structural ensemble determination. Curr. Opin. Struct. Biol.
**2017**, 42, 106–116. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Köfinger, J.; Różycki, B.; Hummer, G. Inferring Structural Ensembles of Flexible and Dynamic Macromolecules Using Bayesian, Maximum Entropy, and Minimal-Ensemble Refinement Methods. In Biomolecular Simulations: Methods and Protocols; Bonomi, M., Camilloni, C., Eds.; Methods in Molecular Biology; Springer: New York, NY, USA, 2019; pp. 341–352. ISBN 978-1-4939-9608-7. [Google Scholar]
- Ravera, E.; Sgheri, L.; Parigi, G.; Luchinat, C. A critical assessment of methods to recover information from averaged data. Phys. Chem. Chem. Phys.
**2016**, 18, 5686–5701. [Google Scholar] [CrossRef] [PubMed] - Schneidman-Duhovny, D.; Pellarin, R.; Sali, A. Uncertainty in integrative structural modeling. Curr. Opin. Struct. Biol.
**2014**, 28, 96–104. [Google Scholar] [CrossRef] [Green Version] - Fenwick, R.B.; Esteban-Martín, S.; Salvatella, X. Influence of Experimental Uncertainties on the Properties of Ensembles Derived from NMR Residual Dipolar Couplings. J. Phys. Chem. Lett.
**2010**, 1, 3438–3441. [Google Scholar] [CrossRef] - Schröder, G.F. Hybrid methods for macromolecular structure determination: Experiment with expectations. Curr. Opin. Struct. Biol.
**2015**, 31, 20–27. [Google Scholar] [CrossRef] [PubMed] - Bottaro, S.; Lindorff-Larsen, K. Biophysical experiments and biomolecular simulations: A perfect match? Science
**2018**, 361, 355–360. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jensen, M.R.; Zweckstetter, M.; Huang, J.-R.; Blackledge, M. Exploring Free-Energy Landscapes of Intrinsically Disordered Proteins at Atomic Resolution Using NMR Spectroscopy. Chem. Rev.
**2014**. [Google Scholar] [CrossRef] [PubMed] - Kashtanov, S.; Borcherds, W.; Wu, H.; Daughdrill, G.W.; Ytreberg, F.M. Using Chemical Shifts to Assess Transient Secondary Structure and Generate Ensemble Structures of Intrinsically Disordered Proteins. In Intrinsically Disordered Protein Analysis: Volume 1, Methods and Experimental Tools; Uversky, V.N., Dunker, A.K., Eds.; Methods in Molecular Biology; Humana Press: Totowa, NJ, USA, 2012; pp. 139–152. ISBN 978-1-61779-927-3. [Google Scholar]
- Kjaergaard, M.; Poulsen, F.M. Disordered proteins studied by chemical shifts. Prog. Nucl. Magn. Reson. Spectrosc.
**2012**, 60, 42–51. [Google Scholar] [CrossRef] - Kragelj, J.; Ozenne, V.; Blackledge, M.; Jensen, M.R. Conformational Propensities of Intrinsically Disordered Proteins from NMR Chemical Shifts. ChemPhysChem
**2013**, 14, 3034–3045. [Google Scholar] [CrossRef] - Jensen, M.R.; Salmon, L.; Nodet, G.; Blackledge, M. Defining Conformational Ensembles of Intrinsically Disordered and Partially Folded Proteins Directly from Chemical Shifts. J. Am. Chem. Soc.
**2010**, 132, 1270–1272. [Google Scholar] [CrossRef] - Mantsyzov, A.B.; Shen, Y.; Lee, J.H.; Hummer, G.; Bax, A. MERA: A webserver for evaluating backbone torsion angle distributions in dynamic and disordered proteins from NMR data. J. Biomol. NMR
**2015**, 63, 85–95. [Google Scholar] [CrossRef] [PubMed] - Cesari, A.; Reißer, S.; Bussi, G. Using the Maximum Entropy Principle to Combine Simulations and Solution Experiments. Computation
**2018**, 6, 15. [Google Scholar] [CrossRef] - Hummer, G.; Köfinger, J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys.
**2015**, 143, 243150. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bottaro, S.; Bengtsen, T.; Lindorff-Larsen, K. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy reweighting approach. bioRxiv
**2018**, 457952. [Google Scholar] [CrossRef] - Escobedo, A.; Topal, B.; Kunze, M.B.A.; Aranda, J.; Chiesa, G.; Mungianu, D.; Bernardo-Seisdedos, G.; Eftekharzadeh, B.; Gairí, M.; Pierattelli, R.; et al. Side chain to main chain hydrogen bonds stabilize a polyglutamine helix in a transcription factor. Nat. Commun.
**2019**, 10, 2034. [Google Scholar] [CrossRef] [PubMed] - Fisher, C.K.; Ullman, O.; Stultz, C.M. Efficient construction of disordered protein ensembles in a bayesian framework with optimal selection of conformations. Pac. Symp. Biocomput.
**2012**, 82–93. [Google Scholar] [CrossRef] - Fisher, C.K.; Stultz, C.M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol.
**2011**, 21, 426–431. [Google Scholar] [CrossRef] [Green Version] - Fisher, C.K.; Huang, A.; Stultz, C.M. Modeling intrinsically disordered proteins with bayesian statistics. J. Am. Chem. Soc.
**2010**, 132, 14919–14927. [Google Scholar] [CrossRef] - Bratholm, L.A.; Christensen, A.S.; Hamelryck, T.; Jensen, J.H. Bayesian inference of protein structure from chemical shift data. PeerJ
**2015**, 3, e861. [Google Scholar] [CrossRef] - Potrzebowski, W.; Trewhella, J.; Andre, I. Bayesian inference of protein conformational ensembles from limited structural data. PLOS Comput. Biol.
**2018**, 14, e1006641. [Google Scholar] [CrossRef] - Bonomi, M.; Camilloni, C.; Cavalli, A.; Vendruscolo, M. Metainference: A Bayesian inference method for heterogeneous systems. Sci. Adv.
**2016**, 2, e1501177. [Google Scholar] [CrossRef] [PubMed] - Iešmantavičius, V.; Jensen, M.R.; Ozenne, V.; Blackledge, M.; Poulsen, F.M.; Kjaergaard, M. Modulation of the Intrinsic Helix Propensity of an Intrinsically Disordered Protein Reveals Long-Range Helix–Helix Interactions. J. Am. Chem. Soc.
**2013**, 135, 10155–10163. [Google Scholar] [CrossRef] [PubMed] - Robustelli, P.; Piana, S.; Shaw, D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA
**2018**, 115, E4758–E4766. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Robustelli, P.; Cavalli, A.; Dobson, C.M.; Vendruscolo, M.; Salvatella, X. Folding of Small Proteins by Monte Carlo Simulations with Chemical Shift Restraints without the Use of Molecular Fragment Replacement or Structural Homology. J. Phys. Chem. B
**2009**, 113, 7890–7896. [Google Scholar] [CrossRef] [PubMed] - Esteban-Martín, S.; Fenwick, R.B.; Ådén, J.; Cossins, B.; Bertoncini, C.W.; Guallar, V.; Wolf-Watz, M.; Salvatella, X. Correlated inter-domain motions in adenylate kinase. PLoS Comput. Biol.
**2014**, 10, e1003721. [Google Scholar] [CrossRef] [PubMed] - De Simone, A.; Richter, B.; Salvatella, X.; Vendruscolo, M. Toward an Accurate Determination of Free Energy Landscapes in Solution States of Proteins. J. Am. Chem. Soc.
**2009**, 131, 3810–3811. [Google Scholar] [CrossRef] [Green Version] - Schneider, T.R.; Brünger, A.T.; Nilges, M. Influence of internal dynamics on accuracy of protein NMR structures: Derivation of realistic model distance data from a long molecular dynamics trajectory. J. Mol. Biol.
**1999**, 285, 727–740. [Google Scholar] [CrossRef] - Lindorff-Larsen, K.; Ferkinghoff-Borg, J. Similarity measures for protein ensembles. PLoS ONE
**2009**, 4, e4203. [Google Scholar] [CrossRef] - Camilloni, C.; Robustelli, P.; Simone, A.D.; Cavalli, A.; Vendruscolo, M. Characterization of the Conformational Equilibrium between the Two Major Substates of RNase A Using NMR Chemical Shifts. J. Am. Chem. Soc.
**2012**, 134, 3968–3971. [Google Scholar] [CrossRef] - Lou, H.; Cukier, R.I. Reweighting ensemble probabilities with experimental histogram data constraints using a maximum entropy principle. J. Chem. Phys.
**2018**, 149, 234106. [Google Scholar] [CrossRef] - White, A.D.; Dama, J.F.; Voth, G.A. Designing Free Energy Surfaces That Match Experimental Data with Metadynamics. J. Chem. Theory Comput.
**2015**, 11, 2451–2460. [Google Scholar] [CrossRef] [PubMed] - Marinelli, F.; Faraldo-Gómez, J.D. Ensemble-Biased Metadynamics: A Molecular Simulation Method to Sample Experimental Distributions. Biophys. J.
**2015**, 108, 2779–2782. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods
**2016**, 14, 71–73. [Google Scholar] [CrossRef] [PubMed] - Marsh, J.A.; Singh, V.K.; Jia, Z.; Forman-Kay, J.D. Sensitivity of secondary structure propensities to sequence differences between α- and γ-synuclein: Implications for fibrillation. Protein Sci.
**2006**, 15, 2795–2804. [Google Scholar] [CrossRef] [PubMed] - Camilloni, C.; De Simone, A.; Vranken, W.F.; Vendruscolo, M. Determination of Secondary Structure Populations in Disordered States of Proteins Using Nuclear Magnetic Resonance Chemical Shifts. Biochemistry
**2012**, 51, 2224–2231. [Google Scholar] [CrossRef] [PubMed] - Shen, Y.; Bax, A. SPARTA+: A modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. NMR
**2010**, 48, 13–22. [Google Scholar] [CrossRef] [PubMed] - Li, D.-W.; Brüschweiler, R. PPM: A side-chain and backbone chemical shift predictor for the assessment of protein conformational ensembles. J. Biomol. NMR
**2012**, 54, 257–265. [Google Scholar] [CrossRef] - Kish, L. Survey Sampling; John Wiley & Sons, Inc.: New York, NY, USA, 1965. [Google Scholar]
- Pitera, J.W.; Chodera, J.D. On the Use of Experimental Observations to Bias Simulated Ensembles. J. Chem. Theory Comput.
**2012**, 8, 3445–3451. [Google Scholar] [CrossRef] - Weare, J. On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method. J. Chem. Phys.
**2013**, 138, 084107. [Google Scholar] [CrossRef] [Green Version] - Beauchamp, K.A.; Pande, V.S.; Das, R. Bayesian energy landscape tilting: Towards concordant models of molecular ensembles. Biophys. J.
**2014**, 106, 1381–1390. [Google Scholar] [CrossRef] - Sanchez-Martinez, M.; Crehuet, R. Application of the maximum entropy principle to determine ensembles of intrinsically disordered proteins from residual dipolar couplings. Phys. Chem. Chem. Phys. PCCP
**2014**, 16, 26030–26039. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jones, E.; Oliphant, T.; Peterson, P. SciPy: Open Source Scientific Tools for Python. 2001. Available online: https://www.scipy.org/citing.html#scipy-the-library (accessed on 16 September 2019).
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers
**1983**, 22, 2577–2637. [Google Scholar] [CrossRef] [PubMed] - McGibbon, R.T.; Beauchamp, K.A.; Harrigan, M.P.; Klein, C.; Swails, J.M.; Hernández, C.X.; Schwantes, C.R.; Wang, L.P.; Lane, T.J.; Pande, V.S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J.
**2015**, 109, 1528–1532. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bonomi, M.; Camilloni, C. Integrative structural and dynamical biology with PLUMED-ISDB. Bioinformatics
**2017**, 33, 3999–4000. [Google Scholar] [CrossRef] [PubMed] - Marsh, J.A.; Forman-Kay, J.D. Ensemble modeling of protein disordered states: Experimental restraint contributions and validation. Proteins
**2012**, 80, 556–572. [Google Scholar] [CrossRef] [PubMed] - Richter, B.; Gsponer, J.; Várnai, P.; Salvatella, X.; Vendruscolo, M. The MUMO (minimal under-restraining minimal over-restraining) method for the determination of native state ensembles of proteins. J. Biomol. NMR
**2007**, 37, 117–135. [Google Scholar] [CrossRef] - Rangan, R.; Bonomi, M.; Heller, G.T.; Cesari, A.; Bussi, G.; Vendruscolo, M. Determination of Structural Ensembles of Proteins: Restraining vs. Reweighting. J. Chem. Theory Comput.
**2018**, 14, 6632–6641. [Google Scholar] [CrossRef] [PubMed] - Olsson, S.; Vögeli, B.R.; Cavalli, A.; Boomsma, W.; Ferkinghoff-Borg, J.; Lindorff-Larsen, K.; Hamelryck, T. Probabilistic Determination of Native State Ensembles of Proteins. J. Chem. Theory Comput.
**2014**, 10, 3484–3491. [Google Scholar] [CrossRef] - Hansen, P.; O’Leary, D. The Use of the L-Curve in the Regularization of Discrete Ill-Posed Problems. SIAM J. Sci. Comput.
**1993**, 14, 1487–1503. [Google Scholar] [CrossRef] - Wald, A.; Wolfowitz, J. On a test whether two samples are from the same population. Ann Math Stat.
**1940**, 11, 147–162. [Google Scholar] [CrossRef] - Cesari, A.; Bottaro, S.; Lindorff-Larsen, K.; Banáš, P.; Šponer, J.; Bussi, G. Fitting Corrections to an RNA Force Field Using Experimental Data. J. Chem. Theory Comput.
**2019**, 15, 3425–3431. [Google Scholar] [CrossRef] [PubMed] - Piana, S.; Lindorff-Larsen, K.; Dirks, R.M.; Salmon, J.K.; Dror, R.O.; Shaw, D.E. Evaluating the Effects of Cutoffs and Treatment of Long-range Electrostatics in Protein Folding Simulations. PLoS ONE
**2012**, 7, e39918. [Google Scholar] [CrossRef] - Tian, C.; Kasavajhala, K.; Belfon, K.; Raguette, L.; Huang, H.; Migues, A.; Bickel, J.; Wang, Y.; Pincay, J.; Wu, Q.; et al. ff19SB: Amino-Acid Specific Protein Backbone Parameters Trained Against Quantum Mechanics Energy Surfaces in Solution. ChemRxiv
**2019**. [Google Scholar] [CrossRef] - Boomsma, W.; Ferkinghoff-Borg, J.; Lindorff-Larsen, K. Combining Experiments and Simulations Using the Maximum Entropy Principle. PLoS Comput. Biol.
**2014**, 10, e1003406. [Google Scholar] [CrossRef] [PubMed] - Tamiola, K.; Acar, B.; Mulder, F. Sequence-specific random coil chemical shifts of intrinsically disordered proteins. J. Am. Chem. Soc.
**2010**, 132, 18000–18003. [Google Scholar] [CrossRef] [PubMed] - De Simone, A.; Cavalli, A.; Hsu, S.-T.D.; Vranken, W.; Vendruscolo, M. Accurate random coil chemical shifts from an analysis of loop regions in native states of proteins. J. Am. Chem. Soc.
**2009**, 131, 16332–16333. [Google Scholar] [CrossRef] [PubMed] - Nielsen, J.T.; Mulder, F.A.A. POTENCI: Prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins. J. Biomol. NMR
**2018**, 70, 141–165. [Google Scholar] [CrossRef] - Kjaergaard, M.; Brander, S.; Poulsen, F.M. Random coil chemical shift for intrinsically disordered proteins: Effects of temperature and pH. J. Biomol. NMR
**2011**, 49, 139–149. [Google Scholar] [CrossRef] - Kjaergaard, M.; Poulsen, F.M. Sequence correction of random coil chemical shifts: Correlation between neighbor correction factors and changes in the Ramachandran distribution. J. Biomol. NMR
**2011**, 50, 157–165. [Google Scholar] [CrossRef] - Modig, K.; Jürgensen, V.W.; Lindorff-Larsen, K.; Fieber, W.; Bohr, H.G.; Poulsen, F.M. Detection of initiation sites in protein folding of the four helix bundle ACBP by chemical shift analysis. FEBS Lett.
**2007**, 581, 4965–4971. [Google Scholar] [CrossRef] [Green Version] - Haxholm, G.W.; Nikolajsen, L.F.; Olsen, J.G.; Fredsted, J.; Larsen, F.H.; Goffin, V.; Pedersen, S.F.; Brooks, A.J.; Waters, M.J.; Kragelund, B.B. Intrinsically disordered cytoplasmic domains of two cytokine receptors mediate conserved interactions with membranes. Biochem. J.
**2015**, 468, 495–506. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Amount of helical content in the ACTR trajectories simulated using different force fields. The a99SBdisp is taken as the target ensemble and the C36m and a03ws as the simulated trajectories to reweight.

**Figure 2.**Evolution of ${\chi}_{red}^{2}$ with the effective sample size (N

_{eff}), showing the lack of an L-shaped curve. The a99SBdisp is taken as the target ensemble and the C36m and a03ws as the simulated trajectories to reweight.

**Figure 3.**CS difference between the target CS and the a03ws CS for the three atoms used (C, CA, and CB) compared to the difference in helicity for these two ensembles (in red). Although for many residues the difference lies below the error of the predictor, for some regions it does not. Besides, the helicity is correlated with the CA and the CB CS difference.

**Figure 4.**Behaviour of different quantities during the reweighting procedure of the a03ws ensemble. The quantities defined as thin lines would not be measurable in a real case scenario, but their behaviour can be inferred from the quantities in thick lines.

**Figure 5.**α-Helical content for the reweighted and target ensembles. For the reweighted ensembles the train (a03ws-t) and validation (a03ws-v) sets are shown. The original ensemble before the reweighting is also shown.

**Figure 6.**α-Helical content for the reweighted and target ensembles. For the reweighted ensembles the train (a99SBdisp-t) and validation (a99SBdisp-v) sets are shown. The original ensemble before the reweighting is also shown and, as expected, it corresponds exactly to the target ensemble as they are the same.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Crehuet, R.; Buigues, P.J.; Salvatella, X.; Lindorff-Larsen, K.
Bayesian-Maximum-Entropy Reweighting of IDP Ensembles Based on NMR Chemical Shifts. *Entropy* **2019**, *21*, 898.
https://doi.org/10.3390/e21090898

**AMA Style**

Crehuet R, Buigues PJ, Salvatella X, Lindorff-Larsen K.
Bayesian-Maximum-Entropy Reweighting of IDP Ensembles Based on NMR Chemical Shifts. *Entropy*. 2019; 21(9):898.
https://doi.org/10.3390/e21090898

**Chicago/Turabian Style**

Crehuet, Ramon, Pedro J. Buigues, Xavier Salvatella, and Kresten Lindorff-Larsen.
2019. "Bayesian-Maximum-Entropy Reweighting of IDP Ensembles Based on NMR Chemical Shifts" *Entropy* 21, no. 9: 898.
https://doi.org/10.3390/e21090898