Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation

La, Van N. T.; Minh, David D. L.

doi:10.3390/ijms242015074

Open AccessArticle

Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation

by

Van N. T. La

¹

and

David D. L. Minh

^2,*

¹

Department of Biology, Illinois Institute of Technology, Chicago, IL 60616, USA

²

Department of Chemistry, Illinois Institute of Technology, Chicago, IL 60616, USA

^*

Author to whom correspondence should be addressed.

Int. J. Mol. Sci. 2023, 24(20), 15074; https://doi.org/10.3390/ijms242015074

Submission received: 25 September 2023 / Revised: 4 October 2023 / Accepted: 7 October 2023 / Published: 11 October 2023

(This article belongs to the Section Molecular Biophysics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We compare several different methods to quantify the uncertainty of binding parameters estimated from isothermal titration calorimetry data: the asymptotic standard error from maximum likelihood estimation, error propagation based on a first-order Taylor series expansion, and the Bayesian credible interval. When the methods are applied to simulated experiments and to measurements of Mg(II) binding to EDTA, the asymptotic standard error underestimates the uncertainty in the free energy and enthalpy of binding. Error propagation overestimates the uncertainty for both quantities, except in the simulations, where it underestimates the uncertainty of enthalpy for confidence intervals less than 70%. In both datasets, Bayesian credible intervals are much closer to observed confidence intervals.

Keywords:

Isothermal Titration Calorimetry (ITC); Bayesian Credible Interval (BCI); Confidence Interval (CI); Asymptotic Standard Error (ASE); Maximum Likelihood Estimation (MLE); Error Propagation (EP)

1. Introduction

Isothermal titration calorimetry (ITC) is widely used to characterize binding processes involving biomolecules, including proteins [1], small organic molecules [2], DNA/RNA [3,4], and lipids [5]. ITC data are routinely analyzed to estimate thermodynamic parameters—the Gibbs free energy

Δ G

and the enthalpy

Δ H

—for simple binding processes. Based on the relation

Δ G = Δ H - T Δ S

that includes the temperature T, the entropy

Δ S

may also be obtained. These parameters have often been estimated using what we will refer to as the standard procedure: a nonlinear least squares regression method implemented in the Origin software package that is distributed with the MicroCal VP-ITC instrument and its successors. The software yields a maximum likelihood estimate (MLE) of the parameters and asymptotic standard error (ASE). Unfortunately, the ASE underestimates the uncertainty by as much as an order of magnitude [6]!

The severe underestimation of uncertainty is mainly a consequence of ignoring the error in the titrant concentration. In the standard procedure, the error in the titrand concentration (in the sample cell) is handled by assigning the stoichiometry n as a free parameter. On the other hand, the titrant concentration (in the syringe) is treated as a constant. This assumption is made because ITC data can only be used to estimate the ratio of the titrant:titrand concentrations as opposed to individual values [7]. However, it is a poor assumption, because large errors (10–20%) in titrant concentration between laboratories have been observed [6].

In 2015, Boyce et al. suggested that the uncertainty of the standard procedure could be adjusted based on error propagation [8]. Specifically, based on the Taylor expansion, errors in

K_{a}

,

Δ H

, and the site parameter n may be corrected by the relative error of the titrant concentration. However, they did not show that the resulting uncertainty estimate leads to accurate confidence intervals. Confidence intervals are accurate when the X% confidence interval includes the true value X% of the time, where X is a confidence level. In 2018, Nguyen et al. described the analysis of ITC data with Bayesian regression. They found that Bayesian credible intervals (BCIs)—regions that contain a specified percentage of the Bayesian posterior—are more accurate confidence intervals than those based on the ASE [9]. There was no comparison to confidence intervals based on the ASE augmented with error propagation. The purpose of this short manuscript is to address this oversight.

2. Results and Discussion

2.1. Bayesian Posteriors Are Converged

As in Nguyen et al. [9], the Markov chain Monte Carlo protocol leads to converged BCIs. In a representative run for one of the 1000 simulated integrated heat curves (Figure S1), fewer than 10% of samples are required before estimated percentiles of the posterior density are stable (Figure S2). Comparable convergence behavior is observed when sampling the Bayesian posterior for the ITC experiments (Figure S3).

2.2. Error Propagation Expands Confidence Intervals to Be Larger Than Bayesian Credible Intervals

For each of the 14 experiments, 95% CIs of

Δ G

and

Δ H

are shown in Figure 1. Because the true value of parameters is unknown, the median estimate is shown as a proxy. Panels (a) and (b) reproduce Figure 6 from Nguyen et al. [9]. The 95% CIs of

Δ G

encompass the median value in nearly every experiment. For CIs based on the ASE, 95% CIs of

Δ H

are too small. Panel (c) shows that error propagation increases CIs to encompass the median, but the CIs appear to be larger than necessary.

2.3. Even with Error Propagation, BCIs Provide More Accurate CIs Than the ASE

The accuracy of CIs was more carefully assessed by coverage plots, in which predicted confidence intervals are plotted against the percentage of BCIs and CIs that contain the true values of

Δ G

and

Δ H

. For accurate CIs, points should lie along the diagonal. If points are below the diagonal, CIs are underestimated. If points are above the diagonal, CIs are overestimated.

Coverage plots were generated for 1000 simulations with high error (Figure 2) and low error (Figure S4) and for 14 experiments (Figure 3). In all of the coverage plots, Bayesian credible intervals are closest to the diagonal. As expected, the ASE consistently underestimates confidence intervals. For

Δ G

, the ASE with error propagation overestimates confidence intervals for nearly all confidence levels. For

Δ H

, the story is more subtle. In the simulations at both high and low error, confidence intervals are underestimated for confidence levels less than 70% but somewhat overestimated for higher confidence levels. In the experiments, confidence intervals are overestimated for confidence intervals greater than 30%.

3. Materials and Methods

3.1. Integrated Heat Data

Our data are integrated heat curves,

D \in \{q_{1}, q_{2}, \dots, q_{N}\}

, where

q_{n}

is the integrated heat of injection n. We analyzed simulations as well as ITC experiments that were previously described [9].

Simulations are useful because it is inexpensive to collect large amounts of data and because thermodynamic parameters are known exactly. Simulations of simple 1:1 binding were performed in a similar way as in Nguyen et al. [9]. A total of 1000 integrated heat curves with 24 injections each were modeled based on the free energy of binding

Δ G

, the enthalpy of binding

Δ H

, the enthalpy of dilution and stirring per injection

Δ H_{0}

, the concentration of receptor (titrand)

{[R]}_{0}

, the concentration of ligand (titrant)

{[L]}_{s}

, and the standard deviation of the measurement error

σ

. The thermodynamic parameters and enthalpy of injection were fixed at

Δ G

= −10 kcal/mol,

Δ H

= −5 kcal/mol, and

Δ H_{0}

= 0.5

μ

cal;

{[R]}_{0}

and

{[L]}_{s}

were sampled from lognormal distributions with mean values of 0.1 and 1.0 mM, respectively. Based on the uncertainty of 10% observed by Myszka et al. [6], the variance was set at either small (5% of the mean) or large (10% of the mean). Measurement error was modeled as normally distributed with a zero mean and standard deviation of

σ

= 1

μ

cal.

We also analyzed 14 integrated heat curves from previously performed experiments in which MgCl

_{2}

was titrated into a sample cell containing EDTA in a MicroCal VP-ITC calorimeter [9].

3.2. Regression

Data were analyzed via Bayesian regression and maximum likelihood estimation. In both procedures, integrated heat curves for simple 1:1 binding were modeled as previously described [9]. They are functions of the aforementioned parameters,

\begin{matrix} θ \equiv (Δ G, Δ H, Δ H_{0}, {[R]}_{0}, {[L]}_{s}, σ) . \end{matrix}

(1)

Observed injection heat

q_{n}

was treated as normally distributed about the true heat

q_{n}^{*} (θ)

,

\begin{matrix} q_{n} \sim N (q_{n}^{*} (θ), σ^{2}) . \end{matrix}

(2)

Thus, the likelihood function of an integrated heat curve

D \in \{q_{1}, q_{2}, \dots, q_{N}\}

is

\begin{matrix} p (D | θ) = \frac{1}{{(2 π)}^{N / 2} σ^{N}} exp [- \frac{1}{2 σ^{2}} \sum_{n = 1}^{N} {(q_{n} - q_{n}^{*} (θ))}^{2}] . \end{matrix}

(3)

3.2.1. Bayesian Regression

For the Bayesian regression, the prior of parameters was independent, such that

p (θ) = \prod_{i} p (θ_{i})

. As in Nguyen et al. [9], uniform priors were used for

Δ G

,

Δ H

, and

Δ H_{0}

. Lognormal priors were used for the concentrations of the ligand and the receptor,

\begin{matrix} ln {[X]}_{0} \sim LN ({[X]}_{0}, {(δ {[X]}_{0})}^{2}), \end{matrix}

(4)

where

{[X]}_{0} \in {{[R]}_{0}, {[L]}_{s}}

is the stated value of each quantity;

δ

was assumed to be either 5% or 10%. The uninformative Jeffreys prior was used for

σ

[10]:

\begin{matrix} p (σ) \propto \frac{σ_{0}}{σ}, \end{matrix}

(5)

where

σ_{0} = 1 cal

. For this model, the posterior probability density is

\begin{matrix} p (θ | D) \propto p (D | θ) p (θ), \end{matrix}

(6)

Sampling from the Bayesian posterior was performed using a Markov chain Monte Carlo method, as in Nguyen et al. [9], but with a few small adjustments. Instead of using pymc3, the regression was implemented in numpyro [11,12]. After 2000 warm-up moves, 10,000 (as opposed to 5000 [9]) samples from four chains were stored. The X% BCI of each parameter was calculated based on the smallest interval that contains X% of the posterior samples. Additionally, the uncertainty

δ

was set at either 5% and 10%, as opposed to only 10%.

3.2.2. Maximum Likelihood Estimation

For the MLE, parameter estimates were obtained as

\begin{matrix} \hat{θ} = \arg max_{θ} log p (D | θ) . \end{matrix}

(7)

The covariance matrix of the asymptotic standard error (ASE) was estimated based on the inverse Fisher information matrix,

\begin{matrix} c o v (\hat{θ}) \approx - \frac{1}{N} {[\frac{\partial l o g L_{N}}{\partial θ \partial θ^{⊤}} |_{θ = \hat{θ}}]}^{- 1} . \end{matrix}

(8)

We used scipy.optimize.minimize function from the python package scipy [13] to implement this MLE model, estimate the parameters, and automatically calculate the covariance matrix for parameter uncertainty. The X% CI of each parameter was defined by an interval in which the lower bound was the 1 − X/2 percentile, and the upper bound was the 1 + X/2 percentile of the normal distribution with a mean as the estimated value and standard deviation as the ASE.

3.2.3. Maximum Likelihood Estimation with Error Propagation

We performed error propagation to augment the ASE of MLE parameters based on the formula provided by Boyce et al. [8],

\begin{matrix} {(\frac{s_{θ}}{θ})}^{2} = {(\frac{s_{θ, A S E}}{θ})}^{2} + {(\frac{s_{{[L]}_{s}}}{{[L]}_{S}})}^{2} \end{matrix}

(9)

In this equation, s are standard errors and

θ \in {Δ G, Δ H}

are the parameters affected by the uncertainty of ligand concentration

{[L]}_{S}

;

s_{θ, A S E}

is the ASE,

s_{{[L]}_{s}}

is the standard error in the ligand concentration, and

s_{θ}

is the error estimate of the parameter

θ

that incorporates ligand concentration error. The uncertainty of the ligand concentration

s_{{[L]}_{s}}

can be estimated by another experiment or based on previous estimates. Considering uncertainty in both protein and ligand concentrations, we used either 5% or 10% for both

s_{{[R]}_{0}} / {[R]}_{0}

and

s_{{[L]}_{S}} / {[L]}_{S}

. CIs of this procedure were estimated similarly to the MLE procedure.

4. Conclusions

In both ITC simulations and experiments, BCIs provide more accurate uncertainty estimates for thermodynamic binding parameters than the ASE, without or with error propagation. The ASE underestimates the uncertainties of all datasets. Error propagation overestimates the uncertainties in the experimental dataset, but in simulations it underestimates the uncertainty of enthalpy for confidence intervals less than 70%.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms242015074/s1.

Author Contributions

Conceptualization, D.D.L.M.; methodology, D.D.L.M.; formal analysis, V.N.T.L.; investigation, V.N.T.L.; resources, V.N.T.L. and D.D.L.M.; data curation, V.N.T.L.; writing—original draft preparation, V.N.T.L.; writing—review and editing, D.D.L.M.; visualization, V.N.T.L.; supervision, D.D.L.M.; project administration, D.D.L.M.; funding acquisition, D.D.L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an award entitled “Collaborative Research: CDS&E: Elucidating Binding using Bayesian Inference to Integrate Multiple Data Sources” (#1905324) from the Chemical Measurement and Imaging Program in the Division of Chemistry of the National Science Foundation, for which DDLM is the principal investigator.

Data Availability Statement

All code is freely available at https://github.com/vanngocthuyla/bitc/tree/main/bitc_nls_ep, accessed on 30 July 2023.

Acknowledgments

We thank Joel Tellinghuisen for suggesting this comparison.

Conflicts of Interest

The authors declare no conflict of interest.

References

Leavitt, S.; Freire, E. Direct measurement of protein binding energetics by isothermal titration calorimetry. Curr. Opin. Struct. Biol. 2001, 11, 560–566. [Google Scholar] [CrossRef] [PubMed]
Duff, M.R.; Grubbs, J.; Howell, E.E. Isothermal Titration Calorimetry for Measuring Macromolecule-Ligand Affinity. J. Vis. Exp. 2011, 55, e2796. [Google Scholar] [CrossRef]
Feig, A.L. Chapter 19—Studying RNA–RNA and RNA–Protein Interactions by Isothermal Titration Calorimetry. In Methods in Enzymology; Biophysical, Chemical, and Functional Probes of RNA Structure, Interactions and Folding: Part A; Academic Press: Cambridge, MA, USA, 2009; Volume 468, pp. 409–422. [Google Scholar] [CrossRef]
Malecek, K.; Ruthenburg, A. Chapter Nine—Validation of Histone-Binding Partners by Peptide Pull-Downs and Isothermal Titration Calorimetry. In Methods in Enzymology; Nucleosomes, Histones & Chromatin Part A; Academic Press: Cambridge, MA, USA, 2012; Volume 512, pp. 187–220. [Google Scholar] [CrossRef]
Swamy, M.J.; Sankhala, R.S.; Singh, B.P. Thermodynamic Analysis of Protein-Lipid Interactions by Isothermal Titration Calorimetry. In Methods in Molecular Biology; Humana: New York, NY, USA, 2019; Volume 2003, pp. 71–89. [Google Scholar] [CrossRef]
Myszka, D.G.; Abdiche, Y.N.; Arisaka, F.; Byron, O.; Eisenstein, E.; Hensley, P.; Thomson, J.A.; Lombardo, C.R.; Schwarz, F.; Stafford, W.; et al. The ABRF-MIRG’02 Study: Assembly State, Thermodynamic, and Kinetic Analysis of an Enzyme/Inhibitor Interaction. J. Biomol. Tech. 2003, 14, 247–269. [Google Scholar]
Estelle, A.B.; George, A.; Barbar, E.J.; Zuckerman, D.M. Quantifying cooperative multisite binding in the hub protein LC8 through Bayesian inference. PLoS Comput. Biol. 2023, 19, e1011059. [Google Scholar] [CrossRef]
Boyce, S.E.; Tellinghuisen, J.; Chodera, J.D. Avoiding accuracy-limiting pitfalls in the study of protein-ligand interactions with isothermal titration calorimetry. bioRxiv 2015. [Google Scholar] [CrossRef]
Nguyen, T.H.; Rustenburg, A.S.; Krimmer, S.G.; Zhang, H.; Clark, J.D.; Novick, P.A.; Branson, K.; Pande, V.S.; Chodera, J.D.; Minh, D.D.L. Bayesian analysis of isothermal titration calorimetry for binding thermodynamics. PLoS ONE 2018, 13, e0203224. [Google Scholar] [CrossRef] [PubMed]
Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 1946, 186, 453–461. [Google Scholar] [CrossRef]
Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N.D. Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 2019, 20, 973–978. [Google Scholar] [CrossRef]
Phan, D.; Pradhan, N.; Jankowiak, M. Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro. arXiv 2019. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]

Figure 1. Uncertainty estimates of Mg(II):EDTA dataset. 95% credible intervals estimated from the Bayesian posterior (a), confidence intervals calculated by ASE from nonlinear least squares (b), and confidence intervals calculated by ASE with EP (c) for parameters specifying magnesium binding to EDTA. The median MCMC samples are shown by the vertical green lines. The standard deviations of the lower and upper bounds are denoted as red bars and estimated by bootstrapping.

Figure 2. Uncertainty validation of the simulation dataset at high error of 10%. The predicted rate (%) of CIs containing the true values were plotted against the observed rate (%) for Bayesian credible intervals (blue leftward triangles), nonlinear least squares confidence intervals (red circles), and nonlinear least squares confidence intervals with error propagation (cyan downward triangles). Error bars of Bayesian procedure, which were standard deviations based on 100 bootstrapping samples, were too small to be visible.

Figure 3. Uncertainty validation of Mg(II)-EDTA dataset. The predicted rate (%) of CIs containing the true values were plotted against the observed rate (%) for Bayesian credible intervals (blue leftward triangles), nonlinear least squares confidence intervals (red circles), and nonlinear least squares confidence intervals with error propagation (cyan downward triangles). Error bars of Bayesian procedure were standard deviations based on 100 bootstrapping samples.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

La, V.N.T.; Minh, D.D.L. Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation. Int. J. Mol. Sci. 2023, 24, 15074. https://doi.org/10.3390/ijms242015074

AMA Style

La VNT, Minh DDL. Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation. International Journal of Molecular Sciences. 2023; 24(20):15074. https://doi.org/10.3390/ijms242015074

Chicago/Turabian Style

La, Van N. T., and David D. L. Minh. 2023. "Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation" International Journal of Molecular Sciences 24, no. 20: 15074. https://doi.org/10.3390/ijms242015074

APA Style

La, V. N. T., & Minh, D. D. L. (2023). Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation. International Journal of Molecular Sciences, 24(20), 15074. https://doi.org/10.3390/ijms242015074

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Regression Quantifies Uncertainty of Binding Parameters from Isothermal Titration Calorimetry More Accurately Than Error Propagation

Abstract

1. Introduction

2. Results and Discussion

2.1. Bayesian Posteriors Are Converged

2.2. Error Propagation Expands Confidence Intervals to Be Larger Than Bayesian Credible Intervals

2.3. Even with Error Propagation, BCIs Provide More Accurate CIs Than the ASE

3. Materials and Methods

3.1. Integrated Heat Data

3.2. Regression

3.2.1. Bayesian Regression

3.2.2. Maximum Likelihood Estimation

3.2.3. Maximum Likelihood Estimation with Error Propagation

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI