# Principled Decision-Making Workflow with Hierarchical Bayesian Models of High-Throughput Dose-Response Measurements

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- An overview of the model and data to orient the reader (Section 2).
- Steps taken to validate correctness of the hierarchical Bayesian model (Section 3).
- An outline of how Bayesian posteriors can be used for principled decisions (Section 4).
- Further discussion of the advantages of hierarchical models, as well as limitations of this specific implementation (Section 5).
- Concluding thoughts on the promise of hierarchical Bayesian estimation in high-throughput assays (Section 6).

## 2. Model and Data Overview

#### 2.1. Base Model Definition

#### 2.2. Data Characteristics and Summary Statistics

- The species and extraction method that a protein was isolated from;
- The protein ID (a unique identifier encompassing its gene name);
- The gene from which the protein was expressed;
- The temperature at which an observation was taken;
- The fold change of the detected stable protein at that temperature, relative to the level at the lowest measured temperature.

## 3. Validation of Model Implementation

#### 3.1. Melting Curves

#### 3.2. Posterior Variance

## 4. Principled Bayesian Decision-Making in High-Throughput Settings

- Which samples need higher-quality confirmatory measurements?
- Which samples should we take forward for further investigation in other measurement modes?

#### 4.1. Acquiring Informative Measurements

#### 4.2. Confirming Optimal Measurements

#### 4.3. Prioritizing Samples for Further Modification

## 5. Discussion

#### 5.1. Hierarchical Bayesian Methods Enable Reasonable Estimates Where Separate Curve-Fitting Fails to Provide One

#### 5.2. Limitations of Our Model and Inferential Procedure

## 6. Conclusions: The Promise of Hierarchical Bayesian Models in High-Throughput Biological Measurements

## 7. Materials and Methods

#### 7.1. Hierarchical Bayesian Estimation Model

#### 7.2. High-Throughput Measurement Data

#### 7.3. Separate Curve-Fitting

#### 7.4. Posterior Curves

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ADVI | Automatic Differentiation Variational Inference |

## References

- Zhang, J.; Chung, T.; Oldenburg, K. A Simple Statistical Parameter for Use in Evaluation and Validation of high-throughput Screening Assays. J. Biomol. Screen.
**1999**, 4, 67–73. [Google Scholar] [CrossRef] [PubMed] - Sui, Y.; Wu, Z. Alternative statistical parameter for high-throughput screening assay quality assessment. J. Biomol. Screen.
**2007**, 12, 229–234. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Malo, N.; Hanley, J.; Cerquozzi, S.; Pelletier, J.; Nadon, R. Statistical practice in high-throughput screening data analysis. Nat. Biotechnol.
**2006**, 24, 167–175. [Google Scholar] [CrossRef] [PubMed] - Wilson, A.; Reif, D.M.; Reich, B.J. Hierarchical dose–response modeling for high-throughput toxicity screening of environmental chemicals. Biometrics
**2014**, 70, 237–246. [Google Scholar] [CrossRef] [PubMed] - Shterev, I.D.; Dunson, D.B.; Chan, C.; Sempowski, G.D. Bayesian multi-plate high-throughput screening of compounds. Sci. Rep.
**2018**, 8, 9551. [Google Scholar] [CrossRef] - Jensen, S.T.; Shirley, K.E.; Wyner, A.J. Bayesball: A Bayesian hierarchical model for evaluating fielding in major league baseball. Ann. Appl. Stat.
**2009**, 3, 491–520. [Google Scholar] [CrossRef] - Ahn, W.Y.; Krawitz, A.; Kim, W.; Busemeyer, J.R.; Brown, J.W. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. Decision
**2013**, 1, 8–23. [Google Scholar] [CrossRef] [Green Version] - Gustafson, P. Large hierarchical Bayesian analysis of multivariate survival data. Biometrics
**1997**, 53, 230–242. [Google Scholar] [CrossRef] [PubMed] - Tonkin-Hill, G.; Lees, J.A.; Bentley, S.D.; Frost, S.D.; Corander, J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res.
**2019**, 47, 5539–5549. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Messner, M.J.; Chappell, C.L.; Okhuysen, P.C. Risk assessment for Cryptosporidium: A hierarchical Bayesian analysis of human dose response data. Water Res.
**2001**, 35, 3934–3940. [Google Scholar] [CrossRef] - Kruschke, J. Bayesian estimation supersedes the t test. J. Exp. Psychol. Gen.
**2013**, 142, 573–603. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jarzab, A.; Kurzawa, N.; Hopf, T.; Moerch, M.; Zecha, J.; Leijten, N.; Bian, Y.; Musiol, E.; Maschberger, M.; Stoehr, G.; et al. Meltome atlas-thermal proteome stability across the tree of life. Nat. Methods
**2020**, 17, 495–503. [Google Scholar] [CrossRef] [PubMed] - Savitski, M.M.; Reinhard, F.B.; Franken, H.; Werner, T.; Savitski, M.F.; Eberhard, D.; Molina, D.M.; Jafari, R.; Dovega, R.B.; Klaeger, S.; et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science
**2014**, 346, 1255784. [Google Scholar] [CrossRef] [Green Version] - Schafer, J.L. Multiple imputation: A primer. Stat. Methods Med. Res.
**1999**, 8, 3–15. [Google Scholar] [CrossRef] [PubMed] - Kucukelbir, A.; Tran, D.; Ranganath, R.; Gelman, A.; Blei, D.M. Automatic Differentiation Variational Inference. J. Mach. Learn. Res.
**2017**, 18, 1–45. [Google Scholar] - Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc.
**2017**, 112, 859–877. [Google Scholar] [CrossRef] [Green Version] - Brookes, D.; Park, H.; Listgarten, J. Conditioning by adaptive sampling for robust design. In Proceedings of Machine Learning Research; PMLR: Long Beach, CA, USA, 2019; Volume 97, pp. 773–782. [Google Scholar]
- Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci.
**2016**, 2, e55. [Google Scholar] [CrossRef] [Green Version] - Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods
**2020**, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]

**Figure 1.**Example estimated melting curves against original measurement data for three species (by row). Blue figures are curves from proteins that had the lowest variance in estimated melting temperatures for each species. Yellow figures are curves from proteins for which melting-points are not obvious from the data and did not have an assigned melting temperature, but nonetheless could plausibly be assigned one. Red figures are curves from proteins that exhibited the highest variance in the estimated melting temperature for each species. Dotted lines indicate separate curve fittings using SciPy’s curve-fitting facilities (Materials and Methods); cases where errors were raised in curve-fittings are omitted.

**Figure 2.**Proteins with greater discrepancies between the two methods for their estimated melting-points also had greater uncertainties for their Bayesian estimated melting-points. Residuals were calculated by taking the Bayesian estimated melting-point minus the separate curve estimated melting-point.

**Figure 3.**Imputed melting temperatures have higher uncertainty than non-imputed melting temperatures.

**Figure 4.**Probabilistic decision-making framework leveraging posterior distributions. (

**a**) In choosing the next most informative re-measurement, we would suggest taking the blue sample because it has the highest uncertainty. (

**b**) In choosing samples for confirmatory measurements that are above a threshold value defined a priori, we would suggest taking the red sample because it has the highest probability of being greater than a threshold value. (

**c**) To decide which samples to use as a base for further modification towards extreme values, we would calculate the probability of superiority between all pairs of samples and identify the one that has the highest probability.

Run Name | Mean | StDev | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|---|

A. thaliana seedling lysate | 43.8 | 1.5 | 34.6 | 43.1 | 43.9 | 44.5 | 49.6 |

B. subtilis lysate | 43.7 | 3.0 | 36.8 | 41.9 | 43.8 | 44.8 | 58.4 |

C. elegans lysate | 44.0 | 3.5 | 34.2 | 42.2 | 44.5 | 45.4 | 57.6 |

D. melanogaster lysate | 43.3 | 2.6 | 39.2 | 41.9 | 42.5 | 43.7 | 54.6 |

E. coli cells | 54.1 | 3.5 | 45.4 | 51.8 | 53.8 | 55.7 | 67.1 |

E. coli lysate | 55.2 | 4.6 | 45.9 | 51.8 | 54.1 | 57.9 | 67.3 |

G. stearothermophilus lysate | 81.9 | 5.7 | 59.7 | 77.7 | 81.3 | 85.8 | 97.3 |

M. musculus BMDC lysate | 49.4 | 2.0 | 44.0 | 48.2 | 49.4 | 50.7 | 60.0 |

M. musculus liver lysate | 51.0 | 2.2 | 44.4 | 49.7 | 51.0 | 51.8 | 64.1 |

O. antarctica lysate | 48.8 | 4.5 | 36.5 | 46.0 | 47.5 | 51.3 | 63.7 |

P. torridus lysate | 72.9 | 3.6 | 65.2 | 70.5 | 72.2 | 74.5 | 83.6 |

S. cerevisiae lysate | 47.1 | 2.3 | 40.9 | 45.7 | 46.8 | 48.4 | 55.4 |

T. thermophilus cells | 108.5 | 8.5 | 80.7 | 104.8 | 110.3 | 114.5 | 125.0 |

T. thermophilus lysate | 107.3 | 8.4 | 79.2 | 101.8 | 109.0 | 113.4 | 125.0 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ma, E.J.; Kummer, A.
Principled Decision-Making Workflow with Hierarchical Bayesian Models of High-Throughput Dose-Response Measurements. *Entropy* **2021**, *23*, 727.
https://doi.org/10.3390/e23060727

**AMA Style**

Ma EJ, Kummer A.
Principled Decision-Making Workflow with Hierarchical Bayesian Models of High-Throughput Dose-Response Measurements. *Entropy*. 2021; 23(6):727.
https://doi.org/10.3390/e23060727

**Chicago/Turabian Style**

Ma, Eric J., and Arkadij Kummer.
2021. "Principled Decision-Making Workflow with Hierarchical Bayesian Models of High-Throughput Dose-Response Measurements" *Entropy* 23, no. 6: 727.
https://doi.org/10.3390/e23060727