#
Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification^{ †}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Method

#### 2.1. Bayesian Hierarchical Models with a Latent Function

#### 2.2. Latent Function Inference with SELFI

#### 2.3. Check for Model Misspecification

#### 2.4. Score Compression and Simulation-Based Inference

## 3. Lotka–Volterra BHM

#### 3.1. Lotka–Volterra Solver

#### 3.2. Lotka–Volterra Observer

#### 3.2.1. Full Data Model

**Signal.**The (unobserved) signal ${s}_{z}$ is a delayed and non-linearly perturbed observation of the true population function for species $z\in \left(\right)open="\{"\; close="\}">x,y$, modulated by some seasonal efficiency ${e}_{z}\left(t\right)$. Formally, ${s}_{x}\left(0\right)={x}_{0}$, ${s}_{y}\left(0\right)={y}_{0}$, and for $i\in \u301a0,S/2-1\u301b$,

**Noise.**The signal ${s}_{z}$ is subject to additive noise, giving a noisy signal ${u}_{z}\left(t\right)={s}_{z}\left(t\right)+{n}_{z}^{\mathrm{D}}\left(t\right)+{n}_{z}^{\mathrm{O}}\left(t\right)$, where the noise has two components:

- Demographic Gaussian noise with zero mean and variance proportional to the true underlying population, i.e., ${n}_{x}^{\mathrm{D}}\left(t\right)\u293aG\left(\right)open="["\; close="]">0,rx\left(t\right)$ and ${n}_{y}^{\mathrm{D}}\left(t\right)\u293aG\left(\right)open="["\; close="]">0,ry\left(t\right)$. The parameter r gives the strength of demographic noise.
- Observational Gaussian noise that accounts for observer efficiency, coupling prey and predators such that$$\left(\begin{array}{c}{n}_{x}^{\mathrm{O}}\left(t\right)\\ {n}_{y}^{\mathrm{O}}\left(t\right)\end{array}\right)\u293aG\left(\right)open="["\; close="]">\left(\begin{array}{c}0\\ 0\end{array}\right),s\left(\begin{array}{cc}y\left(t\right)& t\sqrt{x\left(t\right)y\left(t\right)}\\ t\sqrt{x\left(t\right)y\left(t\right)}& x\left(t\right)\end{array}\right)$$The parameter s gives the overall amplitude of observational noise, and the parameter t controls the strength of the non-diagonal component (it should be chosen such that the covariance matrix appearing in Equation (18) is positive semi-definite).

**Censoring.**Finally, observed data are a censored and thresholded version of the noisy signal: for each timestep ${t}_{i}$, ${\mathsf{\Phi}}_{z}\left({t}_{i}\right)={m}_{z}\left({t}_{i}\right)\times min\left(\right)open="["\; close="]">{u}_{z}\left({t}_{i}\right),{M}_{z}$, where ${M}_{z}$ is the maximum number of prey or predators that can be detected by the observer, and ${m}_{z}$ is a mask (taking either the value 0 or 1). Masked data points are discarded. The data vector is $\mathsf{\Phi}=\left(\right)open="\{"\; close="\}">\left(\right)open="\{"\; close="\}">{\mathsf{\Phi}}_{x}\left({t}_{i}\right)$. It contains $P\le S$ elements depending on the number of masked timesteps for each species z (formally, $P={\sum}_{i=0}^{S/2-1}\left(\right)open="("\; close=")">{\mathsf{\delta}}_{\mathrm{K}}^{{m}_{x}\left({t}_{i}\right),1}+{\mathsf{\delta}}_{\mathrm{K}}^{{m}_{y}\left({t}_{i}\right),1}$, where ${\mathsf{\delta}}_{\mathrm{K}}$ is a Kronecker delta symbol).

#### 3.2.2. Simplified Data Model

## 4. Results

#### 4.1. Inference of Population Functions with SELFI

#### 4.2. Check for Model Misspecification

#### 4.3. Score Compression

#### 4.4. Inference of Parameters Using Likelihood-Free Rejection Sampling

## 5. Conclusions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Leclercq, F.; Enzi, W.; Jasche, J.; Heavens, A. Primordial power spectrum and cosmology from black-box galaxy surveys. Mon. Not. R. Astron. Soc.
**2019**, 490, 4237–4253. [Google Scholar] [CrossRef] - Rousset, F. Inferences from Spatial Population Genetics. In Handbook of Statistical Genetics; John Wiley & Sons, Ltd.: London, UK, 2007; Chapter 28; pp. 945–979. [Google Scholar] [CrossRef]
- Alsing, J.; Wandelt, B. Generalized massive optimal data compression. Mon. Not. R. Astron. Soc. Lett.
**2018**, 476, L60–L64. [Google Scholar] [CrossRef][Green Version] - Papamakarios, G.; Murray, I. Fast ϵ-free Inference of Simulation Models with Bayesian Conditional Density Estimation. In Advances in Neural Information Processing Systems 29: Proceedings of the 30th International Conference on Neural Information Processing Systems, 5–10 December 2016, Barcelona, Spain; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 1036–1044. [Google Scholar]
- Alsing, J.; Wandelt, B.; Feeney, S. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. Mon. Not. R. Astron. Soc.
**2018**, 477, 2874–2885. [Google Scholar] [CrossRef][Green Version] - Gutmann, M.U.; Corander, J. Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models. J. Mach. Learn. Res.
**2016**, 17, 1–47. [Google Scholar] - Leclercq, F. Bayesian optimization for likelihood-free cosmological inference. Phys. Rev. D
**2018**, 98, 063511. [Google Scholar] [CrossRef][Green Version] - Thomas, O.; Pesonen, H.; Sá-Leão, R.; de Lencastre, H.; Kaski, S.; Corander, J. Split-BOLFI for misspecification-robust likelihood free inference in high dimensions. arXiv, 2020; arXiv:2002.09377v1. [Google Scholar]
- Beaumont, M.A. Approximate Bayesian Computation. Annu. Rev. Stat. Its Appl.
**2019**, 6, 379–403. [Google Scholar] [CrossRef]

**Figure 1.**selfi inference of the population function $\mathsf{\theta}$ given the observed data ${\mathsf{\Phi}}_{\mathrm{O}}$, used as a check for model misspecification.

**Left panels**: the prior mean and expansion point ${\mathsf{\theta}}_{0}$ and the effective posterior mean $\mathsf{\gamma}$ are represented as yellow and green/red lines, respectively, with their $2\sigma $ credible intervals. For comparison, simulations $T\left(\mathsf{\omega}\right)$ with $\mathsf{\omega}\u293aP\left(\mathsf{\omega}\right)$, and the ground truth ${\mathsf{\theta}}_{\mathrm{gt}}$ are shown in grey and blue, respectively.

**Middle and right panels**: the prior covariance matrix $\mathbf{S}$ and the posterior covariance matrix $\mathsf{\Gamma}$, respectively. The first row corresponds to model A (see Section 3.2.1) and the second row to model B (see Section 3.2.2).

**Figure 2.**Simulation-based inference of the Lotka–Volterra parameters $\mathsf{\omega}=(\alpha ,\beta ,\gamma ,\delta )$ given the compressed observed data ${\tilde{\mathsf{\omega}}}_{\mathrm{O}}$. Plots in the lower corner show two-dimensional marginals of the prior $P\left(\mathsf{\omega}\right)$ (yellow contours) and of the SBI posterior $P\left(\mathsf{\omega}\right|{\tilde{\mathsf{\omega}}}_{\mathrm{O}})$ (green contours), using a threshold $\epsilon =2$ on the Fisher–Rao distance between simulated $\tilde{\mathsf{\omega}}$ and observed ${\tilde{\mathsf{\omega}}}_{\mathrm{O}}$, ${d}_{\mathrm{FR}}(\tilde{\mathsf{\omega}},{\tilde{\mathsf{\omega}}}_{\mathrm{O}})$. Contours show 1, 2, and $3\sigma $ credible regions. Plots on the diagonal show one-dimensional marginal distributions of the parameters, using the same color scheme. Dotted and dashed lines denote the position of the fiducial point for score compression ${\mathsf{\omega}}_{0}$ and of the ground truth parameters ${\mathsf{\omega}}_{\mathrm{gt}}$, respectively. The scatter plots in the upper corner illustrate score compression for pairs of parameters. There, red dots represent some simulated samples. Larger dots show some accepted samples (i.e., for which ${d}_{\mathrm{FR}}(\tilde{\mathsf{\omega}},{\tilde{\mathsf{\omega}}}_{\mathrm{O}})<\epsilon $), with a color map corresponding to the value of one component of $\tilde{\mathsf{\omega}}$. In the color bars, pink lines denote the mean and $1\sigma $ scatter among accepted samples of the component of $\tilde{\mathsf{\omega}}$, and the orange line denotes its value in ${\tilde{\mathsf{\omega}}}_{\mathrm{O}}$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Leclercq, F.
Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification. *Phys. Sci. Forum* **2022**, *5*, 4.
https://doi.org/10.3390/psf2022005004

**AMA Style**

Leclercq F.
Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification. *Physical Sciences Forum*. 2022; 5(1):4.
https://doi.org/10.3390/psf2022005004

**Chicago/Turabian Style**

Leclercq, Florent.
2022. "Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification" *Physical Sciences Forum* 5, no. 1: 4.
https://doi.org/10.3390/psf2022005004