# Fast Compression of MCMC Output

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- burn-in, which allows discarding the b first states;
- thinning, which allows retaining only one out of t (post burn-in) states.

## 2. Control Variates

#### 2.1. Definition

#### 2.2. Control Variates as a Weighting Scheme

#### 2.3. Gradient-Based Control Variates

- The probability density $p\in {C}^{1}(\mathsf{\Omega},\mathbb{R})$ where $\mathsf{\Omega}\subseteq {\mathbb{R}}^{d}$ is an open set;
- Function $\varphi \in {C}^{1}(\mathsf{\Omega},{\mathbb{R}}^{d})$ is such that ${\oint}_{\partial \mathsf{\Omega}}p\left(x\right)\varphi \left(x\right)\xb7n\left(x\right)S\left(dx\right)=0$ where ${\oint}_{\partial \mathsf{\Omega}}$ denotes the integral over the boundary of $\mathsf{\Omega}$, and $S\left(dx\right)$ is the surface element at $x\in \partial \mathsf{\Omega}$.

#### 2.4. MCMC-Based Control Variates

## 3. The Cube Method

#### 3.1. Definitions

#### 3.2. Subsamples as Vertices

#### 3.3. Existence of a Solution

#### 3.4. Flight Phase

Algorithm 1: Flight phase iteration |

Input: $\pi (t-1)$Output: $\pi \left(t\right)$^{1} Sample $u\left(t\right)$ in $kerA$ with ${u}_{k}\left(t\right)=0$ if the k-th component of $\pi (t-1)$ is an integer. ^{2} Compute ${\lambda}_{1}^{\star}$ and ${\lambda}_{2}^{\star}$, the largest values of ${\lambda}_{1}>0$ and ${\lambda}_{2}>0$ such that: $0\le \pi (t-1)+{\lambda}_{1}u\left(t\right)\le 1$ and $0\le \pi (t-1)-{\lambda}_{2}u\left(t\right)\le 1$. ^{3} With probability ${\lambda}_{2}^{\star}/({\lambda}_{1}^{\star}+{\lambda}_{2}^{\star})$, set $\pi \left(t\right)\leftarrow \pi (t-1)+{\lambda}_{1}u\left(t\right)$; otherwise, set $\pi \left(t\right)\leftarrow \pi (t-1)-{\lambda}_{2}u\left(t\right)$. |

#### 3.5. Landing Phase

## 4. Cube Thinning

#### 4.1. First Step: Computing the Weights

#### 4.2. Second Step: Cube Resampling

#### 4.3. Dealing with Weights Outside of $[0,1]$

## 5. Experiments

`BalancedSampling`.

#### 5.1. Evaluation Criteria

#### 5.2. Lotka–Volterra Model

#### 5.3. Truncated Normal

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Details on the Landing Phase

## Appendix B. Estimation of the Energy Distance

## References

- Robert, C.P.; Casella, G. Monte Carlo Statistical Methods; Springer: New York, NY, USA, 2004. [Google Scholar] [CrossRef][Green Version]
- Geyer, C.J. Practical Markov Chain Monte Carlo. Stat. Sci.
**1992**, 7, 473–483. [Google Scholar] [CrossRef] - Cowles, M.K.; Carlin, B.P. Markov chain Monte Carlo convergence diagnostics: A comparative review. J. Am. Statist. Assoc.
**1996**, 91, 883–904. [Google Scholar] [CrossRef] - Mak, S.; Joseph, V.R. Support points. Ann. Stat.
**2018**, 46, 2562–2592. [Google Scholar] [CrossRef] - Riabiz, M.; Chen, W.; Cockayne, J.; Swietach, P.; Niederer, S.A.; Mackey, L.; Oates, C.J. Optimal Thinning of MCMC Output. arXiv
**2020**, arXiv:2005.03952. [Google Scholar] - Deville, J.C. Efficient balanced sampling: The cube method. Biometrika
**2004**, 91, 893–912. [Google Scholar] [CrossRef][Green Version] - Dellaportas, P.; Kontoyiannis, I. Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**2011**, 74, 133–161. [Google Scholar] [CrossRef] - Glasserman, P. Monte Carlo Methods in Financial Engineering; Springer: New York, NY, USA, 2004; Volume 53, pp. xiv+596. [Google Scholar]
- Owen, A.B. Monte Carlo Theory, Methods and Examples; in progress; 2013; Available online: https://statweb.stanford.edu/~owen/mc/ (accessed on 2 August 2021).
- Mira, A.; Solgi, R.; Imparato, D. Zero variance Markov chain Monte Carlo for Bayesian estimators. Stat. Comput.
**2013**, 23, 653–662. [Google Scholar] [CrossRef][Green Version] - Oates, C.J.; Girolami, M.; Chopin, N. Control functionals for Monte Carlo integration. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**2016**, 79, 695–718. [Google Scholar] [CrossRef][Green Version] - Hammer, H.; Tjelmeland, H. Control variates for the Metropolis-Hastings algorithm. Scand. J. Stat.
**2008**, 35, 400–414. [Google Scholar] [CrossRef] - Mijatović, A.; Vogrinc, J. On the Poisson equation for Metropolis-Hastings chains. Bernoulli
**2018**, 24, 2401–2428. [Google Scholar] [CrossRef][Green Version] - Chauvet, G.; Tillé, Y. A fast algorithm for balanced sampling. Comput. Stat.
**2006**, 21, 53–62. [Google Scholar] [CrossRef] - Brosse, N.; Durmus, A.; Meyn, S.; Moulines, E.; Radhakrishnan, A. Diffusion approximations and control variates for MCMC. arXiv
**2019**, arXiv:1808.01665. [Google Scholar] - Brooks, S.; Gelman, A. Some issues for monitoring convergence of iterative simulations. Comput. Sci. Stat.
**1998**, 1998, 30–36. [Google Scholar] - Székely, G.J.; Rizzo, M.L. A new test for multivariate normality. J. Multivar. Anal.
**2005**, 93, 58–80. [Google Scholar] [CrossRef][Green Version] - Klebanov, L.B. N-Distances and Their Applications; The Karolinum Press, Charles University: Prague, Czech Republic, 2006. [Google Scholar]

**Figure 1.**Lotka–Volterra example: first 5000 weights of the cube methods, based on full (

**top**) or diagonal (

**bottom**) set of covariates.

**Figure 2.**Lotka–Volterra example: box-plots of the kernel Stein discrepency for all the cube method variations, compared with the KSD method for three kernels and the usual thinning method (horizontal lines).

**Top**: $M=100$.

**Bottom**: $M=1000$. (In the top plot, standard thinning is omitted to improve clarity, as corresponding value is too high.)

**Figure 3.**Lotka–Volterra example: box-plots of the star discrepency for all the cube method variations, compared with the KSD method for three kernels and the usual thinning method (horizontal lines).

**Top**: $M=100$.

**Bottom**: $M=1000$.

**Figure 4.**Lotka–Volterra example: boxplots of the energy distance for all the cube method variations, compared with the KSD method for three kernels and the usual thinning method (horizontal lines).

**Top**: $M=100$.

**Bottom**: $M=1000$.

**Figure 5.**Truncated normal example: box-plots over 100 independent replicates of each estimator; see text for more details.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chopin, N.; Ducrocq, G. Fast Compression of MCMC Output. *Entropy* **2021**, *23*, 1017.
https://doi.org/10.3390/e23081017

**AMA Style**

Chopin N, Ducrocq G. Fast Compression of MCMC Output. *Entropy*. 2021; 23(8):1017.
https://doi.org/10.3390/e23081017

**Chicago/Turabian Style**

Chopin, Nicolas, and Gabriel Ducrocq. 2021. "Fast Compression of MCMC Output" *Entropy* 23, no. 8: 1017.
https://doi.org/10.3390/e23081017