# Approximate Bayesian Inference

## Abstract

**:**

## 1. Introduction

## 2. Approximation in the Modelization

## 3. Approximation in the Computations

#### 3.1. Non-Exact Monte Carlo Methods

#### 3.2. Asymptotic Approximations

#### 3.3. Approximations via Optimization

## 4. Scope of This Special Issue

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ABC | Approximate Bayesian Computation |

EP | Expectation Propagation |

MALA | Monte Carlo Adjusted Langevin Algorithm |

MCMC | Markov Chain Monte Carlo |

MLE | Maximum Likelihood Estimator |

PAC | Probably Approximately Correct |

VaR | Value at Risk |

## References

- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys.
**1953**, 21, 1087–1092. [Google Scholar] [CrossRef][Green Version] - Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell.
**1984**, 6, 721–741. [Google Scholar] [CrossRef] [PubMed] - Casella, G.; George, E.I. Explaining the Gibbs sampler. Am. Stat.
**1992**, 46, 167–174. [Google Scholar] - Duane, S.; Kennedy, A.D.; Pendleton, B.J.; Roweth, D. Hybrid Monte Carlo. Phys. Lett. B
**1987**, 195, 216–222. [Google Scholar] [CrossRef] - Neal, R. Bayesian Learning for Neural Networks; Springer Lecture Notes in Statistics; Springer: Berlin/Heidelberg, Germany, 1999; Volume 118. [Google Scholar]
- Gilks, W.R.; Roberts, G.O.; Sahu, S.K. Adaptive Markov chain monte carlo through regeneration. J. Am. Stat. Assoc.
**1998**, 93, 1045–1054. [Google Scholar] [CrossRef] - Atchade, Y.; Fort, G.; Moulines, E.; Priouret, P. Adaptive Markov chain Monte Carlo: Theory and methods. In Bayesian Time Series Models; Cambridge University Press: Cambridge, UK, 2011; pp. 32–51. [Google Scholar]
- Roberts, G.O.; Rosenthal, J.S. Examples of adaptive MCMC. J. Comput. Graph. Stat.
**2009**, 18, 349–367. [Google Scholar] [CrossRef] - Besag, J.; Green, P.; Higdon, D.; Mengersen, K. Bayesian Computation and Stochastic Systems. Stat. Sci.
**1995**, 10, 3–41. [Google Scholar] [CrossRef] - Andrieu, C.; De Freitas, N.; Doucet, A.; Jordan, M.I. An introduction to MCMC for machine learning. Mach. Learn.
**2003**, 50, 5–43. [Google Scholar] [CrossRef][Green Version] - Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. (Eds.) Handbook of Markov Chain Monte Carlo; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
- Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Chopin, N.; Gadat, S.; Guedj, B.; Guyader, A.; Vernet, E. On some recent advances on high dimensional Bayesian statistics. ESAIM Proc. Surv.
**2015**, 51, 293–319. [Google Scholar] [CrossRef][Green Version] - Green, P.J.; Łatuszyński, K.; Pereyra, M.; Robert, C.P. Bayesian computation: A summary of the current state, and samples backwards and forwards. Stat. Comput.
**2015**, 25, 835–862. [Google Scholar] [CrossRef][Green Version] - Meyn, S.P.; Tweedie, R.L. Markov Chains and Stochastic Stability; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Douc, R.; Moulines, E.; Priouret, P.; Soulier, P. Markov Chains; Springer: Berlin, Germany, 2018. [Google Scholar]
- Joulin, A.; Ollivier, Y. Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab.
**2010**, 38, 2418–2442. [Google Scholar] [CrossRef] - Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. Trans. ASM J. Basic Eng.
**1960**, 82, 35–45. [Google Scholar] [CrossRef][Green Version] - Doucet, A.; De Freitas, N.; Gordon, N. (Eds.) Sequential Monte Carlo Methods in Practice; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Chopin, N.; Papaspiliopoulos, O. An Introduction to Sequential Monte Carlo; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Naesseth, C.A.; Lindsten, F.; Schön, T.B. Elements of Sequential Monte Carlo. Found. Trends Mach. Learn.
**2019**, 12, 307–392. [Google Scholar] [CrossRef][Green Version] - Bennett, J.; Lanning, S. The Netflix prize. In Proceedings of the KDD Cup and Workshop, Los Gatos, CA, USA, 12 August 2005; pp. 35–38. [Google Scholar]
- Lim, Y.J.; Teh, Y.W. Variational Bayesian approach to movie rating prediction. In Proceedings of the KDD Cup and Workshop, Jose, CA, USA, 12 August 2007; pp. 15–21. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] [PubMed] - Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Mandt, S.; Hoffman, M.D.; Blei, D.M. Stochastic gradient descent as approximate Bayesian inference. J. Mach. Learn. Res.
**2017**, 18, 1–35. [Google Scholar] - Maddox, W.J.; Izmailov, P.; Garipov, T.; Vetrov, D.P.; Wilson, A.G. A simple baseline for Bayesian uncertainty in deep learning. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2019; pp. 13153–13164. [Google Scholar]
- Osawa, K.; Swaroop, S.; Khan, M.E.; Jain, A.; Eschenhagen, R.; Turner, R.E.; Yokota, R. Practical deep learning with Bayesian principles. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 4287–4299. [Google Scholar]
- Neal, R.M. Sampling from multimodal distributions using tempered transitions. Stat. Comput.
**1996**, 6, 353–366. [Google Scholar] [CrossRef] - Friel, N.; Pettitt, A.N. Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2008**, 70, 589–607. [Google Scholar] [CrossRef] - Walker, S.; Hjort, N.L. On Bayesian consistency. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2001**, 63, 811–821. [Google Scholar] [CrossRef] - Grünwald, P.D.; Van Ommen, T. Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Anal.
**2017**, 12, 1069–1103. [Google Scholar] [CrossRef] - Bhattacharya, A.; Pati, D.; Yang, Y. Bayesian fractional posteriors. Ann. Stat.
**2019**, 47, 39–66. [Google Scholar] [CrossRef][Green Version] - Bissiri, P.G.; Holmes, C.C.; Walker, S.G. A general framework for updating belief distributions. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2016**, 78, 1103–1130. [Google Scholar] [CrossRef] [PubMed][Green Version] - Shawe-Taylor, J.; Williamson, R.C. A PAC analysis of a Bayesian estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, Nashville, TN, USA, 6–9 July 1997; pp. 2–9. [Google Scholar]
- McAllester, D.A. Some PAC-Bayesian theorems. Mach. Learn.
**1999**, 37, 355–363. [Google Scholar] [CrossRef][Green Version] - Catoni, O. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning; Monograph Series 56; IMS Lecture Notes: Beachwood, OH, USA, 2007. [Google Scholar]
- Van Erven, T. PAC-Bayes mini-tutorial: A continuous union bound. arXiv
**2014**, arXiv:1405.1580. [Google Scholar] - McAllester, D.A. A PAC-Bayesian tutorial with a dropout bound. arXiv
**2013**, arXiv:1307.2118. [Google Scholar] - Catoni, O. Statistical Learning Theory and Stochastic Optimization: Ecole d’Eté de Probabilités de Saint-Flour XXXI-2001; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Zhang, T. From ϵ-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Stat.
**2006**, 34, 2180–2210. [Google Scholar] [CrossRef] - Grünwald, P.D.; Mehta, N.A. A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity. Conf. Algorithmic Learn.
**2019**, 98, 433–465. [Google Scholar] - Deng, L.; Hinton, G.; Kingsbury, B. New types of deep neural network learning for speech recognition and related applications: An overview. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 8599–8603. [Google Scholar]
- Neyshabur, B.; Bhojanapalli, S.; McAllester, D.; Srebro, N. Exploring generalization in deep learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5947–5956. [Google Scholar]
- Dziugaite, G.K.; Roy, D. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv
**2017**, arXiv:1703.11008. [Google Scholar] - Dziugaite, G.K.; Roy, D. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1377–1386. [Google Scholar]
- Amit, R.; Meir, R. Meta-learning by adjusting priors based on extended PAC-Bayes theory. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 205–214. [Google Scholar]
- Nozawa, K.; Sato, I. PAC-Bayes Analysis of Sentence Representation. arXiv
**2019**, arXiv:1902.04247. [Google Scholar] - Pitas, K. Better PAC-Bayes bounds for deep neural networks using the loss curvature. arXiv
**2019**, arXiv:1909.03009. [Google Scholar] - Rivasplata, O.; Tankasali, V.M.; Szepesvari, C. PAC-Bayes with backprop. arXiv
**2019**, arXiv:1908.07380. [Google Scholar] - Guedj, B. A primer on PAC-Bayesian learning. In Proceedings of the Second Congress of the French Mathematical Society, Lille, France, 4–8 June 2018. [Google Scholar]
- Vovk, V.G. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, Rochester, NY, USA, 6–8 August 1990. [Google Scholar]
- Littlestone, N.; Warmuth, M.K. The weighted majority algorithm. Inf. Comput.
**1994**, 108, 212–261. [Google Scholar] [CrossRef][Green Version] - Cesa-Bianchi, N.; Lugosi, G. Prediction, Learning, and Games; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Besson, R.; Le Pennec, E.; Allassonnière, S. Learning from both experts and data. Entropy
**2019**, 21, 1208. [Google Scholar] [CrossRef][Green Version] - Seldin, Y.; Auer, P.; Shawe-Taylor, J.S.; Ortner, R.; Laviolette, F. PAC-Bayesian analysis of contextual bandits. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 1683–1691. [Google Scholar]
- Bubeck, S.; Cesa-Bianchi, N. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Found. Trends Mach. Learn.
**2012**, 5, 1–122. [Google Scholar] [CrossRef][Green Version] - Leung, G.; Barron, A.R. Information theory and mixing least-squares regressions. IEEE Trans. Inf. Theory
**2006**, 52, 3396–3410. [Google Scholar] [CrossRef] - Jiang, W.; Tanner, M.A. Gibbs posterior for variable selection in high-dimensional classification and data mining. Ann. Stat.
**2008**, 36, 2207–2231. [Google Scholar] [CrossRef] - Dalalyan, A.S.; Tsybakov, A.B. Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. Syst. Sci.
**2012**, 78, 1423–1443. [Google Scholar] [CrossRef] - Suzuki, T. PAC-Bayesian bound for Gaussian process regression and multiple kernel additive model. In Proceedings of the 25th Annual Conference on Learning Theory, Edinburgh, Scotland, 25–27 June 2012; pp. 8.1–8.20. [Google Scholar]
- Dalalyan, A.S.; Salmon, J. Sharp oracle inequalities for aggregation of affine estimators. Ann. Stat.
**2012**, 40, 2327–2355. [Google Scholar] [CrossRef] - Dalalyan, A.S.; Grappin, E.; Paris, Q. On the exponentially weighted aggregate with the Laplace prior. Ann. Stat.
**2018**, 46, 2452–2478. [Google Scholar] [CrossRef][Green Version] - Syring, N.; Hong, L.; Martin, R. Gibbs posterior inference on Value-At-Risk. Scand. Actuar. J.
**2019**, 7, 548–557. [Google Scholar] [CrossRef] - Ermak, D.L. A computer simulation of charged particles in solution. I. Technique and equilibrium properties. J. Chem. Phys.
**1975**, 62, 4189–4196. [Google Scholar] [CrossRef] - Rossky, P.J.; Doll, J.D.; Friedman, H.L. Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys.
**1978**, 69, 4628–4633. [Google Scholar] [CrossRef][Green Version] - Roberts, G.O.; Tweedie, R.L. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli
**1996**, 2, 341–363. [Google Scholar] [CrossRef] - Dalalyan, A.S. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Proceedings of the 2017 Conference on Learning Theory, PMLR, Amsterdam, The Netherlands, 7–10 July 2017; pp. 678–689. [Google Scholar]
- Raginsky, M.; Rakhlin, A.; Telgarsky, M. Non-convex learning via Stochastic Gradient Langevin Dynamics: A nonasymptotic analysis. In Proceedings of the 2017 Conference on Learning Theory, PMLR, Amsterdam, The Netherlands, 7–10 July 2017; pp. 1674–1703. [Google Scholar]
- Cheng, X.; Chatterji, N.S.; Bartlett, P.L.; Jordan, M.I. Underdamped Langevin MCMC: A non-asymptotic analysis. In Proceedings of the 31st Conference on Learning Theory, PMLR, Stockholm, Sweden, 6–9 July 2018; pp. 300–323. [Google Scholar]
- Dalalyan, A.S.; Riou-Durand, L.; Karagulyan, A. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. arXiv
**2019**, arXiv:1906.08530. [Google Scholar] - Durmus, A.; Moulines, E. High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli
**2019**, 25, 2854–2882. [Google Scholar] [CrossRef][Green Version] - Mou, W.; Flammarion, N.; Wainwright, M.J.; Bartlett, P.L. Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity. arXiv
**2019**, arXiv:1907.11331. [Google Scholar] - Andrieu, C.; Roberts, G.O. The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat.
**2009**, 37, 697–725. [Google Scholar] [CrossRef] - Lyne, A.M.; Girolami, M.; Atchadé, Y.; Strathmann, H.; Simpson, D. On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Stat. Sci.
**2015**, 30, 443–467. [Google Scholar] [CrossRef][Green Version] - Vats, D.; Gonçalves, F.; Łatuszyński, K.; Roberts, G.O. Efficient Bernoulli factory MCMC for intractable likelihoods. arXiv
**2020**, arXiv:2004.07471. [Google Scholar] - Korattikara, A.; Chen, Y.; Welling, M. Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 181–189. [Google Scholar]
- Huggins, J.; Campbell, T.; Broderick, T. Coresets for Scalable Bayesian Logistic Regression. In Proceedings of the Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016; pp. 4080–4088. [Google Scholar]
- Quiroz, M.; Kohn, R.; Villani, M.; Tran, M.N. Speeding up MCMC by efficient data subsampling. J. Am. Stat. Assoc.
**2018**, 114, 831–843. [Google Scholar] [CrossRef] - Maire, F.; Friel, N.; Alquier, P. Informed sub-sampling MCMC: Approximate Bayesian inference for large datasets. Stat. Comput.
**2019**, 29, 449–482. [Google Scholar] [CrossRef][Green Version] - Alquier, P.; Friel, N.; Everitt, R.; Boland, A. Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Stat. Comput.
**2016**, 26, 29–47. [Google Scholar] [CrossRef][Green Version] - Medina-Aguayo, F.J.; Lee, A.; Roberts, G.O. Stability of noisy metropolis–hastings. Stat. Comput.
**2016**, 26, 1187–1211. [Google Scholar] [CrossRef] [PubMed][Green Version] - Rudolf, D.; Schweizer, N. Perturbation theory for Markov chains via Wasserstein distance. Bernoulli
**2018**, 24, 2610–2639. [Google Scholar] [CrossRef][Green Version] - Stoehr, J.; Benson, A.; Friel, N. Noisy Hamiltonian Monte Carlo for doubly intractable distributions. J. Comput. Graph. Stat.
**2019**, 28, 220–232. [Google Scholar] [CrossRef] - Bardenet, R.; Doucet, A.; Holmes, C. On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res.
**2017**, 18, 1515–1557. [Google Scholar] - Tavaré, S.; Balding, D.; Griffith, R.; Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics
**1997**, 145, 505–518. [Google Scholar] - Beaumont, M.A.; Zhang, W.; Balding, D.J. Approximate Bayesian computation in population genetics. Genetics
**2002**, 162, 2025–2035. [Google Scholar] - Marin, J.-M.; Pudlo, P.; Robert, C.P.; Ryder, R.J. Approximate Bayesian computational methods. Stat. Comput.
**2012**, 22, 1167–1180. [Google Scholar] [CrossRef][Green Version] - Sisson, S.A.; Fan, Y.; Beaumont, M. (Eds.) Handbook of Approximate Bayesian Computation; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Biau, G.; Cérou, F.; Guyader, A. New insights into approximate Bayesian computation. Ann. De L’IHP Probab. Stat.
**2015**, 51, 376–403. [Google Scholar] [CrossRef] - Bernton, E.; Jacob, P.E.; Gerber, M.; Robert, C.P. Approximate Bayesian computation with the Wasserstein distance. J. R. Stat. Soc. Ser. B
**2019**, 81, 235–269. [Google Scholar] [CrossRef] - Buchholz, A.; Chopin, N. Improving approximate Bayesian computation via quasi-Monte Carlo. J. Comput. Graph. Stat.
**2019**, 28, 205–219. [Google Scholar] [CrossRef][Green Version] - Nguyen, H.D.; Arbel, J.; Lü, H.; Forbes, F. Approximate Bayesian computation via the energy statistic. IEEE Access
**2020**, 8, 131683–131698. [Google Scholar] [CrossRef] - Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2009**, 71, 319–392. [Google Scholar] [CrossRef] - Freedman, D. Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters. Ann. Stat.
**1999**, em 27, 1119–1141. [Google Scholar] [CrossRef] - Boucheron, S.; Gassiat, E. A Bernstein-von Mises theorem for discrete probability distributions. Electron. J. Stat.
**2009**, 3, 114–148. [Google Scholar] [CrossRef] - Bickel, P.J.; Kleijn, B.J. The semiparametric Bernstein–von Mises theorem. Ann. Stat.
**2012**, 40, 206–237. [Google Scholar] [CrossRef][Green Version] - Rivoirard, V.; Rousseau, J. Bernstein–von Mises theorem for linear functionals of the density. Ann. Stat.
**2012**, 40, 1489–1523. [Google Scholar] [CrossRef] - Castillo, I.; Nickl, R. On the Bernstein–von Mises phenomenon for nonparametric Bayes procedures. Ann. Stat.
**2014**, 42, 1941–1969. [Google Scholar] [CrossRef][Green Version] - Ghosal, S.; Van der Vaart, A. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
- Watanabe, S. Mathematical Theory of Bayesian Statistics; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Attias, H. Inferring parameters and structure of latent variable models byvariational Bayes. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 30 July–1 August 1999; pp. 21–30. [Google Scholar]
- Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An introduction to variational methods for graphical models. Mach. Learn.
**1999**, 37, 183–233. [Google Scholar] [CrossRef] - Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn.
**2008**, 1, 1–305. [Google Scholar] [CrossRef][Green Version] - Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc.
**2017**, 112, 859–877. [Google Scholar] [CrossRef][Green Version] - Hinton, G.E.; Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, Santa Cruz, CA, USA, 26–28 July 1993; pp. 5–13. [Google Scholar]
- Salter-Townshend, M.; Murphy, T.B. Variational Bayesian inference for the latent position cluster model for network data. Comput. Stat. Data Anal.
**2013**, 57, 661–671. [Google Scholar] [CrossRef] - Braun, M.; McAuliffe, J. Variational inference for large-scale models of discrete choice. J. Am. Stat. Assoc.
**2010**, 105, 324–335. [Google Scholar] [CrossRef][Green Version] - Wu, G. Fast and scalable variational Bayes estimation of spatial econometric models for Gaussian data. Spat. Stat.
**2018**, 24, 32–53. [Google Scholar] [CrossRef] - Baltagi, B.H.; Bresson, G.; Etienne, J.M. Carbon dioxide emissions and economic activities: A mean field variational Bayes semiparametric panel data model with random coefficients. Ann. Econ. Stat.
**2019**, 134, 43–77. [Google Scholar] [CrossRef] - Gefang, D.; Koop, G.; Poon, A. Computationally efficient inference in large Bayesian mixed frequency VARs. Econ. Lett.
**2020**, 191, 109120. [Google Scholar] [CrossRef] - Gunawan, D.; Kohn, R.; Nott, D. Variational Approximation of Factor Stochastic Volatility Models. arXiv
**2020**, arXiv:2010.06738. [Google Scholar] - Hoffman, M.D.; Blei, D.M.; Wang, C.; Paisley, J. Stochastic variational inference. J. Mach. Learn. Res.
**2013**, 14, 1303–1347. [Google Scholar] - Li, X.; Zheng, Y. Patch-based video processing: A variational Bayesian approach. IEEE Trans. Circuits Syst. Video Technol.
**2009**, 19, 27–40. [Google Scholar] - Winn, J.; Bishop, C.M. Variational Message Passing. J. Mach. Learn. Res.
**2005**, 6, 661–694. [Google Scholar] - Broderick, T.; Boyd, N.; Wibisono, A.; Wilson, A.C.; Jordan, M.I. Streaming Variational Bayes. Adv. Neural Inf. Process. Syst.
**2013**, 26, 1727–1735. [Google Scholar] - Khan, M.E.; Lin, W. Conjugate-computation variational inference: Converting variational inference in non-conjugate models to inferences in conjugate models. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 20 April 2017; pp. 878–887. [Google Scholar]
- Domke, J. Provable smoothness guarantees for black-box variational inference. arXiv
**2019**, arXiv:1901.08431. [Google Scholar] - Tran, M.N.; Nott, D.J.; Kohn, R. Variational Bayes with intractable likelihood. J. Comput. Graph. Stat.
**2017**, 26, 873–882. [Google Scholar] [CrossRef] - Alquier, P.; Ridgway, J.; Chopin, N. On the properties of variational approximations of Gibbs posteriors. J. Mach. Learn. Res.
**2016**, 17, 8374–8414. [Google Scholar] - Sheth, R.; Khardon, R. Excess risk bounds for the Bayes risk using variational inference in latent Gaussian models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 5151–5161. [Google Scholar]
- Cottet, V.; Alquier, P. 1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation. Mach. Learn.
**2018**, 107, 579–603. [Google Scholar] [CrossRef] - Wang, Y.; Blei, D.M. Frequentist consistency of variational Bayes. J. Am. Stat. Assoc.
**2019**, 114, 1147–1161. [Google Scholar] [CrossRef][Green Version] - Chérief-Abdellatif, B.-E. Consistency of ELBO maximization for model selection. In Proceedings of the 1st Symposium on Advances in Approximate Bayesian Inference, PMLR, Montreal, QC, Canada, 2 December 2018; pp. 11–31. [Google Scholar]
- Guha, B.S.; Bhattacharya, A.; Pati, D. Statistical Guarantees and Algorithmic Convergence Issues of Variational Boosting. arXiv
**2020**, arXiv:2010.09540. [Google Scholar] - Chérief-Abdellatif, B.-E.; Alquier, P.; Khan, M.E. A Generalization Bound for Online Variational Inference. arXiv
**2019**, arXiv:1904.03920. [Google Scholar] - Alquier, P.; Ridgway, J. Concentration of tempered posteriors and of their variational approximations. Ann. Stat.
**2020**, 48, 1475–1497. [Google Scholar] [CrossRef] - Yang, Y.; Pati, D.; Bhattacharya, A. α-variational inference with statistical guarantees. Ann. Stat.
**2020**, 48, 886–905. [Google Scholar] [CrossRef] - Zhang, F.; Gao, C. Convergence rates of variational posterior distributions. Ann. Stat.
**2020**, 48, 2180–2207. [Google Scholar] [CrossRef] - Chérief-Abdellatif, B.E. Convergence Rates of Variational Inference in Sparse Deep Learning. arXiv
**2019**, arXiv:1908.04847. [Google Scholar] - Nielsen, F. An elementary introduction to information geometry. Entropy
**2020**, 22, 1110. [Google Scholar] [CrossRef] - Li, Y.; Turner, R.E. Rényi divergence variational inference. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 1073–1081. [Google Scholar]
- Dieng, A.B.; Tran, D.; Ranganath, R.; Paisley, J.; Blei, D. Variational inference via χ-upper bound minimization. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 2732–2741. [Google Scholar]
- Geffner, T.; Domke, J. On the Difficulty of Unbiased Alpha Divergence Minimization. arXiv
**2019**, arXiv:2010.09541. [Google Scholar] - Huggins, J.; Kasprzak, M.; Campbell, T.; Broderick, T. Validated Variational Inference via Practical Posterior Error Bounds. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Sicily, Italy, 3 June 2020; pp. 1792–1802. [Google Scholar]
- Reid, M.D.; Frongillo, R.M.; Williamson, R.C.; Mehta, N. Generalized mixability via entropic duality. In Proceedings of the 28th Conference on Learning Theory, Paris, France, 3–6 July 2015; pp. 1501–1522. [Google Scholar]
- Knoblauch, J.; Jewson, J.; Damoulas, T. Generalized variational inference: Three arguments for deriving new posteriors. arXiv
**2019**, arXiv:1904.02063. [Google Scholar] - Alemi, A.A. Variational Predictive Information Bottleneck. In Proceedings of the 2nd Symposium Advances Approximate Bayesian Inference, PMLR, Vancouver, BC, Canada, 8 December 2019; pp. 1–6. [Google Scholar]
- Alquier, P. Non-exponentially weighted aggregation: Regret bounds for unbounded loss functions. arXiv
**2020**, arXiv:2009.03017. [Google Scholar] - Grunwald, P.D.; Dawid, A.P. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. Ann. Stat.
**2004**, 32, 1367–1433. [Google Scholar] [CrossRef][Green Version] - Bégin, L.; Germain, P.; Laviolette, F.; Roy, J.-F. PAC-Bayesian bounds based on the Rényi divergence. In Proceedings of the 19th International Conference Artificial Intelligence and Statistics PMLR, Cadiz, Spain, 9–11 May 2016; pp. 435–444. [Google Scholar]
- Alquier, P.; Guedj, B. Simpler PAC-Bayesian bounds for hostile data. Mach. Learn.
**2018**, 107, 887–902. [Google Scholar] [CrossRef][Green Version] - Minka, T.P. Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001; pp. 362–369. [Google Scholar]
- Minka, T. Divergence Measures and Message Passing; Technical Report; Microsoft Research: Redmond, DC, USA, 2005. [Google Scholar]
- Seeger, M.; Nickisch, H. Fast convergent algorithms for expectation propagation approximate Bayesian inference. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 652–660. [Google Scholar]
- Li, Y.; Hernández-Lobato, J.M.; Turner, R.E. Stochastic expectation propagation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2323–2331. [Google Scholar]
- Dehaene, G.P.; Barthelmé, S. Bounding errors of expectation-propagation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 244–252. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Vehtari, A.; Gelman, A.; Sivula, T.; Jylänki, P.; Tran, D.; Sahai, S.; Blomstedt, P.; Cunningham, J.P.; Schiminovich, D.; Robert, C.P. Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data. J. Mach. Learn. Res.
**2020**, 21, 1–53. [Google Scholar] - Joseph, V.R.; Dasgupta, T.; Tuo, R.; Wu, C. Sequential exploration of complex surfaces using minimum energy designs. Technometrics
**2015**, 57, 64–74. [Google Scholar] [CrossRef] - Liu, Q.; Wang, D. Stein variational gradient descent: A general purpose Bayesian inference algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2378–2386. [Google Scholar]
- Chen, W.Y.; Mackey, L.; Gorham, J.; Briol, F.-X.; Oates, C.J. Stein points. In Proceedings of the 35th International Conference on Machine Learningc PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 843–852. [Google Scholar]
- Chen, W.Y.; Barp, A.; Briol, F.-X.; Gorham, J.; Girolami, M.; Mackey, L.; Oates, C. Stein Point Markov Chain Monte Carlo. In Proceedings of the 36th International Conference on Machine Learningc PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1011–1021. [Google Scholar]
- Kassab, R.; Simeone, O. Federated Generalized Bayesian Learning via Distributed Stein Variational Gradient Descent. arXiv
**2020**, arXiv:2009.06419. [Google Scholar] - Nitanda, A.; Suzuki, T. Stochastic Particle Gradient Descent for Infinite Ensembles. arXiv
**2017**, arXiv:1712.05438. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alquier, P.
Approximate Bayesian Inference. *Entropy* **2020**, *22*, 1272.
https://doi.org/10.3390/e22111272

**AMA Style**

Alquier P.
Approximate Bayesian Inference. *Entropy*. 2020; 22(11):1272.
https://doi.org/10.3390/e22111272

**Chicago/Turabian Style**

Alquier, Pierre.
2020. "Approximate Bayesian Inference" *Entropy* 22, no. 11: 1272.
https://doi.org/10.3390/e22111272