Extended Variational Message Passing for Automated Approximate Bayesian Inference
Abstract
:1. Introduction
2. Problem Statement
Variational Message Passing on ForneyStyle Factor Graphs
 Choose a variable ${z}_{k}$ from the set ${z}_{1:K}$.
 Compute the incoming messages.$$\begin{array}{cc}\hfill {\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right)& \propto exp\left({\u2329log{f}_{a}\left({z}_{1:k}\right)\u232a}_{q\left({z}_{1:k1}\right)}\right)\hfill \end{array}\phantom{\rule{0ex}{0ex}}\begin{array}{cc}\hfill {\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)& \propto exp\left({\u2329log{f}_{b}\left({z}_{k:K}\right)\u232a}_{q\left({z}_{k+1:K}\right)}\right)\hfill \end{array}$$
 Update the posterior.$$\begin{array}{c}\hfill q\left({z}_{k}\right)=\frac{{\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right){\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)}{\int {\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right){\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)\mathrm{d}{z}_{k}}.\end{array}$$
 Update the local free energy (for performance tracking), i.e., update all terms in $\mathcal{F}$ that are affected by the update (4):$$\begin{array}{c}\hfill {\mathcal{F}}_{k}={\u2329log\frac{q\left({z}_{k}\right)}{{f}_{a}\left({z}_{1:k}\right){f}_{b}\left({z}_{k:K}\right)}\u232a}_{q\left({z}_{1:K}\right)}.\end{array}$$
3. Specification of EVMP Algorithm
3.1. Distribution Types
 (1)
 The standard Exponential Family (EF) of distributions, i.e., the following:$$p\left(z\right)=h\left(z\right)exp\left(\varphi {\left(z\right)}^{\u22ba}\eta {A}_{\eta}\left(\eta \right)\right),$$
 (2)
 Distributions that are of the following exponential form:$$p\left(z\right)\propto exp\left(\varphi {\left(g\left(z\right)\right)}^{\u22ba}\eta \right),$$
 (3)
 A List of Weighted Samples (LWS), i.e., the following:$$p\left(z\right):=\left\{\left({w}^{\left(1\right)},{z}^{\left(1\right)}\right),\dots ,\left({w}^{\left(N\right)},{z}^{\left(N\right)}\right)\right\}\phantom{\rule{0.166667em}{0ex}}.$$
 (4)
 Deterministic relations are represented by delta distributions, i.e., the following:$$p\left(x\rightz)=\delta (xg\left(z\right))\phantom{\rule{0.166667em}{0ex}}.$$Technically, the equality factor $f(x,y,z)=\delta (zx)\delta (zy)$ also specifies a deterministic relation between variables.
3.2. Factor Types
3.3. Message Types
3.4. Posterior Types
3.5. Computation of Posteriors
 (1)
 In the case that the colliding forward and backward messages both carry EF distributions with the same sufficient statistics $\varphi \left(z\right)$, then computing the posterior simplifies to a summation of natural parameters:$$\begin{array}{cc}\hfill {\overrightarrow{m}}_{z}\left(z\right)& \propto exp\left(\varphi {\left(z\right)}^{\u22ba}{\eta}_{1}\right)\hfill \\ \hfill {\overleftarrow{m}}_{z}\left(z\right)& \propto exp\left(\varphi {\left(z\right)}^{\u22ba}{\eta}_{2}\right)\hfill \\ \hfill q\left(z\right)& \propto {\overrightarrow{m}}_{z}\left(z\right)\xb7{\overleftarrow{m}}_{z}\left(z\right)\propto exp\left(\varphi {\left(z\right)}^{\u22ba}({\eta}_{1}+{\eta}_{2})\right).\hfill \end{array}$$In this case, the posterior $q\left(z\right)$ will also be represented by the EF distribution type. This case corresponds to classical VMP with conjugate factor pairs.
 (2)
 The forward message again carries a standard EF distribution. The backward message carries either an NEF distribution or a nonconjugate EF distribution.
 (a)
 If the forward message is Gaussian, i.e., ${\overrightarrow{m}}_{z}\left(z\right)=\mathcal{N}(z;{\mu}_{1},{V}_{1})$, we use a Laplace approximation to compute the posterior:$$\begin{array}{cc}\hfill \mu & =arg\underset{z}{max}\left(log{\overrightarrow{m}}_{z}\left(z\right)+log{\overleftarrow{m}}_{z}\left(z\right)\right),\hfill \\ \hfill V& ={(\nabla {\nabla}_{z}(log{\overrightarrow{m}}_{z}\left(z\right)+log{\overleftarrow{m}}_{z}\left(z\right)){}_{z=\mu})}^{1}\hfill \\ \hfill q\left(z\right)& \propto {\overrightarrow{m}}_{z}\left(z\right)\xb7{\overleftarrow{m}}_{z}\left(z\right)=\mathcal{N}(z;\mu ,V)\hfill \end{array}$$
 (b)
 Otherwise (${\overrightarrow{m}}_{z}\left(z\right)$ is not a Gaussian), we use Importance Sampling (IS) to compute the posterior:$$\begin{array}{cc}\hfill {z}^{\left(1\right)},\dots ,{z}^{\left(N\right)}& \sim {\overrightarrow{m}}_{z}\left(z\right),\hfill \\ \hfill {\tilde{w}}^{\left(i\right)}& ={\overleftarrow{m}}_{z}\left({z}^{\left(i\right)}\right)\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}i=1,\dots ,N\hfill \\ \hfill {w}^{\left(i\right)}& ={\tilde{w}}^{\left(i\right)}/\sum _{j=1}^{N}{\tilde{w}}^{\left(j\right)}\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}i=1,\dots ,N\hfill \\ \hfill q\left(z\right)& \propto {\overrightarrow{m}}_{z}\left(z\right)\xb7{\overleftarrow{m}}_{z}\left(z\right)=\left\{\left({w}^{\left(1\right)},{z}^{\left(1\right)}\right),\dots ,\left({w}^{\left(N\right)},{z}^{\left(N\right)}\right)\right\}.\hfill \end{array}$$
 (3)
 The forward message carries an LWS distribution, i.e., the following:$${\overrightarrow{m}}_{z}\left(z\right):=\left\{\left({w}_{1}^{\left(1\right)},{z}_{1}^{\left(1\right)}\right),\dots ,\left({w}_{1}^{\left(N\right)},{z}_{1}^{\left(N\right)}\right)\right\}\phantom{\rule{0.166667em}{0ex}},$$$$\begin{array}{cc}\hfill {\tilde{w}}^{\left(i\right)}& ={w}_{1}^{\left(i\right)}{\overleftarrow{m}}_{z}\left({z}_{1}^{\left(i\right)}\right)\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}i=1,\dots ,N\hfill \\ \hfill {w}^{\left(i\right)}& ={\tilde{w}}^{\left(i\right)}/\sum _{j=1}^{N}{\tilde{w}}^{\left(j\right)}\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}i=1,\dots ,N\hfill \\ \hfill {z}^{\left(1\right)},\dots ,{z}^{\left(N\right)}& ={z}_{1}^{\left(1\right)},\dots ,{z}_{1}^{\left(N\right)}\hfill \\ \hfill q\left(z\right)& \propto {\overrightarrow{m}}_{z}\left(z\right)\xb7{\overleftarrow{m}}_{z}\left(z\right)=\left\{\left({w}^{\left(1\right)},{z}^{\left(1\right)}\right),\dots ,\left({w}^{\left(N\right)},{z}^{\left(N\right)}\right)\right\}.\hfill \end{array}$$
3.6. Computation of Messages
 (1)
 If factor ${f}_{a}({z}_{1},{z}_{2},\dots ,{z}_{k})$ is a soft factor of the form (see Figure 3a)$$\begin{array}{c}\hfill {f}_{a}\left({z}_{1:k}\right)=p\left({z}_{k}\right{z}_{1:k1})={h}_{a}\left({z}_{k}\right)exp\left({\varphi}_{a}{\left({z}_{k}\right)}^{\u22ba}{\eta}_{a}\left({z}_{1:k1}\right){A}_{a}\left({z}_{1:k1}\right)\right).\end{array}$$$$\begin{array}{cc}\hfill {\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right)& \propto {h}_{a}\left({z}_{k}\right)exp\left({\varphi}_{a}{\left({z}_{k}\right)}^{\u22ba}{\u2329{\eta}_{a}\left({z}_{1:k1}\right)\u232a}_{q\left({z}_{1:k1}\right)}\right).\hfill \end{array}$$If rather ${z}_{1}$ (or ${z}_{2},\dots ,{z}_{k1}$) than ${z}_{k}$ is the output variable of ${f}_{a}$, i.e., if the following is true:$$\begin{array}{c}\hfill {f}_{a}\left({z}_{1:k}\right)=p\left({z}_{1}\right{z}_{2:k})={h}_{a}\left({z}_{1}\right)exp\left({\varphi}_{a}{\left({z}_{1}\right)}^{\u22ba}{\eta}_{a}\left({z}_{2:k}\right){A}_{a}\left({z}_{2:k}\right)\right).\end{array}$$$$\begin{array}{cc}\hfill {\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)& \propto exp\left({\u2329{\varphi}_{a}\left({z}_{1}\right)\u232a}_{q\left({z}_{1}\right)}^{\u22ba}{\u2329{\eta}_{a}\left({z}_{2:k}\right)\u232a}_{q\left({z}_{2:k1}\right)}{\u2329{A}_{a}\left({z}_{2:k}\right)\u232a}_{q\left({z}_{2:k1}\right)}\right).\hfill \end{array}$$In this last expression, we chose to assign a backward arrow to ${\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)$ since it is customary to align the message direction with the direction of the factor, which in this case points to ${z}_{1}$.Note that the message calculation rule for ${\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right)$ requires the computation of expectation ${\u2329{\eta}_{a}\left({z}_{1:k1}\right)\u232a}_{q\left({z}_{1:k1}\right)}$, and for ${\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)$ we need to compute expectations ${\u2329{\varphi}_{a}\left({z}_{1}\right)\u232a}_{q\left({z}_{1}\right)}$ and ${\u2329{\eta}_{a}\left({z}_{2:k}\right)\u232a}_{q\left({z}_{2:k1}\right)}$. In the update rules to be shown below, we will see these expectations of statistics of z appear over and again. In Section 3.8 we detail how we calculate these expectations and in Appendix A, we further discuss the origins of these expectations.
 (2)
 In the case that ${f}_{\delta}$ is a deterministic factor (see Figure 3b):$$\begin{array}{c}\hfill {f}_{\delta}(x,{z}_{1:k})=p\left(x\right{z}_{1:k})=\delta (xg\left({z}_{1:k}\right)).\end{array}$$$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {\overrightarrow{m}}_{x}\left(x\right)=\left\{\left(\frac{1}{N},g\left({z}_{1:k}^{\left(1\right)}\right)\right),\dots ,\left(\frac{1}{N},g\left({z}_{1:k}^{\left(N\right)}\right)\right)\right\},\hfill \\ & \mathrm{where}\phantom{\rule{4.pt}{0ex}}{z}_{j}^{\left(i\right)}\sim {\overrightarrow{m}}_{{z}_{j}}\left({z}_{j}\right)\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}j=1:k.\hfill \end{array}$$For the computation of the backward message toward ${z}_{k}$, we distinguish two cases:
 (a)
 If all forward incoming messages from the variables ${z}_{1:k}$ are Gaussian, we first use a Laplace approximation to obtain a Gaussian joint posterior $q\left({z}_{1:k}\right)=\mathcal{N}({z}_{1:k};{\mu}_{1:k},{V}_{1:k})$; see Appendix B.1.2 and Appendix B.2.2 for details. Then, we evaluate the posteriors for individual random variables, e.g., $q\left({z}_{k}\right)=\int q\left({z}_{1:k}\right)\mathrm{d}{z}_{1:k1}=\mathcal{N}({z}_{k};{\mu}_{k},{V}_{k})$. Finally, we send the following Gaussian backward message:$$\begin{array}{c}\hfill {\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)\propto q\left({z}_{k}\right)/{\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right).\end{array}$$
 (b)
 Otherwise (the incoming messages from the variables ${z}_{1:k}$ are not all Gaussian), we use Monte Carlo and send a message to ${z}_{k}$ as a NEF distribution:$$\begin{array}{c}\hfill {\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)\approx \frac{1}{N}\sum _{i=1}^{N}{\overleftarrow{m}}_{x}\left(g({z}_{1:k1}^{\left(i\right)},{z}_{k})\right),\phantom{\rule{4.pt}{0ex}}\mathrm{where}\phantom{\rule{4.pt}{0ex}}{z}_{j}^{\left(i\right)}\sim {\overrightarrow{m}}_{{z}_{j}}\left({z}_{j}\right).\end{array}$$Note that if ${f}_{\delta}$ is a single input deterministic node, i.e., ${f}_{\delta}(x,{z}_{k})=p\left(x\right{z}_{k})=\delta (xg\left({z}_{k}\right))$, then the backward message simplifies to ${\overleftarrow{m}}_{{z}_{k}}\left({z}_{k}\right)={\overleftarrow{m}}_{x}\left(g\left({z}_{k}\right)\right)$ (Appendix B.1.1).
 (3)
 The third factor type that leads to a special message computation rule is the equality node; see Figure 3c. The outgoing message from an equality node$${f}_{=}(z,{z}^{\prime},{z}^{\u2033})=\delta (z{z}^{\prime})\delta (z{z}^{\u2033})$$$$\begin{array}{cc}\hfill {\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right)& =\int \underset{\mathrm{node}\phantom{\rule{4.pt}{0ex}}\mathrm{function}}{\underset{\u23df}{\delta ({z}_{k}{z}_{k}^{\prime})\delta ({z}_{k}{z}_{k}^{\u2033})}}\underset{\mathrm{incoming}\phantom{\rule{4.pt}{0ex}}\mathrm{messages}}{\underset{\u23df}{{\overrightarrow{m}}_{{z}_{k}^{\prime}}\left({z}_{k}^{\prime}\right){\overrightarrow{m}}_{{z}_{k}^{\u2033}}\left({z}_{k}^{\u2033}\right)}}\mathrm{d}{z}_{k}^{\prime}\mathrm{d}{z}_{k}^{\u2033}\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& ={\overrightarrow{m}}_{{z}_{k}^{\prime}}\left({z}_{k}\right){\overrightarrow{m}}_{{z}_{k}^{\u2033}}\left({z}_{k}\right).\hfill \end{array}$$
3.7. Computation of Free Energy
 If $q\left(z\right)$ is a represented by a standard EF distribution, i.e.,$$q\left(z\right)=h\left(z\right)exp\left(\varphi {\left(z\right)}^{\u22ba}\eta {A}_{\eta}\left(\eta \right)\right),$$$$\begin{array}{c}\hfill {\mathcal{H}}_{z}={\u2329log\left(h\right(z\left)\right)\u232a}_{q\left(z\right)}{\u2329\varphi \left(z\right)\u232a}_{q\left(z\right)}^{\u22ba}\eta +{A}_{\eta}\left(\eta \right).\end{array}$$
 Otherwise, if $q\left(z\right)$ is represented by a LWS, i.e.,$$q\left(z\right):=\left\{\left({w}^{\left(1\right)},{z}^{\left(1\right)}\right),\dots ,\left({w}^{\left(N\right)},{z}^{\left(N\right)}\right)\right\},$$$$\begin{array}{c}\hfill {\mathcal{H}}_{z}={\widehat{\mathcal{H}}}_{z}^{1}+{\widehat{\mathcal{H}}}_{z}^{2},\end{array}$$
3.8. Expectations of Statistics
 (1)
 We have two cases when $q\left(z\right)$ is coded as an EF distribution, i.e.,$$q\left(z\right)=h\left(z\right)exp\left(\varphi {\left(z\right)}^{\u22ba}\eta {A}_{\eta}\left(\eta \right)\right):$$
 (a)
 If $\Phi \left(z\right)\in \varphi \left(z\right)$, i.e., the statistic $\Phi \left(z\right)$ matches with elements of the sufficient statistics vector $\varphi \left(z\right)$, then ${\u2329\Phi \left(z\right)\u232a}_{q\left(z\right)}$ is available in closed form as the gradient of the logpartition function (this is worked out in Appendix A.1.1, see (A14) and (A15)):$${\u2329\Phi \left(z\right)\u232a}_{q\left(z\right)}\in {\nabla}_{\eta}{A}_{\eta}\left(\eta \right).$$
 (b)
 Otherwise ($\Phi \left(z\right)\notin \varphi \left(z\right)$), then we evaluate$${\u2329\Phi \left(z\right)\u232a}_{q\left(z\right)}\approx \frac{1}{N}\sum _{i=1}^{N}\Phi \left({z}^{\left(i\right)}\right)\phantom{\rule{0.166667em}{0ex}},$$
 (2)
 In case $q\left(z\right)$ is represented by a LWS, i.e., the following:$$q\left(z\right)=\left\{\left({w}^{\left(1\right)},{z}^{\left(1\right)}\right),\dots ,\left({w}^{\left(N\right)},{z}^{\left(N\right)}\right)\right\}\phantom{\rule{0.166667em}{0ex}},$$$${\u2329\Phi \left(z\right)\u232a}_{q\left(z\right)}\approx \sum _{i=1}^{N}{w}^{\left(i\right)}\Phi \left({z}^{\left(i\right)}\right)\phantom{\rule{0.166667em}{0ex}}.$$
3.9. PseudoCode for the EVMP Algorithm
Algorithm 1 Extended VMP (Meanfield assumption) 

4. Experiments
4.1. Filtering with the Hierarchical Gaussian Filter
4.2. Parameter Estimation for a Linear Dynamical System
4.3. EVMP for a Switching State Space Model
5. Related Work
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
VMP  Variational Message Passing 
EVMP  Extended Variational Message Passing 
BP  Belief propagation 
EP  Expectation propagation 
FFG  Forneystyle Factor Graph 
EF  Exponential family 
NEF  Nonstandard exponential family 
LWS  List of Weighted Samples 
IS  Importance sampling 
MCMC  Markov Chain Monte Carlo 
HMC  Hamiltonian Monte Carlo 
ADVI  Automatic Differentiation Variational Inference 
PG  Particle Gibbs 
Appendix A. On the Applicability of VMP
 ${f}_{a}\left({z}_{1:k}\right)$ is an element of the exponential family (EF) of distributions, i.e.,$$\begin{array}{cc}\hfill {f}_{a}\left({z}_{1:k}\right)& =p\left({z}_{k}\right{z}_{1:k1})\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& ={h}_{a}\left({z}_{k}\right)exp\left({\varphi}_{a}{\left({z}_{k}\right)}^{\u22ba}{\eta}_{a}\left({z}_{1:k1}\right){A}_{a}\left({z}_{1:k1}\right)\right).\hfill \end{array}$$In this equation, ${h}_{a}\left({z}_{k}\right)$ is a base measure, ${\eta}_{a}\left({z}_{1:k1}\right)$ is a vector of natural (or canonical) parameters, ${\varphi}_{a}\left({z}_{k}\right)$ are the sufficient statistics, and ${A}_{a}\left({z}_{1:k1}\right)$ is the logpartition function, i.e., ${A}_{a}\left({z}_{1:k1}\right)=log\left(\int {h}_{a}\left({z}_{k}\right)exp\left({\varphi}_{a}{\left({z}_{k}\right)}^{\u22ba}{\eta}_{a}\left({z}_{1:k1}\right)\right)d{z}_{k}\right)$. It is always possible to write the logpartition function as a function of natural parameters ${A}_{{\eta}_{a}}\left({\eta}_{a}\right)$, such that ${A}_{{\eta}_{a}}\left({\eta}_{a}\right)={A}_{a}\left({z}_{1:k1}\right)$. Throughout the paper, we sometimes prefer the natural parameter parameterization of the log partition.
 We differentiate a few cases for ${f}_{b}$:
 ${f}_{b}$ is also an element of the EF, given by the following:$$\begin{array}{cc}\hfill {f}_{b}\left({z}_{k:K}\right)& =p\left({z}_{K}\right{z}_{k:K1})\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& ={h}_{b}\left({z}_{K}\right)exp({\varphi}_{b}{\left({z}_{K}\right)}^{\u22ba}{\eta}_{b}\left({z}_{k:K1}\right){A}_{b}\left({z}_{k:K1}\right))\phantom{\rule{0.166667em}{0ex}},\hfill \end{array}$$$$\begin{array}{c}\hfill {f}_{b}({z}_{k},{z}_{k+1:K})={h}_{b}\left({z}_{K}\right)exp({\varphi}_{a}{\left({z}_{k}\right)}^{\u22ba}{\eta}_{ba}\left({z}_{k+1:K}\right)+{c}_{ba}\left({z}_{k+1:K}\right)).\end{array}$$The crucial element of this rewrite is that both ${f}_{a}\left({z}_{1:k}\right)$ and ${f}_{b}\left({z}_{k:K}\right)$ are written as exponential functions of the same sufficient statistics function ${\varphi}_{a}\left({z}_{k}\right)$. This case leads to the regular VMP update equations, see Appendix A.1.Our Extended VMP does not need this assumption and derives approximate VMP update rules for the following extensions.
 ${f}_{b}$ is an element of the EF, but not amenable to the modification given in (A3), i.e., it cannot be written as an exponential function of sufficient statistics ${\varphi}_{a}\left({z}_{k}\right)$. Therefore, ${f}_{b}$ is not a conjugate pair with ${f}_{a}$ for ${z}_{k}$.
 ${f}_{b}\left({z}_{k:K}\right)$ is a composition of a deterministic node with an EF node, see Figure A1. In particular, in this case ${f}_{b}\left({z}_{k:K}\right)$ can be decomposed as follows:$$\begin{array}{cc}\hfill {f}_{b}\left({z}_{k:K}\right)& =\int \delta (xg\left({z}_{k}\right)){f}_{c}(x,{z}_{k+1:K})\mathrm{d}x\hfill \end{array}$$$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& ={f}_{c}(g\left({z}_{k}\right),{z}_{k+1:K}),\hfill \end{array}$$$$\begin{array}{cc}\hfill {f}_{c}(x,{z}_{k+1:K})& =p\left({z}_{K}\rightx,{z}_{k+1:K1})\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& ={h}_{c}\left({z}_{K}\right)exp({\varphi}_{c}{\left({z}_{K}\right)}^{\u22ba}{\eta}_{c}(x,{z}_{k+1:K1}){A}_{c}(x,{z}_{k+1:K1})).\hfill \end{array}$$We assume that the conjugate prior to ${f}_{c}$ for random variable x has sufficient statistics vector ${\widehat{\varphi}}_{c}\left(x\right)$, and hence (A5) can be modified as follows:$$\begin{array}{c}\hfill {f}_{c}(x,{z}_{k+1:K})={h}_{c}\left({z}_{K}\right)exp({\widehat{\varphi}}_{c}{\left(x\right)}^{\u22ba}{\widehat{\eta}}_{c}\left({z}_{k+1:K}\right)+{\widehat{c}}_{c}\left({z}_{k+1:K}\right)),\end{array}$$
Appendix A.1. VMP with Conjugate Soft Factor Pairs
Appendix A.1.1. Messages and Posteriors
Appendix A.1.2. Free Energy
Appendix A.2. VMP with NonConjugate Soft Factor Pairs
Appendix A.3. VMP with Composite Nodes
Appendix B. Derivation of Extended VMP
Appendix B.1. Deterministic Mappings with Single Inputs
Appendix B.1.1. NonGaussian Case
Appendix B.1.2. Gaussian Case
Appendix B.2. Deterministic Mappings with Multiple Inputs
Appendix B.2.1. Monte Carlo Approximation to the Backward Message
Appendix B.2.2. Gaussian Approximation to the Backward Message
Appendix B.3. NonConjugate Soft Factor Pairs
 If ${\overrightarrow{m}}_{{z}_{k}}\left({z}_{k}\right)$ is a Gaussian message, apply Laplace to approximate $q\left({z}_{k}\right)$ with a Gaussian distribution as in (A31a,b).
 Otherwise, use IS as in (A30a,b).
Appendix C. Free Energy Approximation
Appendix D. Implementation Details in ForneyLab
Appendix E. Bonus: Bootstrap Particle Filtering
Appendix F. Illustrative Example
 Initiate $q\left(x\right)$, $q\left(z\right)$ by Normal distributions and $q\left(w\right)$ by an LWS.
 Repeat until convergence the following three steps:
 
 Choose w for updating.
 
 Calculate VMP message ${\overleftarrow{m}}_{w}\left(w\right)$ by (14). In this case,$$\begin{array}{cc}\hfill {\overleftarrow{m}}_{w}\left(w\right)& \propto exp\left({\left[\begin{array}{c}logw\\ w\end{array}\right]}^{\u22ba}\left[\begin{array}{c}0.5\\ \u2329x\u232ay0.5\left(\u2329{x}^{2}\u232a+{y}^{2}\right)\end{array}\right]\right)\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& \propto \mathcal{G}a\left(w;1.5,\u2329x\u232ay+0.5\left(\u2329{x}^{2}\u232a+{y}^{2}\right)\right),\hfill \end{array}$$
 
 Calculate ${\overrightarrow{m}}_{w}\left(w\right)$ by the following (16):$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {\overrightarrow{m}}_{w}\left(w\right)=\left\{\left(\frac{1}{N},exp\left({z}^{\left(1\right)}\right)\right),\dots ,\left(\frac{1}{N},exp\left({z}^{\left(N\right)}\right)\right)\right\},\hfill \\ & \mathrm{where}\phantom{\rule{4.pt}{0ex}}{z}^{\left(i\right)}\sim {\overrightarrow{m}}_{z}\left(z\right)=\mathcal{N}(z;{\mu}_{z},{v}_{z}).\hfill \end{array}$$
 
 Update $q\left(w\right)$ by Section 3.5 rule (3).
 
 Choose z for updating.
 
 Calculate ${\overleftarrow{m}}_{z}\left(z\right)$ by (18), which is a NEF distribution:$$\begin{array}{c}\hfill {\overleftarrow{m}}_{z}\left(z\right)={\overleftarrow{m}}_{w}(exp\left(z\right))\propto exp\left({\left[\begin{array}{c}z\\ exp\left(z\right)\end{array}\right]}^{\u22ba}\left[\begin{array}{c}0.5\\ \u2329x\u232ay0.5\left(\u2329{x}^{2}\u232a+{y}^{2}\right)\end{array}\right]\right).\end{array}$$
 
 The forward message is simply the prior: ${\overrightarrow{m}}_{z}\left(z\right)=\mathcal{N}(z;{\mu}_{z},{v}_{z})$
 
 Update $q\left(z\right)$ by Section 3.5 rule (2)(a).
 
 Choose x for updating.
 
 Calculate VMP message ${\overleftarrow{m}}_{x}\left(x\right)$ by (14). In this case,$$\begin{array}{cc}\hfill {\overleftarrow{m}}_{x}\left(x\right)& \propto exp\left({\left[\begin{array}{c}x\\ {x}^{2}\end{array}\right]}^{\u22ba}\left[\begin{array}{c}\u2329w\u232ay\\ 0.5\u2329w\u232a\end{array}\right]\right)\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& \propto \mathcal{N}\left(x;y,1/\u2329w\u232a\right).\hfill \end{array}$$
 
 The forward message is the prior:$$\begin{array}{c}\hfill {\overrightarrow{m}}_{x}\left(x\right)=\mathcal{N}(x;{\mu}_{x},{v}_{x})\propto exp\left({\left[\begin{array}{c}x\\ {x}^{2}\end{array}\right]}^{\u22ba}\left[\begin{array}{c}{\mu}_{x}/{v}_{x}\\ 0.5/{v}_{x}\end{array}\right]\right).\end{array}$$
 
 Update $q\left(x\right)$ by Section 3.5 rule (1), i.e., the following:$$\begin{array}{cc}\hfill q\left(x\right)& \propto exp\left({\left[\begin{array}{c}x\\ {x}^{2}\end{array}\right]}^{\u22ba}\left[\begin{array}{c}{\mu}_{x}/{v}_{x}+\u2329w\u232ay\\ 0.5(1/{v}_{x}+\u2329w\u232a)\end{array}\right]\right)\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& =\mathcal{N}\left(x;\frac{{\mu}_{x}+\u2329w\u232a{v}_{x}y}{1+\u2329w\u232a{v}_{x}},\frac{{v}_{x}}{1+\u2329w\u232a{v}_{x}}\right).\hfill \end{array}$$
References
 van de Meent, J.W.; Paige, B.; Yang, H.; Wood, F. An Introduction to Probabilistic Programming. arXiv 2018, arXiv:1809.10756. [Google Scholar]
 Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A Probabilistic Programming Language. J. Stat. Softw. 2017, 76, 1–32. [Google Scholar] [CrossRef][Green Version]
 Dillon, J.V.; Langmore, I.; Tran, D.; Brevdo, E.; Vasudevan, S.; Moore, D.; Patton, B.; Alemi, A.; Hoffman, M.; Saurous, R.A. TensorFlow Distributions. arXiv 2017, arXiv:1711.10604. [Google Scholar]
 Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N.D. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res. 2019, 20, 1–6. [Google Scholar]
 Ge, H.; Xu, K.; Ghahramani, Z. Turing: A Language for Flexible Probabilistic Inference. In International Conference on Artificial Intelligence and Statistics; PMLR, 2018; pp. 1682–1690. [Google Scholar]
 Titsias, M.; LázaroGredilla, M. Doubly stochastic variational Bayes for nonconjugate inference. In International Conference on Machine Learning; PMLR, 2014; pp. 1971–1979. [Google Scholar]
 Minka, T.; Winn, J.; Guiver, J.; Zaykov, Y.; Fabian, D.; Bronskill, J. Infer.NET 0.3. 2018. Available online: https://dotnet.github.io/infer/ (accessed on 25 June 2021).
 Cox, M.; van de Laar, T.; de Vries, B. A factor graph approach to automated design of Bayesian signal processing algorithms. Int. J. Approx. Reason. 2019, 104, 185–204. [Google Scholar] [CrossRef][Green Version]
 Winn, J.; Bishop, C.M. Variational message passing. J. Mach. Learn. Res. 2005, 6, 661–694. [Google Scholar]
 Dauwels, J. On Variational Message Passing on Factor Graphs. In Proceedings of the IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 2546–2550. [Google Scholar]
 Tokdar, S.T.; Kass, R.E. Importance sampling: A review. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 54–60. [Google Scholar] [CrossRef]
 Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
 Baydin, A.G.; Pearlmutter, B.A.; Radul, A.A.; Siskind, J.M. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. 2017, 18, 5595–5637. [Google Scholar]
 Bezanson, J.; Karpinski, S.; Shah, V.B.; Edelman, A. Julia: A fast dynamic language for technical computing. arXiv 2012, arXiv:1209.5145. [Google Scholar]
 Loeliger, H.A.; Dauwels, J.; Hu, J.; Korl, S.; Ping, L.; Kschischang, F.R. The factor graph approach to modelbased signal processing. Proc. IEEE 2007, 95, 1295–1322. [Google Scholar] [CrossRef][Green Version]
 Loeliger, H.A. An introduction to factor graphs. IEEE Signal Process. Mag. 2004, 21, 28–41. [Google Scholar] [CrossRef][Green Version]
 Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef][Green Version]
 Dauwels, J.; Korl, S.; Loeliger, H.A. Particle methods as message passing. In Proceedings of the IEEE International Symposium on Information Theory, Seattle, WA, USA, 9–14 July 2006; pp. 2052–2056. [Google Scholar]
 Şenöz, I.; De Vries, B. Online variational message passing in the hierarchical Gaussian filter. In Proceedings of the 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), Aalborg, Denmark, 17–20 September 2018; pp. 1–6. [Google Scholar]
 Mathys, C.D.; Lomakina, E.I.; Daunizeau, J.; Iglesias, S.; Brodersen, K.H.; Friston, K.J.; Stephan, K.E. Uncertainty in perception and the Hierarchical Gaussian Filter. Front. Hum. Neurosci. 2014, 8, 825. [Google Scholar] [CrossRef][Green Version]
 Kucukelbir, A.; Tran, D.; Ranganath, R.; Gelman, A.; Blei, D.M. Automatic differentiation variational inference. J. Mach. Learn. Res. 2017, 18, 430–474. [Google Scholar]
 Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef][Green Version]
 Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
 Ghahramani, Z.; Hinton, G.E. Parameter Estimation for Linear Dynamical Systems; Technical Report CRGTR922; Department of Computer Science, University of Toronto: Toronto, ON, Canada, 1996. [Google Scholar]
 Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, UCL (University College London), London, UK, 2003. [Google Scholar]
 Hoffman, M.D.; Gelman, A. The NoUTurn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
 Ghahramani, Z.; Hinton, G.E. Variational learning for switching statespace models. Neural Comput. 2000, 12, 831–864. [Google Scholar] [CrossRef] [PubMed]
 Neal, R.M. MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo 2011, 2, 113–162. [Google Scholar]
 Betancourt, M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv 2017, arXiv:1701.02434. [Google Scholar]
 Wood, F.; Meent, J.W.; Mansinghka, V. A new approach to probabilistic programming inference. In Artificial Intelligence and Statistics; PMLR, 2014; pp. 1024–1032. [Google Scholar]
 Andrieu, C.; Doucet, A.; Holenstein, R. Particle markov chain monte carlo methods. J. R. Stat. Soc. Ser. B 2010, 72, 269–342. [Google Scholar] [CrossRef][Green Version]
 De Freitas, N.; HøjenSørensen, P.; Jordan, M.I.; Russell, S. Variational MCMC. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 2001; pp. 120–127. [Google Scholar]
 Wexler, Y.; Geiger, D. Importance sampling via variational optimization. In Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence; AUAI Press: Arlington, VA, USA, 2007; pp. 426–433. [Google Scholar]
 Salimans, T.; Kingma, D.; Welling, M. Markov chain monte carlo and variational inference: Bridging the gap. In International Conference on Machine Learning; PMLR, 2015; pp. 1218–1226. [Google Scholar]
 Ye, L.; Beskos, A.; De Iorio, M.; Hao, J. Monte Carlo coordinate ascent variational inference. In Statistics and Computing; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–19. [Google Scholar]
 Särkkä, S. Bayesian Filtering and Smoothing; Cambridge University Press: Cambridge, UK, 2013; Volume 3. [Google Scholar]
 Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
 Gordon, N.J.; Salmond, D.J.; Smith, A.F. Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEEE Proc. Radar Signal Process. 1993, 140, 107–113. [Google Scholar] [CrossRef][Green Version]
 Frank, A.; Smyth, P.; Ihler, A. Particlebased variational inference for continuous systems. Adv. Neural Inf. Process. Syst. 2009, 22, 826–834. [Google Scholar]
 Ihler, A.; McAllester, D. Particle belief propagation. In Artificial Intelligence and Statistics; PMLR, 2009; pp. 256–263. [Google Scholar]
 Wainwright, M.J.; Jaakkola, T.S.; Willsky, A.S. A new class of upper bounds on the log partition function. IEEE Trans. Inf. Theory 2005, 51, 2313–2335. [Google Scholar] [CrossRef]
 Saeedi, A.; Kulkarni, T.D.; Mansinghka, V.K.; Gershman, S.J. Variational particle approximations. J. Mach. Learn. Res. 2017, 18, 2328–2356. [Google Scholar]
 Raiko, T.; Valpola, H.; Harva, M.; Karhunen, J. Building Blocks for Variational Bayesian Learning of Latent Variable Models. J. Mach. Learn. Res. 2007, 8, 155–201. [Google Scholar]
 Knowles, D.A.; Minka, T. Nonconjugate variational message passing for multinomial and binary regression. Adv. Neural Inf. Process. Syst. 2011, 24, 1701–1709. [Google Scholar]
 Khan, M.; Lin, W. ConjugateComputation Variational Inference: Converting Variational Inference in NonConjugate Models to Inferences in Conjugate Models. In Artificial Intelligence and Statistics; PMLR, 2017; pp. 878–887. [Google Scholar]
 Ranganath, R.; Gerrish, S.; Blei, D. Black box variational inference. In Artificial Intelligence and Statistics; PMLR, 2014; pp. 814–822. [Google Scholar]
 Mackay, D.J.C. Introduction to monte carlo methods. In Learning in Graphical Models; Springer: Berlin/Heidelberg, Germany, 1998; pp. 175–204. [Google Scholar]
 Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: Burlington, MA, USA, 1988. [Google Scholar]
 MacKay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
 Minka, T.P. Expectation Propagation for approximate Bayesian inference. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 362–369. [Google Scholar]
 Vehtari, A.; Gelman, A.; Sivula, T.; Jylänki, P.; Tran, D.; Sahai, S.; Blomstedt, P.; Cunningham, J.P.; Schiminovich, D.; Robert, C.P. Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data. J. Mach. Learn. Res. 2020, 21, 1–53. [Google Scholar]
 Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef][Green Version]
 Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
 Koyama, S.; Castellanos PérezBolde, L.; Shalizi, C.R.; Kass, R.E. Approximate methods for statespace models. J. Am. Stat. Assoc. 2010, 105, 170–180. [Google Scholar] [CrossRef][Green Version]
 Macke, J.H.; Buesing, L.; Cunningham, J.P.; Yu, B.M.; Shenoy, K.V.; Sahani, M. Empirical models of spiking in neural populations. Adv. Neural Inf. Process. Syst. 2011, 24, 1350–1358. [Google Scholar]
 Smola, A.J.; Vishwanathan, S.; Eskin, E. Laplace propagation. In Proceedings of the 16th International Conference on Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2004; pp. 441–448. [Google Scholar]
 Acerbi, L. Variational bayesian monte carlo. arXiv 2018, arXiv:1810.05558. [Google Scholar]
 Ajgl, J.; Šimandl, M. Differential entropy estimation by particles. IFAC Proc. Vol. 2011, 44, 11991–11996. [Google Scholar] [CrossRef]
 Revels, J.; Lubin, M.; Papamarkou, T. ForwardMode Automatic Differentiation in Julia. arXiv 2016, arXiv:1607.07892. [Google Scholar]
 Doucet, A.; de Freitas, N.; Gordon, N. Sequential Monte Carlo Methods in Practice; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Algorithm  Run Time (s) 

EVMP (ForneyLab)  $6.366\pm 0.081$ 
ADVI (Turing)  $91.468\pm 3.694$ 
Algorithm  Free Energy  Total Time (s) 

EVMP (ForneyLab)  135.837  $58.674\pm 0.467$ 
ADVI (Turing)  90.285  $47.405\pm 1.772$ 
NUTS (Turing)    $78.407\pm 4.206$ 
Algorithm  Free Energy  Total Time (s) 

EVMP (Meanfield)  283.991  $42.722\pm 0.197$ 
EVMP (Structured)  273.596  $51.684\pm 0.311$ 
HMCPG (Turing)    $116.291\pm 0.886$ 
NUTSPG (Turing)    $51.715\pm 0.441$ 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Akbayrak, S.; Bocharov, I.; de Vries, B. Extended Variational Message Passing for Automated Approximate Bayesian Inference. Entropy 2021, 23, 815. https://doi.org/10.3390/e23070815
Akbayrak S, Bocharov I, de Vries B. Extended Variational Message Passing for Automated Approximate Bayesian Inference. Entropy. 2021; 23(7):815. https://doi.org/10.3390/e23070815
Chicago/Turabian StyleAkbayrak, Semih, Ivan Bocharov, and Bert de Vries. 2021. "Extended Variational Message Passing for Automated Approximate Bayesian Inference" Entropy 23, no. 7: 815. https://doi.org/10.3390/e23070815