# Automating Model Comparison in Factor Graphs

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We show that the Bayesian model comparison can be performed through message passing on a graph, where the individual model performance results are captured in a single factor node as described in Section 4.1.
- We specify a universal mixture node and derive a set of custom message-passing update rules in Section 4.2. Performing probabilistic inference with this node in conjunction with scale factors yields different Bayesian model comparison methods.
- Bayesian model averaging, selection, and combination are recovered and consequently automated in Section 5.1, Section 5.2 and Section 5.3 by imposing a specific structure or local constraints on the model selection variable m.

## 2. Related Work

## 3. Background Material

#### 3.1. Forney-Style Factor Graphs

#### 3.2. Sum-Product Message Passing

#### 3.3. Scale Factors

**Theorem**

**1.**

**Proof.**

#### 3.4. Variational Free Energy

## 4. Universal Mixture Modeling

#### 4.1. A Variational Free Energy Decomposition for Mixture Models

#### 4.2. A Factor Graph Approach to Universal Mixture Modeling: A General Recipe

**Theorem**

**2.**

**Proof.**

#### 4.3. A Factor Graph Approach to Universal Mixture Modeling: An Illustrative Example

## 5. Model Comparison Methods

#### 5.1. Bayesian Model Averaging

#### 5.2. Bayesian Model Selection

#### 5.3. Bayesian Model Combination

#### Probabilistic Inference for Bayesian Model Combination

## 6. Experiments

#### 6.1. Verification Experiments

#### 6.2. Validation Experiments

#### 6.2.1. Mixed Models

#### 6.2.2. Voice Activity Detection

## 7. Discussion

## 8. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Proofs

#### Appendix A.1. Proof of Theorem 1

#### Appendix A.2. Proof of Theorem 2

## Appendix B. Derivations

#### Appendix B.1. Derivation of Variational Free Energy Decomposition for Mixture Models

#### Appendix B.2. Derivation of Message ${\phantom{\rule{1.66656pt}{0ex}}{\textstyle \overrightarrow{{\textstyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{m}\left(m\right)$

#### Appendix B.3. Derivation of Message ${\phantom{\rule{1.66656pt}{0ex}}{\textstyle \overrightarrow{{\textstyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{{s}_{j}}\left({s}_{j}\right)$

## References

- Box, G.E.P. Robustness in the Strategy of Scientific Model Building. In Robustness in Statistics; Launer, R.L., Wilkinson, G.N., Eds.; Academic Press: Cambridge, MA, USA, 1979; pp. 201–236. [Google Scholar]
- Blei, D.M. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models. Annu. Rev. Stat. Its Appl.
**2014**, 1, 203–232. [Google Scholar] [CrossRef] [Green Version] - Box, G.E.P. Science and Statistics. J. Am. Stat. Assoc.
**1976**, 71, 791–799. [Google Scholar] [CrossRef] - MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive computation and machine learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model Averaging: A Tutorial. Stat. Sci.
**1999**, 14, 382–401. [Google Scholar] - Monteith, K.; Carroll, J.L.; Seppi, K.; Martinez, T. Turning Bayesian model averaging into Bayesian model combination. In Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2657–2663. [Google Scholar] [CrossRef]
- Cox, M.; van de Laar, T.; de Vries, B. A factor graph approach to automated design of Bayesian signal processing algorithms. Int. J. Approx. Reason.
**2019**, 104, 185–204. [Google Scholar] [CrossRef] [Green Version] - Bagaev, D.; Podusenko, A.; De Vries, B. RxInfer: A Julia package for reactive real-timeBayesian inference. J. Open Source Softw.
**2023**, 8, 5161. [Google Scholar] [CrossRef] - Ge, H.; Xu, K.; Ghahramani, Z. Turing: A Language for Flexible Probabilistic Inference. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR, Playa Blanca, Spain, 9–11 April 2018; pp. 1682–1690. [Google Scholar]
- Bingham, E.; Chen, J.P.; Jankowiak, M.; Obermeyer, F.; Pradhan, N.; Karaletsos, T.; Singh, R.; Szerlip, P.; Horsfall, P.; Goodman, N.D. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res.
**2018**, 20, 973–978. [Google Scholar] [CrossRef] - Buchner, J. UltraNest—A robust, general purpose Bayesian inference engine. arXiv
**2021**, arXiv:2101.09604. [Google Scholar] - Salvatier, J.; Wiecki, T.; Fonnesbeck, C. Probabilistic Programming in Python using PyMC. arXiv
**2015**, arXiv:1507.08050. [Google Scholar] - Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A Probabilistic Programming Language. J. Stat. Softw.
**2017**, 76, 1–32. [Google Scholar] [CrossRef] - Kamary, K.; Mengersen, K.; Robert, C.P.; Rousseau, J. Testing hypotheses via a mixture estimation model. arXiv
**2018**, arXiv:1412.2044. [Google Scholar] - Minka, T.; Winn, J. Gates. In Advances in Neural Information Processing Systems 21; Curran Associates, Inc.: Red Hook, NY, USA, 2009; pp. 1073–1080. [Google Scholar]
- Fragoso, T.M.; Neto, F.L. Bayesian model averaging: A systematic review and conceptual classification. Int. Stat. Rev.
**2018**, 86, 1–28. [Google Scholar] [CrossRef] [Green Version] - Stephan, K.E.; Penny, W.D.; Daunizeau, J.; Moran, R.J.; Friston, K.J. Bayesian model selection for group studies. NeuroImage
**2009**, 46, 1004–1017. [Google Scholar] [CrossRef] [Green Version] - Rigoux, L.; Stephan, K.; Friston, K.; Daunizeau, J. Bayesian model selection for group studies—Revisited. NeuroImage
**2014**, 84, 971–985. [Google Scholar] [CrossRef] - Schmitt, M.; Radev, S.T.; Bürkner, P.C. Meta-Uncertainty in Bayesian Model Comparison. arXiv
**2023**, arXiv:2210.07278. [Google Scholar] - Minka, T.P. Bayesian Model Averaging Is Not Model Combination. 2000. Available online: http://www.stat.cmu.edu/minka/papers/bma.html (accessed on 9 June 2023).
- Keller, M.; Kamary, K. Bayesian model averaging via mixture model estimation. arXiv
**2018**, arXiv:1711.10016. [Google Scholar] - Yao, Y.; Vehtari, A.; Simpson, D.; Gelman, A. Using Stacking to Average Bayesian Predictive Distributions (with Discussion). Bayesian Anal.
**2018**, 13, 917–1007. [Google Scholar] [CrossRef] - Domingos, P. Bayesian Averaging of Classifiers and the Overfitting Problem. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, San Francisco, CA, USA, 29 June–2 July 2000; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2000; pp. 223–230. [Google Scholar]
- Yao, Y.; Pirš, G.; Vehtari, A.; Gelman, A. Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful. Bayesian Anal.
**2022**, 17, 1043–1071. [Google Scholar] [CrossRef] - Wolpert, D.H. Stacked generalization. Neural Netw.
**1992**, 5, 241–259. [Google Scholar] [CrossRef] - Loeliger, H.A. An introduction to factor graphs. IEEE Signal Process. Mag.
**2004**, 21, 28–41. [Google Scholar] [CrossRef] [Green Version] - Loeliger, H.A.; Dauwels, J.; Hu, J.; Korl, S.; Ping, L.; Kschischang, F.R. The Factor Graph Approach to Model-Based Signal Processing. Proc. IEEE
**2007**, 95, 1295–1322. [Google Scholar] [CrossRef] [Green Version] - Kschischang, F.; Frey, B.; Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory
**2001**, 47, 498–519. [Google Scholar] [CrossRef] [Green Version] - Dauwels, J. On Variational Message Passing on Factor Graphs. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 2546–2550. [Google Scholar] [CrossRef] [Green Version]
- Şenöz, I.; van de Laar, T.; Bagaev, D.; de Vries, B. Variational Message Passing and Local Constraint Manipulation in Factor Graphs. Entropy
**2021**, 23, 807. [Google Scholar] [CrossRef] - Pearl, J. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach. In Proceedings of the American Association for Artificial Intelligence National Conference on AI, Pittsburgh, PA, USA, 18–20 August 1982; pp. 133–136. [Google Scholar]
- Murphy, K.; Weiss, Y.; Jordan, M.I. Loopy Belief Propagation for Approximate Inference: An Empirical Study. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 30 July–1 August 1999. [Google Scholar]
- Winn, J.M. Variational Message Passing and Its Applications. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2004. [Google Scholar]
- Dauwels, J.; Korl, S.; Loeliger, H.A. Expectation maximization as message passing. In Proceedings of the International Symposium on Information Theory (ISIT), Adelaide, Australia, 4–9 September 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 583–586. [Google Scholar] [CrossRef] [Green Version]
- Minka, T.P. Expectation Propagation for Approximate Bayesian Inference. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 362–369. [Google Scholar]
- Yedidia, J.S.; Freeman, W.T.; Weiss, Y. Generalized Belief Propagation. Adv. Neural Inf. Process. Syst.
**2000**, 13, 689–695. [Google Scholar] - Reller, C. State-Space Methods in Statistical Signal Processing: New Ideas and Applications. Ph.D. Thesis, ETH Zurich, Zurich, Switzerland, 2013. [Google Scholar]
- Nguyen, H.M.; van Erp, B.; Şenöz, İ.; de Vries, B. Efficient Model Evidence Computation in Tree-structured Factor Graphs. In Proceedings of the 2022 IEEE Workshop on Signal Processing Systems (SiPS), Rennes, France, 2–4 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Winn, J.; Bishop, C.M. Variational Message Passing. J. Mach. Learn. Res.
**2005**, 6, 661–694. [Google Scholar] - Forney, G. Codes on graphs: Normal realizations. IEEE Trans. Inf. Theory
**2001**, 47, 520–548. [Google Scholar] [CrossRef] [Green Version] - Bishop, C.M. Pattern Recognition and Machine Learning; Information science and statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
- Friston, K.; Penny, W. Post hoc Bayesian model selection. NeuroImage
**2011**, 56, 2089–2099. [Google Scholar] [CrossRef] [Green Version] - Friston, K.; Parr, T.; Zeidman, P. Bayesian model reduction. arXiv
**2019**, arXiv:1805.07092. [Google Scholar] - Parr, T.; Friston, K.J. Generalised free energy and active inference. Biol. Cybern.
**2019**, 113, 495–513. [Google Scholar] [CrossRef] [Green Version] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; Adaptive computation and machine learning series; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- van de Laar, T. Automated Design of Bayesian Signal Processing Algorithms. Ph.D. Thesis, Technische Universiteit Eindhoven, Eindhoven, The Netherlands, 2019. [Google Scholar]
- Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V.B. Julia: A Fresh Approach to Numerical Computing. SIAM Rev.
**2017**, 59, 65–98. [Google Scholar] [CrossRef] - Bagaev, D.; de Vries, B. Reactive Message Passing for Scalable Bayesian Inference. Sci. Program.
**2023**, 2023, 6601690. [Google Scholar] [CrossRef] - Bagaev, D.; van Erp, B.; Podusenko, A.; de Vries, B. ReactiveMP.jl: A Julia package for reactive variational Bayesian inference. Softw. Impacts
**2022**, 12, 100299. [Google Scholar] [CrossRef] - Cui, G.; Yu, X.; Iommelli, S.; Kong, L. Exact Distribution for the Product of Two Correlated Gaussian Random Variables. IEEE Signal Process. Lett.
**2016**, 23, 1662–1666. [Google Scholar] [CrossRef] - Hoffman, M.D.; Blei, D.M.; Wang, C.; Paisley, J. Stochastic Variational Inference. J. Mach. Learn. Res.
**2013**, 14, 1303–1347. [Google Scholar] - Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. arXiv
**2015**, arXiv:1505.05424. [Google Scholar] - Haussmann, M.; Hamprecht, F.A.; Kandemir, M. Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation. arXiv
**2019**. [Google Scholar] [CrossRef] - Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv
**2017**, arXiv:1706.05098. [Google Scholar]

**Figure 1.**A Forney-style factor graph representation of the factorized function in (3).

**Figure 2.**Subgraph containing model selection variable m. The node ${f}_{m}$ terminates the subgraph and is defined in (14).

**Figure 3.**(left) Overview of the traditional process of model comparison. Here, inference is performed in a set of K models, after which the models are compared. These models may partially overlap in both variables as in structure. Specifically, in this example, the variables ${s}_{j}$ connect the overlapping factors ${f}_{o}$ to the non-overlapping factors. The notation ${s}_{j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{m}_{k}=1$ denotes the variable ${s}_{j}$ in the k-th model. (right) Our approach to model comparison based on mixture modeling. The different models are combined into a single graph representing a mixture model, where the model selection variable m specifies the component assignment. The variable ${s}_{j}$ without conditioning implies that it has been marginalized over the different models m.

**Figure 5.**Schematic overview of (a) Bayesian model averaging, (b) selection and (c) combination as specified in Section 5.1, Section 5.2 and Section 5.3. This overview explicitly visualizes the structural differences between the prior distributions and form constraints imposed on the model selection variable m. The edges crossing the plates are implicitly connected through equality nodes.

**Figure 6.**Visualization of the verification experiments as specified in Section 6.1. The individual plots show the (predictive) posterior distributions for the assignment variable in (29) for $N=\{1,5,10,100,1000\}$ observations as computed using the different methods outlined in Section 5.1, Section 5.2 and Section 5.3.

**Figure 7.**Inference results of the mixed model as described in Section 6.2.1. The inference procedure is performed by (left) Bayesian model averaging and (right) Bayesian model combination under a variational mean-field factorization. (top) The posterior estimate for the shift c. (bottom) The predictive posterior distribution for new observations in blue with underlying components in red.

**Figure 8.**Results of the voice activity detection experiment as specified in Section 6.2.2. The figure shows (top) the clean signal, (middle) the clean signal corrupted by additive white Gaussian noise and (bottom) the inferred speech probability.

**Table 1.**Table containing (top) the Forney-style factor graph representation of the mixture node. (bottom) The derived outgoing messages for the mixture node. It can be noted that the backward message towards m resembles a scaled categorical distribution and that the forward message towards ${s}_{j}$ represents a mixture distribution. Derivations of the messages ${\phantom{\rule{1.66656pt}{0ex}}{\textstyle \overrightarrow{{\textstyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{m}\left(m\right)$ and ${\overrightarrow{\mu}}_{{s}_{j}}\left({s}_{j}\right)$ are presented in Appendices Appendix B.2 and Appendix B.3, respectively.

Factor Node | |
---|---|

Messages | Functional form |

${\phantom{\rule{1.66656pt}{0ex}}{\textstyle \overrightarrow{{\textstyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{m}\left(m\right)$ | $\prod _{k=1}^{K}{\left(\int {\overrightarrow{\mu}}_{{s}_{j}|{m}_{k}=1}\left({s}_{j}\right){\phantom{\rule{1.66656pt}{0ex}}{\displaystyle \overrightarrow{{\displaystyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{{s}_{j}}\left({s}_{j}\right)\mathrm{d}{s}_{j}\right)}^{{m}_{k}}$ |

${\overrightarrow{\mu}}_{{s}_{j}}\left({s}_{j}\right)$ | $\sum _{k=1}^{K}{\overrightarrow{\mu}}_{m}\left({m}_{k}=1\right){\overrightarrow{\mu}}_{{s}_{j}|{m}_{k}=1}\left({s}_{j}\right)$ |

${\phantom{\rule{1.66656pt}{0ex}}{\textstyle \overrightarrow{{\textstyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{{s}_{j}|{m}_{k}=1}\left({s}_{j}\right)$ | ${\phantom{\rule{1.66656pt}{0ex}}{\displaystyle \overrightarrow{{\displaystyle \phantom{\rule{-1.66656pt}{0ex}}\mu \phantom{\rule{1.66656pt}{0ex}}}}}\phantom{\rule{-1.66656pt}{0ex}}}_{{s}_{j}}\left({s}_{j}\right)$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

van Erp, B.; Nuijten, W.W.L.; van de Laar, T.; de Vries, B.
Automating Model Comparison in Factor Graphs. *Entropy* **2023**, *25*, 1138.
https://doi.org/10.3390/e25081138

**AMA Style**

van Erp B, Nuijten WWL, van de Laar T, de Vries B.
Automating Model Comparison in Factor Graphs. *Entropy*. 2023; 25(8):1138.
https://doi.org/10.3390/e25081138

**Chicago/Turabian Style**

van Erp, Bart, Wouter W. L. Nuijten, Thijs van de Laar, and Bert de Vries.
2023. "Automating Model Comparison in Factor Graphs" *Entropy* 25, no. 8: 1138.
https://doi.org/10.3390/e25081138