# A Robust Solution to Variational Importance Sampling of Minimum Variance

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

## 3. An Approximation to the Rényi Projection

## 4. Empirical Study of VIS Performance

#### 4.1. Experimental Comparison of Different VIS Approaches

#### 4.2. Experimental Comparison versus a Deterministic Approach

#### 4.3. Discussion

## 5. Mixture IS Approach

#### Deterministic Mixture IS with Control Variables: Combining the Strengths of I and R Projections

## 6. Empirical Study of Mixture VIS Performance

#### 6.1. Empirical Study on the Importance of the Component Weight, $\alpha $

#### 6.2. Experimental Comparison of Mixture Importance Sampling

#### 6.3. Discussion

## 7. Related Work

## 8. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Minka’s Based Procedure Full Derivation

## Appendix B. Mixture Importance Sampling Estimator with Control Variates as a Multiple Regression

## References

- Kahn, H.; Marshall, A.W. Methods of Reducing Sample Size in Monte Carlo Computations. J. Oper. Res. Soc. Am.
**1953**, 1, 263–278. [Google Scholar] - Owen, A.B. Importance Sampling. In Monte Carlo Theory, Methods and Examples; 2013; unpublished. [Google Scholar]
- Hernández-González, J.; Capdevila, J.; Cerquides, J. Variational Importance Sampling: Initial Findings. In Artificial Intelligence Research and Development: Proceedings of the 22nd International Conference of the Catalan Association for Artificial Intelligence; Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 2019; Volume 319, pp. 95–104. [Google Scholar]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; The MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Owen, A.; Zhou, Y. Safe and Effective Importance Sampling. J. Am. Stat. Assoc.
**2000**, 95, 135–143. [Google Scholar] [CrossRef] - Basseville, M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process.
**2013**, 93, 621–633. [Google Scholar] [CrossRef] - Zhu, H.; Rohwer, R. Measurements of Generalisation Based on Information Geometry. In Mathematics of Neural Networks: Models, Algorithms and Applications; Springer: Boston, MA, USA, 1997; pp. 394–398. [Google Scholar] [CrossRef][Green Version]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012; Volume 28. [Google Scholar] [CrossRef]
- Minka, T. Divergence Measures and Message Passing; Technical Report MSR-TR-2005-173; Microsoft Research Ltd.: Cambridge, UK, 2005. [Google Scholar]
- Hesterberg, T. Weighted Average Importance Sampling and Defensive Mixture Distributions. Technometrics
**1995**, 37, 185–194. [Google Scholar] [CrossRef] - Elvira, V.; Martino, L.; Luengo, D.; Bugallo, M.F. Efficient Multiple Importance Sampling Estimators. IEEE Signal Process. Lett.
**2015**, 22, 1757–1761. [Google Scholar] [CrossRef][Green Version] - Regli, J.B.; Silva, R. Alpha-Beta Divergence For Variational Inference. arXiv
**2018**, arXiv:1805.01045. [Google Scholar] - Wang, D.; Liu, H.; Liu, Q. Variational Inference with Tail-Adaptive f-Divergence. Available online: https://papers.nips.cc/paper/2018/hash/1cd138d0499a68f4bb72bee04bbec2d7-Abstract.html (accessed on 11 December 2020).
- Erven, T.V.; Harremos, P. Rényi Divergence and Kullback-Leibler Divergence. IEEE Trans. Inf. Theory
**2014**, 60, 3797–3820. [Google Scholar] [CrossRef][Green Version] - Li, Y.; Turner, R.E. Rényi Divergence Variational Inference. Available online: https://proceedings.neurips.cc/paper/1992/file/7750ca3559e5b8e1f44210283368fc16-Paper.pdf (accessed on 11 December 2020).

**Figure 1.**Results on four different synthetic problems (see Section 4), in terms of mean relative error and mean empirical variance (left and right plots for each subfigure, respectively), of the different VIS approaches: the I-projection (VIS-I), our approximated R projection (VIS-Rh), Minka’s approximated R projection (VIS-Rm) and the exact R projection (VIS-R). Each point in the lines is a mean over 1000 estimators with a specific number of samples in $\{64\xb7k|1\le k\le 128\}$. This is considered to cover the reasonable setups (up to 2

^{13}, a 12.5% of the whole sample space) for sampling based estimators.

**Figure 2.**Experimental results in terms of empirical variance of VIS-Rh with different v

_{max}, as well as the time required for our approximated projection. Each figure shows the results for problem instances generated with two different dependence strengths ($\tau =\{3.5,10\}$). Every point is an average over 100 estimators, with 2

^{13}samples each, on 20 different problem instances.

**Figure 3.**Results on four different synthetic problems (see Section 4), in terms of mean relative error and mean empirical variance (left and right plots for each subfigure, respectively), of the different VIS approaches: the I-projection (VIS-I), our approximated R projection (VIS-Rh), Minka’s approximated R projection (VIS-Rm) and the exact R projection (VIS-R). Moreover, simple Monte Carlo (MC) and an exhaustive procedure (DEst) are also included. Each point in the lines is a mean over 1000 estimators with a specific number of samples in $\{512\xb7k|1\le k\le 128\}$. This covers up to the whole sample space (2

^{16}).

**Figure 4.**Distributions considered in the simple variance analysis. A normal distribution with variance 4 and a mixture of larger variance (6) with a large component (${\alpha}_{a}=1-{\alpha}_{b}$, centered in −0.07, with variance $0.2$) and a small component with a huge contribution to variance (${\alpha}_{b}={2}^{-10}$, centered in 77.02, with variance 1).

**Figure 5.**Proportion of cases (over 1000 repetitions) where the mean square error of the estimator using the lower variance distribution (normal distribution in Figure 4) is larger than that of the one using the larger variance distribution (mixture in Figure 4). On the left, the proportion for different number of estimators (a single data point sampled per estimator). On the right, the same proportion for different number of estimators and samples per estimator.

**Figure 6.**Results on four different synthetic problems (see Section 4), in terms of mean relative error and mean empirical variance (left and right plots for each subfigure, respectively), of VIS-Rx with five different a selections: constant $\alpha \in \{0.5,0.25,0.1,0.05\}$ and a decreasing α value (VIS-Rx-rel) relative to the number of samples. Each point in the lines is a mean over 1000 estimators with a specific number of samples in $\{64\xb7k|1\le k\le 128\}$. This is considered to cover the reasonable setups (up to 2

^{13}, a 12.5% of the whole sample space) for sampling based estimators.

**Figure 7.**Results on four different synthetic problems (see Section 4), in terms of mean relative error and mean empirical variance (left and right plots for each subfigure, respectively), of five different VIS approaches: our mixture-based proposal (VIS-Rx-rel), those using its components (VIS-I and our VIS-Rh), Minka’s approximated R projection (VIS-Rm) and the exact R projection (VIS-R). Each point in the lines is a mean over 1000 estimators with a specific number of samples in $\{64\xb7k|1\le k\le 128\}$. This is considered to cover the reasonable setups (up to 2

^{13}, a 12.5% of the whole sample space) for sampling based estimators.

**Figure 8.**Results on four different synthetic problems (see Section 4), in terms of mean relative error and mean empirical variance (left and right plots for each subfigure, respectively), of five different VIS approaches: our mixture-based proposal (VIS-Rx-rel), those using its components (VIS-I and our VIS-Rh), Minka’s approximated R projection (VIS-Rm) and the exact R projection (VIS-R). Moreover, simple MC and an exhaustive procedure (DEst) are also included. Each point in the lines is a mean over 1000 estimators with a specific number of samples in $\{512\xb7k|1\le k\le 128\}$. This covers up to the whole sample space (2

^{16}).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hernández-González, J.; Cerquides, J. A Robust Solution to Variational Importance Sampling of Minimum Variance. *Entropy* **2020**, *22*, 1405.
https://doi.org/10.3390/e22121405

**AMA Style**

Hernández-González J, Cerquides J. A Robust Solution to Variational Importance Sampling of Minimum Variance. *Entropy*. 2020; 22(12):1405.
https://doi.org/10.3390/e22121405

**Chicago/Turabian Style**

Hernández-González, Jerónimo, and Jesús Cerquides. 2020. "A Robust Solution to Variational Importance Sampling of Minimum Variance" *Entropy* 22, no. 12: 1405.
https://doi.org/10.3390/e22121405