#
Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. From Variational Inference (VI) Objective to VAE Objective

#### 2.1. Variational Inference

#### 2.2. Variational Autoencoders

#### 2.3. Posterior Collapse and Mismatch between the True and the Approximate Posterior

#### 2.4. Optimal Prior

## 3. Related Work

## 4. VAE with Data-Dependent Conditional Priors

#### 4.1. Two-Level Generative Process

#### Data-Dependent Conditional Priors

#### 4.2. Inference Model

#### 4.3. Optimization Objective

#### 4.3.1. Analysis of the Objective

## 5. VAE with Continuous and Discrete Components

#### 5.1. Comparing the Alternative Models

#### 5.2. Assuming Uniform Approximate Categorical Posterior

## 6. Empirical Evaluation

#### 6.1. Synthetic Data Experiments

**known**number of components: discrete latent variable $\mathbf{c}$ with two categories (corresponding to the ground-truth two mixture components)**unknown**number of components: discrete latent variable $\mathbf{c}$ with 150 categories

#### 6.2. Real Data Experiments

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Proofs

#### Appendix A.1. Proofs of Section 4

**Proof**

**of**

**Equation (12).**

**Proof**

**of**

**Equation (13).**

#### Appendix A.2. Proofs of Section 5

**Proof**

**of**

**Equation (17).**

**Proof**

**of**

**Equation (19).**

**Proof**

**of**

**CP-VAE objective:**

**Proof**

**of**

**INDq objective:**

**Proof**

**of**

**INDp objective:**

**Proof**

**of**

**INDqp objective:**

#### Appendix A.3. Proofs of Section 5.2

**Proof.**

#### Appendix A.4. Maximizing the Negative RE is Equivalent to Maximizing a Lower Bound on MI between the Latent Variables (**z**,**c**) and **x**

**Proof.**

## Appendix B. MNIST

**Table A1.**MNIST: First five categories with higher probability for each label based on the marginal categorical posterior condition on each label of CPVAE and INDq with discrete latent variable with 150 categories. With bold we mark the categories that appear in more than one labels.

CPVAE | INDq | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

label 0 | 22 | 102 | 17 | 59 | 134 | 67 | 145 | 56 | 18 | 32 |

label 1 | 32 | 48 | 21 | 9 | 20 | 94 | 97 | 85 | 124 | 67 |

label 2 | 29 | 55 | 45 | 95 | 88 | 31 | 5 | 48 | 140 | 65 |

label 3 | 129 | 5 | 13 | 146 | 125 | 122 | 135 | 125 | 149 | 99 |

label 4 | 53 | 40 | 47 | 122 | 74 | 107 | 87 | 120 | 132 | 80 |

label 5 | 66 | 60 | 117 | 63 | 10 | 43 | 141 | 56 | 102 | 149 |

label 6 | 41 | 35 | 127 | 130 | 149 | 109 | 103 | 111 | 141 | 98 |

label 7 | 52 | 0 | 148 | 108 | 106 | 135 | 67 | 125 | 2 | 99 |

label 8 | 132 | 103 | 27 | 100 | 85 | 1 | 23 | 15 | 63 | 40 |

label 9 | 75 | 42 | 81 | 144 | 0 | 67 | 99 | 18 | 34 | 79 |

**Figure A1.**MINST: Marginal categorical posterior of CPVAE (

**a**), INDq (

**b**), INDp (

**c**) and INDqp (

**d**) with discrete latent variable with 150 categories.

**Figure A2.**MINST: Marginal categorical posterior condition on each label of CPVAE with discrete latent variable with 150 categories.

**Figure A3.**New variations of individual MNIST digits generated by our CP-VAE model with a latent discrete variable with 150 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 20 categories with probability higher than $1/150$.

**Figure A4.**New variations of individual MNIST digits generated by INDq model with a latent discrete variable with 150 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 20 categories with probability higher than $1/150$.

**Figure A5.**MINST: Marginal categorical posterior condition on each label of INDq with discrete latent variable with 150 categories.

**Figure A6.**New variations of individual MNIST generated by INDp model (1st–2nd row) and INDqp model (3rd–4th row) with a latent discrete variable with 150 categories. Samples in the same subplot were generated from the same discrete category. For both models we randomly pick 10 categories categories with probability higher than $1/150$ in order to generate the samples.

## Appendix C. Omniglot

**Figure A7.**Omniglot: Marginal categorical posterior of CPVAE (

**a**), INDq (

**b**), INDp (

**c**) and INDqp (

**d**) with discrete latent variable with 500 categories.

**Figure A8.**New variations of individual Omniglot symbols generated by our CP-VAE model with a latent discrete variable with 500 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 20 categories with probability higher than $1/500$.

**Figure A9.**New variations of individual Omniglot symbols generated by INDq model with a latent discrete variable with 500 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 20 categories with probability higher than $1/500$.

**Figure A10.**New variations of individual Omniglot symbols generated by INDp model (1st–2nd row) and INDqp model (3rd–4rth row) with a latent discrete variable with 500 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 10 categories with probability higher than $1/500$.

## References

- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1278–1286. [Google Scholar]
- Rezende, D.J.; Mohamed, S. Variational inference with normalizing flows. In Proceedings of the 32 International Conference on Machine Learning, Lille, France, 6–11 July 2015; p. 9. [Google Scholar]
- Burda, Y.; Grosse, R.B.; Salakhutdinov, R. Importance weighted autoencoders. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved variational inference with inverse autoregressive flow. In Proceedings of the Advances in Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Chen, X.; Kingma, D.P.; Salimans, T.; Duan, Y.; Dhariwal, P.; Schulman, J.; Sutskever, I.; Abbeel, P. Variational lossy autoencoders. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; p. 17. [Google Scholar]
- Jiang, Z.; Zheng, Y.; Tan, H.; Tang, B.; Zhou, H. Variational deep embedding: An unsupervised and generative approach to clustering. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar]
- Nalisnick, E.; Smyth, P. Stick-breaking variational autoencoders. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Zhao, S.; Song, J.; Ermon, S. InfoVAE: Information maximizing variational autoencoders. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) 2017, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Alemi, A.A.; Poole, B.; Fischer, I.; Dillon, J.V.; Saurous, R.A.; Murphy, K. Fixing a broken ELBO. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Davidson, T.R.; Falorsi, L.; De Cao, N.; Kipf, T.; Tomczak, J.M. Hyperspherical variational auto-encoders. arXiv
**2018**, arXiv:1804.00891. [Google Scholar] - Dai, B.; Wipf, D. Diagnosing and enhancing VAE models. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Ramapuram, J.; Gregorova, M.; Kalousis, A. Lifelong generative modeling. arXiv
**2019**, arXiv:1705.09847. [Google Scholar] [CrossRef] - Lavda, F.; Ramapuram, J.; Gregorova, M.; Kalousis, A. Continual classification learning using generative models. Continual learning workshop, Advances in Neural Information Processing Systems2018, Montreal, QC, Canada, 3–8 December 2018. arXiv
**2018**, arXiv:1810.10612. [Google Scholar] - Locatello, F.; Bauer, S.; Lucic, M.; Rätsch, G.; Gelly, S.; Schölkopf, B.; Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
- Bowman, S.R.; Vilnis, L.; Vinyals, O.; Dai, A.; Jozefowicz, R.; Bengio, S. Generating Sentences from a Continuous Space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 10–21. [Google Scholar] [CrossRef]
- Tomczak, J.M.; Welling, M. VAE with a VampPrior. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), Playa Blanca, Lanzarote, Spain, 9–11 April 2018. [Google Scholar]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. Beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Hoffman, M.D.; Johnson, M.J. ELBO surgery: Yet another way to carve up the variational evidence lower bound. In Proceedings of the NIPS Symposium on Advances in Approximate Bayesian Inference, Montreal, QC, Canada, 2 December 2018; p. 4. [Google Scholar]
- Kim, H.; Mnih, A. Disentangling by factorising. arXiv
**2018**, arXiv:1802.05983. [Google Scholar] - Dupont, E. Learning disentangled joint continuous and discrete representations. In Proceedings of the Advances in Neural Information Processing Systems 2018, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Gulrajani, I.; Kumar, K.; Ahmed, F.; Taiga, A.A.; Visin, F.; Vazquez, D.; Courville, A. PixelVAE: A latent variable model for natural images. In Proceedings of the International Conference on Learning Representations (ICLR) 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Chung, J.; Kastner, K.; Dinh, L.; Goel, K.; Courville, A.C.; Bengio, Y. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems 28; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 2980–2988. [Google Scholar]
- Xu, J.; Durrett, G. Spherical latent spaces for stable variational autoencoders. In Proceedings of the Conference on Empirical Methods in Natural Language Processing 2018, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
- Takahashi, H.; Iwata, T.; Yamanaka, Y.; Yamada, M.; Yagi, S. Variational autoencoder with implicit optimal priors. In Proceedings of the AAAI Conference on Artificial Intelligence 2019, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5066–5073. [Google Scholar]
- Dilokthanakul, N.; Mediano, P.A.M.; Garnelo, M.; Lee, M.C.H.; Salimbeni, H.; Arulkumaran, K.; Shanahan, M. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv
**2017**, arXiv:1611.02648. [Google Scholar] - Goyal, P.; Hu, Z.; Liang, X.; Wang, C.; Xing, E. Nonparametric variational auto-encoders for hierarchical representation learning. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Li, X.; Chen, Z.; Poon, L.K.M.; Zhang, N.L. Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering. In Proceedings of the International Conference on Learning Representations 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Mohamed, S.; Rosca, M.; Figurnov, M.; Mnih, A. Monte carlo gradient estimation in machine learning. arXiv
**2019**, arXiv:1906.10652. [Google Scholar] - Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with gumbel-softmax. In Proceedings of the International Conference on Learning Representations (ICLR) 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv
**2016**, arXiv:1511.05644. [Google Scholar] - Kingma, D.P.; Mohamed, S.; Jimenez Rezende, D.; Welling, M. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 3581–3589. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version] - Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv
**2017**, arXiv:1708.07747. [Google Scholar] - Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science
**2015**, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Salakhutdinov, R.; Murray, I. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 872–879. [Google Scholar]

**Figure 1.**To generate new examples from the learned data distribution ${p}_{\theta}\left(\mathbf{x}\right)$, we sample the discrete and continuous latent variables from the two-level prior and pass those through the decoder.

**Figure 2.**The encoder infers the parameters of the discrete and continuous approximate posteriors using a gated layer for the hierarchical conditioning. First, it outputs the parameters of the discrete latent variable, ${\pi}_{\varphi}$. Subsequently, there is an extra layer that takes as input ${\pi}_{\varphi}$ concatenated with the last hidden layer of the encoder and infers the parameters of the continuous latent variable.

**Figure 3.**Histograms of the data generated from the ground-truth $\mathbf{x}\sim p\left(\mathbf{x}\right)=\frac{1}{2}\left(N(0.3,0.05)+N(0.7,0.05)\right)$ and the learned distributions using CP-VAE, MoG, VampPrior INDq, INDp, INDqp, with 2 and 150 categories and VAE. Our CP method can recover the bi-modal structure of the data correctly, irrespective of the number of categories used for the latent categorical component.

**Figure 4.**Histograms of conditionally generate samples using our conditional prior variational autoencoder (CP-VAE) model with latent discrete variable with two categories. In subfigure

**a**we conditionally generate samples from the first category and in subfigure

**b**we conditionally generate samples from the first category.

**Figure 5.**CP-VAE (150-category case) generations sampled from the marginal posterior (

**a**) and from the uniform prior (

**b**) and CP-VAE reconstructions (

**c**). The CP-VAE learns to ignore the excess capacity of the disjoint latent space by assigning near-zero probability to some of the categories in the discrete latent space. These parts of the latent space are ignored for the reconstructions and by sampling the categorical variable from the marginal posterior ${q}_{\varphi}\left(c\right)$ can correctly be ignored also for the generations.

**Figure 6.**New data examples from the FashionMNIST generated by our CP-VAE model with latent discrete variable with 150 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples we randomly use 20 categories with probability higher than 1/150. CP-VAE accurately captures not only the main source of variation of the data, but can also find subcategories among each main category, in a totally unsupervised manner. For example we condition on category 61 and we can see in the 2nd subplot of the second row that it generates flat sandals while when we condition on category 34 in the fourth subplot of the fourth row it generates sandals with heels.

**Figure 7.**New data examples from the FashionMNIST generated by INDq model with a latent discrete variable with 150 categories. Examples in the same subplot were generated from the same discrete category. To generate the samples, we use all of the categories with probability higher than 1/150. INDq model is not always able to conditionally generate new samples from the individual modes of the underlying distribution but generates a mix of images from different modes in some subplots. For example in the 1st subplot of the second row it mixes t-shirts, dresses, pullovers, and trousers, and in the third subplot of the second row it mixes flat sandals with sneakers and ankle boots.

**Figure 8.**New data examples from the FashionMNIST generated by INDp model (1st–2nd row) and INDqp model (3rd–4th row) with a latent discrete variable with 150 categories. Samples in the same subplot were generated from the same discrete category. For both models we randomly pick 10 categories with probability higher than 1/150 in order to generate the samples. INDp and INDqp models are both not able to conditionally generate new samples from the individual modes of the underlying distribution, but generate a mix of images from different modes.

**Figure 9.**FashionMNIST: Marginal categorical posterior of CP-VAE (

**a**), INDq (

**b**), INDp (

**c**) and INDqp (

**d**) with discrete latent variable with 150 categories.

**Figure 10.**FashionMNIST: Marginal categorical posterior conditioned on each label of CP-VAE with discrete latent variable with 150 categories.

**Figure 11.**FashionMNIST: marginal categorical posterior conditioned on each label of INDq with discrete latent variable with 150 categories.

**Figure 12.**FashionMNIST: Marginal categorical posterior conditioned on the 5 first labels of INDp (1st row) and INDqp (2nd row) with discrete latent variable with 150 categories.

**Figure 13.**New variations of label specific individual FashionMNIST generated by CP-VAE model with a latent discrete variable with 150 categories. Samples in the same row belong to the same class label and samples in the same subplot were generated from the same discrete category.

**Figure 14.**New data examples from the Omniglot dataset generated by the various methods with increasing number of the prior components.

**Figure 15.**New data examples from the MNIST dataset generated by the various methods with increasing number of the prior components.

**Figure 16.**New data examples from the FashionMNIST dataset generated by the various methods with increasing number of the prior components.

**Table 1.**Independence assumptions for discrete-continuous latent variable models and the corresponding decomposition of the B and C terms in Equation (19).

Model | ${\mathit{q}}_{\mathit{\varphi}}(\mathit{z},\mathit{c}|\mathit{x})$ | ${\mathit{p}}_{\mathit{\phi}}(\mathit{z},\mathit{c})$ | $\mathit{B}1$ | $\mathit{C}1$ | Refs. |
---|---|---|---|---|---|

CP-VAE | ${q}_{\varphi}\left(\mathbf{z}\right|\mathbf{x},\mathbf{c}){q}_{\varphi}\left(\mathbf{c}\right|\mathbf{x})$ | ${p}_{\phi}\left(\mathbf{z}\right|\mathbf{c})p\left(\mathbf{c}\right)$ | ${\mathbb{I}}_{q}\left(\mathbf{z}\right|\mathbf{c},\mathbf{x}\left|\mathbf{c}\right)$ | ${\mathbb{E}}_{q\left(c\right)}[KL({q}_{\varphi}\left(\mathbf{z}\right|\mathbf{c})\parallel {p}_{\phi}\left(\mathbf{z}\right|\mathbf{c}))]$ | |

INDq | ${q}_{\varphi}\left(\mathbf{z}\right|\mathbf{x}){q}_{\varphi}\left(\mathbf{c}\right|\mathbf{x})$ | ${p}_{\phi}\left(\mathbf{z}\right|\mathbf{c})p\left(\mathbf{c}\right)$ | ${\mathbb{E}}_{{q}_{\varphi}(\mathbf{c},\mathbf{x})}[KL({q}_{\varphi}\left(\mathbf{z}\right|\mathbf{x})\parallel {q}_{\varphi}\left(\mathbf{z}\right|\mathbf{c})]$ | ${\mathbb{E}}_{q\left(c\right)}[KL({q}_{\varphi}\left(\mathbf{z}\right|\mathbf{c})\parallel {p}_{\phi}\left(\mathbf{z}\right|\mathbf{c}))]$ | [7] |

INDp | ${q}_{\varphi}\left(\mathbf{z}\right|\mathbf{x},\mathbf{c}){q}_{\varphi}\left(\mathbf{c}\right|\mathbf{x})$ | $p\left(\mathbf{z}\right)p\left(\mathbf{c}\right)$ | ${\mathbb{I}}_{q}(\mathbf{z},(\mathbf{c},\mathbf{x}))$ | $KL({q}_{\varphi}\left(\mathbf{z}\right)\parallel p\left(\mathbf{z}\right))$ | [26,34] |

INDqp | ${q}_{\varphi}\left(\mathbf{z}\right|\mathbf{x}){q}_{\varphi}\left(\mathbf{c}\right|\mathbf{x})$ | $p\left(\mathbf{z}\right)p\left(\mathbf{c}\right)$ | ${\mathbb{I}}_{q}(\mathbf{z},\mathbf{x})$ | $KL({q}_{\varphi}\left(\mathbf{z}\right)\parallel p\left(\mathbf{z}\right))$ | [13,21] |

**Table 2.**FashionMNIST: first, five categories with higher probability for each label based on the marginal categorical posterior condition on each label of CPVAE and INDq with discrete latent variable with 150 categories. With bold, we mark the categories that appear in more than one label.

CPVAE | INDq | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

label 0 | 9 | 80 | 127 | 24 | 62 | 135 | 133 | 149 | 42 | 64 |

label 1 | 143 | 1 | 144 | 64 | 135 | 135 | 149 | 71 | 64 | 3 |

label 2 | 146 | 25 | 145 | 41 | 29 | 135 | 149 | 71 | 42 | 3 |

label 3 | 135 | 91 | 17 | 138 | 64 | 135 | 149 | 133 | 3 | 148 |

label 4 | 83 | 138 | 109 | 37 | 147 | 135 | 71 | 149 | 42 | 114 |

label 5 | 34 | 88 | 77 | 3 | 85 | 63 | 13 | 34 | 39 | 31 |

label 6 | 79 | 122 | 81 | 26 | 127 | 149 | 135 | 42 | 133 | 71 |

label 7 | 53 | 16 | 43 | 46 | 19 | 63 | 34 | 13 | 31 | 126 |

label 8 | 47 | 95 | 137 | 97 | 63 | 7 | 63 | 105 | 123 | 145 |

label 9 | 94 | 130 | 101 | 119 | 132 | 13 | 63 | 34 | 31 | 145 |

**Table 3.**Comparison of negative variational lower bounds for the different methods over the test data sets.

MNIST | FashionMNIST | $\mathit{Omniglot}$ | ||||||
---|---|---|---|---|---|---|---|---|

$\mathit{c}=\mathbf{10}$ | $\mathit{c}=\mathbf{150}$ | $\mathit{c}=\mathbf{500}$ | $\mathit{c}=\mathbf{10}$ | $\mathit{c}=\mathbf{150}$ | $\mathit{c}=\mathbf{500}$ | $\mathit{c}=\mathbf{50}$ | $\mathit{c}=\mathbf{500}$ | |

CP | 87.15 | 88.53 | 89.91 | 232.52 | 233.79 | 234.89 | 117.51 | 120.64 |

INDq | 89.67 | 92.15 | 93.38 | 232.71 | 234.41 | 234.93 | 125.28 | 124.48 |

INDp | 88.20 | 88.77 | 88.53 | 229.83 | 230.41 | 231.34 | 120.92 | 121.83 |

INDqp | 87.93 | 88.21 | 88.98 | 228.65 | 230.98 | 231.18 | 119.99 | 120.82 |

VAE | 88.75 | — | — | 231.49 | — | — | 115.06 | — |

MG | 89.43 | 88.96 | 88.85 | 267.07 | 272.60 | 274.55 | 116.31 | 116.12 |

VP | 87.94 | 86.55 | 86.07 | 230.87 | 229.82 | 270.83 | 114.01 | 113.74 |

HVAE | 86.7 | — | — | 230.10 | — | — | 110.81 | — |

HVP | 85.90 | 85.09 | 85.01 | 229.67 | 229.36 | 229.62 | 110.50 | 110.16 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lavda, F.; Gregorová, M.; Kalousis, A.
Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data. *Entropy* **2020**, *22*, 888.
https://doi.org/10.3390/e22080888

**AMA Style**

Lavda F, Gregorová M, Kalousis A.
Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data. *Entropy*. 2020; 22(8):888.
https://doi.org/10.3390/e22080888

**Chicago/Turabian Style**

Lavda, Frantzeska, Magda Gregorová, and Alexandros Kalousis.
2020. "Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data" *Entropy* 22, no. 8: 888.
https://doi.org/10.3390/e22080888