# Comparing Classical and Quantum Generative Learning Models for High-Fidelity Image Synthesis

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Generative Modeling

#### 1.2. Trilemma of Generative Learning

## 2. Background

#### 2.1. Classical Image Synthesis

#### 2.1.1. Restricted Boltzmann Machine

#### 2.1.2. Variational Autoencoder

#### 2.1.3. Generative Adversarial Networks

#### 2.1.4. Denoising Diffusion Probabilistic Model

- Increasing depth versus width, holding model size relatively constant.
- Increasing the number of attention heads.
- Using attention at 32 × 32, 16 × 16, and 8 × 8 resolutions rather than only at 16 × 16.
- Using the BigGAN residual block for upsampling and downsampling the activations.
- Rescaling residual connections with $\frac{1}{\sqrt{2}}$.

#### 2.2. Quantum Machine Learning

#### 2.2.1. Quantum Boltzmann Machine

#### 2.2.2. Image Classification

#### 2.2.3. Image Synthesis

## 3. Methods

#### 3.1. Goal

- perform the image synthesis directly on the QBM,
- evaluate the performance of the QBM against a(n) RBM, VAE, GAN, and DDPM,
- evaluate various generative modeling methods on FID, KID, and Inception scores,
- model a richer image dataset, CIFAR-10.

#### 3.2. Data

#### 3.3. Classical Models

#### 3.4. Quantum Model

#### 3.5. Hyper-Parameters

#### 3.6. Metrics

#### 3.6.1. Inception Score

- (i)
- Fidelity is captured by the probability distribution produced as classification output by the Inception classifier on a generated image [27]. Note that a highly skewed distribution with a single peak indicates that the Inception classifier is able to identify the image as belonging to a specific class with high confidence. Therefore, the image is considered high fidelity.
- (ii)
- Diversity is captured by summing all the probability distributions produced for individually generated classes. The uniform nature of the resultant sum of distributions is indicative of the diversity of the generated images. E.g., a model trained on CIFAR-10 that only manages to produce high-fidelity images of dogs would severely fail to be diverse.

#### 3.6.2. Fréchet Inception Distance (FID)

#### 3.6.3. Kernel Inception Distance (KID)

#### 3.6.4. Quantitative Metrics

#### 3.6.5. Qualitative Metrics

## 4. Results

#### 4.1. Restricted Boltzmann Machine (RBM)

#### 4.2. Variational Autoencoder (VAE)

#### 4.3. Generative Adversarial Networks (GANs)

#### 4.4. Denoising Diffusion Probabilistic Model (DDPM)

#### 4.5. Quantum Boltzmann Machine (QBM)

## 5. Analysis

#### 5.1. Scores

#### 5.1.1. Inception Score

#### 5.1.2. Fréchet Inception Distance

#### 5.1.3. Kernel Inception Distance

#### 5.2. Feature Extraction

#### 5.3. Trilemma of Generative Learning

#### 5.3.1. High-Quality Sampling

#### 5.3.2. Mode Coverage and Diversity

#### 5.3.3. Fast Sampling

#### 5.3.4. Conclusions

## 6. Conclusions and Future Work

- Restricted Boltzmann Machine
- Variational Autoencoder
- Generative Adversarial Network
- Denoising Diffusion Probabilistic Model

#### 6.1. Image Preprocessing

#### 6.2. Quantum Computing

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. arXiv
**2021**, arXiv:2105.05233. [Google Scholar] - Jain, S.; Ziauddin, J.; Leonchyk, P.; Yenkanchi, S.; Geraci, J. Quantum and classical machine learning for the classification of non-small-cell lung cancer patients. SN Appl. Sci.
**2020**, 2, 1088. [Google Scholar] [CrossRef] - Thulasidasan, S. Generative Modeling for Machine Learning on the D-Wave; Technical Report; Los Alamos National Lab. (LANL): Los Alamos, NM, USA, 2016. [Google Scholar] [CrossRef]
- Amin, M.H.; Andriyash, E.; Rolfe, J.; Kulchytskyy, B.; Melko, R. Quantum Boltzmann Machine. Phys. Rev. X
**2018**, 8, 21050. [Google Scholar] [CrossRef] - Xiao, Z.; Kreis, K.; Vahdat, A. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs. arXiv
**2021**, arXiv:2112.07804. [Google Scholar] - Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations; MIT Press: Cambridge, MA, USA, 1986; Volume 1. [Google Scholar]
- Freund, Y.; Haussler, D. Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in Neural Information Processing Systems; Moody, J., Hanson, S., Lippmann, R., Eds.; Morgan-Kaufmann: Burlington, MA, USA, 1991; Volume 4. [Google Scholar]
- Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA
**1982**, 79, 2554–2558. [Google Scholar] [CrossRef] [PubMed] - Hinton, G.E. A Practical Guide to Training Restricted Boltzmann Machines. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 599–619. [Google Scholar] [CrossRef]
- Carreira-Perpiñán, M.Á.; Hinton, G.E. On Contrastive Divergence Learning. In Proceedings of the AISTATS, Bridgetown, Barbados, 6–8 January 2005. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv
**2014**, arXiv:1312.6114. [Google Scholar] - Rocca, J. Understanding Variational Autoencoders (VAES). 2021. Available online: https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73 (accessed on 3 December 2023).
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv
**2014**, arXiv:1406.2661. [Google Scholar] [CrossRef] - A Beginner’s Guide to Generative Adversarial Networks (Gans). Available online: https://wiki.pathmind.com/generative-adversarial-network-gan (accessed on 3 December 2023).
- Arjovsky, M.; Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. arXiv
**2017**, arXiv:1701.04862. [Google Scholar] - Hinton, G.E. Training Products of Experts by Minimizing Contrastive Divergence. Neural Comput.
**2002**, 14, 1771–1800. [Google Scholar] [CrossRef] [PubMed] - What is Quantum Annealing? D-Wave System Documentation. Available online: https://docs.dwavesys.com/docs/latest/c_gs_2.html (accessed on 3 December 2023).
- Lu, B.; Liu, L.; Song, J.Y.; Wen, K.; Wang, C. Recent progress on coherent computation based on quantum squeezing. AAPPS Bull.
**2023**, 33, 7. [Google Scholar] [CrossRef] - Wittek, P.; Gogolin, C. Quantum Enhanced Inference in Markov Logic Networks. Sci. Rep.
**2017**, 7, 45672. [Google Scholar] [CrossRef] [PubMed] - Li, W.; Deng, D.L. Recent advances for quantum classifiers. Sci. China Phys. Mech. Astron.
**2022**, 65, 220301. [Google Scholar] [CrossRef] - Wei, S.; Chen, Y.; Zhou, Z.; Long, G. A quantum convolutional neural network on NISQ devices. AAPPS Bull.
**2022**, 32, 2. [Google Scholar] [CrossRef] - Sleeman, J.; Dorband, J.E.; Halem, M. A hybrid quantum enabled RBM advantage: Convolutional autoencoders for quantum image compression and generative learning. arXiv
**2020**, arXiv:2001.11946. [Google Scholar] - Krizhevsky, A.; Nair, V.; Hinton, G. CIFAR-10 (Canadian Institute for Advanced Research). Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 3 December 2023).
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 3 December 2023).
- Eckersley, P.; Nasser, Y. EFF AI Progress Measurement Project. 2017. Available online: https://www.eff.org/ai/metrics (accessed on 3 December 2023).
- Mack, D. A Simple Explanation of the Inception Score. 2019. Available online: https://medium.com/octavian-ai/a-simple-explanation-of-the-inception-score-372dff6a8c7a (accessed on 3 December 2023).
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv
**2015**, arXiv:1512.00567. [Google Scholar] - Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. arXiv
**2016**, arXiv:1606.03498. [Google Scholar] - Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv
**2017**, arXiv:1706.08500. [Google Scholar] [CrossRef] - Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying MMD GANs. arXiv
**2018**, arXiv:1801.01401. [Google Scholar] [CrossRef] - Cloud Tensor Processing Units (TPUS)|Google Cloud. Available online: https://cloud.google.com/tpu/docs/tpus (accessed on 3 December 2023).
- Dhillon, P.S.; Foster, D.; Ungar, L. Transfer Learning Using Feature Selection. arXiv
**2009**, arXiv:0905.4022. [Google Scholar] [CrossRef]

**Figure 1.**Generative Learning Trilemma [6]. Labels show frameworks that tackle two of the three requirements well.

**Figure 2.**Restricted Boltzmann Machine architecture [3].

**Figure 3.**Variational Autoencoder architecture [13].

**Figure 4.**GANs architecture [15].

**Figure 5.**DDPM Markov chain [1].

**Figure 6.**Generated samples on CelebA-HQ 256 × 256 by DDPM [1].

**Figure 7.**D-Wave Quantum Processing Unit (QPU) topology Chimera graph [18].

**Figure 8.**Hybrid Approach that used a Classical Autoencoder to map the image space to a compressed space [23].

**Figure 9.**Ten random images from each class of CIFAR-10 with respective class labels [24].

**Figure 10.**Binarization of a normalized vector to a set of binary vectors [3].

**Figure 11.**RBM-generated image synthesis output from respective input. (

**a**) RBM input images; (

**b**) RBM output images.

**Figure 12.**VAE-generated image synthesis output from respective input. (

**a**) VAE input images; (

**b**) VAE output images.

**Figure 13.**GAN-generated image synthesis output from respective input. (

**a**) GAN input images; (

**b**) GAN output images.

QBM | RBM | VAE | GAN | DDPM | |
---|---|---|---|---|---|

Epochs | 10 | 10 | 50 | 50 | 30,000 |

Batch Size | 256 | 256 | 512 | 128 | - |

# of Hidden Nodes | 128 | 2500 | 32 | 64 | 32 |

Learning Rate (${10}^{-3}$) | 0.0035 | 0.0035 | 0.2 | 0.2 | 0.2 |

Metric | Description | Performance |
---|---|---|

Inception | K-L Divergence between conditional and marginal label distributions over generated data | Higher is better |

FID | Wasserstein distance between multivariate Gaussians fitted to data embedded into a feature space | Lower is better |

KID | Measures the dissimilarity between two probability distributions ${P}_{r}$ and ${P}_{g}$ using samples drawn independently from each distribution | Lower is better |

QBM | RBM | VAE | GAN | DDPM | |
---|---|---|---|---|---|

Inception | 1.77 | 3.84 | 7.87 | 2.72 | 3.319 |

FID | 210.83 | 379.65 | 93.48 | 122.49 | 307.51 |

KID | 0.068 | 0.191 | 0.024 | 0.033 | 0.586 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jain, S.; Geraci, J.; Ruda, H.E.
Comparing Classical and Quantum Generative Learning Models for High-Fidelity Image Synthesis. *Technologies* **2023**, *11*, 183.
https://doi.org/10.3390/technologies11060183

**AMA Style**

Jain S, Geraci J, Ruda HE.
Comparing Classical and Quantum Generative Learning Models for High-Fidelity Image Synthesis. *Technologies*. 2023; 11(6):183.
https://doi.org/10.3390/technologies11060183

**Chicago/Turabian Style**

Jain, Siddhant, Joseph Geraci, and Harry E. Ruda.
2023. "Comparing Classical and Quantum Generative Learning Models for High-Fidelity Image Synthesis" *Technologies* 11, no. 6: 183.
https://doi.org/10.3390/technologies11060183