# From Beginning to BEGANing: Role of Adversarial Learning in Reshaping Generative Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Non-Adversarial Generative Networks

#### 2.1. Explicit Density Models

#### 2.1.1. Structures of Tractable Density

**Fully visible belief networks (FVBNs):**FVBNs fall among the three most popular approaches to generative modeling, along with generative adversarial networks (GANs) and variational autoencoders. This model uses the chain rule of probability to decompose a probability distribution of an n-dimensional vector into a product of one-dimensional probability distributions:

**Nonlinear independent components analysis (Nonlinear ICA):**Nonlinear ICAs are another popular tractable density method and are often mentioned in comparison to FVBNs and GANs. They are based on the definition of a continuous, non-linear transformation of data between two different spaces or dimensionalities. As the name suggests, it attempts to represent the observed data as statistically independent component variables.

**Neural Autoregressive Distributed Estimator (NADE):**Neural autoregressive distributed estimator (NADE) models are neural network architectures that can be applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance [24].

**Masked Autoregressive Distributed Estimator (MADE):**Masked autoregressive models use a binary mask matrix for an element wise multiplication for each matrix to zero connections so as to fulfill the autoregressive property. Here, computing the negative log-likelihood is equivalent to sequentially predicting each dimension of input x [25].

**PixelRNN:**PixelRNN is a deep neural network that sequentially predicts the pixels in an image along with the two spatial dimensions. This method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks [25].

#### 2.1.2. Variational Approximations

**Variational Auto Encoder (VAE):**VAEs are appealing because they are built upon standard function approximators (neural networks) and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits, faces, house numbers, CIFAR images, physical models of scenes, segmentation, and predicting the future from static images [26].

#### 2.1.3. Markov Chain Approximations

**Restricted Boltzmann Machines:**A restricted Boltzmann machine (RBM) [27] is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. As the taxonomy indicates, RBMs are a variant of Boltzmann machines, with the restriction that the neurons must form a bipartite graph: a pair of nodes from each of the two groups of units (commonly referred to as the “visible” and “hidden” units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group.

#### 2.2. Implicit Density Models

#### 2.2.1. Goal-Seeking Neural Networks (GSN)

#### 2.2.2. Adversarial Networks

## 3. Generative Adversarial Networks

_{g}over the data x, which is learnt by defining a prior (i.e., prior knowledge) over the input noise variables, p

_{g}(z). This is then mapped to a data space in the form of $G(z:{\Theta}_{g})$ where G is the differentiable function of the multilayer perceptron over ${\Theta}_{g}$.

#### 3.1. Convergence and Stability Issues of Generative Adversarial Networks

- Instability;
- Failure to converge.

#### 3.2. Comparative Analysis of Generative Adversarial Networks

#### 3.3. Critical Analysis of Generative Adversarial Networks

## 4. Conditional Generative Adversarial Nets

_{z}(z), and ground truth, y, in the generator. This allows the input for conditioning and the prior noise input to be considered in one single layer. The extent of complex generation mechanisms between these two abstract entities can be modified using higher order interactions.

#### 4.1. Comparative Analysis of Conditional Generative Adversarial Nets

#### 4.2. Critical Analysis of Conditional Generative Adversarial Nets

## 5. Deep Multi Scale Video Prediction beyond Mean Square Error

#### 5.1. Comparative Analysis of Deep Multi Scale Video Prediction beyond Mean Square Error

#### 5.2. Critical Analysis of Deep Multi Scale Video Prediction beyond Mean Square Error

## 6. Adversarial Autoencoders (AAE)

- A traditional reconstruction error criterion;
- An adversarial learning criterion to configure the output dispersion of distribution.

#### 6.1. Comparative Analysis of Adversarial Autoencoders

#### 6.2. Critical Analysis of Adversarial Autoencoders

## 7. Deep Convolutional Generative Adversarial Networks

- Instability of training that makes it difficult to reproduce results;
- Blurriness of generated real-world images (i.e., improvement of accuracy);
- Explaining the role of different convolution filters in the network.

- Deterministic spatial pooling functions were replaced with strided convolutions;
- Fully connected layers on top of convolutional features were eliminated.

#### 7.1. Comparative Analysis of DCGANs

#### 7.2. Critical Analysis of DCGANs

## 8. Energy-Based GANs

_{i}denotes the ith image in the set and Sit is the transpose of image S

_{i}. Here, bs refers to the batch size that has been chosen for processing.

#### 8.1. Comparative Analysis of Energy-Based Generative Adversarial Network

#### 8.2. Critical Analysis of Energy-Based Generative Adversarial Network

## 9. Least Squares Generative Adversarial Networks

#### 9.1. Comparative Analysis of LSGANs

#### 9.2. Critical Analysis of LSGANs

## 10. AdaGAN: Boosting Generative Models

#### 10.1. Analysis of AdaGAN Algorithm

#### 10.2. Critical Analysis of AdaGAN

## 11. Wasserstein GAN

_{r}to P

_{g}. This distance, also called the earth mover’s distance, is meant to improve the convergence and allow the generator to learn faster. This is based on two theorems which state that:

- If the generator function is continuous on the noise latent space, Lipschitz locally, and adheres to the regularity assumption 1, then the Wasserstein distance of the two distributions in question will also be continuous everywhere and differentiable almost everywhere;
- The total variation distance and Jenson–Shanon divergence reach zero while comparing two distributions where the original distribution is P and the generated distribution is ${P}_{n},n\in N,n\to \infty $. This also happens for the Wasserstein distance but only when the two distributions converge as P
_{n}converges to P.

#### 11.1. Comparative Analysis of Wasserstein GAN

#### 11.2. Critical Analysis of Wasserstein GAN

- WGAN suffer from the lack of scalability of the critic, which means that networks cannot be compared with different critics;
- The critics do not have in finite capacity and need to be estimated with intuition for how close to the EM distance they are;
- The architecture becomes unstable when any moment base optimizer is used to train it as the loss function is non-stationary. Hence, RMSProp was used;
- The training of WGANs takes much longer than other popular GAN models.

## 12. BEGAN: Boundary Equilibrium Generative Adversarial Networks

_{1}is the loss distribution L(x) and ${\mu}_{2}$ is the loss distribution $L(G(z))$. In order to minimize |m

_{1}− m

_{2}|, either of the equations in (28) can be used. Since the minimization of m

_{1}is conducive to autoencoding the images, (28(b)) is used for BEGAN. The equilibrium factor, as stated before, can then be given by (29).

- When the diversity ratio is lowered, the discriminator focuses on autoencoding images and reduces the image diversity of the generated samples;
- When the diversity ratio is higher, more emphasis is subjected towards discriminating the generated images, hence the diversity of the images produced increases.

#### 12.1. Comparative Analysis of BEGAN: Boundary Equilibrium Generative Adversarial Networks

#### 12.2. Critical Analysis of BEGANs

- The question of the necessity to have an autoencoder discriminator;
- The question of the latent space size for the autoencoder from the previous point;
- The improvement of using variational auto encoders;
- The problem of knowing when to add noise to the input.

## 13. Creative Adversarial Networks

#### 13.1. Critical Analysis of CANs

- Whether the art was created by humans or a computer;
- Whether the art was original and held novelty or not.

#### 13.2. Critical Analysis of CANs

## 14. Mini-Batch Processing and Other Improved Techniques for Training GANs

#### 14.1. Comparative Analysis of Improved Techniques for Training GAN’s

#### 14.2. Critical Analysis of Improved Techniques for Training GANs

## 15. Generative Visual Manipulation on the Natural Image Manifold

- alteration in shape and color;
- transformation of an image;
- generation of a new image pertaining to the user data.

#### 15.1. Comparative Analysis of Generative Visual Manipulation on the Natural Image Manifold

#### 15.2. Critical Analysis of Generative Visual Manipulation on the Natural Image Manifold

## 16. Image-to-Image Translation with Conditional Adversarial Networks

#### 16.1. Comparative Analysis of Image-to-Image Translation with Conditional Adversarial Networks

#### 16.2. Critical Analysis of Image-to-Image Translation with Conditional Adversarial Networks

## 17. SEGAN: Speech Enhancement Generative Adversarial Network

#### 17.1. Analysis of SEGAN

#### 17.2. Critical Analysis of SEGAN

## 18. Recent Developments

#### 18.1. Transfer Learning

#### 18.2. Progressive Growing

## 19. Future Avenues

#### 19.1. Adversarial Noise

#### 19.2. Pruning of Adversarial Networks

#### 19.3. Adversarial Compression

#### 19.4. Single Image Super Resolution

#### 19.5. New Architectures

#### 19.6. Other Avenues for Future Work

## 20. Conclusions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Kriegeskorte, N.; Tal, G. Neural network models and deep learning. Curr. Biol.
**2019**, 29, R231–R236. [Google Scholar] [CrossRef] - Telgarsky, M. Benefits of depth in neural networks. In Proceedings of the Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; PMLR, Workshop and Conference Proceedings. Volume 49, pp. 1–23. [Google Scholar]
- Lucas, T.; Oord, A.V.D.; Bethge, M. A note on the evaluation of generative models. Published as a conference paper at ICLR 2016. arXiv
**2015**, arXiv:1511.01844. [Google Scholar] - Albahar, M.; Jameel, A. Deepfakes: Threats and countermeasures systematic review. J. Theor. Appl. Inf. Technol.
**2019**, 97, 3242–3250. [Google Scholar] - Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of Generative Adversarial Networks (GANs): An Updated Review. Arch. Comput. Methods Eng.
**2021**, 28, 525–552. [Google Scholar] [CrossRef] - Zheng, Q.; Yang, M.; Yang, J.; Zhang, Q.; Zhang, X. Improvement of Generalization Ability of Deep CNN via Implicit Regularization in Two-Stage Training Process. IEEE Access
**2018**, 6, 15844–15869. [Google Scholar] [CrossRef] - Zhao, M.; Jha, A.; Liu, Q.; Millis, B.A.; Jensen, A.M.; Lu, L.; Landman, B.A.; Tyska, M.J.; Huo, Y. Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking. Med. Image Anal.
**2021**, 71, 102048. [Google Scholar] [CrossRef] - Duan, M.; Li, K.; Liao, X.; Li, K. A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine. IEEE Trans. Neural Netw. Learn. Syst.
**2018**, 29, 2337–2351. [Google Scholar] [CrossRef] - Zhao, M.; Liu, Q.; Jha, A.; Deng, R.; Yao, T.; Mahadevan-Jansen, A.; Tyska, M.J.; Millis, B.A.; Huo, Y. VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding based Deep Learning. In Machine Learning in Medical Imaging. MLMI 2021. Lecture Notes in Computer Science; Lian, C., Cao, X., Rekik, I., Xu, X., Yan, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; Volume 12966, pp. 437–446. [Google Scholar] [CrossRef]
- Chen, C.; Li, K.; Wei, W.; Zhou, J.T.; Zeng, Z. Hierarchical Graph Neural Networks for Few-Shot Learning. IEEE Trans. Circuits Syst. Video Technol.
**2022**, 32, 240–252. [Google Scholar] [CrossRef] - Pu, B.; Li, K.; Li, S.; Zhu, N. Automatic Fetal Ultrasound Standard Plane Recognition Based on Deep Learning and IIoT. IEEE Trans. Ind. Inform.
**2021**, 17, 7771–7780. [Google Scholar] [CrossRef] - Jin, B.; Cruz, L.; Gonçalves, N. Pseudo RGB-D Face Recognition. IEEE Sens.
**2022**, 22, 21780–21794. [Google Scholar] [CrossRef] - Zhou, S.; Chen, L.; Sugumaran, V. Hidden Two-Stream Collaborative Learning Network for Action Recognition. Comput. Mater. Contin.
**2020**, 63, 1545–1561. [Google Scholar] [CrossRef] - Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Commun. ACM
**2014**, 63, 139–144. [Google Scholar] [CrossRef] - Ranzato, M.; Szlam, A.; Bruna, J.; Mathieu, M.; Collobert, R.; Chopra, S. Video (Language) Modeling: A Baseline for Generative Models of Natural Videos. 2016. Available online: http://arxiv.org/abs/1412.6604 (accessed on 1 September 2022).
- Adate, A.; Tripathy, B.K. Understanding single image super-resolution techniques with generative adversarial networks. In Proceedings of the 7th International Conference on Soft Computing for Problem Solving, SocPros 2017, IIT, Bhubaneswar, Odisha, India, 23–24 December 2017. [Google Scholar]
- Adate, A.; Tripathy, B.K. S-LSTM-GAN: Shared recurrent neural networks with adversarial training. In Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, ICDECT 2017, Symbiosis University, Pune, India, 15–16 December 2018. [Google Scholar]
- Tolstikhin, I.O.; Gelly, S.; Bousquet, O.; Simon-Gabriel, C.-J.; Schölkopf, B. AdaGAN: Boosting Generative Models. Advances in neural information processing systems. arXiv
**2017**, arXiv:1701.02386. [Google Scholar] - Berthelot, D.; Schumm, T.; Metz, L. BEGAN: Boundary Equilibrium Generative Adversarial Networks 2017. Available online: http://arxiv.org/abs/1703.10717 (accessed on 1 September 2022).
- Likas, A. Probability density estimation using artificial neural networks. Comput. Phys. Commun.
**2021**, 118, 167–175. [Google Scholar] [CrossRef] - Scholz, F. Maximum likelihood estimation. In Encyclopedia of Statistical Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Mark, H.; Martin, P. An introduction and survey of estimation of distribution algorithms. Swarm Evol. Comput.
**2011**, 1, 111–128. [Google Scholar] [CrossRef] [Green Version] - Lappalainen, H.; Honkela, A. Bayesian non-linear independent component analysis by multi-layer perceptron. In Advances in Independent Component Analysis, 1st ed.; Girolami, M., Ed.; Springer: London, UK, 2000; pp. 93–121. [Google Scholar]
- Uria, B.; Cote, M.-A.; Gregor, K.; Murray, I.; Larochelle, H. Neural autoregressive distribution estimation. J. Mach. Learn. Res.
**2016**, 17, 7184–7220. [Google Scholar] - Oord, A.V.D.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 11 June 2016; JMLR: Brookline, MA, USA, 2016; Volume 48, pp. 1747–1756. [Google Scholar]
- Pu, Y.; Gan, Z.; Henao, R.; Yuan, X.; Li, C.; Stevens, A.; Carin, L. Variational autoencoder for deep learning of images, labels and captions. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2016; pp. 2352–2360. [Google Scholar]
- Sutskever, I.; Hinton, G.E.; Taylor, G.W. The recurrent temporal restricted Boltzmann machine. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2009; pp. 1601–1608. [Google Scholar]
- Henrion, M. Propagating uncertainty in Bayesian networks by probabilistic logic sampling. Mach. Intell. Patt. Rec. North-Holl.
**1988**, 5, 149–163. [Google Scholar] - Filho, E.C.D.B.C.; Bisset, D.L.; Fairhurst, M.C. A Goal Seeking Neuron for Boolean Neural Networks. In International Neural Network Conference, Paris, France, 9–13 July1990; Springer: Dordrecht, The Netherlands, 1990; pp. 894–897. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. 2014. Available online: http://arxiv.org/abs/1411.1784 (accessed on 1 September 2022).
- Elgammal, A.; Liu, B.; Elhoseiny, M.; Mazzone, M. CAN: Creative Adversarial Networks, generating “art” by learning about styles and deviating from style norms. arXiv
**2017**, arXiv:1706.07068. [Google Scholar] - Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.J. Adversarial Autoencoders. 2016. Available online: http://arxiv.org/abs/1511.05644 (accessed on 1 September 2022).
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2016. Available online: http://arxiv.org/abs/1511.06434 (accessed on 1 December 2022).
- Zhao, J.J.; Mathieu, M.; LeCun, Y. Energy-based Generative Adversarial Network. 2017. Available online: http://arxiv.org/abs/1609.03126 (accessed on 1 December 2022).
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z. Multi-Class Generative Adversarial Networks with the L2 Loss Function. In Proceedings of the IEEE 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; Available online: http://arxiv.org/abs/1611.04076 (accessed on 1 December 2022).
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. Available online: http://proceedings.mlr.press/v70/arjovsky17a/arjovsky17a.pdf (accessed on 1 December 2022).
- Pascual, S.; Serrà, J.; Bonafonte, A. Towards generalized speech enhancement with generative adversarial networks. arXiv
**2019**, arXiv:1904.03418. [Google Scholar] - Wang, J.; Yang, Y.; Wang, T.; Sherratt, R.; Zhang, J. Big Data Service Architecture: A Survey. J. Internet Technol.
**2020**, 21, 393–405. [Google Scholar] - Zhang, J.; Zhong, S.; Wang, T.; Chao, H.-C.; Wang, J. Blockchain-Based Systems and Applications: A Survey. J. Internet Technol.
**2020**, 21, 1–14. [Google Scholar] - Mescheder, L.; Geiger, A.; Nowozin, S. Which Training Methods for GANs do actually Converge? In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Nagarajan, V.; Kolter, J.Z. Gradient descent GAN optimization is locally stable. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Mescheder, L.; Nowozin, S.; Geiger, A. The numerics of GANs. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Advances in Neural Information Processing Systems. Volume 30. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. Advances in Neural Information Processing Systems (NeurIPS 2017). 2017, Volume 30, pp. 1–11. Available online: https://proceedings.neurips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf (accessed on 1 December 2022).
- Kodali, N.; Abernethy, J.; Hays, J.; Kira, Z. On Convergence and Stability of GANs. arXiv
**2017**, arXiv:1705.07215. [Google Scholar] - Sønderby, C.K.; Raiko, T.; Maaløe, L.; Sønderby, S.K.; Winther, O. Ladder Variational Autoencoders. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5 December 2016; Advances in Neural Information Processing Systems. Volume 29. [Google Scholar]
- Roth, K.; Lucchi, A.; Nowozin, S.; Hofmann, T. Stabilizing training of generative adversarial networks through regularization. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2017; Volume 30. [Google Scholar]
- Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv
**2017**, arXiv:1701.04862. [Google Scholar] - Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training Generative Adversarial Networks with Limited Data. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 12 December 2020. [Google Scholar]
- Mo, S.; Cho, M.; Shin, J. Freeze the discriminator: A simple baseline for fine-tuning GANs. arXiv
**2020**, arXiv:2002.10964. [Google Scholar] - Noguchi, A.; Harada, T. Image generation from small datasets via batch statistics adaptation. In Proceedings of the ICCV, Seoul, Republic of Korea, 27 October 2019. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv
**2018**, arXiv:1706.08500v6. [Google Scholar] - Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv
**2018**, arXiv:1802.05957v1. [Google Scholar] - Lee, H.; Grosse, R.; Ranganath, R.; Ng, A.Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th International Conference on Machine Learning, ICML, Montreal, QU, Canada, 14–18 June 2009; pp. 609–616. [Google Scholar]
- Bengio, Y.; Thibodeau-Laufer, E.; Yosinski, J. Deep Generative Stochastic Networks Trainable by Backprop. 2014. Available online: http://arxiv.org/abs/1306.1091 (accessed on 1 December 2022).
- Bengio, Y.; Mesnil, G.; Dauphin, Y.; Rifai, S. Better Mixing Via Deep Representations 2012. Available online: http://arxiv.org/abs/1207.4404 (accessed on 1 December 2022).
- Goodfellow, I.; Mirza, M.; Courville, A.; Bengio, Y. Multi-prediction deep Boltzmann machines. In Advances in Neural Information Processing Systems, Proceedings of the 15th International Conference, ICONIP 2008, Auckland, New Zealand, 25–28 November 2008; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Revised Selected Papers, Part I; Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 548–556. Available online: http://papers.nips.cc/paper/5024-multi-prediction-deep-boltzmann-machines.pdf (accessed on 1 December 2022).
- Pascanu, R.; Mikolov, T.; Bengio, Y. Understanding the Exploding Gradient Problem. 2012. Available online: http://arxiv.org/abs/1211.5063 (accessed on 1 December 2022).
- Masci, J.; Meier, U.; Ciresan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar]
- Mathieu, M.; Couprie, C.; LeCun, Y. Deep Multi-Scale Video Prediction Beyond Mean Square Error. ICLR 2016. Available online: http://arxiv.org/abs/1511.05440 (accessed on 1 December 2022).
- Springenberg, J.T. Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks. 2016. Available online: https://doi.org/10.48550/arXiv.1511.06390 (accessed on 1 December 2022).
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv
**2014**, arXiv:1312.6114. [Google Scholar] - Rasmus, A.; Berglund, M.; Honkala, M.; Valpola, H.; Raiko, T. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2015; pp. 3546–3554. [Google Scholar]
- Maaloe, L.; Sonderby, C.K.; Sonderby, S.K.; Winther, O. Auxiliary Deep Generative Models. 2016. Available online: https://arxiv.org/pdf/1602.05473.pdf (accessed on 1 December 2022).
- LeCun, Y.; Chopra, S.; Hadsell, R.; Ranzato, M.; Huang, F. A tutorial on energy-based learning. In Predicting Structured Data; Bakir, G., Hofman, T., Schölkopf, B., Smola, A., Taskar, B., Eds.; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Salimans, T.; Goodfellow, I.J.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. 2016. Available online: http://arxiv.org/abs/1606.03498 (accessed on 1 September 2022).
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1, Harrahs and Harveys, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M.A. Striving for Simplicity: The All-Convolutional Net. 2014. Available online: http://arxiv.org/abs/1412.6806 (accessed on 1 September 2022).
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015. Available online: http://arxiv.org/abs/1502.03167 (accessed on 1 December 2022).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; Available online: http://arxiv.org/abs/1512.03385 (accessed on 1 September 2022). [CrossRef] [Green Version]
- Zhu, J.-Y.; Krähenbühl, P.; Shechtman, E.; Efros, A.A. Generative Visual Manipulation on the Natural Image Manifold. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; pp. 597–613. [Google Scholar] [CrossRef] [Green Version]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2017; pp. 1125–1134. [Google Scholar] [CrossRef] [Green Version]
- Hwang, S.; Kim, H. Self-transfer learning for fully weakly supervised object localization. arXiv
**2016**, arXiv:1602.01625. [Google Scholar] - Mahapatra, D.; Ge, Z. Training data independent image registration with GANs using transfer learning and segmentation information. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; IEEE: New York, NY, USA, 2019; pp. 709–713. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild, CRCV-TR-12-01. arXiv
**2012**, arXiv:1212.0402. [Google Scholar] - Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of Gans for Improved Quality, Stability, and Variation, Iclr 2018. Available online: http://research.nvidia.com/publication/2017-10Progressive-Growing-of5 (accessed on 1 September 2022).
- Adate, A.; Saxena, R.; Don, S. Understanding how adversarial noise affects single image classification. In Smart Secure Systems: IoT and Analytics Perspective; Venkataramani, G.P., Sankaranarayanan, K., Mukherjee, S., Arputharaj, K., Narayanan, S.S., Eds.; Springer: Singapore, 2018; pp. 287–295. [Google Scholar]
- Yu, C.; Pool, J. Self-supervised gan compression. arXiv
**2020**, arXiv:2007.01491. [Google Scholar] - Song, X.; Chen, Y.; Feng, Z.H.; Hu, G.; Yu, D.J.; Wu, X.J. SP-GAN: Self-Growing and Pruning Generative Adversarial Networks. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 32, 2458–2469. [Google Scholar] [CrossRef] [PubMed] - Adate, A.; Saxena, R.; Gnana Kiruba, G. Analyzing image compression using generative adversarial networks. In Proceedings of the 7th International Conference on Soft Computing for Problem Solving, SocPros 2017, IIT, Bhubaneswar, Odisha, India, 23–24 December 2017. [Google Scholar]
- Ma, R.; Junying, L.; Peng, L.; Jing, G. Reconstruction of Generative Adversarial Networks in Cross Modal Image Generation with Canonical Polyadic Decomposition. Wirel. Commun. Mob. Comput.
**2021**, 2021, 8868781. [Google Scholar] [CrossRef] - Takamoto, M.; Yusuke, M.; Takamoto, M.; Morishita, Y. An Empirical Study of the Effects of Sample-Mixing Methods for Efficient Training of Generative Adversarial Networks. In Proceedings of the 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), Tokyo, Japan, 8–10 September 2021; pp. 49–55. [Google Scholar] [CrossRef]
- Tseng, H.Y. Generative Adversarial Networks for Content Creation. Ph.D. Thesis, University of California, Merced, CA, USA, 2021. [Google Scholar]
- Armandpour, M.; Ali, S.; Chunyuan, L.; Zhou, M. Partition-Guided GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 25 June 2021; pp. 5099–5109. [Google Scholar]

**Figure 4.**Multi-scale architecture [59].

**Figure 5.**Adversarial autoencoder [32].

**Figure 6.**Generator for the large-scale scene understanding (LSUN) bedroom dataset DCGAN that uses fractional convolutional layers (deconvolutional layers) to generate images from a 100-dimension noisy distribution.

**Figure 7.**Energy-based generative adversarial network with Z as the noise space, X as the original image, and E as the energy assignment to the given generated variable.

**Figure 8.**Comparison of Inception score vs. bin percentages for three different cases of GAN and EBGAN.

**Figure 9.**The generated ImageNet outputs that show how 256 × 256 images are generated using EBGAN with Pull-away Term.

**Figure 10.**Model architecture used for the LSUN dataset to compare the obtained results: (

**a**) the generator; and (

**b**) the discriminator.

**Figure 11.**A comparison of vanilla GANs and LSGANs iterate through the dataset to produce the target output.

**Figure 14.**The architecture of the generator/decoder (

**Left**), and the encoder (

**Right**) for the given network.

**Figure 16.**Images generated by the CAN that were ranked as the most realistic and unique by human subjects.

**Figure 17.**(

**a**) Samples generated by model during semi-supervised training; and (

**b**) samples generated with minibatch discrimination.

**Figure 18.**Images generated while performing the task of realistic photo manipulation of color and shape.

**Figure 20.**Encoder-decoder architecture for speech enhancement (G network). The arrows between encoder and decoder blocks denote skip connections.

**Figure 21.**Waveform and spectrogram of: (

**a**) clean; (

**b**) noisy; and (

**c**) enhanced speech corresponding to the sentence—“We were surprised to see”.

**Figure 23.**1024 × 1024 resolution images generated by transfer learning. The rightmost column images show those in the latent space of the model.

**Figure 25.**The fading of layers during the training process. The layers go from (

**a**) to (

**b**) by assuming the already-trained layers as a residual block where the weights move from 0 to 1.

**Figure 26.**The results generated on the ‘Celeb A dataset’ using: (

**a**) the progressive growth GAN; and (

**b**) WGANs with gradient penalty.

Model | Year | Author | Generative Model | Discriminative Model |
---|---|---|---|---|

Generative Adversarial Networks (GANs) | 2014 | Goodfellow [14] | Multilayer Perceptron | Multilayer Perceptron |

Conditional Adversarial Networks (CGANs) | 2014 | Mirza & Osindero [30] | Multilayer perceptron with Conditional y | Multilayer Perceptron with Conditional y |

Deep Multi-Scaled Video Prediction beyond Mean Square Error | 2015 | Mathieu et al. [31] | Padded Convolutions interlaced with ReLU non-linearities | Standard non-padded convolutions followed by fully connected layers and ReLU non-linearities |

Adversarial Autoencoders (AAE) | 2015 | Makhzani et al. [32] | Encoder and Decoder using Universal Approximator Posterior that encode the features as distribution | discriminator using Universal Approximator Posterior |

Deep Convolutional GANs (DCGANs) | 2016 | Radford et al. [33] | CNN with batch normalization, connecting highest convolutional features to input/output, ReLU activation | CNN with strided convolutions, batch normalizations and flattening of last convolution layer, leaky ReLU activation |

Energy Based GANs (EBGAN) | 2016 | Zhao et al. [34] | Ladder Network (LN) model with autoencoder using batch normalization to generate low energy output | Ladder Network model with CNN as autoencoder using batch normalization to assign high energy to generated output |

Least Square GANs (LSGAN) | 2017 | Mao et al. [35] | Following DCGANs, ReLU activation on CNN without batch normalization | Least Square Loss function with leaky ReLU on CNN |

AdaGAN: Boosting Generative Model | 2017 | Tolstikhin et al. [18] | Two hidden ReLU layers with size 10 and 5 respectively, and latent space Z = R^{5} | Two hidden layers of ReLU of size 20 and 10 respectively |

Wasserstein GANs | 2017 | Arjovsky et al. [36] | Multi-Layer Perceptron network with 4 hidden layers and 512 units at each layer, using Wasserstein distance | Tested with both MLP discriminator and DCGAN discriminator |

Boundary Equilibrium GANs (BEGAN) | 2017 | Berthelot et al. [19] | Generator uses same architecture as discriminators decoder with different weights | Convolutional Deep Neural Network built as am Autoencoder as proposed in EBGAN with ELUs |

Creative Adversarial Networks | 2017 | Elgammal et al. [31] | Similar to DCGAN architecture, starting with a 4 × 4 spatial extant with 2048 feature maps and converted to finally a 256 × 256 pixel image. | Constructed as a ‘body’ of convolutional layers, each of which is followed by a leaky ReLU activation, and two heads, representing multi-label loss and fake image loss |

Speech Enhancement GANs | 2017 | Pascual S, et al. [37] | Fully convolutional neural network with strided convolutions and parametric ReLUs with skip connections between encoding and decoding units; LSGAN loss | One-dimensional convolutional structure using batch normalization and Leaky ReLU non-linearities |

**Table 2.**Window-based mean log-likelihood estimation of adversarial networks versus deep belief networks, stacked conditional autoencoders and deep gradient stochastic networks.

Model | MNIST (×10^{2}) | TFD (×10^{3}) |
---|---|---|

Deep GSN [54] | 2.14 ± 0.011 | 1.890 ± 0.029 |

DBN [55] | 1.38 ± 0.02 | 1.909 ± 0.066 |

Stacked CAE [55] | 1.21 ± 0.016 | 2.110 ± 0.05 |

Adversarial nets | 2.25 ± 0.02 | 2.057 ± 0.026 |

**Table 3.**Results for different models used over MNIST dataset [30].

Model | MNIST |
---|---|

Deep Belief Network [57] | 138 ± 2 |

Stacked ConvAutoEncoder [58] | 121 ± 1.6 |

Deep Stochastic Network [54] | 214 ± 1.1 |

Generative Adversarial Network [14] | 225 ± 2 |

Conditional Adversarial Network [30] | 132 ± 1.8 |

**Table 4.**Comparison results obtained after addition of different loss functions [59].

Type of Loss | 1st Frame Similarity | 2nd Frame Similarity | ||
---|---|---|---|---|

PSNR | SSIM | PSNR | SSIM | |

Single scale ${l}_{2}$ loss | 26.5 | 0.84 | 22.4 | 0.82 |

Multi scale ${l}_{2}$ loss | 27.6 | 0.86 | 22.5 | 0.81 |

Multi scale ${l}_{1}$ loss | 28.7 | 0.88 | 23.8 | 0.83 |

Multi Scale Gradient Difference ${l}_{1}$ Loss | 29.4 | 0.90 | 24.9 | 0.84 |

Multi Scale Gradient Difference ${l}_{1}^{*}$ Loss | 29.9 | 0.90 | 26.4 | 0.87 |

Adversarial Loss * | 30.6 | 0.89 | 26.1 | 0.85 |

Adversarial Gradient Difference Loss * | 31.5 | 0.91 | 28.0 | 0.87 |

Adversarial Fine-Tuned Gradient Difference Loss * | 32.0 | 0.92 | 28.9 | 0.89 |

**Table 5.**Results on Window estimate obtained in [32].

MNIST (10K) | MNIST (10M) | TFD (10K) | TFD (10M) | |
---|---|---|---|---|

Deep Belief Nets | 138 ± 2 | - | 1909 ± 66 | - |

Stacked Convolutional AE | 121 ± 1.6 | - | 2110 ± 50 | - |

Deep GSN | 214 ± 1.1 | - | 1890 ± 29 | - |

GAN | 225 ± 2 | 386 | 2057 ± 26 | - |

Generative Moment Matching Nets + AE | 282 ± 2 | - | 2204 ± 20 | - |

Adversarial Autoencoders | 340 ± 2 | 427 | 2252 ± 16 | 2522 |

**Table 6.**Results based on classification of SVHN digits where GANs are used as feature extractors [33].

Model | Error Rate |
---|---|

KNN | 77.93% |

TVSM | 66.55% |

M1 + KNN | 65.63% |

M1 + TVSM | 54.33% |

mM + M2 | 36.02% |

SWWAE without dropout | 27.83% |

SWWAE with dropout | 23.56% |

Supervised CNN with same architecture as proposed | 28.87% |

DCGAN + L2-SVM | 22.48% |

Modes: 1 | Modes: 2 | Modes: 3 | Modes: 5 | Modes: 7 | Modes: 10 | |
---|---|---|---|---|---|---|

Vanilla GAN | 0.97 | 0.88 | 0.63 | 0.72 | 0.58 | 0.59 |

(0.9; 1.0) | (0.4; 1.0) | (0.5; 1.0) | (0.5; 0.8) | (0.4; 0.8) | (0.2; 0.7) | |

Best of T (T = 3) | 0.99 | 0.96 | 0.91 | 0.80 | 0:84 | 0.70 |

(1.0; 1.0) | (0.9; 1.0) | (0.7; 1.0) | (0.7; 0.9) | (0.7; 0.9) | (0.6; 0.8) | |

Best of T (T = 10) | 0.99 | 0.99 | 0.98 | 0.80 | 0.87 | 0.71 |

(1.0; 1.0) | (1.0; 1.0) | (0.8; 1.0) | (0.8; 0.9) | (0.8; 0.9) | (0.7; 0.8) | |

Ensemble (T = 3) | 0.99 | 0.98 | 0.93 | 0.78 | 0.85 | 0.80 |

(1.0; 1.0) | (0.9; 1.0) | (0.8; 1.0) | (0.6; 1.0) | (0.6; 1.0) | (0.6; 1.0) | |

Ensemble (T = 10) | 1.00 | 0.99 | 1.00 | 0.91 | 0.88 | 0.89 |

(1.0; 1.0) | (1.0; 1.0) | (1.0; 1.0) | (0.8; 1.0) | (0.8; 1.0) | (0.7; 1.0) | |

TopKLast0.5 (T = 3) | 0.98 | 0.98 | 0.95 | 0:95 | 0.86 | 0.86 |

(0.9; 1.0) | (0.9; 1.0) | (0.9; 1.0) | (0.8; 1.0) | (0.7; 1.0) | (0.6; 0.9) | |

TopKLast0.5 (T = 10) | 0.99 | 0.98 | 0.98 | 0:99 | 0.99 | 1.00 |

(1.0; 1.0) | (0.9; 1.0) | (1.0; 1.0) | (0.8; 1.0) | (0.8; 1.0) | (0.8; 1.0) | |

Boosted (T = 3) | 0.99 | 0.99 | 0.98 | 0.91 | 0.91 | 0.86 |

(1.0; 1.0) | (0.9; 1.0) | (0.9; 1.0) | (0.8; 1.0) | (0.8; 1.0) | (0.7; 1.0) | |

Boosted (T = 10) | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

(1.0; 1.0) | (1.0; 1.0) | (1.0; 1.0) | (1.0; 1.0) | (1.0; 1.0) | (1.0; 1.0) |

Method (Unsupervised) | Score |
---|---|

Real Data | 11.24 |

BEGAN | 5.62 |

ALI | 5.34 |

MIX + WGAN | 4.04 |

**Table 9.**Results of the subjective comparison of CAN against the datasets: Art created by human artists, Abstract Expressionist, Art Basel 2016, and the combination of the two.

Painting Set | Question 1 (std) Intentionality | Question 2 (std) Structure | Question 3 (std) Communication | Question 4 (std) Inspiration |
---|---|---|---|---|

CAN | 3.3 (0.47) | 3.2 (0.47) | 2.7 (0.46) | 2.5 (0.41) |

Abstract Expressionist | 2.8 (0.43) | 2.6 (0.35) | 2.4 (0.41) | 2.3 (0.27) |

Art Basel 2016 | 2.5 (0.72) | 2.4 (0.64) | 2.1 (0.59) | 1.9 (0.54) |

Artist sets combined | 2.7 (0.6) | 2.5 (0.52) | 2.2 (0.54) | 2.1 (0.45) |

**Table 10.**Average error measured for image reconstruction per dataset [70].

Shoes | Church Outdoor | Outdoor Natural | Handbags | Shirts | |
---|---|---|---|---|---|

Optimization-based | 0.155 | 0.319 | 0.176 | 0.299 | 0.284 |

Network-based | 0.210 | 0.338 | 0.198 | 0.302 | 0.265 |

Hybrid (Zhu et al.) | 0.140 | 0.250 | 0.145 | 0.242 | 0.184 |

**Table 11.**FCN-scores for different generator architectures [71].

Loss | Pixel-per acc. | Per-Class acc. | Class IOU |
---|---|---|---|

Encoder-decoder(L2) | 0.35 | 0.12 | 0.08 |

Encoder-Decoder (L1+cGAN) | 0.29 | 0.09 | 0.05 |

U-net (L1) | 0.48 | 0.18 | 0.13 |

U-net (L1+cGAN) [71] | 0.55 | 0.20 | 0.14 |

**Table 12.**The results of various metrics on the original data, the data generated by the benchmark model, and the data generated by SEGAN [37].

Metric | Original Noisy Signal | Wiener-Enhancement | SEGAN Enhancement |
---|---|---|---|

PESQ | 1.97 | 2.22 | 2.16 |

CSIG | 3.35 | 3.23 | 3.48 |

CBAK | 2.44 | 2.68 | 2.94 |

COVL | 2.63 | 2.67 | 2.80 |

SSNR | 1.68 | 5.07 | 7.73 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bhandari, A.; Tripathy, B.; Adate, A.; Saxena, R.; Gadekallu, T.R.
From Beginning to BEGANing: Role of Adversarial Learning in Reshaping Generative Models. *Electronics* **2023**, *12*, 155.
https://doi.org/10.3390/electronics12010155

**AMA Style**

Bhandari A, Tripathy B, Adate A, Saxena R, Gadekallu TR.
From Beginning to BEGANing: Role of Adversarial Learning in Reshaping Generative Models. *Electronics*. 2023; 12(1):155.
https://doi.org/10.3390/electronics12010155

**Chicago/Turabian Style**

Bhandari, Aradhita, Balakrushna Tripathy, Amit Adate, Rishabh Saxena, and Thippa Reddy Gadekallu.
2023. "From Beginning to BEGANing: Role of Adversarial Learning in Reshaping Generative Models" *Electronics* 12, no. 1: 155.
https://doi.org/10.3390/electronics12010155