Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

Kebaili, Aghiles; Lapuyade-Lahorgue, Jérôme; Ruan, Su

doi:10.3390/jimaging9040081

Open AccessReview

Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

by

Aghiles Kebaili

,

Jérôme Lapuyade-Lahorgue

and

Su Ruan

^*

Université Rouen Normandie, INSA Rouen Normandie, Université Le Havre Normandie, Normandie Univ, LITIS UR 4108, F-76000 Rouen, France

^*

Author to whom correspondence should be addressed.

J. Imaging 2023, 9(4), 81; https://doi.org/10.3390/jimaging9040081

Submission received: 13 March 2023 / Revised: 31 March 2023 / Accepted: 7 April 2023 / Published: 13 April 2023

(This article belongs to the Topic Medical Image Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning has become a popular tool for medical image analysis, but the limited availability of training data remains a major challenge, particularly in the medical field where data acquisition can be costly and subject to privacy regulations. Data augmentation techniques offer a solution by artificially increasing the number of training samples, but these techniques often produce limited and unconvincing results. To address this issue, a growing number of studies have proposed the use of deep generative models to generate more realistic and diverse data that conform to the true distribution of the data. In this review, we focus on three types of deep generative models for medical image augmentation: variational autoencoders, generative adversarial networks, and diffusion models. We provide an overview of the current state of the art in each of these models and discuss their potential for use in different downstream tasks in medical imaging, including classification, segmentation, and cross-modal translation. We also evaluate the strengths and limitations of each model and suggest directions for future research in this field. Our goal is to provide a comprehensive review about the use of deep generative models for medical image augmentation and to highlight the potential of these models for improving the performance of deep learning algorithms in medical image analysis.

Keywords:

data augmentation; deep learning; medical imaging; generative models; variational autoencoders; diffusion models

1. Introduction

In recent years, advances in deep learning have been remarkable in many fields, including medical imaging. Deep learning is used to solve a wide variety of tasks such as classification [1,2], segmentation [3], and detection [4] using different types of medical imaging modalities, for instance, magnetic resonance imaging (MRI) [5], computed tomography (CT) [6], and positron emission tomography (PET) [7]. Most of these modalities are defined as very high-dimensional data, and the number of training samples is often limited in the medical domain (e.g., the rarity of certain diseases). As deep learning algorithms rely on large amounts of data, running such applications in a low-sample-size regime can be very challenging. Data augmentation can increase the size of the training set by artificially synthesizing new samples. It is a very popular technique in computer vision [8] and has become inseparable from deep learning applications when rich training sets are not available. Data generation is also used in the case of missing modalities for multimodal image segmentation [9]. As a result, the model can be trained to generalize images with better quality and avoid overfitting. In addition, some deep learning frameworks, including PyTorch [10], allow for on-the-fly data augmentation during training, rather than physically expanding the training dataset. Basic data augmentation operations include random rotations, cropping, flipping, or noise injection. However, these simple operations are not sufficient when dealing with complex data such as medical images.

Several studies have been conducted to propose data augmentation schemes more suitable for the medical domain. The ultimate goal would be to reproduce a data distribution as close as possible to the real data, such that it is impossible, or at least difficult, to distinguish the newly sampled data from the real data. Recent performance improvements in deep generative models have made them particularly attractive for data augmentation. For example, generative adversarial networks (GANs) [11] have demonstrated their ability to generate realistic images. As a result, this architecture has been widely used in the medical field [12,13] and has been included in several data augmentation reviews [14,15,16]. Nevertheless, GANs also have their drawbacks, such as learning instability, difficulty in converging, and suffering from mode collapse [17], which is a state where the generator produces only a few samples. Variational autoencoders (VAEs) [18] are another type of deep generative model that has received less attention in data augmentation. VAEs outperform GANs in terms of output diversity and are free of mode collapse. However, the major problem is their tendency to often produce blurry and hazy output images. This undesirable effect is due to the regularization term in the loss function. Recently, a new type of deep generative model called diffusion models (DMs) [19,20] has emerged and promises remarkable results with a great ability to generate realistic and diverse outputs. However, DMs are still in their infancy and are not yet well established in the medical field, but are expected to be a promising alternative to previous generative models. One of the drawbacks of DMs is their high computational cost and huge sampling time.

Different approaches have been proposed to solve this generative learning trilemma of quality sampling, fast sampling, and diversity [21]. In this paper, we review the state of the art of deep learning architectures for data augmentation, focusing on three types of deep generative models for medical image augmentation: VAEs, GANs, and DMs. To provide an accurate review, we harvested a large number of publications via the PubMed and Google Scholar search engines. We selected only publications dating from at least 2017 using various keywords related to data augmentation in medical imaging. Following this, a second manual filtering was performed to eliminate all cases of false positives (publications not related to the medical field and/or data augmentation). In conclusion, 72 publications have been kept, mainly from journals such as IEEE Transactions In Medical Imaging or Medical Image Analysis and conferences such as Medical Image Computing and Computer Assisted Intervention and IEEE International Symposium on Biomedical Imaging. Some publications will be described in more detail in Section 3; these have been selected according to two criteria: date of publication and number of citations. Nevertheless, all the articles are available in descriptive tables as well as other information such as datasets used to perform training. These different papers were organized according to the deep generative model employed and the main downstream tasks targeted by the generated data (i.e., classification, segmentation, and cross-modal translation). Knowing the dominance of GANs for data augmentation in the medical imaging domain, the objective of this article is to highlight other generative models. To the best of our knowledge, this is the first review article that compares different deep generative models for data augmentation in medical imaging and does not focus exclusively on GANs [14,15,16], nor traditional data augmentation methods [22,23]. The quantitative ratio of GAN-based articles to the rest of the deep generative models is very unbalanced; nevertheless, we try to bring some equilibrium to this ratio in the hope that an unbiased comparative study following this paper may be possible in the future. To further illustrate our findings, we present a graphical representation of the selected publications in Figure 1. This figure provides a comprehensive overview of the number of publications per year, per modality, and per downstream task. By analyzing these graphics, we can observe the trends and the preferences of the scientific community in terms of the use of deep generative models for data augmentation in medical imaging.

This article is organized as follows: Section 2 presents a brief theoretical view of the above deep generative models. Section 3 reviews deep generative models for medical imaging data augmentation, grouped by the targeted application. Section 4 discusses the advantages and disadvantages of each architecture and proposes a direction for future research. Finally, Section 5 concludes the paper.

2. Background

The main goal of deep generative models is to learn the underlying distribution of the data and to generate new samples that are similar to the real data. Our deep generative model can be represented as a function

g : z ⟶ x

that maps a low-dimensional latent vector

z \in R^{d}

to a high-dimensional data point

x \in R^{D}

such as

d \leq D

. The latent variable z is a realization of a random vector that is sampled from a prior distribution

p (z)

. The data point x is another realization sampled from the data distribution

p (x)

. The goal of the deep generative model is to learn the mapping function g such that the generated data

g (z)

are similar to the real data x associated with z. Each deep generative model proposes its own approach to learn the mapping function g. In this section we present a brief overview of the most popular deep generative models. Figure 2 provides a visual representation of their respective architectures.

2.1. Generative Adversarial Networks

GAN [11] is a class of deep generative models composed of two separate networks: a generator and a discriminator. The generator can be seen as a mapping function G from a random latent vector z to a data point x, where z is sampled from a fixed prior distribution

p (z)

commonly modelled as a Gaussian distribution. The discriminator D is a binary classifier that takes a data point x as input and outputs a probability

D (x)

such that x is a real data point. During the training process, the generator G is trained to replicate data points

x_{g}

so that the discriminator cannot distinguish between real data points

x_{r}

and the generated data points

x_{g}

. On the other hand, the discriminator D is trained to differentiate the fake from the real data points. Those two networks are trained simultaneously in an adversarial manner, hence the name generative adversarial network. The loss functions of G and D can be expressed as follow:

\begin{matrix} L_{G} & = min_{θ} E_{z \sim p (z)} [log D_{ϕ} (G_{θ} (z))] \\ L_{D} & = max_{ϕ} E_{x \sim p (x)} [log D_{ϕ} (x)] + E_{z \sim p (z)} [log (1 - D_{ϕ} (G_{θ} (z)))] \end{matrix}

(1)

where

θ

and

ϕ

are the corresponding learnable parameters for the generator and discriminator neural networks, respectively.

This adversarial learning has proven to be effective in capturing the underlying distribution of the real data distribution

p (x)

. This has been inspired by game theory and can be seen as a minimax game between the generator and the discriminator. It is ultimately desirable to reach a Nash equilibrium where both the generator and discriminator are equally effective at their tasks. The loss function can be summarized as follows:

L_{G A N} = min_{θ} max_{ϕ} E_{x \sim p (x)} [log D_{ϕ} (x)] + E_{z \sim p (z)} [log (1 - D_{ϕ} (G_{θ} (z)))]

(2)

Once trained, new data points can be synthesized by sampling a random latent vector z from the prior distribution

p (z)

and feeding it to the generator.

2.2. Variational Autoencoders

Variational inference is a Bayesian inference technique that allows us to estimate the posterior distribution

p (z | x)

with a simpler distribution

q (z | x)

. The aim of variational inference is to minimize a Kullback–Leibler divergence between the posterior distribution

p_{θ} (z | x)

and the variational distribution

q_{ϕ} (z | x)

, where

θ

and

ϕ

are the posterior and variational distribution parameters, respectively. The Kullback–Leibler is the most commonly used. The loss function based on Kullback–Leibler is defined as follows:

min_{θ, ϕ} D_{K L} (q_{ϕ} (z | x) | | p_{θ} (z | x)) = min_{θ, ϕ} E_{z \sim q_{ϕ}} [log \frac{q_{ϕ} (z | x)}{p_{θ} (z | x)}]

(3)

With further simplifications, and applying Jensen’s inequality, we can rewrite the above equation as:

log p_{θ} (x) = - E_{z \sim q_{ϕ}} [log q_{ϕ} (z | x)] + E_{z \sim q_{ϕ}} [log p_{θ} (z, x)] + D_{K L} (q_{ϕ} (z | x) | | p_{θ} (z | x))

(4)

\begin{matrix} log p_{θ} (x) & \geq - E_{z \sim q_{ϕ}} [log q_{ϕ} (z | x)] + E_{z \sim q_{ϕ}} [log p_{θ} (z, x)] \\ \geq E_{z \sim q_{ϕ}} [log p_{θ} (x | z)] - E_{z \sim q_{ϕ}} [log \frac{q_{ϕ} (z | x)}{p (z)}] = E L B O \end{matrix}

(5)

where

log p_{θ} (x)

is the marginal log likelihood of the data x,

p (z)

is the prior distribution of the latent variable z, generally modeled as a Gaussian distribution, and ELBO is the evidence lower bound. The variational distribution

q_{ϕ} (z | x)

can be learned by minimizing

D_{K L} (q_{ϕ} (z | x) | | p_{θ} (z | x))

, which is equivalent to maximizing the ELBO given a fixed

θ

. This ELBO term can be further decomposed into two terms: the reconstruction term and the regularization term. The reconstruction term measures the difference between the input data and its reconstruction, and it is typically calculated using binary cross-entropy loss. The regularization term ensures that the latent variables follow a desired distribution, such as a normal distribution, and it is calculated using the Kullback–Leibler divergence between the latent distribution and the desired distribution. Together, these two terms form the ELBO loss function, which is used to train the VAE model. The VAE is composed of an encoder

q_{ϕ} (z | x)

and a decoder

p_{θ} (x | z)

. The encoder

q_{ϕ} (z | x)

is a neural network that maps the data x to the latent variable z. The decoder

p_{θ} (x | z)

is a neural network that maps the latent variable z to the data x. The VAE is trained by minimizing the reconstruction and regularization terms (6).

L_{r e c} = E_{q_{ϕ} (z | x)} [log p_{θ} (x | z)], L_{r e g} = E_{q_{ϕ} (z | x)} [log \frac{q_{ϕ} (z | x)}{p (z)}]

(6)

Once trained, new data points can be synthesized by sampling a random latent vector z from the prior distribution

q_{ϕ}

and feeding it to the decoder. In other words, the decoder represents the generative model.

2.3. Diffusion Probabilistic Models

Diffusion models [19,20] are a class of generative models that are based on the diffusion process. The diffusion process is a stochastic process that can be seen as a parameterized Markov chain. Each transition in the chain gradually adds a Gaussian noise to an initial data point

x_{0}

of distribution

q (x)

. The diffusion process can be expressed as follow:

\begin{matrix} q (x_{t} | x_{t - 1}) & = N (\sqrt{α_{t}} x_{t - 1}, β_{t} I) \\ q (x_{1 : T} | x_{0}) & = \prod_{t = 1}^{T} q (x_{t} | x_{t - 1}) \end{matrix}

(7)

where

β_{t} \in [0, 1], t = 1, \dots, T

is the predefined noise variance at step t,

α_{t} = 1 - β_{t}

, and T, the total number of steps. The diffusion model is trained to reverse the diffusion process starting with a noise input

x_{T} \sim N (0, I)

and reconstructing the initial data point

x_{0}

. This denoising process can be seen as a generative model. The reverse diffusion process can be expressed as follows:

p_{θ} (x_{0 : T}) = p (x_{T}) \prod_{t = 1}^{T} p_{θ} (x_{t - 1} | x_{t}), q (x_{t - 1} | x_{t}) = N (μ (x_{t}, t), Σ (x_{t}, t))

(8)

where

μ (x_{t}, t)

and

Σ (x_{t}, t)

are the mean and the variance of the denoising model at step t. Similarly to the VAE, diffusion models learn to recreate the true sample at each step by maximizing the evidence lower bound (ELBO), matching the true denoising distribution

q (x_{t - 1} | x_{t})

and the learned denoising distribution

p_{θ} (x_{t - 1} | x_{t})

. By the end of the training, the diffusion model will be able to map a noise input

x_{T}

to the initial data point

x_{0}

throught reverse diffusion; hence, new data points can be synthesized by sampling a random noise vector

x_{T}

from the prior distribution

N (0, I)

and feeding it to the model.

2.4. Exploring the Trade-Offs in Deep Generative Models: The Generative Learning Trilemma

2.4.1. Generative Adversarial Networks

The design and training of VAEs, GANs, and DMs is often subject to trade-offs between fast sampling, high-quality samples, and mode coverage, known as the generative learning trilemma [21]. Among these models, GANs have received particular attention due to their ability to generate realistic images and are the first deep generative models to be extensively used for medical image augmentation. They are known for their ability to generate high-quality samples that are difficult to distinguish from real data. However, they may suffer from mode collapse, a phenomenon where the model only generates samples from a limited number of modes or patterns in the data distribution, potentially leading to poor coverage of the data distribution and a lack of diversity in the generated samples. To address mode collapse, several variations of GAN have been proposed. One popular approach is the Wasserstein GAN (WGAN) [24], which replaces the Jensen–Shannon divergence used in the original GAN with the Wasserstein distance, a metric that measures the distance between two probability distributions. This has the benefit of improving the quality of the generated samples. Another widely used extension is the conditional GAN (CGAN) [25], which adds a conditioning variable y to the latent vector z in the generator, allowing for more control over the generated samples and partially mitigating mode collapse. The CGAN can be seen as a generative model that can generate data points x conditioned on y and models the joint distribution

p (x, y)

. A GAN with a conditional generator has been introduced by Isola et al. [26] to learn to translate images from one domain to another by replacing the traditional noise-to-image generator with a U-Net [27]. The adversarial learning process allows the U-Net to generate more realistic images based on a better understanding of the underlying data distribution.

Other variations of the GAN include deep convolutional GAN (DCGAN) [28], progressive growing GAN (PGGAN) [29], CycleGAN [30], auxiliary classifier GAN (ACGAN) [31], VAE-GAN [32], and many others, which have been proposed to address various issues such as training stability, scalability, and quality of the generated samples. While these variants have achieved good results in a variety of tasks, they also come with their own set of trade-offs. Despite these limitations, GANs are generally fast at generating new images, making them a good choice for data augmentation when well-trained. As an example, Figure 3 showcases the capacity of a CycleGAN to generate realistic synthetic medical images.

2.4.2. Variational Autoencoders

VAEs are a another type of deep generative model that has gained popularity for their ease of training and good coverage of the data distribution. Unlike GANs, VAEs are trained to maximize the likelihood of the data rather than adversarially, making them a good choice for tasks that require fast sampling and good coverage of the data distribution. Using variational inference methods, VAEs are able to better approximate the real data distribution given a random noise vector, thus making them less vulnerable to mode collapse. Moreover, VAEs enable the extraction of relevant features and can learn a smooth latent representation of the data, which allows for the interpolation of points in the space providing more control over the generated samples [33].

VAEs have not been as commonly used for data augmentation compared to GANs due to the blurry and hazy nature of the generated samples. However, several proposals, such as inverse autoregressive flow [34], InfoVAE [35], or VQ-VAE2 [36], have been made to improve the quality of VAE-generated samples as well as the variational aspect of the model. Despite this, most of these extensions have not yet been applied to medical image augmentation. A more effective approach to addressing the limitations of VAEs in this context is to utilize a hybrid model called a VAE-GAN, which combines the strengths of both VAEs and GANs to generate high-quality, diverse, and realistic synthetic samples. While VAE-GANs cannot fully fix the low-quality generation of VAEs, they do partially address this issue by incorporating the adversarial training objective of GANs, which allows for the improvement of visual quality and sharpness of the generated samples while still preserving the ability of VAEs to learn a compact latent representation of the data. In addition to VAE-GANs, another common architecture for medical image augmentation is the use of conditional VAEs (CVAEs), which allows for the control of the output samples by conditioning the generation process on additional information, such as class labels or attributes. This can be particularly useful in medical imaging, as it allows for the generation of synthetic samples that are representative of specific subgroups or conditions within the data. By using conditional VAEs, it is possible to generate synthetic samples that are more targeted and relevant to specific tasks or analyses. In summary, VAEs, VAE-GANs, and conditional VAEs are all viable approaches for medical image augmentation, each offering different benefits and trade-offs in terms of diversity, quality, and fidelity of the generated samples.

2.4.3. Diffusion Models

There has been a recent surge in the use of DMs for image synthesis in the academic literature due to their superior performance in generating high-quality and realistic synthesized images compared to other deep generative models such as VAEs and GANs [37]. This success can be attributed to the way in which DMs model the data distribution by approximating it using a series of simple distributions combined through the diffusion process, allowing them to capture complex, high-dimensional distributions and generate samples that are highly representative of the underlying data. This is especially useful for synthesizing images as natural images often have a wide range of textures, colors, and other visual features that can be difficult to model using simpler parametric models. This can also be applied to medical imaging where data tends to be complex. However, DMs can also have some limitations, such as being computationally intensive to solve, especially for large or complex systems, and requiring a significant amount of data to be accurately calibrated. In addition, DMs have a long sampling time compared to other deep generative models such as VAEs and GANs due to the high number of steps in the reverse diffusion process (ranging from several hundreds to thousands). This issue is compounded when the model is being used in real-time applications or when it is necessary to generate large numbers of samples. As a result, researchers have proposed several solutions and variants of diffusion models that aim to improve the sampling speed while maintaining high-quality and diverse samples. These include strategies such as progressive distillation [38]. This method involves distilling a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. Another way to improve the sampling time is the use of improved variants such as Fast Diffusion Probabilistic Model (FastDPM) [39], which uses a modified optimization algorithm to reduce the sampling time and introduces a concept of continuous diffusion process, or with non-Markovian diffusion models such as Denoising Diffusion Implicit Model (DDIM) [40]. Similarly to VAE-GAN, ref. [21] proposes the denoising diffusion GAN, which is a hybrid architecture between DMs and multimodal conditional GANs [25], which have been shown to produce high-quality and diverse samples at a much faster sampling speed compared to the original diffusion models (factor of ×2000). Overall, while diffusion models have demonstrated great potential in the field of image synthesis, their long sampling time remains a challenge that researchers are actively working to address.

3. Deep Generative Models for Medical Image Augmentation

Medical image processing and analysis using deep learning has developed rapidly in the past years, and it has been able to achieve state-of-the-art results in many tasks. However, the lack of data is still a major issue in this field. To address this, medical image augmentation became a crucial task, and many studies have been conducted in this direction. In this section, we will review the different deep generative models that have been proposed to generate synthetic medical images. This review is organized into three different categories corresponding to each one of the deep generative models. The publications are further classified according to the downstream task targeted by the generated images. We address here the most common tasks in medical imaging: classification, segmentation, and cross-model image translation, which will be summarized in the form of tables.

3.1. Generative Adversarial Networks

As part of their study, Han et al. [41] proposed the use of two variants of GANs for generating (2D) MRI sequences: a WGAN [24] and a DCGAN [28], in which combinations of convolutions and batch normalizations replace the fully-connected layers. The results of this study were presented in the form of a visual Turing test where an expert physician was asked to classify real and synthetic images. For all MRI sequences except FLAIR images, WGAN was significantly more successful at deceiving the physician than DCGAN (62% compared to 54%). The same author further proposes using PGGAN [29] combined with traditional data augmentation techniques such as geometric transformations. PGGAN is a GAN with a multi-stage training strategy that progressively increases the resolution of the generated images. The results indicate that combining PGGAN with traditionally augmented data can slightly improve the performance of the classifier when compared to using PGGAN alone.

Conditional synthesis is a technique that allows the generation of images conditioned on a specific variable y. This is particularly useful in medical imaging, where tasks such as segmentation or cross-modal translation are widespread. A variable y serves as the ground truth for the generated images and can be expressed in various ways, including class labels, segmentation maps, or translation maps. In this context, Frid-Adar et al. [42] propose to use an ACGAN [31] for synthesizing liver lesions in CT images. The ACGAN is a GAN with a discriminator conditioned on a class label. Three label classes were considered: cysts, metastases, and hemangiomas. Based solely on conventional data augmentation, the classification results produced a sensitivity of 78.6% and a specificity of 88.4%. By adding the synthetic data augmentation, the results increased to a sensitivity of 85.7% and a specificity of 92.4%. Guibas et al. [43] propose a two-stage pipeline for generating synthetic images of fundus photographs with associated blood vessel segmentation masks. In the first stage, synthetic segmentation masks are generated using DCGAN, and in the second stage, these synthetic masks are translated into photorealistic fundus images using CGAN. Comparing the Kullback–Leibler divergence between the real and synthetic images revealed no significant differences between the two distributions. In addition, the authors evaluated the generated images on a segmentation task using only synthetic images, showing an F1 score of 0.887 versus 0.898 when using real images. This negligible difference indicates the quality of the generated images. By the same token, Platscher et al. [44] propose using a two-step image translation approach to generate MRI images with ischemic stroke lesion masks. The first step consists of generating synthetic stroke lesion masks using a WGAN. The newly generated fake lesions are implanted on healthy brain anatomical segmentation masks. Finally, those segmentation masks are fed into a pretrained image-translation model that maps the mask into a real ischemic stroke MRI. The authors studied three different image translation models, CycleGAN [30], Pix2Pix [26], and SPADE [45], and reported that Pix2Pix was the most successful in terms of visual quality. A U-Net [27] was trained using both clinical and generated images and showed an improvement in the Dice score compared to the model trained only on clinical images (63.7% to 72.8%).

Regarding cross-modal translation, Yurt et al. [46] propose a multi-stream approach for generating missing or corrupted MRI contrasts from other high-quality ones using a GAN-based architecture. The generator is composed of multiple one-to-one streams and a joint many-to-one stream, which are designed to learn latent representations sensitive to unique and common features of the source, respectively. The complementary feature maps generated in the one-to-one streams and the shared feature maps generated in the many-to-one stream are combined with a fusion block and fed to a joint network that infers the final image. In their experiments, the authors compare their approach to other state-of-the-art translation GANs and show that the proposed method is more effective in terms of quantitative and radiological assessments. The synthesized images presented in this study demonstrate the effectiveness of deep learning approaches applied to data augmentation in medical imaging. Specifically, the study investigated two tasks: (a) T1-weighted image synthesis from T2- and PD-weighted images and (b) PD-weighted image synthesis from T1- and T2-weighted images. The results obtained from the proposed method outperformed other variants of GANs such as pGAN [47] and MM-GAN [48], highlighting its effectiveness for image synthesis in medical imaging.

In summary, the use of GANs for data augmentation has been demonstrated to be a successful approach. The studies discussed in this section have employed some of the most innovative and known GAN architectures in the medical field, including WGAN, DCGAN, and Pix2Pix, and have primarily focused on three tasks: classification, segmentation, and cross-modal translation. Custom-made GAN variants have also been proposed in the current state of the art (see Table 1), some of which could be explored further. Notably, conditional synthesis has proven to be particularly useful for tasks such as segmentation and cross-modal translation, as seen with the ACGAN and Pix2Pix, resulting in an improved classification performance. Additionally, two-stage pipeline approaches have been proposed for generating synthetic images conditioned on segmentation masks. To further illustrate the use of GANs for medical image augmentation, we present a summary of the relevant studies in Table 1. This table includes information about the dataset, imaging modality, and evaluation metrics used in each study, as well as the specific type of GAN architecture employed. A further discussion will be presented in Section 4.

3.2. Variational Autoencoders

Zhuang et al. [51] present an empirical evaluation of 3D functional MRI data augmentation using deep generative models such as VAEs and GANs. The results indicate that CVAE and conditional WGAN can produce diverse, high-quality brain images. A 3D convolutional neural network (CNN) was used to further evaluate the generated samples on the original and augmented data in a classification task, demonstrating an accuracy improvement of

3.17 %

when using CVAE augmented data and

3.72 %

when using CWGAN augmented data. As part of Pesteie et al. [81], a revised variant of the CVAE is proposed, called the ICVAE, which separates the embedding space of the input data and the conditioning variables. This allows the generated image characteristics to be independent of the conditioning variables, resulting in a more diverse output. In contrast, the standard CVAE encodes the data and conditioning variables in a shared embedding space. The authors evaluate the ICVAE on classification and segmentation tasks using transverse ultrasound images of the spine and FLAIR MRI images of the brain, respectively. The results demonstrate an improvement of

8.0 \pm 1.0 %

in classification accuracy and

4.5 \pm 0.5 %

in the Dice score compared to the model trained on real images only. The ICVAE model is able to generate more realistic MRI images by encoding appearance features independently of the structures in its latent space. The authors demonstrate the generation of synthetic MRI and ultrasound images using the ICVAE architecture, which are conditioned on a tumor segmentation mask and a label indicating the center-line of the spine, respectively. The CVAE architecture is also shown for comparison. Chadebec et al. [82] introduce a novel Geometry-aware VAE for high dimensional data augmentation in low sample size settings. This model combines Riemannian metric learning with normalizing flows to improve the expressiveness of the posterior distribution and learn meaningful latent representations of the data. Additionally, the authors propose a new non-prior sampling scheme based on Hamiltonian Monte Carlo, since the standard procedure utilizing the prior distribution is highly dependent upon the data, especially for small datasets. As a result, the generated samples are remarkably more realistic than those generated by a conventional VAE, and the model is more resilient to the lack of data. An evaluation of the synthetic data on a classification task shows an improvement in accuracy from 66.3% to 74.3% using 50 real + 5000 synthetic MRIs, compared to using only the original data. The original paper by Chadebec et al. [82] includes a challenge in which readers are invited to identify the real brain MRIs from fake ones.

Other studies suggest the use of VAEs to improve the segmentation task performance. Huo et al. [83] introduce a progressive VAE-based architecture (PAVAE) for generating synthetic brain lesions with associated segmentation masks. The authors propose a two-step pipeline where the first step consists in generating synthetic segmentation masks based on a conditional adversarial VAE. The CVAE is assisted by a “condition embedding block” that encodes high-level semantic information of the lesion into the feature space. The second step involves generating photorealistic lesion images conditioned on the lesion mask using “mask embedding blocks”, which encodes the lesion mask into the feature space during generation, similar to SPADE. The authors compare their approach to other state-of-the-art methods and show that PAVAE can produce more realistic synthetic lesions with associated segmentation masks. A segmentation network is trained using both real and synthetic lesions and shows an improvement in the Dice score compared to the model trained only on real images (66.69% to 74.18%).

In a recent paper, Yang et al. [78] propose a new model for cross-domain translation called conditional variational autoencoding GAN (CAE-ACGAN). CAE-ACGAN combines the advantages of both VAEs and GANs in a single end-to-end architecture. The integration of VAE and GAN, along with the implementation of an auxiliary discriminative classifier network, allows for a partial resolution of the challenges posed by image blurriness and mode collapse. Moreover, the VAE incorporates skip connections between the encoder and decoder, which enhances the quality of the images generated. In addition to translating 3D CT images into their corresponding MR, the CAE-ACGAN generates more realistic images as a result of its discriminator, which serves as a quality-assurance mechanism. Based on PSNR and SSIM scores, the CAE-ACGAN model showed a mild improvement over other state-of-the-art architectures, such as Pix2Pix and WGAN-GP [84].

Table 2 compiles a summary of the relevant studies using VAEs in medical data augmentation. In contrast to GANs, the number of studies employing VAEs for data augmentation in medical imaging is relatively low. However, almost half of these studies have utilized hybrid architectures, combining VAEs with adversarial learning. Interestingly, we observe that unlike GANs, there are not many VAE variants in medical imaging. Most commonly used VAE architectures are either conditional, such as vanilla CVAE and ICVAE, or hybrid architectures, such as IntroVAE, PAVAE, and ALVAE. Further discussion on the effectiveness of VAEs for medical image augmentation and the specific architectures utilized in previous studies will be presented in Section 4.

3.3. Diffusion Models

In their study, Pinaya et al. [97] introduce a new approach for generating high-resolution 3D MR images using a latent diffusion model (LDM) [98]. LDMs are a type of generative model that combine autoencoders and diffusion models to synthesize new data. The autoencoder component of the LDM compresses the input data into a lower-dimensional latent representation, while the diffusion model component generates new data samples based on this latent representation. The LDM in this work was trained on data from the UK Biobank dataset and conditioned on clinical variables such as age and sex. The authors compare the performance of their LDM to VAE-GAN [32] and LSGAN [99], using the Fréchet inception distance [100] as the evaluation metric. The results show that the LDM outperforms the other models, with an FID of 0.0076 compared to 0.1567 for VAE-GAN and 0.0231 for LSGAN (where a lower FID score indicates a better performance). Even when conditioned on specific variables, the synthetic MRIs generated by this model demonstrate its ability to produce diverse and realistic brain MRI samples based on the ventricular volume and brain volume. As a valuable contribution to the scientific community, the authors also created a dataset of 100,000 synthetic MRIs that was made openly available for further research.

Fernandez et al. [101] introduce a generative model, named brainSPADE, for synthesizing labeled brain MRI images that can be used for training segmentation models. The model combines a diffusion model with a VAE-GAN, with the GAN component particularly utilizing SPADE normalization to incorporate the segmentation mask. The model consists of two components: a segmentation map generator and an image generator. The segmentation map generator is a VAE that takes as input a segmentation map, then encodes and builds a latent space from it. To focus on semantic information and disregard insignificant details, the latent code is then diffused and denoised using LDMs. This creates an efficient latent space that emphasizes meaningful information while filtering out noise and other unimportant details. A VAE decoder then generates an artificial segmentation map from this latent space. The image generator is a SPADE model that builds a style latent space from an arbitrary style and combines it with the artificial segmentation map to decode the final output image. The performance of the brainSPADE model is evaluated on a segmentation task using nnU-Net [102], and the results show that the model performs comparably when trained on synthetic data compared to when it is trained on real data, and that using a combination of both significantly improves the model’s performance.

Lyu and Wang [103] conducted a study that investigated the use of diffusion models for image translation in medical imaging, specifically the conversion of MRI to CT scans. In their study, the authors utilized two diffusion-based approaches: the conditional DDPM and conditional score-based model which utilizes stochastic differential equations [104]. These methods involved conditioning the reverse process on T2-weighted MRI images. To evaluate the performance of these diffusion models in comparison to other methods (conditional WGAN and U-Net), the authors conducted experiments on the Gold Atlas male pelvis dataset [105] using three novel sampling methods and compared the results to those obtained using GAN- and CNN-based approaches. The results indicated that the diffusion models outperformed both the GAN- and CNN-based methods in terms of structural similarity index (SSIM) and peak signal-to-noise ratio (PNSR).

We present a summary of the relevant studies utilizing diffusion models for medical image augmentation in Table 3. This table includes details about the dataset, imaging modality, and evaluation metrics used in each study, as well as the specific diffusion model employed. Upon examining this table, we notice that all the studies included are relatively recent, with the earliest study dating back to 2022. This suggests that diffusion models have gained increasing attention in the field of medical image augmentation and synthesis in recent years. Additionally, we see that in 2022, diffusion models received more attention for these tasks compared to GANs and VAEs, highlighting their growing popularity and potential for use in various scenarios.

4. Key Findings and Implications

In this review, we focused on generative deep models applied to medical data augmentation, specifically VAEs, GANs, and diffusion models. These approaches each have their own strengths and limitations, as described by the generative learning trilemma [21], which states that it is generally difficult to achieve high-quality sampling, fast sampling, and mode coverage simultaneously. As illustrated in Figure 1a, the number of publications on data augmentation using VAEs increases by approximately 81% from 2017 to 2022, while the number using GANs has remained relatively stagnant. This trend may be due to the fact that most possible fields of research using GANs have already been explored, making it difficult to go beyond current methods using these architectures. However, we have also seen an increase in the use of more complex architectures combining multiple generative models [64,78], which have shown promising results in terms of both quality and mode coverage. On the other hand, the number of studies using diffusion models has drastically increased starting from 2022, and these models have shown particular potential for synthesizing high-quality images with good mode coverage [120].

Basic data augmentation operators such as Gaussian noise addition, cropping, and padding are commonly used to augment data and generate new images for training [8]. However, the complex structures of medical images, which encompass anatomical variation and irregular tumor shapes, may render these basic operations unsuitable, resulting in the production of irrelevant images that disrupt the logical image structure [22], and additionally, can lead to image deformations and the generation of aberrant data that can adversely impact the model performance. One basic data augmentation operator that is not well suited for medical images is flipping images, which can sometimes cause anatomical inconsistencies [121]. To overcome this issue, deformable augmentation techniques have been introduced, such as random displacement fields and spline interpolation, to augment the data in a more realistic way. These techniques have proved to be useful [22]; however, they are strongly dependent on the data and limited in some cases. Recent advances in deep learning have led to the development of generative models that can be trained to generate realistic images and simulate the underlying data distribution. These synthesized images are more truthful than those generated using traditional data augmentation techniques. They guarantee a better coherence of the general structure of medical images and greater variability, providing a more effective way to generate realistic and diverse data.

The use of GANs in medical imaging, as seen in Table 1, has been widespread and applied to a variety of modalities and datasets, demonstrating their versatility and potential for various applications within the field. When it comes to classification, DCGAN and WGAN have been the most-commonly used architectures and are considered safe bets in this domain. For example, Zhuang et al. [51] demonstrated a 3% accuracy improvement in generating fMRIs using an improved WGAN. These architectures, with their capacity for high-quality generation and good mode coverage, offer significant potential for the generation of synthetic images for medical imaging classification. In the case of segmentation and translation, the architectures that have shown the most promise include Pix2Pix, CycleGAN, and SPADE, all of which have proven their potential for conditional generation and cross-modal translation. Platscher et al. [44] conducted a comparative study of these three architectures, demonstrating their capacity to generate high-quality images suitable for medical image segmentation and translation tasks (improvement of 9.1% in Dice score). These architectures can significantly reduce the need for manual annotation of medical images and thus significantly reduce the time and cost required for data annotation.

On the other hand, VAEs have been utilized in fewer studies for medical image augmentation, as shown in Table 2. They have been employed in other tasks such as reconstruction, as demonstrated by Biffi et al. [92] and Volokitin et al. [93], who used CVAE for 3D volume reconstruction, and interpretability of features, as exemplified by Hyang et al. [94], who identified biomarkers using VAEs. Furthermore, VAEs are often used in hybrid architectures with adversarial learning techniques. The most promising architectures include PAVAE [83] and IntroVAE [122], alongside conditional VAEs, for various purposes including classification, segmentation, and translation tasks. However, while VAEs have shown potential in these areas, there is still room for improvement. One study that particularly shows promising results is that of Chadebec and Allassonnière [88], who propose to model the latent space of a VAE as a Riemannian manifold, allowing high-quality image generation comparable to GANs. Chadebec and Allassonnière [88] demonstrated an improvement of 8% in accuracy using synthetic images generated with their proposed VAE model. Nevertheless, this architecture requires a high computational cost and time, which is a significant drawback in practical applications.

Table 3 presents a summary of the relevant studies utilizing diffusion models for medical image augmentation. These studies, all of which are relatively recent, with the earliest dating back to 2022, suggest that diffusion models have gained increasing attention in medical image augmentation and synthesis in recent years. Furthermore, in 2022, diffusion models have been the most-commonly used generative models for medical image augmentation compared to GANs and VAEs, highlighting their growing popularity and potential for use in various scenarios. Of the diffusion models studied, DDPM and LDM are the most prevalent, alongside conditional variants such as CDDPM [103] and CLDM [97]. Notably, the difference between LDM and DDPM is the ability of LDM to model long-range dependencies within the data by constructing a low-dimensional latent representation and diffusing it, while DDPMs apply the diffusion process directly to the input images. This can be especially useful for medical image augmentation tasks that require capturing complex patterns and structures. For instance, Saeed et al. [114] demonstrated the capacity of LDM conditioned on text for a task of lesion identification, achieving an accuracy improvement of 5.8%. These findings suggest that diffusion models have a promising potential for future medical image augmentation and synthesis research. To further exemplify the potential of diffusion models in generating realistic medical images, we present in Figure 4 a set of synthesized MRI images using a DDPM. These generated images exhibit high visual fidelity and are almost indistinguishable from the real images. One of the reasons for this high quality is the DDPM’s ability to model the diffusion process of the image density function. By doing so, the DDPM can generate images with increased sharpness and fine details, as seen in the synthesized MRI images.

These studies have covered a range of modalities, including MRI, CT, and ultrasound, as well as dermoscopy and otoscopy. Classification is the most common downstream task targeted in these studies, but there have also been multiple state-of-the-art solutions proposed for more complex tasks such as generating multimodal missing images (e.g., from CT to MRI) and multi-contrast MRI images. In order to provide ground truth segmentation masks for tasks such as segmentation, most studies have explored the field of conditional synthesis. This allows for greater control over the synthesized images and can help to stabilize training [25], as the model is given explicit guidance on the desired output. For our discussion on medical image augmentation, we have also compiled two summary tables to provide a comprehensive overview of the datasets and metrics used in the reviewed studies. Table 4 presents a summary of the datasets used in the reviewed studies. This table includes information about the title of the dataset, a reference, and a link to the public repository if available, as well as the studied modality and anatomy. From examining this table, we see that MRI is the most-commonly used modality, followed by CT. In terms of anatomy, brain studies dominate, with lung studies coming in second. It is worth noting that the BraTS dataset is widely used across multiple studies, highlighting its importance in the field. Additionally, we notice the presence of private datasets in this table, which is not surprising given that many medical studies are associated with specific medical centers and may not be publicly available. When we consider the state of the art of medical imaging studies (see Figure 1b), we notice that the PET and ultrasound modalities are less represented compared to the others. One reason for the scarcity of PET studies is the limited availability of nuclear doctors compared to radiologists. Nuclear doctors specialize in nuclear medicine, and PET is one such imaging modality that uses radioactive tracers to produce 3D images of the body. Due to the limited number of nuclear doctors, there are fewer medical exams that use PET, leading to less publicly available data for research purposes [123]. On the other hand, ultrasound is an operator-dependent modality and requires a certain level of field knowledge. Additionally, ultrasound is not as effective as other modalities such as CT and MRI in detecting certain pathologies, which may also contribute to its lower representation in the state of the art. Despite these limitations, both PET and ultrasound remain important imaging modalities in clinical practice, and future research should aim to explore their full potential in the field of medical imaging.

Second, Table 5 provides a summary of the metrics used to evaluate the performance of the various models discussed in the review. It is clear from this table that a variety of metrics are employed, ranging from traditional evaluation measures to more recent ones. Currently, many studies rely on shallow metrics such as the mean absolute error, peak signal-to-noise ratio [138], or structural similarity [139], which do not accurately reflect the visual quality of the image. For instance, while optimizing pixel-wise loss can produce a clearer image, it may result in lower numerical scores compared to using adversarial loss [140]. To address this challenge, researchers have proposed different methods for evaluation. The most well-known approach is to validate the quality of the generated samples through downstream tasks such as segmentation or classification. An overview of the augmentation process using a downstream task is depicted in Figure 5. Another approach is to use deep-learning-based metrics such as the learned perceptual image patch similarity (LPIPS) [141], Fréchet inception distance (FID) [100], or inception score (IS) [142], which are designed to better reflect human judgments of image quality. These deep-learning-based metrics take into account not only pixel-wise similarities, but also high-level features and semantic information in the images, making them more effective in evaluating the visual quality of the generated images. LPIPS, for instance, measures the perceptual similarity between two images by using a pretrained deep neural network. FID and IS are other popular deep-learning-based metrics for image generation, and they have been widely used in various image generation tasks to assess the quality and diversity of the generated samples. However, these metrics may not always align perfectly with human perception, and further studies are needed to assess their effectiveness for different types of medical images.

Despite the advancements made by generative models in medical data augmentation, several challenges still remain. A common issue in GANs, known as mode collapse, occurs when the generator only produces a limited range of outputs, rather than the full range of possibilities. While techniques such as minibatch discrimination and the incorporation of auxiliary tasks [142] have been suggested as potential solutions, further research is needed to effectively address this issue. In addition, there is a balance to be struck between the sample quality and the generation speed, which affects all generative models. GANs are known for their ability to generate high-quality samples quickly, allowing them to be widely used in medical imaging and data augmentation [14,15,153]. Another approach for stabilizing the training of GANs is to use WGAN [24]. WGAN improves upon the original GAN by using the Wasserstein distance instead of the Jensen–Shannon divergence as the cost function for training the discriminator network. While these approaches have demonstrated success in improving GAN images and partially addressing mode collapse and training instability, there is still room for improvement. Diffusion models have overshadowed GANs during the latest years, particularly due to the success of text-to-image generation architectures such as DALL-E [154], Imagen [155], and stable diffusion [98]. These diffusion models naturally produce more realistic images than GANs. However, in our view, GANs have only been set aside and not entirely disregarded. With the recent release of GigaGAN [156] and StyleGAN-T [157], GANs have made a resurgence by producing comparable or even better results than diffusion models. This renewed interest in GANs demonstrates the continued relevance of this approach to image generation and indicates that GANs may still have much to offer in advancing the field. Future research could explore hybrid models that combine the strengths of both GANs and diffusion models to create even more realistic and high-quality images.

VAEs have not gained as much attention in the medical imaging field, due to their tendency to produce blurry and hazy generated images. However, some studies have used conditional VAEs or hybrid architectures to address this issue and improve the quality of the samples produced. Researchers are therefore exploring the use of hybrid models that combine the strengths of multiple generative models, as well as improved VAE variations that offer enhanced image quality. Hybrid architectures, such as VAE-GANs [32], have demonstrated the potential to partially address the issues of both VAEs and GANs, allowing a better-quality generation and good mode coverage. Interestingly, recent research has even combined all three generative models into a single pipeline [101]. This study has shown comparable results on a segmentation task when using a fully synthetic dataset compared to using the real dataset. These promising results suggest that hybrid architectures could open up new possibilities. However, these models can be complex and challenging to train, and more research is needed to fully realize their potential. In fact, many VAEs used in medical imaging are hybrid architectures, as they offer a good balance between the strengths and weaknesses of both VAEs and GANs [85,101]. It is important to note that VAEs have an advantage over GANs in operating better with smaller datasets due to the presence of an encoder [158], which can extract relevant features from the input images and significantly reduce the search space required for generating new images through the process of reconstruction. This feature also makes VAEs a form of dimensionality reduction, and the representation obtained by the encoder can provide a better starting point for the decoder to approximate the real data distribution more accurately. In contrast, GANs have a wider search space, which may lead to challenges in learning features effectively. For instance, we show in Figure 6 a comparison between synthesized MRIs using vanilla VAE [18] and the Hamiltonian VAE [159]. In addition to the advantage of operating better with smaller datasets, VAEs also offer a disentangled, interpretable, and editable latent space. This means that the encoded representation of an input image can be separated into independent and interpretable features, allowing for better understanding and manipulation of the underlying data. Another option is the use of improved variants of VAEs, which have been proposed to generate high-quality images. There has been limited exploration of improved VAE variants such as VQ-VAE2 [36], IAF-VAE [34], or Hamiltonian VAE [159] in the medical imaging field, but these variants have shown promise in generating high-quality images in other domains. It may be worth exploring their potential for medical image augmentation, as they offer the possibility of improving the quality of the generated images without sacrificing other important characteristics such as fast sampling and good mode coverage.

Diffusion models have more recently been applied to medical imaging [120], and some studies have demonstrated high-quality results [106]. These models are capable of synthesizing highly realistic images and have a good mode coverage while keeping the training stable, but suffer from a long sampling time due to the high number of steps in the diffusion process. This limitation may be less significant in medical imaging applications, which are not typically used in real time, but researchers are likely to continue working on optimizing diffusion models for faster sampling. It may also be possible to trade off some sample quality for faster sampling in diffusion models [21], as realism is a key requirement for data augmentation in medical imaging. For example, Song et al. [40] proposes a variant of diffusion models called Training-free Denoising Diffusion Implicit Model (DDIM) aimed to speed up the sampling process by replacing the Markovian process with a non-Markovian one in the DDPM. This resulted in a faster sampling procedure that did not significantly compromise the quality of the samples. Fast Diffusion Probabilistic Model (FastDPM) [39] introduces the concept of a continuous diffusion process with smaller time steps in order to reduce the sampling time. These efforts to improve the efficiency of diffusion models demonstrate the ongoing interest in finding ways to balance the sample quality and generation speed in medical imaging applications.

There are several other factors to consider when discussing the use of generative models for medical data augmentation. One important factor is the incorporation of domain-specific techniques and knowledge into the design of these models [160]. By incorporating knowledge of anatomy and physiology, for example, researchers can improve the realism and utility of the generated data. Another important factor is the ethical considerations of using synthetic data for medical applications, including the potential for biased or unrealistic generated data and the need for proper validation and testing. To further improve the performance and efficiency of medical data augmentation, researchers are also exploring the use of generative models in combination with other techniques such as transfer learning [23,161] or active learning [162,163]. The role of interpretability and explainability in these models is also important to consider, particularly in the context of clinical decision making and regulatory requirements. In addition to data augmentation, generative models have the potential to be used for other medical applications such as generating synthetic patient records or synthesizing medical images from non-image data [109].

5. Conclusions

In this review, we examine the use of deep generative models for medical image augmentation. The limited availability of training data remains a major challenge in medical image analysis with deep learning approaches, which can be addressed by data augmentation techniques. However, traditional techniques still produce limited and unconvincing results. We focus on three types of deep generative models for medical image augmentation, VAEs, GANs, and DMs, and provide an overview of the current state of the art in each of these models. While deep generative models offer several advantages over traditional data augmentation techniques, including the ability to generate realistic new images that capture the underlying distribution of the training dataset, they also have some limitations. VAEs offer the ability to learn a meaningful and disentangled representation of the data, which can be useful for interpretability and latent space addition. Despite these advantages, VAEs may produce fuzzy images that lack important details, which can be especially problematic in medical imaging. To address this limitation, improved VAE variants have been developed, such as vector quantized VAE, which uses powerful priors to generate synthetic samples with higher coherence and fidelity. Another approach involves combining VAEs with adversarial learning to improve the level of detail in the generated images. Alternatively, GANs have been found to generate high-quality images with fine details, and can be memory-efficient due to their upsampling-only architecture. However, GANs can be difficult to train and may suffer from mode collapse. Techniques such as WGAN and minibatch discrimination can help stabilize GAN training, and increasing the size of the training set can also be effective. Diffusion models have also been shown to generate high-quality images with increased sharpness and fine details, better than previous generative models, but they require significant computational resources to train and may be less interpretable. Researchers are currently exploring ways to reduce the sampling time of diffusion models, such as with progressive distillation, FastDPM, and DDIM variants. Overall, while each approach has its own strengths and weaknesses, continued research and development will be crucial for improving the effectiveness of deep generative models in various applications, including medical imaging. This evaluation of the strengths and limitations of each model can suggest directions for future research in this field including the exploration of hybrid architectures and improved variants, the incorporation of domain-specific knowledge, and the combination with other techniques such as transfer learning or active learning. The aim of this review is to emphasize the potential of deep generative models in enhancing the performance of deep learning algorithms for medical image analysis. By identifying the challenges of the current methods, we seek to increase awareness of the need for further contributions in this field.

Author Contributions

Conceptualization, A.K., S.R. and J.L.-L.; methodology, A.K., S.R. and J.L.-L.; software, A.K.; validation, S.R. and J.L.-L.; writing—original draft preparation, A.K.; writing—review and editing, S.R. and J.L.-L.; supervision, S.R. and J.L.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Amyar, A.; Modzelewski, R.; Vera, P.; Morard, V.; Ruan, S. Weakly Supervised Tumor Detection in PET Using Class Response for Treatment Outcome Prediction. J. Imaging 2022, 8, 130. [Google Scholar] [CrossRef] [PubMed]
Brochet, T.; Lapuyade-Lahorgue, J.; Huat, A.; Thureau, S.; Pasquier, D.; Gardin, I.; Modzelewski, R.; Gibon, D.; Thariat, J.; Grégoire, V.; et al. A Quantitative Comparison between Shannon and Tsallis–Havrda–Charvat Entropies Applied to Cancer Outcome Prediction. Entropy 2022, 24, 436. [Google Scholar] [CrossRef] [PubMed]
Zhou, T.; Ruan, S.; Vera, P.; Canu, S. A Tri-Attention fusion guided multi-modal segmentation network. Pattern Recognit. 2022, 124, 108417. [Google Scholar] [CrossRef]
Chen, X.; Konukoglu, E. Unsupervised detection of lesions in brain MRI using constrained adversarial auto-encoders. arXiv 2018, arXiv:1806.04972. [Google Scholar]
Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik 2019, 29, 102–127. [Google Scholar] [CrossRef]
Song, Y.; Zheng, S.; Li, L.; Zhang, X.; Zhang, X.; Huang, Z.; Chen, J.; Wang, R.; Zhao, H.; Chong, Y.; et al. Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 2775–2780. [Google Scholar] [CrossRef]
Islam, J.; Zhang, Y. GAN-based synthetic brain PET image generation. Brain Inform. 2020, 7, 1–12. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Zhou, T.; Vera, P.; Canu, S.; Ruan, S. Missing Data Imputation via Conditional Generator and Correlation Learning for Multimodal Brain Tumor Segmentation. Pattern Recognit. Lett. 2022, 158, 125–132. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Sandfort, V.; Yan, K.; Pickhardt, P.J.; Summers, R.M. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci. Rep. 2019, 9, 16884. [Google Scholar] [CrossRef] [Green Version]
Mahapatra, D.; Bozorgtabar, B.; Garnavi, R. Image super-resolution using progressive generative adversarial networks for medical image analysis. Comput. Med. Imaging Graph. 2019, 71, 30–39. [Google Scholar] [CrossRef]
Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ali, H.; Biswas, M.R.; Mohsen, F.; Shah, U.; Alamgir, A.; Mousa, O.; Shah, Z. The role of generative adversarial networks in brain MRI: A scoping review. Insights Imaging 2022, 13, 98. [Google Scholar] [CrossRef]
Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative adversarial networks in medical image augmentation: A review. Comput. Biol. Med. 2022, 105382. [Google Scholar] [CrossRef] [PubMed]
Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs do actually converge? In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3481–3490. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Xiao, Z.; Kreis, K.; Vahdat, A. Tackling the generative learning trilemma with denoising diffusion gans. arXiv 2021, arXiv:2112.07804. [Google Scholar]
Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier gans. arXiv 2016, arXiv:1610.09585. [Google Scholar]
Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. 2015. arXiv 2015, arXiv:1512.09300. [Google Scholar]
Higgins, I.; Matthey, L.; Glorot, X.; Pal, A.; Uria, B.; Blundell, C.; Mohamed, S.; Lerchner, A. Early visual concept learning with unsupervised deep learning. arXiv 2016, arXiv:1606.05579. [Google Scholar]
Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved variational inference with inverse autoregressive flow. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Zhao, S.; Song, J.; Ermon, S. Infovae: Information maximizing variational autoencoders. arXiv 2017, arXiv:1706.02262. [Google Scholar]
Razavi, A.; Van den Oord, A.; Vinyals, O. Generating diverse high-fidelity images with vq-vae-2. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
Salimans, T.; Ho, J. Progressive distillation for fast sampling of diffusion models. arXiv 2022, arXiv:2202.00512. [Google Scholar]
Kong, Z.; Ping, W. On fast sampling of diffusion probabilistic models. arXiv 2021, arXiv:2106.00132. [Google Scholar]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
Han, C.; Hayashi, H.; Rundo, L.; Araki, R.; Shimoda, W.; Muramatsu, S.; Furukawa, Y.; Mauri, G.; Nakayama, H. GAN-based synthetic brain MR image generation. In Proceedings of the IEEE 15th International Symposium on Biomedical Imaging, New York, NY, USA, 16–19 April 2018; pp. 734–738. [Google Scholar]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef] [Green Version]
Guibas, J.T.; Virdi, T.S.; Li, P.S. Synthetic medical images from dual generative adversarial networks. arXiv 2017, arXiv:1709.01872. [Google Scholar]
Platscher, M.; Zopes, J.; Federau, C. Image Translation for Medical Image Generation–Ischemic Stroke Lesions. arXiv 2020, arXiv:2010.02745. [Google Scholar] [CrossRef]
Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2337–2346. [Google Scholar]
Yurt, M.; Dar, S.U.; Erdem, A.; Erdem, E.; Oguz, K.K.; Çukur, T. mustGAN: Multi-stream generative adversarial networks for MR image synthesis. Med. Image Anal. 2021, 70, 101944. [Google Scholar] [CrossRef]
Dar, S.U.; Yurt, M.; Karacan, L.; Erdem, A.; Erdem, E.; Cukur, T. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans. Med. Imaging 2019, 38, 2375–2388. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Yuan, P.; Sun, Y. MM-GAN: 3D MRI data augmentation for medical image segmentation via generative adversarial networks. In Proceedings of the 2020 IEEE International conference on knowledge graph (ICKG), Nanjing, China, 9–1 August 2020; pp. 227–234. [Google Scholar]
Han, C.; Rundo, L.; Araki, R.; Nagano, Y.; Furukawa, Y.; Mauri, G.; Nakayama, H.; Hayashi, H. Combining noise-to-image and image-to-image GANs: Brain MR image augmentation for tumor detection. IEEE Access 2019, 7, 156966–156977. [Google Scholar] [CrossRef]
Kwon, G.; Han, C.; Kim, D.s. Generation of 3D brain MRI using auto-encoding generative adversarial networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October 2019; pp. 118–126. [Google Scholar]
Zhuang, P.; Schwing, A.G.; Koyejo, O. Fmri data augmentation via synthesis. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 1783–1787. [Google Scholar]
Waheed, A.; Goyal, M.; Gupta, D.; Khanna, A.; Al-Turjman, F.; Pinheiro, P.R. Covidgan: Data augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access 2020, 8, 91916–91923. [Google Scholar] [CrossRef] [PubMed]
Han, C.; Rundo, L.; Araki, R.; Furukawa, Y.; Mauri, G.; Nakayama, H.; Hayashi, H. Infinite brain MR images: PGGAN-based data augmentation for tumor detection. In Neural Approaches to Dynamics of Signal Exchanges; Springer: Singapore, 2019; pp. 291–303. [Google Scholar]
Sun, L.; Wang, J.; Huang, Y.; Ding, X.; Greenspan, H.; Paisley, J. An adversarial learning approach to medical image synthesis for lesion detection. IEEE J. Biomed. Health Inform. 2020, 24, 2303–2314. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Zhang, X.; Chen, W.; Wang, K.; Zhang, X. Class-aware multi-window adversarial lung nodule synthesis conditioned on semantic features. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 589–598. [Google Scholar]
Geng, X.; Yao, Q.; Jiang, K.; Zhu, Y. Deep neural generative adversarial model based on VAE+ GAN for disorder diagnosis. In Proceedings of the 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), Zhenjiang, China, 27–29 November 2020; pp. 1–7. [Google Scholar]
Pang, T.; Wong, J.H.D.; Ng, W.L.; Chan, C.S. Semi-supervised GAN-based radiomics model for data augmentation in breast ultrasound mass classification. Comput. Methods Programs Biomed. 2021, 203, 106018. [Google Scholar] [CrossRef] [PubMed]
Barile, B.; Marzullo, A.; Stamile, C.; Durand-Dubief, F.; Sappey-Marinier, D. Data augmentation using generative adversarial neural networks on brain structural connectivity in multiple sclerosis. Comput. Methods Programs Biomed. 2021, 206, 106113. [Google Scholar] [CrossRef]
Shen, T.; Hao, K.; Gou, C.; Wang, F.Y. Mass image synthesis in mammogram with contextual information based on gans. Comput. Methods Programs Biomed. 2021, 202, 106019. [Google Scholar] [CrossRef] [PubMed]
Ambita, A.A.E.; Boquio, E.N.V.; Naval, P.C. Covit-gan: Vision transformer forcovid-19 detection in ct scan imageswith self-attention gan forDataAugmentation. In Proceedings of the International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2021; pp. 587–598. [Google Scholar]
Hirte, A.U.; Platscher, M.; Joyce, T.; Heit, J.J.; Tranvinh, E.; Federau, C. Realistic generation of diffusion-weighted magnetic resonance brain images with deep generative models. Magn. Reson. Imaging 2021, 81, 60–66. [Google Scholar] [CrossRef]
Kaur, S.; Aggarwal, H.; Rani, R. MR image synthesis using generative adversarial networks for Parkinson’s disease classification. In Proceedings of the International Conference on Artificial Intelligence and Applications, Jiangsu, China, 15–17 October 2021; pp. 317–327. [Google Scholar]
Guan, Q.; Chen, Y.; Wei, Z.; Heidari, A.A.; Hu, H.; Yang, X.H.; Zheng, J.; Zhou, Q.; Chen, H.; Chen, F. Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN. Comput. Biol. Med. 2022, 145, 105444. [Google Scholar] [CrossRef]
Ahmad, B.; Sun, J.; You, Q.; Palade, V.; Mao, Z. Brain Tumor Classification Using a Combination of Variational Autoencoders and Generative Adversarial Networks. Biomedicines 2022, 10, 223. [Google Scholar] [CrossRef]
Pombo, G.; Gray, R.; Cardoso, M.J.; Ourselin, S.; Rees, G.; Ashburner, J.; Nachev, P. Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3d deep generative models. Med. Image Anal. 2022, 102723. [Google Scholar] [CrossRef]
Neff, T.; Payer, C.; Stern, D.; Urschler, M. Generative adversarial network based synthesis for supervised medical image segmentation. In Proceedings of the OAGM and ARW Joint Workshop, Vienna, Austria, 10–12 May 2017; p. 4. [Google Scholar]
Mok, T.C.; Chung, A. Learning data augmentation for brain tumor segmentation with coarse-to-fine generative adversarial networks. In Proceedings of the International MICCAI Brainlesion Workshop, Granada, Spain, 16 September 2018; pp. 70–80. [Google Scholar]
Shin, H.C.; Tenenholtz, N.A.; Rogers, J.K.; Schwarz, C.G.; Senjem, M.L.; Gunter, J.L.; Andriole, K.P.; Michalski, M. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Granada, Spain, 16 September 2018; pp. 1–11. [Google Scholar]
Jiang, J.; Hu, Y.C.; Tyagi, N.; Zhang, P.; Rimner, A.; Deasy, J.O.; Veeraraghavan, H. Cross-modality (CT-MRI) prior augmented deep learning for robust lung tumor segmentation from small MR datasets. Med. Phys. 2019, 46, 4392–4404. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, H.; Loew, M.; Ko, H. COVID-19 CT image synthesis with a conditional generative adversarial network. IEEE J. Biomed. Health Inform. 2020, 25, 441–452. [Google Scholar] [CrossRef] [PubMed]
Qasim, A.B.; Ezhov, I.; Shit, S.; Schoppe, O.; Paetzold, J.C.; Sekuboyina, A.; Kofler, F.; Lipkova, J.; Li, H.; Menze, B. Red-GAN: Attacking class imbalance via conditioned generation. Yet another medical imaging perspective. In Proceedings of the Medical Imaging with Deep Learning, Montreal, QC, Canada, 6–9 July 2020; pp. 655–668. [Google Scholar]
Shi, H.; Lu, J.; Zhou, Q. A novel data augmentation method using style-based GAN for robust pulmonary nodule segmentation. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 2486–2491. [Google Scholar]
Shen, Z.; Ouyang, X.; Xiao, B.; Cheng, J.Z.; Shen, D.; Wang, Q. Image synthesis with disentangled attributes for chest X-ray nodule augmentation and detection. Med. Image Anal. 2022, 102708. [Google Scholar] [CrossRef] [PubMed]
Chartsias, A.; Joyce, T.; Dharmakumar, R.; Tsaftaris, S.A. Adversarial image synthesis for unpaired multi-modal cardiac data. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Québec City, QC, Canada, 10 September 2017; pp. 3–13. [Google Scholar]
Wolterink, J.M.; Dinkla, A.M.; Savenije, M.H.; Seevinck, P.R.; van den Berg, C.A.; Išgum, I. Deep MR to CT synthesis using unpaired data. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Québec City, QC, Canada, 10 September 2017; pp. 14–23. [Google Scholar]
Nie, D.; Trullo, R.; Lian, J.; Wang, L.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical image synthesis with deep convolutional adversarial networks. IEEE Trans. Biomed. Eng. 2018, 65, 2720–2730. [Google Scholar] [CrossRef]
Armanious, K.; Jiang, C.; Fischer, M.; Küstner, T.; Hepp, T.; Nikolaou, K.; Gatidis, S.; Yang, B. MedGAN: Medical image translation using GANs. Comput. Med. Imaging Graph. 2020, 79, 101684. [Google Scholar] [CrossRef]
Yang, H.; Lu, X.; Wang, S.H.; Lu, Z.; Yao, J.; Jiang, Y.; Qian, P. Synthesizing multi-contrast MR images via novel 3D conditional Variational auto-encoding GAN. Mob. Netw. Appl. 2021, 26, 415–424. [Google Scholar] [CrossRef]
Sikka, A.; Skand; Virk, J.S.; Bathula, D.R. MRI to PET Cross-Modality Translation using Globally and Locally Aware GAN (GLA-GAN) for Multi-Modal Diagnosis of Alzheimer’s Disease. arXiv 2021, arXiv:2108.02160. [Google Scholar]
Amirrajab, S.; Lorenz, C.; Weese, J.; Pluim, J.; Breeuwer, M. Pathology Synthesis of 3D Consistent Cardiac MR Images Using 2D VAEs and GANs. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Singapore, 18 September 2022; pp. 34–42. [Google Scholar]
Pesteie, M.; Abolmaesumi, P.; Rohling, R.N. Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans. Med. Imaging 2019, 38, 2807–2820. [Google Scholar] [CrossRef]
Chadebec, C.; Thibeau-Sutre, E.; Burgos, N.; Allassonnière, S. Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2879–2896. [Google Scholar] [CrossRef]
Huo, J.; Vakharia, V.; Wu, C.; Sharan, A.; Ko, A.; Ourselin, S.; Sparks, R. Brain Lesion Synthesis via Progressive Adversarial Variational Auto-Encoder. In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Singapore, 18 September 2022; pp. 101–111. [Google Scholar]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Imran, A.-A.-Z.; Terzopoulos, D. Multi-adversarial variational autoencoder networks. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 777–782. [Google Scholar]
Qiang, N.; Dong, Q.; Liang, H.; Ge, B.; Zhang, S.; Sun, Y.; Zhang, C.; Zhang, W.; Gao, J.; Liu, T. Modeling and augmenting of fMRI data using deep recurrent variational auto-encoder. J. Neural Eng. 2021, 18, 0460b6. [Google Scholar] [CrossRef]
Madan, Y.; Veetil, I.K.; V, S.; EA, G.; KP, S. Synthetic Data Augmentation of MRI using Generative Variational Autoencoder for Parkinson’s Disease Detection. In Evolution in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2022; pp. 171–178. [Google Scholar]
Chadebec, C.; Allassonnière, S. Data Augmentation with Variational Autoencoders and Manifold Sampling. In Deep Generative Models, and Data Augmentation, Labelling, and Imperfections; Springer: Berlin/Heidelberg, Germany, 2021; pp. 184–192. [Google Scholar]
Liang, J.; Chen, J. Data augmentation of thyroid ultrasound images using generative adversarial network. In Proceedings of the 2021 IEEE International Ultrasonics Symposium (IUS), Xi’an, China, 12–15 September 2021; pp. 1–4. [Google Scholar]
Gan, M.; Wang, C. Esophageal optical coherence tomography image synthesis using an adversarially learned variational autoencoder. Biomed. Opt. Express 2022, 13, 1188–1201. [Google Scholar] [CrossRef]
Hu, Q.; Li, H.; Zhang, J. Domain-Adaptive 3D Medical Image Synthesis: An Efficient Unsupervised Approach. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 495–504. [Google Scholar]
Biffi, C.; Oktay, O.; Tarroni, G.; Bai, W.; Marvao, A.D.; Doumou, G.; Rajchl, M.; Bedair, R.; Prasad, S.; Cook, S.; et al. DLearning interpretable anatomical features through deep generative models: Application to cardiac remodeling. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Grenada, Spain, 16–20 September 2018; pp. 464–471. [Google Scholar]
Volokitin, A.; Erdil, E.; Karani, N.; Tezcan, K.C.; Chen, X.; Gool, L.V.; Konukoglu, E. Modelling the distribution of 3D brain MRI using a 2D slice VAE. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 657–666. [Google Scholar]
Huang, Q.; Qiao, C.; Jing, K.; Zhu, X.; Ren, K. Biomarkers identification for Schizophrenia via VAE and GSDAE-based data augmentation. Comput. Biol. Med. 2022, 105603. [Google Scholar] [CrossRef] [PubMed]
Beetz, M.; Banerjee, A.; Sang, Y.; Grau, V. Combined Generation of Electrocardiogram and Cardiac Anatomy Models Using Multi-Modal Variational Autoencoders. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2020; pp. 1–4. [Google Scholar]
Sundgaard, J.V.; Hannemose, M.R.; Laugesen, S.; Bray, P.; Harte, J.; Kamide, Y.; Tanaka, C.; Paulsen, R.R.; Christensen, A.N. Multi-modal data generation with a deep metric variational autoencoder. arXiv 2022, arXiv:2202.03434. [Google Scholar] [CrossRef] [PubMed]
Pinaya, W.H.; Tudosiu, P.D.; Dafflon, J.; Da Costa, P.F.; Fernandez, V.; Nachev, P.; Ourselin, S.; Cardoso, M.J. Brain imaging generation with latent diffusion models. In Proceedings of the MICCAI Workshop on Deep Generative Models, Singapore, 22 September 2022; pp. 117–126. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
Mao, X.; Li, Q.; Xie, H.; Lau, R.; Wang, Z.; Smolley, S. Least squares generative adversarial networks. arXiv 2016, arXiv:1611.04076. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Fernandez, V.; Pinaya, W.H.L.; Borges, P.; Tudosiu, P.D.; Graham, M.S.; Vercauteren, T.; Cardoso, M.J. Can segmentation models be trained with fully synthetically generated data? In Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging, Singapore, 18–22 September 2022; pp. 79–90. [Google Scholar]
Isensee, F.; Petersen, J.; Klein, A.; Zimmerer, D.; Jaeger, P.F.; Kohl, S.; Wasserthal, J.; Koehler, G.; Norajitra, T.; Wirkert, S.; et al. nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv 2018, arXiv:1809.10486. [Google Scholar]
Lyu, Q.; Wang, G. Conversion Between CT and MRI Images Using Diffusion and Score-Matching Models. arXiv 2022, arXiv:2209.12104. [Google Scholar]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
Nyholm, T.; Svensson, S.; Andersson, S.; Jonsson, J.; Sohlin, M.; Gustafsson, C.; Kjellén, E.; Söderström, K.; Albertsson, P.; Blomqvist, L.; et al. MR and CT data with multiobserver delineations of organs in the pelvic area—Part of the Gold Atlas project. Med. Phys. 2018, 45, 1295–1300. [Google Scholar] [CrossRef] [Green Version]
Dorjsembe, Z.; Odonchimed, S.; Xiao, F. Three-Dimensional Medical Image Synthesis with Denoising Diffusion Probabilistic Models. In Proceedings of the Medical Imaging with Deep Learning, Zurich, Switzerland, 18–22 September 2022. [Google Scholar]
Packhäuser, K.; Folle, L.; Thamm, F.; Maier, A. Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. arXiv 2022, arXiv:2211.01323. [Google Scholar]
Moghadam, P.A.; Van Dalen, S.; Martin, K.C.; Lennerz, J.; Yip, S.; Farahani, H.; Bashashati, A. A Morphology Focused Diffusion Probabilistic Model for Synthesis of Histopathology Images. arXiv 2022, arXiv:2209.13167. [Google Scholar]
Chambon, P.; Bluethgen, C.; Delbrouck, J.B.; Van der Sluijs, R.; Połacin, M.; Chaves, J.M.Z.; Abraham, T.M.; Purohit, S.; Langlotz, C.P.; Chaudhari, A. RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arXiv 2022, arXiv:2211.12737. [Google Scholar]
Wolleb, J.; Sandkühler, R.; Bieder, F.; Cattin, P.C. The Swiss Army Knife for Image-to-Image Translation: Multi-Task Diffusion Models. arXiv 2022, arXiv:2204.02641. [Google Scholar]
Sagers, L.W.; Diao, J.A.; Groh, M.; Rajpurkar, P.; Adamson, A.S.; Manrai, A.K. Improving dermatology classifiers across populations using images generated by large diffusion models. arXiv 2022, arXiv:2211.13352. [Google Scholar]
Peng, W.; Adeli, E.; Zhao, Q.; Pohl, K.M. Generating Realistic 3D Brain MRIs Using a Conditional Diffusion Probabilistic Model. arXiv 2022, arXiv:2212.08034. [Google Scholar]
Ali, H.; Murad, S.; Shah, Z. Spot the fake lungs: Generating synthetic medical images using neural diffusion models. In Proceedings of the Artificial Intelligence and Cognitive Science: 30th Irish Conference, AICS 2022, Munster, Ireland, 8–9 December 2022; pp. 32–39. [Google Scholar]
Saeed, S.U.; Syer, T.; Yan, W.; Yang, Q.; Emberton, M.; Punwani, S.; Clarkson, M.J.; Barratt, D.C.; Hu, Y. Bi-parametric prostate MR image synthesis using pathology and sequence-conditioned stable diffusion. arXiv 2023, arXiv:2303.02094. [Google Scholar]
Weber, T.; Ingrisch, M.; Bischl, B.; Rügamer, D. Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis. arXiv 2023, arXiv:2303.11224. [Google Scholar]
Khader, F.; Mueller-Franzes, G.; Arasteh, S.T.; Han, T.; Haarburger, C.; Schulze-Hagen, M.; Schad, P.; Engelhardt, S.; Baessler, B.; Foersch, S.; et al. Medical Diffusion–Denoising Diffusion Probabilistic Models for 3D Medical Image Generation. arXiv 2022, arXiv:2211.03364. [Google Scholar]
Özbey, M.; Dar, S.U.; Bedel, H.A.; Dalmaz, O.; Özturk, Ş.; Güngör, A.; Çukur, T. Unsupervised medical image translation with adversarial diffusion models. arXiv 2022, arXiv:2207.08208. [Google Scholar]
Meng, X.; Gu, Y.; Pan, Y.; Wang, N.; Xue, P.; Lu, M.; He, X.; Zhan, Y.; Shen, D. A Novel Unified Conditional Score-based Generative Framework for Multi-modal Medical Image Completion. arXiv 2022, arXiv:2207.03430. [Google Scholar]
Kim, B.; Ye, J.C. Diffusion deformable model for 4D temporal medical image generation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 539–548. [Google Scholar]
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion models for medical image analysis: A comprehensive survey. arXiv 2022, arXiv:2211.07804. [Google Scholar]
Abdollahi, B.; Tomita, N.; Hassanpour, S. Data Augmentation in Training Deep Learning Models for Medical Image Analysis; Springer: Berlin/Heidelberg, Germany, 2020; pp. 167–180. [Google Scholar]
Huang, H.; Li, Z.; He, R.; Sun, Z.; Tan, T. Introvae: Introspective variational autoencoders for photographic image synthesis. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Amyar, A.; Ruan, S.; Vera, P.; Decazes, P.; Modzelewski, R. RADIOGAN: Deep convolutional conditional generative adversarial network to generate PET images. In Proceedings of the 2020 7th International Conference on Bioinformatics Research and Applications, Berlin, Germany, 13–15 September 2020; pp. 28–33. [Google Scholar]
Bullitt, E.; Zeng, D.; Gerig, G.; Aylward, S.; Joshi, S.; Smith, J.K.; Lin, W.; Ewend, M.G. Vessel tortuosity and brain tumor malignancy: A blinded study1. Acad. Radiol. 2005, 12, 1232–1240. [Google Scholar] [CrossRef] [Green Version]
Staal, J.; Abràmoff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef] [PubMed]
Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X.; Heng, P.A.; Cetin, I.; Lekadir, K.; Camara, O.; Ballester, M.A.G.; et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 2018, 37, 2514–2525. [Google Scholar] [CrossRef]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar]
Bai, W.; Shi, W.; de Marvao, A.; Dawes, T.J.; O’Regan, D.P.; Cook, S.A.; Rueckert, D. A bi-ventricular cardiac atlas built from 1000+ high resolution MR images of healthy subjects and an analysis of shape and motion. Med. Image Anal. 2015, 26, 133–145. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Van Ginneken, B.; Stegmann, M.B.; Loog, M. Segmentation of anatomical structures in chest radiographs using supervised methods: A comparative study on a public database. Med. Image Anal. 2006, 10, 19–40. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Van Essen, D.C.; Smith, S.M.; Barch, D.M.; Behrens, T.E.; Yacoub, E.; Ugurbil, K.; WU-Minn HCP Consortium. The WU-Minn human connectome project: An overview. Neuroimage 2013, 80, 62–79. [Google Scholar] [CrossRef] [Green Version]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef]
Groh, M.; Harris, C.; Soenksen, L.; Lau, F.; Han, R.; Kim, A.; Koochek, A.; Badri, O. Evaluating deep neural networks trained on clinical images in dermatology with the fitzpatrick 17k dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1820–1828. [Google Scholar]
Yang, X.; He, X.; Zhao, J.; Zhang, Y.; Zhang, S.; Xie, P. COVID-CT-dataset: A CT scan dataset about COVID-19. arXiv 2020, arXiv:2003.13865. [Google Scholar]
Soares, E.; Angelov, P.; Biaso, S.; Froes, M.H.; Abe, D.K. SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv 2020. [Google Scholar] [CrossRef]
Johnson, A.E.; Pollard, T.J.; Greenbaum, N.R.; Lungren, M.P.; Deng, C.y.; Peng, Y.; Lu, Z.; Mark, R.G.; Berkowitz, S.J.; Horng, S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv 2019, arXiv:1901.07042. [Google Scholar]
Jones, S.; Tillin, T.; Park, C.; Williams, S.; Rapala, A.; Al Saikhan, L.; Eastwood, S.V.; Richards, M.; Hughes, A.D.; Chaturvedi, N. Cohort Profile Update: Southall and Brent Revisited (SABRE) study: A UK population-based comparison of cardiovascular disease and diabetes in people of European, South Asian and African Caribbean heritage. Int. J. Epidemiol. 2020, 49, 1441–1442e. [Google Scholar] [CrossRef]
Saha, A.; Twilt, J.; Bosma, J.; van Ginneken, B.; Yakar, D.; Elschot, M.; Veltman, J.; Fütterer, J.; de Rooij, M.; Huisman, H. Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI CAI Challenge. In Proceedings of the RSNA, Chicago, IL, USA, 27 November–1 December 2022. [Google Scholar]
Kynkäänniemi, T.; Karras, T.; Laine, S.; Lehtinen, J.; Aila, T. Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, 14 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 240–248. [Google Scholar]
Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 317. [Google Scholar]
Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef] [Green Version]
Bounliphone, W.; Belilovsky, E.; Blaschko, M.B.; Antonoglou, I.; Gretton, A. A test of relative similarity for model selection in generative models. arXiv 2015, arXiv:1511.04581. [Google Scholar]
Vaserstein, L.N. Markov processes over denumerable products of spaces, describing large systems of automata. Probl. Peredachi Informatsii 1969, 5, 64–72. [Google Scholar]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Nguyen, X.; Wainwright, M.J.; Jordan, M.I. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 2010, 56, 5847–5861. [Google Scholar] [CrossRef] [Green Version]
Sheikh, H.R.; Bovik, A.C. A visual information fidelity approach to video quality assessment. First Int. Workshop Video Process. Qual. Metrics Consum. Electron. 2005, 7, 2117–2128. [Google Scholar]
Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
Tavse, S.; Varadarajan, V.; Bachute, M.; Gite, S.; Kotecha, K. A Systematic Literature Review on Applications of GAN-Synthesized Images for Brain MRI. Future Internet 2022, 14, 351. [Google Scholar] [CrossRef]
Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, PMLR, Online, 18–24 July 2021; pp. 8821–8831. [Google Scholar]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36479–36494. [Google Scholar]
Kang, M.; Zhu, J.Y.; Zhang, R.; Park, J.; Shechtman, E.; Paris, S.; Park, T. Scaling up GANs for Text-to-Image Synthesis. arXiv 2023, arXiv:2303.05511. [Google Scholar]
Sauer, A.; Karras, T.; Laine, S.; Geiger, A.; Aila, T. Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis. arXiv 2023, arXiv:2301.09515. [Google Scholar]
Delgado, J.M.D.; Oyedele, L. Deep learning with small datasets: Using autoencoders to address limited datasets in construction management. Appl. Soft Comput. 2021, 112, 107836. [Google Scholar] [CrossRef]
Caterini, A.L.; Doucet, A.; Sejdinovic, D. Hamiltonian variational auto-encoder. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
He, Y.; Wang, L.; Yang, F.; Clarysse, P.; Robini, M.; Zhu, Y. Effect of different configurations of diffusion gradient directions on accuracy of diffusion tensor estimation in cardiac DTI. In Proceedings of the 16th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 21–24 October 2022; Volume 1, pp. 437–441. [Google Scholar]
Talo, M.; Baloglu, U.B.; Yıldırım, Ö.; Acharya, U.R. Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 2019, 54, 176–188. [Google Scholar] [CrossRef]
Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Gupta, B.B.; Chen, X.; Wang, X. A survey of deep active learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
Rahimi, S.; Oktay, O.; Alvarez-Valle, J.; Bharadwaj, S. Addressing the exorbitant cost of labeling medical images with active learning. In Proceedings of the International Conference on Machine Learning in Medical Imaging and Analysis, Barcelona, Spain, 24–25 May 2021; p. 1. [Google Scholar]

Figure 1. Distribution of publications on deep generative models applied to medical imaging data augmentation as of 2022. (a) The number of publications per architecture type and year. (b) The distribution of publications by modality, with CT and MRI being the most-commonly studied imaging modalities. Note that for cross-modal translation tasks, both the source and target modalities are counted in this plot. (c) The distribution of publications by downstream task, with segmentation and classification being the most common tasks in medical imaging. This figure illustrates the increasing interest in using deep generative models for data augmentation in medical imaging and highlights the diversity of tasks and modalities that have been addressed in the literature.

Figure 2. Illustration of the three deep generative models that are commonly used for medical image augmentation: (a) generative adversarial networks (GANs), which consist of a generator and a discriminator network trained adversarially to generate realistic data; (b) variational autoencoders (VAEs), which consist of an encoder and a decoder network trained to reconstruct data and learn a compact latent representation; and (c) diffusion models, which consist of a forward and backward flow of information through a series of steps to model the data distribution.

Figure 3. Adapted from Sandfort et al. [12], the study presented examples of true contrast CT scans and synthetic non-contrast CT scans generated using a CycleGAN. The left columns show the true contrast CT scans, while the right columns present the synthetic non-contrast CT scans. It is observed that the synthetic non-contrast images generated with CycleGAN appeared convincing, even in the presence of significant abnormalities in the contrast CT scans. The last column on the right displays unrelated examples of non-contrast images. The letters A to F in this figure represent various abnormalities/pathologies, and the arrows indicate their corresponding synthetic non-contrast CT images. However, they are not essential for understanding the main purpose of the figure, which is to demonstrate the generator's ability to produce realistic images.

Figure 4. Synthesized MRIs using a diffusion-based probabilistic model (DDPM) [20] trained on the BraTS2020 dataset. The first row shows a sample of original images, while the second row shows a sample of synthesized images generated using the DDPM.

Figure 5. Illustration of the augmentation pipeline for a generative-model-based data augmentation. The input data, x, are fed into the generative model, g, which synthesizes additional data samples to augment the training set. The downstream architecture, e, which may take the form of a convolutional neural network or U-Net, is then trained on a combination of the synthesized data and real data from the training set. The training set is split into training and validation sets, where the validation set contains only real data for evaluation purposes. After training, the model can be evaluated using various test sets.

Figure 6. Figure presenting a comparison between synthesized MRIs generated by a VAE and a Hamiltonian VAE [159]. Both models were trained on a limited training set of 100 images from BraTS2020 Challenge dataset. The first row showcases original images, while the second and third rows present synthesized images generated by the VAE and Hamiltonian VAE, respectively. While the images generated by both models appear slightly fuzzy, the Hamiltonian VAE demonstrates enhanced performance in generating realistic images. This comparison highlights the robustness of the VAE and Hamiltonian VAE for generating new images from a small dataset [158].

Table 1. Overview of GAN-based architectures for medical image augmentation, including hybrid status of architectures (if applicable), indicating used combinations of VAEs, GANs, and DMs.

Reference	Architecture	Hybrid Status	Dataset	Modality	3D	Eval. Metrics
Classification
[42]	DCGAN, ACGAN		Private	CT		Sens., Spec.
[41]	DCGAN, WGAN		BraTS2016	MR		Acc.
[49]	PGGAN, MUNIT		BraTS2016	MR	✓	Acc., Sens., Spec.,
[50]	AE-GAN	Hybrid (V + G)	BraTS2018, ADNI	MR	✓	MMD, MS-SSIM
[51]	ICW-GAN		OpenfMRI, HCP	MR	✓	Acc., Prec., F1
			NeuroSpin, IBC			Recall
[52]	ACGAN		IEEE CCX	X-ray		Acc., Sens., Spec.
						Prec., Recall, F1
[53]	PGGAN		BraTS2016	MR		Acc., Sens., Spec.
[54]	ANT-GAN		BraTS2018	MR		Acc.
[55]	MG-CGAN		LIDC-IDRI	CT		Acc., F1
[56]	FC-GAN	Hybrid (V + G)	ADHD, ABIDE	MR		Acc., Sens., Spec., AUC
[57]	TGAN		Private	Ultrasound		Acc., Sens., Spec.
[58]	AAE		Private	MR		Prec., Recall, F1
[59]	DCGAN, InfillingGAN		DDSM	CT		LPIPS, Recall
[60]	SAGAN		COVID-CT, SARS-COV2	CT		Acc.
[61]	StyleGAN		Private	MR		-
[62]	DCGAN		PPMI	MR		Acc., Spec., Sens.
[63]	TMP-GAN		CBIS-DDMS, Private	CT		Prec., Recall, F1, AUC
[64]	VAE-GAN	Hybrid (V + G)	Private	MR		Acc., Sens., Spec.
[65]	CounterSynth		UK Biobank, OASIS	MR	✓	Acc., MSE, SSIM, MAE
Segmentation
[43]	CGAN		DRIVE	Fundus photography		KLD, F1
[66]	DCGAN		SCR	X-ray		Dice, Hausdorff
[67]	CB-GAN		BraTS2015	MR		Dice, Prec., Sens.
[68]	Pix2Pix		BraTS2015, ADNI	MR	✓	Dice
[12]	CycleGAN		NIHPCT	CT		Dice
[69]	CM-GAN		Private	MR		KLD, Dice
						hausdorff
[70]	CGAN		COVID-CT	CT		FID, PSNR, SSIM, RMSE
[71]	Red-GAN		BraTS2015, ISIC	MR		Dice
[44]	Pix2Pix, SPADE, CycleGAN		Private	MR		Dice
[72]	StyleGAN		LIDC-IDRI	CT		Dice, Pres., Sens.
[73]	DCGAN, GatedConv		Private	X-ray		MAE, PSNR, SSIM, FID, AUC
Cross-modal translation
[74]	CycleGAN		Private	MR ↔ CT	✓	Dice
[75]	CycleGAN		Private	MR → CT		MAE, PSNR
[76]	Pix2Pix		ADNI, Private	MR → CT	✓	MAE, PSNR, Dice
[77]	MedGAN		Private	PET → CT		SSIM, PSNR, MSE
						VIF, UQI, LPIPS
[47]	pGAN, CGAN		BraTS2015, MIDAS, IXI	T1 ⟷ T2		SSIM, PSNR
[69]	CM-GAN		Private	MR		KLD, Dice
						hausdorff
[46]	mustGAN		IXI, ISLES	T1 ↔ T2 ↔ PD		SSIM, PSNR
[78]	CAE-ACGAN	Hybrid (V + G)	Private	CT → MR	✓	PSNR, SSIM, MAE
[79]	GLA-GAN		ADNI	MR → PET		SSIM, PSNR, MAE
						Acc., F1
Other
[80]	VAE-CGAN	Hybrid (V + G)	ACDC	MR	✓	-

Note: V = variational autoencoders, G = generative adversarial networks.

Table 2. Overview of VAE-based architectures for medical image augmentation, including hybrid status of architectures (if applicable), indicating the combination of VAEs and GANs used in each study.

Reference	Architecture	Hybrid Status	Dataset	Modality	3D	Eval. Metrics
Classification
[81]	ICVAE		Private	MR		Acc., Sens., Spec.
				Ultrasound		Dice, Hausdroff, …
[51]	CVAE		OpenfMRI, HCP	MR	✓	Acc., Prec., F1
			NeuroSpin, IBC			Recall
[82]	GA-VAE		ADNI, AIBL	MR	✓	Acc., Spec., Sens.
[85]	MAVENs	Hybrid (V + G)	APCXR	X-ray		FID, F1
[61]	IntroVAE	Hybrid (V + G)	Private	MR		-
[86]	DR-VAE		HCP	MR		-
[64]	VAE-GAN	Hybrid (V + G)	Private	MR		Acc., Sens., Spec.
[87]	VAE		Private	MR		Acc.
[88]	RH-VAE		OASIS	MR	✓	Acc.
Segmentation
[89]	VAE-GAN	Hybrid (V + G)	Private	Ultrasound		MMD, 1-NN, MS-SSIM
[90]	AL-VAE	Hybrid (V + G)	Private	OCT ¹		MMD, MS, WD
[83]	PA-VAE	Hybrid (V + G)	Private	MR	✓	PSNR, SSIM, Dice
						NMSE, Jacc., …
Cross-modal translation
[78]	CAE-ACGAN	Hybrid (V + G)	Private	CT → MR	✓	PSNR, SSIM, MAE
[91]	3D-UDA		Private	FLAIR ↔ T1 ↔ T2	✓	SSIM, PSNR, Dice
Other
[92]	CVAE		ACDC, Private	MR	✓	-
[92]	CVAE		Private	MR	✓	Dice, Hausdorff
[93]	Slice-to-3D-VAE		HCP	MR	✓	MMD, MS-SSIM
[94]	GS-VDAE		MLSP	MR		Acc.
[80]	VAE-CGAN	Hybrid (V + G)	ACDC	MR	✓	-
[95]	MM-VAE		UK Biobank	MR	✓	MMD
[96]	DM-VAE		Private	Otoscopy		-

¹ OCT stands for “esophageal optical coherence tomography”. V = variational autoencoders, G = generative adversarial networks.

Table 3. Overview of the diffusion-model-based architectures for medical image augmentation that have been published to date (to our knowledge, no such studies were released before 2022). The table includes the reference, architecture name, and hybrid status (if applicable), indicating the combination of VAEs, GANs, and DMs used in each study. The table provides a useful summary of the current state of the art in this area and can help guide researchers in selecting appropriate approaches for their specific needs.

Reference	Architecture	Hybrid Status	Dataset	Modality	3D	Eval. Metrics
Classification
[97]	CLDM		UK Biobank	MR	✓	FID, MS-SSIM
[106]	DDPM		ICTS	MR	✓	MS-SSIM
[107]	LDM		CXR8	X-ray		AUC
[108]	MF-DPM		TCGA	Dermoscopy		Recall
[109]	RoentGen	Hybrid (D + V)	MIMIC-CXR	X-ray		Accuracy
[110]	IITM-Diffusion		BraTS2020	MR		-
[111]	DALL-E2		Fitzpatrick	Dermoscopy		Accuracy
[112]	CDDPM		ADNI	MR	✓	MMD, MS-SSIM, FID
[113]	DALL-E2		Private	X-ray		-
[114]	DDPM		OPMR	MR	✓	Acc., Dice
[115]	LDM		MaCheX	X-ray		MSE, PSNR, SSIM
Segmentation
[116]	DDPM		ADNI, MRNet,	MR, CT		Dice
			LIDC-IDRI
[101]	brainSPADE	Hybrid (V + G + D)	SABRE, BraTS2015	MR		Dice, Accuracy
			OASIS, ABIDE			Precision, Recall
[110]	IITM-Diffusion		BraTS2020	MR		-
Cross-modal translation
[117]	SynDiff	Hybrid (D + G)	IXI, BraTS2015	CT → MR		PSNR, SSIM
			MRI-CT-PTGA
[118]	UMM-CSGM		BraTS2019	FLAIR ↔ T1 ↔ T1c ↔ T2		PSNR, SSIM, MAE
[103]	CDDPM		MRI-CT-PTGA	CT ↔ MR		PSNR, SSIM
Other
[119]	DDM		ACDC	MR	✓	PSNR, NMSE, DICE

Note: V = variational autoencoders, G = generative adversarial networks, D = diffusion models.

Table 4. Summary of the datasets utilized in various publications of deep generative models, organized by modality and body part. For each dataset, the corresponding availability is indicated as public, private, or under certain conditions (UC). Additionally, if a public link for the dataset is available, it is provided.

Abbreviation	Reference	Availability	Dataset	Modality	Anatomy
ADNI		UC	Alzheimers disease neuroimaging Initiative	MR, PET	Brain
BraTS2015		Public	Brain tumor segmentation challenge	MR	Brain
BraTS2016		Public	Brain tumor segmentation challenge	MR	Brain
BraTS2017		Public	Brain tumor segmentation challenge	MR	Brain
BraTS2019		Public	Brain tumor segmentation challenge	MR	Brain
BraTS2020		Public	Brain tumor segmentation challenge	MR	Brain
IEEE CCX		Public	IEEE Covid Chest X-ray dataset	X-ray	Lung
UK Biobank		UC	UK Biobank	MR	Brain, Heart
NIHPCT		Public	National Institutes of Health Pancreas-CT dataset	CT	Kidney
DataDecathlon		Public	Medical Segmentation Decathlon dataset	CT	Liver, Spleen
MIDAS	[124]	Public	Michigan institute for data science	MR	Brain
IXI		Public	Information eXtraction from Images Dataset	MR	Brain
DRIVE	[125]	Public	Digital Retinal Images for Vessel Extraction	Fundus photography	Retinal fundus
ACDC	[126]	Public	Automated Cardiac Diagnosis Challenge	MR	Heart
MRI-CT PTGA	[105]	Public	MRI-CT Part of the Gold Atlas project	CT, MR	Pelvis
ICTS	[50]	Public	National Taiwan University Hospital’s Intracranial Tumor Segmentation dataset	MR	Brain
CXR8	[127]	Public	ChestX-ray8	X-ray	Lung
C19CT		Public	COVID-19 CT segmentation dataset	CT	Lung
TCGA		Private	The Cancer Genome Atlas Program	Microscopy	-
UKDHP	[128]	UC	UK Digital Heart Project	MR	Heart
SCR	[129]	Public	SCR database: Segmentation in Chest Radiographs	X-ray	Lung
HCP	[130]	Public	Human connectom project dataset	MR	Brain
AIBL		UC	Australian Imaging Biomarkers and Lifestyle Study of Ageing	MR, PET	Brain
OpenfMRI		Public	OpenfMRI	MR	Brain
IBC		Public	Individual Brain Charting	MR	Brain
NeuroSpin		Private	Institut des sciences du vivant Frédéric Joliot	MR	Brain
OASIS		Public	The Open Access Series of Imaging Studies	MR	Brain
APCXR	[131]	Public	The anterior-posterior Chest X-Ray dataset	X-ray	Lung
Fitzpatrick	[132]	Public	Fitzpatrick17k dataset	Dermoscopy	Skin
ISIC		Public	The International Skin Imaging Collaboration dataset	Dermoscopy	Skin
DDSM		Public	The Digital Database for Screening Mammography	CT	Breast
CBIS-DDMS		Public	Curated Breast Imaging Subset of DDSM	CT	Breast
LIDC-IDRI		Public	The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI)	CT	Lung
COVID-CT	[133]	Public	-	CT	Lung
SARS-COV2	[134]	Public		CT	Lung
MIMIC-CXR	[135]	Public	Massachusetts Institute of Technology	CT	Lung
PPMI		Public	Parkinson’s Progression Markers Initiative	MR	Brain
ADHD		Public	Attention Deficit Hyperactivity Disorder	MR	Brain
MRNet		Public	MRNet dataset	MR	Knee
MLSP		Public	MLSP 2014 Schizophrenia Classification Challenge	MR	Brain
SABRE	[136]	Public	The Southall and Brent Revisited cohort	MR	Brain, Heart
ABIDE		Public	The Autism Brain Imaging Data Exchange	MR	Brain
OPMR	[137]	Public	Open-source prostate MR data	MR	Pelvis
MaCheX	[115]	Public	Massive Chest X-ray Dataset	X-ray	Lung

Table 5. Summary of quantitative measures used in the reviewed publications.

Abbrv.	Reference	Metric Name	Description
Dice	[143]	Sørensen–Dice coefficient	A measure of the similarity between two sets of data, calculated as twice the size of the intersection of the two sets divided by the sum of the sizes of the two sets
Hausdorff	[144]	Hausdorff distance	A measure of the similarity between two sets of points in a metric space
FID	[100]	Fréchet inception distance	A measure of the distance between the distributions of features extracted from real and generated images, based on the activation patterns of a pretrained inception model
IS	[142]	Inception score	A measure of the quality and diversity of generated images, based on the activation patterns of a pretrained Inception model
MMD	[145]	Maximum mean discrepancy	A measure of the difference between two probability distributions, defined as the maximum value of the difference between the two means
1-NN	[146]	1-nearest neighbor score	A method for classification or regression that involves finding the data point in a dataset that is most similar to a given query point
(MS-)SSIM	[139]	(Multi-scale) structural similarity	A measure of the similarity between two images based on their structural information, taking into account luminance, contrast, and structure.
MS	[147]	Mode score	A measure of the quality of samples generated with two probabilistic generative models based on the difference in maximum mean discrepancies between a reference distribution and simulated distribution
WD	[148]	Wasserstein distance	A measure of the distance between two probability distributions, defined as the minimum amount of work required to transform one distribution into the other
PSNR	[138]	Peak signal-to-noise ratio	A measure of the quality of an image or video, based on the ratio between the maximum possible power of a signal and the power of the noise that distorts the signal
(N)MSE	-	(Normalized) mean squared error	A measure of the average squared difference between the predicted and actual values
Jacc.	[143]	Jaccard index	A measure of the overlap between two sets of data, calculated as the ratio of the area of intersection to the area of union
MAE	-	Mean absolute error	A measure of the average magnitude of the errors between the predicted and actual values
AUC	[149]	Area under the curve	A measure of the performance of a binary classifier, calculated as the area under the receiver operating characteristic curve
LPIPS	[141]	Learned perceptual image patch similarity	An evaluation metric that measures the distance between two images in a perceptual space based on the activation of a deep CNN
KLD	[150]	Kullback–Leibler divergence	A measure of the difference between two probability distributions, often used to compare the similarity of the distributions, with a smaller KL divergence indicating a greater similarity
VIF	[151]	Visual information fidelity	A measure that quantifies the Shannon information that is shared between the reference and the distorted image
UQI	[152]	Universal quality index	A measure of the quality of restored images. It is based on the principle that the quality of an image can be quantified using the correlation between the original and restored images

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. J. Imaging 2023, 9, 81. https://doi.org/10.3390/jimaging9040081

AMA Style

Kebaili A, Lapuyade-Lahorgue J, Ruan S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging. 2023; 9(4):81. https://doi.org/10.3390/jimaging9040081

Chicago/Turabian Style

Kebaili, Aghiles, Jérôme Lapuyade-Lahorgue, and Su Ruan. 2023. "Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review" Journal of Imaging 9, no. 4: 81. https://doi.org/10.3390/jimaging9040081

APA Style

Kebaili, A., Lapuyade-Lahorgue, J., & Ruan, S. (2023). Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging, 9(4), 81. https://doi.org/10.3390/jimaging9040081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

Abstract

1. Introduction

2. Background

2.1. Generative Adversarial Networks

2.2. Variational Autoencoders

2.3. Diffusion Probabilistic Models

2.4. Exploring the Trade-Offs in Deep Generative Models: The Generative Learning Trilemma

2.4.1. Generative Adversarial Networks

2.4.2. Variational Autoencoders

2.4.3. Diffusion Models

3. Deep Generative Models for Medical Image Augmentation

3.1. Generative Adversarial Networks

3.2. Variational Autoencoders

3.3. Diffusion Models

4. Key Findings and Implications

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI