Generative Models for Medical Image Creation and Translation: A Scoping Review

Pang, Haowen; Zhang, Tiande; Wu, Yanan; Chen, Shannan; Qian, Wei; Yao, Yudong; Ye, Chuyang; Monkam, Patrice; Qi, Shouliang

doi:10.3390/s26030862

Open AccessReview

Generative Models for Medical Image Creation and Translation: A Scoping Review

by

Haowen Pang

^1,2,†,

Tiande Zhang

^1,3,†,

Yanan Wu

⁴

,

Shannan Chen

^1,5,

Wei Qian

^1,5,

Yudong Yao

⁶

,

Chuyang Ye

²

,

Patrice Monkam

^1,5,* and

Shouliang Qi

^1,5,*

¹

College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110016, China

²

School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China

³

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

⁴

School of Health Management, China Medical University, Shenyang 110042, China

⁵

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang 110819, China

⁶

Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2026, 26(3), 862; https://doi.org/10.3390/s26030862

Submission received: 15 December 2025 / Revised: 20 January 2026 / Accepted: 26 January 2026 / Published: 28 January 2026

(This article belongs to the Special Issue Advances in Biomedical Imaging and Sensing: Technologies, Applications, and Future Directions)

Download

Browse Figures

Versions Notes

Abstract

Generative models play a pivotal role in the field of medical imaging. This paper provides an extensive and scholarly review of the application of generative models in medical image creation and translation. In the creation aspect, the goal is to generate new images based on potential conditional variables, while in translation, the aim is to map images from one or more modalities to another, preserving semantic and informational content. The review begins with a thorough exploration of a diverse spectrum of generative models, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models (DMs), and their respective variants. The paper then delves into an insightful analysis of the merits and demerits inherent to each model type. Subsequently, a comprehensive examination of tasks related to medical image creation and translation is undertaken. For the creation aspect, papers are classified based on downstream tasks such as image classification, segmentation, and others. In the translation facet, papers are classified according to the target modality. A chord diagram depicting medical image translation across modalities, including Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Cone Beam CT (CBCT), X-ray radiography, Positron Emission Tomography (PET), and ultrasound imaging, is presented to illustrate the direction and relative quantity of previous studies. Additionally, the chord diagram of MRI image translation across contrast mechanisms is also provided. The final section offers a forward-looking perspective, outlining prospective avenues and implementation guidelines for future research endeavors.

Keywords:

generative model; medical image; generative adversarial networks; deep learning

1. Background

In recent years, deep learning has gained widespread prominence in medical image analysis [1,2,3,4,5,6,7] Within the scope of this review, we focus on one of the most compelling applications of deep learning: generative AI in medical imaging, a dynamic and rapidly advancing field of research. The rapid advancement of deep learning and computer vision over the past few decades has had profound implications across a wide range of applications, with the field of medical image generation significantly benefiting from these developments [8,9,10,11,12,13,14].

In this review, we focus on the application of generative models to medical image creation and modality translation. As shown in Figure 1, image creation aims to generate new images based on potential conditional variables. In deep learning-based image generation, a large dataset of real images is typically used to train the model initially. Subsequently, random noise or conditional inputs are utilized to produce new images. This approach is primarily employed to address challenges in medical imaging, such as data scarcity, insufficient annotations, and severe class imbalances, which are common obstacles in training robust deep learning models [15,16].

Imaging modalities, including Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Positron Emission Tomography (PET), are commonly used in clinical workflow, each providing unique structural, functional, and metabolic information [17]. Image translation aims to map images from one or more modalities to another while preserving semantic and informational content. The primary goal of medical image translation is to optimize clinical workflows, particularly in situations where traditional imaging methods are impractical due to constraints related to time, labor, or cost [18].

In this review, the terms generation, creation, synthesis, and translation are used with specific distinctions to ensure conceptual clarity. Generation is used as an umbrella term referring to all processes in which generative models are employed to produce medical images. Creation denotes the generation of new images without a direct one-to-one correspondence to existing source images. Synthesis refers to the production of images under explicit clinical or modality-related constraints, emphasizing anatomical plausibility and diagnostic relevance. Translation describes the process of transforming images from one modality or representation into another.

In this context, several key questions guide our investigation: What are the latest advancements in generative models for medical image creation and cross-modality translation? How do different generative model architectures, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models, perform in the context of medical imaging, and what are their respective strengths and limitations? Furthermore, how do advanced optimization strategies, including adversarial training, uncertainty modeling, and gradient perturbation, contribute to improving the fidelity, realism, and clinical utility of generated medical images? Finally, what are the primary evaluation metrics used to assess the quality, anatomical accuracy, and clinical applicability of these generated images, and how do these metrics align with the standards of real-world medical practice? By addressing these questions, this review aims to provide a comprehensive understanding of the current state of generative models in medical imaging, highlight emerging trends, and identify areas for future research that could further enhance the capabilities and clinical integration of these powerful technologies.

In this review, we categorize the relevant literature according to its respective applications and thoroughly examine their clinical implications. Furthermore, we explore recent trends and potential future directions in the field. To summarize, the primary contributions of our work are as follows:

1.: This review conducts a thorough review of three widely employed generative models: VAEs, GANs, and diffusion models (DMs). We outline algorithms within these generative models that have found extensive applications in the domain of medical image analysis and provide analyses thereof.
2.: This review categorizes the applications of generative models in medical image analysis into creation and translation. We present an extensive review of creation methods and classify their downstream applications into three distinct categories: classification, segmentation, and others. We classify translation methods based on the target modality.
3.: This review organizes previous studies into categories and offers practical implementation guidelines gleaned from the lessons learned in these works.

The architecture of this review is shown in Figure 2. In Section 2, we provided a comprehensive comparison with related works. In Section 3, we introduced the search methods for literature and analyzed the trend of the generated model’s publication. In Section 4, we introduced the three most used generation models, VAEs, GANs, DMs, and their variants. In Section 5, we reviewed medical image creation and classified the literature according to different downstream tasks. In Section 6, we reviewed medical image translation and classified the literature according to different target modalities. In Section 7, we summarized the application of generative models in medical image creation and translation and provided implementation guidelines, as well as limitations and future research in this review.

2. Related Works

Numerous studies have reviewed the application of generative models in medical image analysis, reflecting the rapid development and growing interest in this field. Yi et al. [19] conducted an early review of the applications of GANs in medical image analysis, covering research up to October 2018. Similar to the work of Yi et al. [19], Kazeminia et al. [20] extended this work by reviewing the applications of GANs in medical image analysis up to October 2019. Their review comprehensively categorized the use of GANs across various tasks, including medical image synthesis, segmentation, reconstruction, detection, denoising, registration, and classification.

Beyond GAN-specific reviews, Wang et al. [18] provided a broader perspective by examining deep learning-based methods for medical image translation, highlighting advancements in cross-modality image synthesis. Dayarathna et al. [17] conducted a comprehensive survey on deep learning-based medical image translation, covering research from 2018 to 2023. Their review focused on the generation of pseudo-CT, MRI, and PET images, providing a detailed overview of synthetic contrasts in medical imaging. Additionally, they summarized the most frequently employed deep learning architectures for medical image synthesis, highlighting key methodologies and their applications in cross-modality image generation.

Additionally, several studies have focused specifically on the role of GANs in medical image augmentation. Chen et al. [21], Goceri et al. [15], and Kebaili et al. [16] conducted a comprehensive and systematic review and analysis of GAN-based medical image augmentation work. Osuala et al. [22] reviewed the application of image synthesis and adversarial networks in the field of cancer imaging. Zhao et al. [23] summarized the application of GAN based on attention mechanisms in tasks such as medical image segmentation, synthesis, and detection.

While these reviews provide valuable insights into the use of GANs and related techniques across diverse medical imaging tasks, there remains a significant gap in the literature. To date, no comprehensive review focuses exclusively on the application of deep learning-based generative models for medical image creation and cross-modality translation. Given the increasing complexity of modern generative architectures, such as diffusion models, VAEs, and transformer-based models, and their transformative potential in medical imaging, a dedicated review in this area is both timely and necessary. This work aims to address this gap by systematically analyzing recent advancements in deep learning-based generative models for medical image creation and translation, with a focus on their clinical relevance, methodological innovations, and future research directions.

3. Methodology

This review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines. In line with the scope of a scoping review, no formal risk-of-bias assessment was performed. We conducted a rigorous and comprehensive literature search across multiple well-established academic databases, including Web of Science Core Collection, IEEE Xplore Digital Library, ScienceDirect, SpringerLink, and Google Scholar, to ensure the inclusion of high-quality and diverse studies on generative models for medical image creation and translation. Our search strategy was meticulously designed to capture a broad spectrum of relevant research while maintaining precision and relevance. We employed a combination of targeted keywords and phrases, such as “generative models,” “medical image synthesis,” “GAN,” “diffusion models,” and “image translation,” using Boolean operators (e.g., AND, OR) to construct complex search queries that enhanced the sensitivity and specificity of the search. To maintain a contemporary focus, we restricted the search to peer-reviewed articles published between 2018 and 2023, thereby reflecting the latest advancements and emerging trends in the field. Preprint papers were deliberately excluded from our analysis due to the absence of rigorous peer-review processes, ensuring that only validated and credible research findings were considered.

The selection process was conducted in multiple stages to uphold methodological rigor and reduce potential bias. Initially, we performed a broad screening of article titles and abstracts to identify studies potentially meeting our inclusion criteria. This was followed by a comprehensive full-text review of shortlisted articles, which was independently performed by two authors. Studies were included if they explicitly applied generative models to medical image generation, provided detailed descriptions of model architectures and training methodologies, and quantitatively evaluated model performance using established metrics. We excluded articles that lacked methodological transparency, focused solely on theoretical aspects without empirical validation, or addressed non-medical applications. Any disagreements between the two reviewers were resolved through discussion, with unresolved discrepancies adjudicated by a third author to ensure consensus and objectivity. This multi-step screening approach minimized selection bias and enhanced the reliability and reproducibility of our study identification process.

In this review, we exclude the applications of generative models in medical image denoising, reconstruction, super-resolution, registration, etc. This review focuses on modalities primarily used for clinical diagnosis, such as CT, MRI, X-Ray, and PET. These modalities can non-invasively obtain images of entire organs or systems, aiding in clinical diagnosis and treatment monitoring. In this review, we exclude imaging modalities used for studying the microscopic structure of cells and tissues, such as histology and fluorescence microscopy. These modalities are common in pathology, cell biology, and molecular biology research, and are mainly used in laboratory settings to study features at the cellular and molecular levels. They rely on tissue sections and staining techniques, making them suitable for detailed observation at the cellular and tissue levels.

Through this meticulous and systematic screening, we identified a total of 232 articles that met all predefined inclusion criteria. These articles were incorporated into our review, providing a more extensive and structured analysis compared to prior surveys on the same topic [18,19,21,23,24]. Our systematic approach not only ensures comprehensive coverage of the literature but also facilitates a critical examination of the methodologies, optimization strategies, and clinical implications of generative models in medical imaging. To provide a clear visual representation of our search and selection process, Figure 3 presents a detailed flowchart outlining each stage, from the initial identification of studies to the final inclusion. This figure also illustrates the distribution of the selected articles across different model architectures and medical imaging applications, offering valuable insights into current research trends and gaps in the field.

4. Generative Models

Generative models are designed to learn the underlying distribution of a given dataset, in order to generate new data points that resemble the original dataset [25]. These models can generate new data samples that are like the training data, but not identical. Some popular generative models include VAE, GANs, and DMs.

4.1. Variational Autoencoder

VAEs [26] have already shown promise in generating complicated nature images [27,28,29] and medical images [30,31]. As shown in Figure 4, the VAE model comprises an encoder network that transforms input data into a latent space representation and a decoder network that reconstructs new data samples from this latent space. Unlike conventional autoencoders, VAEs learn a probabilistic representation of the input data, enabling them to generate novel data samples that closely resemble the original input data [32]. VAEs can generate new medical images that are similar to the original training data, which can be used to augment training datasets and improve the performance of machine learning models. However, the images generated by VAEs tend to be blurrier compared to those generated by other generative models like GANs. This is due to the inherent nature of the VAE’s probabilistic framework, which averages over many possible outputs.

In Figure 4, x represents the input image, which is fed into the encoder to obtain two sets of encodings, namely the mean encoding

μ

and the variance encoding

σ

.

ϵ

represents the random noise encoding. By combining the original encoding with the noise encoding after weighted allocation, a new latent code

z

is obtained, which is then sent to the decoder to reconstruct the original image.

4.2. Generative Adversarial Network

As shown in Figure 5a, GAN [33] consists of two neural networks that are trained in an adversarial manner: a generator that generates ‘fake’ data samples that are indistinguishable from the real data, and a discriminator that learns to distinguish between the generated and real data samples [34]. The generator generates new data samples by transforming a low-dimensional input noise vector into a high-dimensional output space that resembles the original data. The discriminator is trained to differentiate between the generated data and the real data. These two networks are optimized through a minimax game framework, wherein the generator aims to create data that can deceive the discriminator, while the discriminator strives to correctly classify the generated data as fake [14].

As shown in Figure 5, variants of GANs have been proposed to address some of the challenges of traditional GANs. For example, conditional GANs (cGANs) add an additional input layer to the generator and discriminator networks, allowing the generator to generate data that satisfies specific conditions, such as class labels or image attributes. Similarly, deep convolutional GANs (DCGANs) use convolutional layers to learn hierarchical features from image data, improving the quality of the generated images. For image translation, Pix2Pix [35] and CycleGAN [36] are the two most commonly used models, and currently, most medical image translation models are modified based on these two models [37,38,39]. GANs and their variants have shown remarkable success in various applications. However, they can be challenging to train and require careful tuning to avoid issues such as mode collapse and instability.

4.3. Diffusion Model

Diffusion models have been applied to various fields of image generation [34]. As shown in Figure 6, the diffusion model is a type of probability generation model that gradually adds noise to the data to break the structure of the data, and then learns a corresponding reverse process to denoise, thereby learning the distribution of the original data. The forward diffusion process incrementally adds noise to the input data, progressively increasing the noise level until the data is entirely transformed into pure Gaussian noise. This process systematically disrupts the underlying structure of the data distribution. The reverse diffusion process, often referred to as denoising, is then employed to reconstruct the original data structure from the perturbed distribution. This step effectively reverses the degradation introduced by the forward diffusion process. As a result, a highly flexible and tractable generative model is achieved, capable of accurately modeling complex data distributions starting from random noise [34].

Recently, diffusion models and their variants have been applied to medical image analysis, including medical image creation [40,41], translation [42], reconstruction [43,44], denoising [45], registration [46], classification [47], and segmentation [48,49].

4.4. Hybrid Generative Models

In addition to standalone generative paradigms, recent studies have increasingly explored hybrid generative models that integrate complementary mechanisms from multiple frameworks. These hybrid approaches aim to mitigate the inherent limitations of individual models while leveraging their respective strengths. Typical hybrid designs include diffusion–GAN or diffusion–autoencoder hybrids that employ diffusion processes for global structure modeling and adversarial losses for enhancing local realism [50]. Compared with standalone generative models, hybrid approaches often demonstrate improved perceptual quality and training stability, particularly in scenarios involving limited annotated data [51]. However, these advantages come at the cost of increased architectural complexity, higher computational requirements, and more challenging optimization procedures [52,53].

4.5. Training Stability and Computational Requirements

In summary, VAEs generally exhibit stable and well-behaved training dynamics with relatively modest computational requirements, but often suffer from limited image fidelity. In contrast, GANs are capable of producing high-quality images but are known to be sensitive to hyperparameter settings and prone to training instability, which may require careful optimization and increased computational overhead. Diffusion-based models demonstrate superior robustness during training and strong generation performance, albeit at the cost of substantially higher computational complexity and longer training and inference times. Hybrid approaches aim to balance these trade-offs by combining complementary strengths of different paradigms, though they often introduce additional architectural complexity.

5. Creation

Due to the inherent structural complexity and large parameter scale of deep learning models, a significant amount of labeled data is typically required for their effective training. The acquisition of labeled medical image data heavily depends on the subjective expertise and professional judgment of radiologists [54]. Additionally, it is susceptible to issues related to image quality, leading to significant challenges such as data scarcity, insufficient annotations, and pronounced class imbalances. These limitations significantly hinder the broader adoption of deep learning models and represent a critical obstacle in the development of deep learning-based medical diagnostic systems [15].

Medical image data augmentation serves as a technique employed to augment the quantity and diversity of available medical images for training machine learning models [16]. Traditional data augmentation techniques include methods such as image quality enhancement, adjustments to brightness or contrast, and geometric transformations like rotation, scaling, and deformation [15]. The ascendancy of deep learning-based generative models in the generation of data has garnered substantial attention. Within the domain of medical image analysis, the utilization of deep learning-based generative models for the generation of medical image data assumes paramount significance. This approach can simulate a substantial volume of challenging-to-obtain medical image data, effectively mitigating the adverse impact of data scarcity on the domain of medical image analysis [22].

In this section, we summarize the application of generative models in medical image creation. We review the literature based on downstream tasks, namely classification tasks, segmentation tasks, and other tasks. As shown in Figure 7a, it is used for creating medical image data for classification tasks. Specifically, various classes of medical images are created from random noise, and then the created data is used to train a classification model. Figure 7b is used for creating medical image data for segmentation tasks. Medical images are created from segmentation masks, and then the created data and masks are used to train a segmentation model. Figure 7c is used for creating medical image data for other downstream tasks, such as regression, object detection, and survival prediction.

5.1. Metrics of Medical Image Creation

In order to verify the performance of the proposed medical image creation method, it is necessary to use metrics to evaluate the similarity between the generated image. Table 1 lists several commonly used image similarity evaluation metrics.

P_{r}

denotes the real image distribution,

P_{g}

denotes the generated image distribution.

p_{M} (y | x)

denotes the label distribution of

x

as predicted by

M

, and

p_{M} (y) = \int p_{M} (y | x) d P_{g}

.

p_{M} (y^{*}) = \int p_{M} (y | x) d P_{r}

is the marginal label distribution for the samples from the real data distribution.

Γ (P_{r}, P_{g})

denotes the set of all joint distributions (i.e., probabilistic couplings) whose marginals are respectively

P_{r}

and

P_{g}

, and

d (x_{r}, x_{g})

denotes the base distance between the two samples.

The Inception Score (IS) uses a pre-trained Inception-v3 model to compute the KL-divergence between the predicted class distributions of generated images and their overall diversity [55]. A higher IS indicates better image quality and diversity. The Mode Score (MS) adds a measure of similarity between the probability distributions of generated samples and real samples based on the IS. Kernel Maximum Mean Discrepancy (MMD) quantifies the difference between the probability distributions of generated samples and real samples using a fixed kernel function. Wasserstein Distance (WD) serves as a metric to evaluate the similarity between two distributions, where a smaller WD indicates greater similarity. Fréchet Inception Distance (FID) computes the Wasserstein-2 distance between the distributions of feature vectors derived from generated and real images, utilizing a pre-trained Inception-v3 model for feature extraction [55]. A lower FID indicates better image quality.

5.2. Classification

In recent years, there have been notable developments in medical image classification techniques, driven by advancements in deep learning algorithms [56]. However, several challenges and limitations remain. First, acquiring large-scale medical image datasets is often difficult due to privacy concerns, limited accessibility, and ethical constraints. Second, training medical image classification models necessitates the involvement of expert radiologists, pathologists, or clinicians to manually annotate the images with appropriate labels or categories. This annotation process is not only labor-intensive but also requires specialized expertise, creating significant barriers to effectively training deep learning models. Third, medical datasets frequently exhibit class imbalance, where certain disease categories are underrepresented compared to others, further complicating model training and evaluation [57]. Detecting rare diseases or conditions with limited training samples poses a challenge, as models tend to favor the majority classes during training.

In this section, we undertake a comprehensive review of the pertinent literature about medical image creation for classification. We compile essential information from the literature and present it in Table 2.

Table 2 provides a comprehensive overview of 27 literature sources, with most of them being based on GAN. Among these 27 sources, the highest number of publications is focused on the chest, with 14 of them specifically targeting chest-related studies. The most common application is in the generation of X-ray and CT images. Additionally, most of the literature is based on 2D models, possibly due to limitations in GPU memory.

Pesteie et al. [30] introduced a variational generative model to learn the probability distribution of image data conditioned on latent variables and corresponding labels. The trained model is employed to generate new images for data augmentation. The efficacy of this approach is demonstrated through its application to ultrasound images (US) of the spine and brain MRI. This model resulted in a notable enhancement in the accuracy of the classification task.

Salehinejad et al. [61] proposed a DCGAN to create chest X-rays. They utilized both real and created images to train a model for the detection of pathology across five classes of chest X-rays. A comparative analysis of DCNNs trained with a mixture of real and created images revealed that the model outperformed its counterparts trained exclusively with real images.

Pan et al. [40] proposed an image creation framework based on a diffusion model utilizing a Swin-transformer-based network. This model encompasses a forward Gaussian noise process and a reverse process employing the transformer-based diffusion model for denoising. COVID-19 classification models were trained using real images, created images, and combinations of both.

Applying generative model–based data augmentation to medical image classification has been extensively explored as a strategy to mitigate data scarcity and class imbalance. Existing studies suggest that generative models can approximate the underlying data distribution and produce samples that resemble real medical images, which may enhance classification performance when incorporated into the training set [83].

In particular, generative augmentation has been shown to be beneficial in scenarios involving limited training data or severe class imbalance [84]. Several studies report measurable improvements in accuracy and AUC when generated samples are used to augment minority classes, especially for rare disease categories or small-scale datasets. By increasing the effective sample size and improving class balance, generative models can help reduce bias toward majority classes during training. Beyond dataset expansion, generative models can introduce controlled intra-class variability by creating samples with diverse appearances, shapes, or textures. This diversity may facilitate the learning of more robust and discriminative features, thereby improving generalization to unseen data. Such benefits are more likely to be observed when the generated images are anatomically consistent and preserve clinically relevant label information. However, the effectiveness of generative model–based data augmentation is highly task- and data-dependent. Empirical evidence indicates that performance gains are not guaranteed. Several studies report marginal improvements or even performance degradation when generated images contain subtle artifacts, blur class-discriminative structures, or introduce distributional shifts relative to real data. These issues are particularly pronounced in fine-grained classification tasks, where minor anatomical differences carry critical diagnostic significance. Moreover, generative augmentation tends to offer limited benefits when sufficient real training data are available. In such cases, classifiers may overfit to generated patterns rather than learning robust representations from authentic clinical images. Importantly, increasing the proportion of generated data does not necessarily result in monotonic performance improvements; multiple studies have observed performance saturation or decline when generated samples dominate the training set.

Overall, generative model–based data augmentation should be viewed as a complementary tool rather than a universal solution for medical image classification. Its effectiveness depends on the quality of the generated samples, the characteristics of the target task, and the balance between generated and real data. Careful empirical validation is therefore essential to determine when generative augmentation provides meaningful performance gains and when it may compromise classification reliability.

5.3. Segmentation

Developing a medical image segmentation model necessitates the expertise of radiologists or clinical professionals to manually annotate the images, thereby establishing ground truth data that serves as a reference for training and evaluating the segmentation model [85]. Manual annotation is a time-intensive, subjective process reliant on expert knowledge, rendering the task of constructing extensive and diverse datasets a formidable endeavor.

The generative models provide the images and masks required for training medical image segmentation models by converting masks into generated images. This approach significantly mitigates the demand for annotated data. In this section, we embark on an exhaustive review of the pertinent literature, which we present comprehensively in Table 3.

Table 3 offers a comprehensive overview of 26 literature sources, with the majority of them centering on GAN. Similar to the emphasis on data creation for classification tasks, a significant number of publications focus on chest and lung-related topics. The most prevalent applications involve the generation of X-ray, CT, and ultrasound images. As with creation for classification, most of the literature is based on 2D models.

Guo et al. [110] introduced a confidence-guided generation of anatomic and molecular MR image networks (CG-SAMR) that enables the generation of data by leveraging lesion contour information into multi-modal MR images. Additionally, they extended the proposed architecture to support training with unpaired data. The generated data proves valuable for data augmentation, especially in the context of images containing pathological information related to gliomas.

Zhang et al. [94] presented an improved Dense GAN for data augmentation. They harnessed the power of Dense GAN to generate CT images, facilitating effective semi-supervised segmentation.

Amirrajab et al. [95] proposed a method for generating cardiac MR images with plausible heart shapes and appearances to create labeled data. The approach dissects image generation into two tasks: label deformation and label-to-image translation. Label deformation is achieved through latent space interpolation within the VAE model, while label-to-image translation is accomplished using a conditional GAN.

5.4. Other Tasks

In addition to classification and segmentation tasks, there are other tasks in the field of medical image analysis, such as regression, object detection, and survival prediction. There are currently many proposed data augmentation methods based on generative models for these tasks. We compile essential information from the literature and present it in Table 4. In addition, we also collected some literature without specified downstream tasks, and they are all listed in Table 4.

Han et al. [112] introduced a 3D Multi-Conditional GAN (MCGAN) to generate nodules on lung CT images to enhance sensitivity in object detection. The MCGAN incorporates two discriminators: the context discriminator and the nodule discriminator. The results demonstrate that 3D CNN-based detection achieves increased sensitivity for nodules of any size or attenuation at fixed false positive rates, effectively addressing the scarcity of medical data by leveraging MCGAN-generated realistic nodules.

Kamli [113] proposed a Synthetic Medical Image Generator (SMIG) with the primary aim of generating MRI using a GAN to provide anonymized data. Furthermore, to predict the growth of glioblastoma multiform tumors, the authors developed a tumor growth predictor. The authors emphasized the significance of employing data generated by SMIG. Despite the limited dataset size available from the public dataset, the results demonstrate valuable accuracy in predicting tumor growth.

Li et al. [121] introduced DeepAnat, a method to generate high-quality T1 images from diffusion MRI and to perform brain segmentation on generated T1 images and assist co-registration using generated T1 images. This study underscores the advantages and practical feasibility of creating medical images to support various diffusion MRI data analyses and their utility in neuroscientific applications.

6. Translation

Medical image modality translation refers to the process of converting images from one or more modalities into different modalities [122], such as transforming from CT to MRI or from T1 and T2 to FLAIR. Medical image modality translation proves invaluable when medical imaging data is scarce or when patients cannot undergo specific imaging modalities due to medical or technical constraints. Modality translation empowers medical professionals and researchers to access more comprehensive information about a patient’s medical condition, enhancing the accuracy of diagnosis and treatment planning [18].

Conventional methods entail the utilization of models with predefined rules to effectuate the conversion of images from one modality to another. These models necessitate manual parameter adjustments to achieve optimal performance and are often tailored to specific applications, contingent upon the distinctive characteristics of the involved imaging modality [18]. Consequently, numerous intricate and application-specific techniques have been developed. However, these methods confront challenges when the two imaging modalities provide disparate information, rendering the establishment of an effective model a formidable undertaking.

In tandem with the advancement of deep learning, an increasing array of modality translation methods grounded in deep learning principles has emerged. Deep learning-based generative models, exemplified by GANs and diffusion models, have exhibited tremendous potential in the domain of medical image modality translation [20]. They excel by acquiring the capability to learn the mapping between different modalities and generating high-quality images.

In this section, we classify the collected modality translation literature according to the target modality. In Figure 8, the literature quantity is shown for translations between CT, MRI, CBCT, X-ray, PET, and ultrasound images. The number of studies of the six source modalities is in the order of MRI (63), CT (18), CBCT (15), PET (7), X-ray (3), and US (1). The number of studies of the six targeted modalities is in the order of CT (78), MRI (18), PET (6), X-ray (4), and US (1). There are four kinds of translations worthy of attention. The first is the translation from MRI to CT (59 studies), primarily focusing on dose calculation for MRI-guided radiation therapy. The second is the translation from CT to MRI (13 studies), primarily aiming for more accurate segmentation. The third is the translation from CBCT to CT, with the main objective of 12, primarily serving the objectives of image denoising and dose calculation. The fourth is to translate PET to CT specifically for attenuation correction.

In addition, the translation between non-contrast images and contrast images has also been a research hotspot in recent years, and we will separately organize them in Section 6.7.

6.1. Metrics of Medical Image Translation

In order to verify the performance of the proposed modality translation method, it is necessary to use metrics to evaluate the similarity between the synthesized image and the real image. Table 5 lists several commonly used image similarity evaluation metrics.

Mean Absolute Error (MAE) provides a straightforward, easy-to-interpret measurement of error. It gives equal weight to all errors, regardless of their magnitude, making it less sensitive to outliers. Mean Squared Error (MSE) gives a more significant penalty to large errors compared to MAE, which can be desirable in some contexts. It is also widely used and mathematically convenient. Peak Signal-to-Noise Ratio (PSNR) is based on MSE and shares some of its limitations [123]. It does not always align with human perception, especially for complex images or artifacts like blockiness. Structural Similarity Index (SSIM) is designed to measure the similarity between two images in terms of luminance, contrast, and structure, which aligns better with human perception [124]. It is often considered more accurate than PSNR for evaluating image quality. In summary, MAE and MSE are simple and widely used metrics that are easy to compute but may not always align with human perception. PSNR is also widely used and easy to interpret, but may not correlate well with perceptual quality. SSIM, on the other hand, is more aligned with human perception but can be more computationally expensive. Choosing the right metric depends on the specific requirements of the application and the aspects of image quality that are most important.

In the equations in Table 5,

x_{i}

and

y_{i}

are the pixel values of position

i

in the image

x

and

y

, respectively. MAX is the maximum possible pixel value.

μ_{x}

and

μ_{y}

are the mean value of image

x

and

y

, respectively.

σ_{x}^{2}

and

σ_{y}^{2}

are the variance of image

x

and

y

, respectively.

σ_{x y}

is the covariance of image

x

and

y

.

C_{1}

and

C_{2}

are constants.

6.2. Generating MRI

6.2.1. Multi-Contrast MRI Translation

MRI stands as a non-invasive medical imaging technique utilizing a potent magnetic field and radio waves to generate intricate images of internal organs and tissues within the human body [38]. Various MRI modalities, including T1-weighted (T1w), T2-weighted (T2w), Diffusion-Weighted Imaging (DWI), Magnetic Resonance Angiography (MRA), and Fluid-Attenuated Inversion Recovery (FLAIR), offer distinctive characteristics and applications. In tumor analysis, T1-weighted scans excel at differentiating gray and white matter in brain images, while T2-weighted images enhance the contrast between fluid and cortical tissue. FLAIR (Fluid-Attenuated Inversion Recovery) sequences are particularly effective in suppressing cerebrospinal fluid signals, improving lesion visibility. T1 contrast-enhanced (T1ce) images are valuable for delineating tumor regions in brain scans. Magnetic Resonance Angiography (MRA) is primarily used to evaluate vascular anatomy and detect abnormalities that may predispose to hemorrhages. Proton density (PD) images are widely utilized in radiology for inferring water content, aiding in lesion classification, and multispectral segmentation. The integration of these multimodal MRI scans provides complementary information, with each modality offering unique insights into the body’s internal structures and functions. Together, they deliver a comprehensive assessment of the patient’s condition [38].

In some cases, it may be difficult to collect complete modalities for medical image analysis due to factors such as the cost of long-term examinations and uncooperative patients, particularly children and the elderly [125]. In such situations, synthesizing missing or damaged modalities using successfully acquired modalities can improve the availability of diagnosis-related images and enhance analysis tasks such as classification and segmentation. In recent years, with the development of deep learning based generative models, there has been an increasing amount of work on the translation between MRI modalities. Table 6 lists essential information about these works.

In Figure 9a, based on the number of studies, the primary translations can be ranked as T1-to-T2 (17), T2-to-T1 (13), T1-to-FLAIR (10), T2-to-PD (7), T2-to-FLAIR (6), PD-to-T2 (6), FLAIR-to-T1 (4), FLAIR-to-T2 (4), T1-to-PD (2), PD-to-T1 (2), T2-to-DWI (2), and DWI-to-T2 (1). In Figure 9b, according to the number of studies, the main translations can be ranked as (T1, FLAIR)-to-(T2, T1ce) (4), (T1, T1ce)-to-(T2, FLAIR) (4), (T1, T2)-to-(T1ce, FLAIR) (4), (T1ce, FLAIR)-to-(T1, T2) (4), (T2, FLAIR)-to-(T1, T1ce) (4), and (T2, T1ce)-to-(T1, FLAIR) (4). The fundamental objective of MRI image translation across contrast mechanisms is to avoid the acquisition of actual scans and provide the unavailable MRI modality necessary for diagnosis and treatment.

In most cases, as the number of source modalities increases, the model’s performance tends to improve. On the same dataset, the performance of multi-to-single translation is superior to single-to-single translation. This is because more modalities can provide complementary information to each other, leading to a more realistic target modality. However, when the number of source modalities remains the same, and the number of target modalities increases, no fixed trend in performance has been observed.

As shown in Table 6, there are several widely used datasets in cross-modality MRI translation, such as IXI, BraTS, and ISLES. The IXI dataset comprises nearly 600 MRIs obtained from normal and healthy subjects. The MRI acquisition protocol for each subject includes a comprehensive set of sequences: T1, T2, PD, MRA, and DWI. These data have been collected across three different hospitals. The Brain Tumor Segmentation (BraTS) dataset is a widely recognized and frequently used collection of medical images specifically designed for brain tumor research, particularly in the field of medical image analysis and machine learning. The dataset includes multimodal brain MRI scans, typically comprising T1, T1ce, T2, and FLAIR images. The Ischemic Stroke Lesion Segmentation (ISLES) challenge is dedicated to evaluating infarct segmentation in both acute and sub-acute stroke cases, leveraging multimodal MRI data. The inaugural ISLES challenge, held in 2015, was divided into two sub-challenges: Sub-acute Stroke Lesion Segmentation (SISS) and Stroke Perfusion Estimation (SPES). SISS aimed to segment subacute stroke lesions using conventional post-stroke MRI sequences, including T1, T2, FLAIR, and DWI. The ISLES 2018 challenge focused on predicting infarct core delineation in DWI using CT perfusion data. The primary objective of the ISLES 2022 challenge is to segment stroke lesions from DWI, ADC, and FLAIR sequences, with a dataset comprising 400 cases.

Currently, most methods are based on GANs, and most of these methods utilize 2D network architecture, possibly due to memory constraints. Furthermore, algorithms that require paired data for training are more prevalent than those that can use unpaired data because paired images can provide better supervision, leading to improved model performance.

Salman et al. [38] proposed pGAN and cGAN for multi-contrast MRI translation, leveraging conditional GANs. The proposed approach preserves intermediate-to-high frequency details through an adversarial loss, providing enhanced synthesis performance using pixel-wise and perceptual losses for registered multi-contrast images and a cycle-consistency loss for unregistered images.

Zhou et al. [133] introduced a Hybrid-fusion Network (Hi-Net) for multi-modal MRI translation, which learns a mapping from multi-modal source images to target images. In Hi-Net, a modality-specific network is employed to learn representations for each individual modality, and a fusion network is utilized to learn the common latent representation of multi-modal data. Subsequently, a multi-modal translation network is designed to densely combine the latent representation with hierarchical features from each modality, acting as a generator to synthesize the target images.

Muzaffer et al. [42] proposed SynDiff, employing an adversarial diffusion model for multi-contrast MRI translation. To capture a direct correlate of the image distribution, SynDiff utilizes a conditional diffusion process that progressively maps noise and source images onto the target image. For efficient and accurate image sampling during inference, large diffusion steps are taken with adversarial projections in the reverse diffusion direction.

6.2.2. Generating MRI from Other Modalities

In this section, we summarize the papers on the translation from non-MRI modalities to MRI. The number of papers on CT-to-MRI is the highest. Table 7 lists essential information about these works.

Wang et al. [173] introduced a bidirectional learning model, denoted as dual contrast CycleGAN (DC-CycleGAN), designed to synthesize MRI from CT. Specifically, a dual contrast loss is incorporated into the discriminators to indirectly establish constraints between real source and synthetic images. This is achieved by leveraging samples from the source domain as negative samples, enforcing the synthetic images to diverge significantly from the source domain. Additionally, cross-entropy and the structural similarity index (SSIM) are integrated into the DC-CycleGAN to consider both the luminance and structure of samples during image translation.

Lei et al. [175] proposed a method for generating MRIs with superior soft-tissue contrast from CBCT images to aid CBCT segmentation. The entire segmentation process comprises three major steps. Firstly, CycleGAN is utilized to estimate a synthetic MRI (sMRI) from CBCT images. Secondly, a deep attention network is trained based on sMRI and its corresponding manual contours. Finally, segmented contours for a query patient are obtained by feeding the patient’s CBCT images into the trained sMRI estimation and segmentation model.

Bazangani et al. [178] proposed a separable convolution-based Elicit Generative Adversarial Network (E-GAN). The architecture can generate a 3D T1-weighted MRI corresponding to FDG-PET.

6.3. Generating CT

CT is a potent medical imaging technique that employs X-ray technology and computer processing to generate cross-sectional images of the human body. CT delivers highly detailed cross-sectional views of internal structures, allowing for precise examination and analysis of anatomical features, organs, and bones [180]. CT scanning plays a pivotal role in diagnosing a wide array of medical conditions, including traumatic injuries like fractures and internal hemorrhaging, as well as the detection and assessment of tumors, vascular disorders like aneurysms and blockages, lung diseases such as pneumonia and cancer, and neurological disorders like strokes, brain tumors, and related conditions [181,182].

However, it is imperative to consider potential risks associated with CT scans due to their use of ionizing radiation, particularly when repeated imaging is necessary [183]. Furthermore, CT serves as the primary imaging modality for radiation therapy, as it provides essential electron density data for dose calculations. While MRI excels in visualizing soft tissues and tumors, it lacks the tissue attenuation information required for accurate dose calculations in radiation therapy. The utilization of generative models to translate MRI into CT images is pivotal in enabling MRI-only radiotherapy, which can yield cost savings, reduce patient radiation exposure, and eliminate registration errors associated with using two distinct imaging modalities [184].

Cone Beam Computed Tomography (CBCT) represents an advanced medical imaging technique widely applied in fields such as dentistry and maxillofacial radiology [185]. CBCT employs a cone-shaped X-ray beam and a specialized detector to produce high-resolution, three-dimensional images of specific regions of interest within the human body, primarily focusing on the craniofacial area. Notably, CBCT offers the advantage of lower radiation doses, enhancing patient safety, while still delivering exceptional image clarity for detailed visualization of anatomical structures like teeth, bones, and soft tissues. However, CBCT does have inherent limitations, including lower contrast for soft tissues and reduced spatial resolution compared to conventional CT. Additionally, CBCT is more susceptible to metal artifacts, potentially compromising image quality when scanning patients with dental restorations or implants. Therefore, the development of generative models to translate CBCT images into CT is of considerable significance [186].

In this section, we provide a comprehensive summary of research papers related to the translation from various imaging modalities, including CBCT, MRI, PET, and X-Ray, into CT. Table 8 lists essential information about these works.

Zhang et al. [186] decomposed CBCT-to-CT translation into artifact reduction and intensity correction. They proposed a Multimodal Unsupervised Representation Disentanglement (MURD) learning framework that disentangles content, style, and artifact representations from CBCT and CT images in the latent space. MURD can synthesize different forms of images by recombining disentangled representations. Additionally, they introduced a multipath consistency loss to enhance structural consistency in synthesis and a multidomain generator to improve synthesis performance.

Dong et al. [196] proposed a 3D CycleGAN framework to synthesize CT images from non-attenuation corrected PET (NAC PET). The method learns a transformation that minimizes the difference between sCT, generated from NAC PET, and true CT. It also learns an inverse transformation such that the cycle NAC PET image generated from the sCT is close to the true NAC PET image.

Zhou et al. [248] proposed a multimodality MRI synchronous construction-based deep learning framework from a single T1-weighted image for MRI-guided radiation therapy (MRIgRT) synthetic CT (sCT) image generation. The network is primarily based on a GAN with sequential subtasks of intermediate synthetic MRI generation and joint sCT image generation from the single T1 MRI. It comprises a multitask generator and a multibranch discriminator, where the generator consists of a shared encoder and a split multibranch decoder.

6.4. Generating X-Ray Image

In this section, we provide a comprehensive summary of research papers related to the translation from various imaging modalities into X-rays. A summarized overview of these works is presented in Table 9, highlighting essential information for reference.

Yuen et al. [252] introduced a CT-based Chest X-ray (CXR) synthesis framework, named ct2cxr, for data augmentation in pneumonia classification. Leveraging GANs and a customized loss function tailored for model training, the approach aims to preserve target pathology and maintain high image fidelity. The results indicate that CXR images generated through style mixing enhance the performance of general pneumonia classification models. Evaluation on a COVID-19 dataset demonstrates similar improvements over baseline models.

Huang et al. [250] proposed a sigmoid-based intensity transform, utilizing the nonlinear optical properties of X-ray films, to enhance image contrast in synthetic cephalograms generated from 3D volumes. Super-resolution deep learning techniques are explored to improve image resolution. For low-dose purposes, Pix2pix is introduced for 2D cephalogram synthesis directly from two cone-beam projections. An efficient automatic landmark detection method for synthetic cephalograms is proposed, combining LeNet5 and ResNet50.

Shen et al. [253] proposed a strategy for obtaining X-ray projection images at novel view angles without the need for actual projection measurements. Specifically, a Deep Learning-based Geometry-Integrated Projection Synthesis (DL-GIPS) framework is proposed for generating novel-view X-ray projections. The deep learning model extracts geometry and texture features from a source-view projection, then performs geometry transformation on the extracted features to accommodate the change in view angle. In the final stage, the X-ray projection in the target view is synthesized from the transformed geometry and shared texture features via an image generator.

6.5. Generating PET Image

Positron Emission Tomography (PET) is a powerful medical imaging technique. It is based on the principle of detecting and visualizing the distribution of positron-emitting radionuclides within the body [254]. PET imaging has a wide range of clinical applications. PET is used to detect and stage various types of cancers by highlighting areas with increased metabolic activity. PET is valuable in studying brain function and diagnosing conditions such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. PET can assess blood flow and myocardial viability, helping in the evaluation of heart conditions, including coronary artery disease and myocardial infarction. PET is used to identify sites of infection or inflammation in the body, which can aid in the diagnosis and monitoring of infectious diseases and inflammatory disorders. PET allows scientists to study various physiological processes, develop new drugs, and better understand diseases at the molecular level. PET provides functional and metabolic information, complementing the structural information obtained from techniques like CT and MRI [255]. It can detect diseases at an early stage when structural changes may not yet be apparent. PET has high sensitivity and specificity, making it a valuable tool for accurate disease detection and treatment monitoring [256].

Of course, PET also has some limitations. PET involves exposure to ionizing radiation due to the use of radiopharmaceuticals. It requires specialized equipment and trained personnel. PET scans may be expensive compared to some other imaging modalities. So, there is currently some work dedicated to converting other commonly used medical image modalities, such as MRI and CT, into PET. In this section, we provide a comprehensive summary of research papers related to the translation from CT or MRI into a PET image. A summarized overview of these works is presented in Table 10, highlighting essential information for reference.

Hu et al. [254] introduced a 3D end-to-end translation network named Bidirectional Mapping GAN (BMGAN) for brain MR-to-PET translation, effectively utilizing image contexts and latent vectors. The proposed bidirectional mapping mechanism is designed to embed the semantic information of PET images into the high-dimensional latent space. Furthermore, the architecture includes a 3D Dense-UNet generator and hybrid loss functions to enhance the visual quality of cross-modality synthetic images.

Ben-Cohen et al. [256] combined a fully convolutional network (FCN) with a conditional GAN to simulate PET data from input CT data. From a clinical perspective, such solutions may facilitate lesion detection and drug treatment evaluation in a CT-only environment, potentially reducing the need for more expensive and radioactive PET/CT scans.

6.6. Generating Ultrasound Image

Ultrasound imaging is a non-invasive medical imaging technique that uses high-frequency sound waves to create real-time, dynamic images of the internal structures of the human body [260]. These images, known as ultrasound scans or ultrasound images, are valuable in medical diagnosis, monitoring pregnancies, and guiding various medical procedures.

Ultrasound imaging does not involve radiation or invasive procedures. It provides dynamic, real-time images, making it suitable for observing movement and function. Ultrasound is safe for pregnant women, infants, and individuals with contraindications to other imaging methods. Ultrasound machines come in various sizes, including handheld devices, making them highly portable for use in different clinical settings.

Grimwood et al. [261] proposed the use of CycleGAN to create synthetic Endoscopic ultrasound (EUS) images from CT data, which can be used as a data augmentation strategy when EUS data is scarce.

6.7. Non-Contrast and Contrast-Enhanced Image

Non-contrast-enhanced medical imaging entails the acquisition of images without the administration of contrast agents. This imaging modality relies on the inherent contrast of natural tissues to visualize anatomical structures and identify potential abnormalities. Non-contrast imaging is commonly employed for routine screenings, initial assessments, and follow-up examinations. It is considered a safer option for patients with contraindications or allergies to contrast agents. Nonetheless, there are scenarios where non-contrast imaging may be limited, and the use of contrast-enhanced imaging could offer additional diagnostic insights.

Contrast-enhanced medical imaging involves the introduction of contrast agents, typically through intravenous administration, to enhance the visualization of specific anatomical structures or physiological processes [262]. These contrast agents contain substances that augment the visibility of blood vessels, organs, tumors, or regions with altered perfusion. Contrast-enhanced imaging proves particularly valuable in situations where non-contrast imaging may not provide adequate diagnostic information. For instance, in contrast-enhanced CT scans, iodine-based contrast agents are intravenously injected to accentuate blood vessels, tumors, and regions with abnormal blood flow, thereby improving the detection and characterization of lesions, vascular anomalies, and tumors. In contrast-enhanced MRI, gadolinium-based contrast agents are commonly utilized to enhance the visualization of blood vessels, brain tumors, and areas with compromised blood–brain barrier integrity, making it indispensable in neuroimaging and the diagnosis of conditions such as multiple sclerosis [263].

Contrast-enhanced imaging plays a pivotal role in diagnosing and characterizing various medical conditions, including tumors, vascular irregularities, inflammation, and ischemia. It furnishes critical insights into the dynamic behavior of tissues, enhancing the specificity and sensitivity of imaging investigations. However, certain patients may not be eligible for contrast agent injections due to various factors. To address this challenge, generative models can be employed to translate non-contrast-enhanced images into contrast-enhanced images [264].

In this section, we provide an overview of research papers focused on the translation between non-contrast-enhanced images and contrast-enhanced images. Table 11 presents essential details from these studies, offering a valuable reference for further exploration of this topic.

Zhao et al. [267] introduced a Tripartite Generative Adversarial Network (Tripartite-GAN) for synthesizing contrast-enhanced MRI (CEMRI) to detect tumors without the need for contrast agent injection. The Tripartite-GAN comprises three interconnected networks—an attention-aware generator, a convolutional neural network-based discriminator, and a region-based convolutional neural network-based detector. This integrated framework facilitates the synthesis of CEMRI and tumor detection, with the generator aiding accurate tumor detection by synthesizing tumor-specific CEMRI and the detector enhancing the generator for precise CEMRI synthesis through back-propagation.

Chen et al. [274] proposed a deep learning-based approach for contrast-enhanced T1 synthesis in brain tumor patients. A 3D high-resolution FCN designed to maintain high-resolution information and aggregate multi-scale information in parallel is employed to map pre-contrast MRI sequences (T1, T2, and ADC) to CEMRI sequences. To address data imbalance between normal tissues and tumor regions, a local loss is introduced to enhance the contribution of tumor regions, resulting in improved tumor enhancement.

Ristea et al. [278] presented a novel approach for translating NCCT scans to CECT scans and vice versa. The approach, named CyTran (cycle-consistent generative adversarial convolutional transformers), is trainable on unpaired images due to the integration of a multi-level cycle consistency loss. In addition to the standard cycle-consistency loss at the image level, additional cycle-consistency losses between intermediate feature representations are proposed, enforcing cycle-consistency at multiple representation levels, and leading to superior results. To handle high-resolution images, a hybrid architecture based on convolutional and multi-head attention layers is designed.

7. Discussion

This review provides a comprehensive summary of prior research on the utilization of generative models in the domain of medical image analysis. Through a synthesis of relevant literature, we categorize the applications of generative models in medical image analysis into two main segments: creation and translation. Building upon the diverse application scenarios of generative models in medical image analysis, this paper organizes previous studies into categories and offers practical implementation guidelines gleaned from the lessons learned in these works.

In Section 4, we conduct a thorough review of three widely employed generative models: VAEs, GANs, and diffusion models. We outline algorithms within these generative models that have found extensive applications in the domain of medical image analysis and provide analyses thereof.

In Section 5, we present an extensive review of creation methods. Depending on the downstream tasks, we classify creation’s downstream applications into three distinct categories: classification, segmentation, and others. Among these, 27 studies focus on classification tasks, 26 studies on segmentation, and 11 studies on various other tasks. Our literature review consistently indicates that across various downstream tasks, data augmentation methods grounded in generative models consistently result in enhanced model performance, particularly when dealing with limited annotation resources.

Section 6 classifies translation methods based on the target modality. For the MRI modality, we identify 61 studies, with 42 studies primarily centered on inter-modal translation within MRI, particularly concentrated on brain images, while 19 studies encompass modalities such as CT, PET to MRI translation. Additionally, there are 77 studies for CT, 5 for X-ray, 6 for PET, and 1 for US, respectively. Furthermore, we conduct a separate analysis of 19 studies involving non-contrast-enhanced and contrast-enhanced image translations.

Our comprehensive literature review underscores the notable advancements in GANs over recent years, with the majority of translation tasks predominantly relying on GAN-based methodologies. Furthermore, the introduction of DDPM has witnessed an increasing number of diffusion models being employed for translating medical images across different modalities. The remarkable image generation capabilities of diffusion models have significantly elevated the quality of synthesized images, albeit the inherent challenge of slow inference speed remains a critical concern [34].

Given the promising and rapidly evolving nature of medical image generation research, alongside the ongoing exploration of optimal image generation algorithms, researchers are encouraged to not only fine-tune strategies and pre-trained weights but also systematically investigate self-supervised learning techniques across various categories within their medical image datasets [39]. Additionally, testing newly developed strategies on multiple datasets, ideally encompassing diverse modalities and medical imaging domains, is recommended to foster a more comprehensive understanding of their potential and limitations.

7.1. Implementation Suggestion

Given the rapidly evolving nature and significant practical implications of medical image generation and translation, along with the increasing prominence of diffusion models, the pursuit of an optimal medical image generative model remains an ongoing challenge. In response, we have conducted a thorough survey and detailed comparative analysis of prior research. Our goal is to offer researchers a comprehensive set of implementation guidelines to support their exploration of methodologies in medical image generation and translation.

7.1.1. Unified Model or Task-Specific Model?

Unified models refer to generative architectures designed to perform multiple medical image generation tasks using a shared network structure [38,139,147]. In contrast, task-specific models are optimized for a single generation task or imaging scenario and are typically tailored to particular modalities, anatomical regions, or downstream objectives [160,161,162]. Task-specific designs can achieve strong performance within their targeted scope but may exhibit limited generalizability when applied to substantially different tasks. Despite increasing interest in unified generative frameworks, drawing definitive conclusions regarding their superiority over task-specific models remains challenging. The selection of an appropriate generative modeling strategy is influenced by multiple factors, including dataset size, imaging modality, anatomical complexity, and the requirements of downstream clinical tasks [17,34].

Within the scope of this review, only a limited number of studies have rigorously evaluated generative models across multiple imaging modalities and anatomical regions. The majority of existing work focuses on single-modality or single-organ settings, which limits the ability to assess model generalization in diverse clinical scenarios. In practice, substantial variations across imaging modalities, organs, tissues, and pathological conditions pose significant challenges for unified models, as a single architecture may struggle to capture highly heterogeneous data distributions with consistent performance. Current empirical evidence does not support a definitive preference for either unified or task-specific generative models. Instead, available studies suggest that task-specific tailoring may offer practical advantages in scenarios where imaging characteristics or anatomical structures differ substantially. However, this observation should be interpreted as a practical consideration rather than a prescriptive recommendation. Systematic benchmarking across multiple modalities and anatomical regions is required to establish clear guidelines on when unified models can effectively generalize and when task-specific designs are more appropriate.

7.1.2. GAN or Diffusion Model?

Since their introduction in 2014, GANs have been widely adopted in medical image generation. Their popularity can largely be attributed to the adversarial learning paradigm, which enables GANs to generate high-fidelity images and has consistently demonstrated superior perceptual quality compared to earlier generative approaches such as VAEs and flow-based models [282]. Nevertheless, GANs are known to suffer from training instability, with performance being sensitive to hyperparameter selection, network architecture, and regularization strategies. Despite these challenges, GAN-based methods remain the most extensively used generative framework in medical image generation to date [21].

The generative modeling landscape shifted notably with the introduction of DDPMs in 2020 [52]. Diffusion models have demonstrated strong theoretical properties, including stable training dynamics and improved mode coverage, and have achieved state-of-the-art performance in several natural image generation benchmarks. Empirical studies suggest that diffusion models are capable of capturing a broader range of sample diversity compared to GANs, while maintaining high structural fidelity [283]. However, this advantage is accompanied by increased computational cost, as diffusion models typically require multiple iterative denoising steps during sampling, resulting in slower inference compared to GAN-based approaches.

This trade-off reflects a broader generative modeling dilemma, as discussed by Kazerouni et al. [34]. GANs excel at fast generation and high visual fidelity but often struggle with limited mode coverage, whereas VAEs and normalizing flows favor diversity at the expense of perceptual quality. Diffusion models aim to reconcile these competing objectives by achieving both broad mode coverage and high-quality generation. Nonetheless, their iterative sampling process remains a practical limitation, motivating ongoing research into accelerated sampling strategies and efficiency-oriented model variants [283,284].

In the context of image creation tasks, both GANs and diffusion models have been successfully applied to unconditional and conditional image generation. Existing evidence indicates that diffusion models often produce samples with improved diversity and structural consistency, particularly in complex anatomical settings [285]. In contrast, for image translation tasks, GAN-based methods continue to dominate the field. Well-established frameworks such as Pix2Pix and CycleGAN provide strong baselines for paired and unpaired translation, respectively, and have been extensively validated across a wide range of medical imaging modalities.

At present, there is no widely accepted diffusion-based baseline model that offers a comparable level of maturity and empirical validation for medical image translation tasks. Consequently, while diffusion models exhibit substantial theoretical potential and promising preliminary results, their practical superiority over established GAN-based approaches in translation scenarios has not yet been conclusively demonstrated. Comprehensive benchmarking across diverse modalities, anatomical regions, and clinical settings is required before definitive conclusions can be drawn regarding the relative merits of GANs and diffusion models in medical image generation.

7.1.3. Translation with Prior Knowledge

Due to the significant differences in information content among medical images from various modalities—often being entirely distinct—the importance of incorporating prior knowledge into medical image generation tasks becomes clear. Integrating prior knowledge is a crucial step toward improving the quality, authenticity, and clinical relevance of the generated images. This incorporation serves multiple purposes, guiding the generative process and ensuring that the resulting images maintain anatomical accuracy and clinical utility [128,286].

One strategy in the integration of prior knowledge involves the design of custom loss functions, engineered specifically to impose constraints rooted in prior knowledge [162,267]. A tangible illustration of this entails the incorporation of penalties or regularization terms into the loss function. These augmentations serve to incentivize the generated images to closely adhere to known anatomical structures or established clinical guidelines.

Furthermore, a critical aspect of leveraging prior knowledge involves preprocessing the training data. This preprocessing aims to highlight or extract specific features or anatomical structures of interest. Techniques such as image segmentation, registration, and other image processing methods can be strategically applied to enhance the quality of the input dataset, thereby providing the model with more robust and informative data.

In the pursuit of fortifying the model’s capacity to leverage prior knowledge, recourse to pre-trained models or knowledge derived from related medical imaging tasks is a valuable strategy. Transfer learning emerges as a potent technique, allowing the model to glean insights from prior knowledge encoded within models trained on analogous tasks or datasets.

Recent work, such as GradXcepUNet [287], highlights the importance of incorporating explainability and prior knowledge into medical image analysis pipelines. Although GradXcepUNet is primarily designed for segmentation rather than image generation, its use of Grad-CAM to identify diagnostically salient regions provides valuable insights for generative modeling. In particular, attention maps derived from downstream tasks could be leveraged to guide generative data augmentation, ensuring that synthetic images preserve clinically critical structures.

7.1.4. Paired Versus Unpaired Image Translation

Across the reviewed literature, image translation methods that rely on paired training data remain dominant. Based on our analysis of Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11, the majority of studies employ paired datasets, reflecting the strong supervision signal provided by spatially aligned image pairs. Paired approaches generally demonstrate superior quantitative performance, particularly in tasks requiring precise voxel-wise correspondence, such as MRI-to-CT or multi-contrast MRI translation [17,38,115,121]. In contrast, unpaired methods are primarily adopted in scenarios where paired data are impractical or unavailable, such as cross-institutional studies or retrospective data collection. While unpaired approaches offer greater flexibility and broader applicability, they often suffer from weaker supervision and increased ambiguity in structure preservation, which may lead to inconsistent anatomical mappings [36,77]. Performance comparisons reported in the literature suggest that paired methods consistently outperform unpaired ones in terms of aware metrics, whereas unpaired methods are more sensitive to training instability and mode collapse. Nevertheless, unpaired translation remains indispensable in real-world clinical settings where strict data pairing cannot be guaranteed. Future research should explore hybrid or semi-supervised strategies that can leverage limited paired data while maintaining the scalability of unpaired approaches.

7.1.5. Other Possible Optimization Strategies for Training

In addition to adversarial training, alternative optimization strategies such as gradient perturbation and knowledge distillation have been proposed to enhance model robustness [288,289]. Gradient perturbation techniques improve the robustness and generalization of generative models by introducing controlled noise during training [290]. This strategy strengthens the model’s ability to adapt to unseen data and reduces overfitting to specific training distributions [291]. Knowledge distillation optimizes medical image generation by transferring knowledge from complex teacher models to lightweight student models [292]. It enhances the model’s generalization across multiple tasks, enabling it to perform diverse medical image generation tasks such as denoising, super-resolution, and modality translation [293]. By leveraging the stable generative capabilities of the teacher model, knowledge distillation mitigates the instability issues commonly encountered during generative model training [294,295].

7.2. Challenges in Medical Image Creation and Translation

7.2.1. Privacy Preservation and Data Protection

Given the sensitive nature of medical imaging data, privacy preservation is a critical consideration in the development and deployment of generative models. Although generative approaches are often promoted as an effective means of alleviating data scarcity through synthetic sample generation, recent studies have demonstrated that inadequately trained models may inadvertently memorize and reproduce identifiable patient information [296,297].

To address these concerns, several privacy-preserving strategies have been proposed. Differential privacy mechanisms introduce controlled noise during the training process to reduce the risk of data leakage [296,298], while federated learning enables decentralized model training across multiple institutions without direct data sharing [84]. In addition, techniques such as secure multi-party computation and encrypted model inference have been explored to further enhance data protection in collaborative medical environments [299].

However, incorporating privacy constraints typically introduces a trade-off between data utility and image fidelity. In the medical domain, where subtle anatomical structures may carry critical diagnostic significance, excessive privacy enforcement can degrade clinically relevant features [296,298]. Therefore, future research should focus on balancing privacy preservation with diagnostic integrity, particularly in high-stakes clinical applications where both data security and image quality are essential.

7.2.2. Safe Deployment and Clinical Reliability

Beyond algorithmic performance, the safe deployment of generative models in clinical practice presents substantial challenges. Synthetic medical images may contain subtle artifacts or statistically plausible yet clinically implausible structures, which can potentially mislead downstream diagnostic models or clinicians if not rigorously validated [300,301].

Another critical concern is distribution shift. Generative models trained on historical datasets may fail to generalize to evolving imaging protocols, scanner upgrades, or demographic changes. Without continuous monitoring and periodic revalidation, such models risk producing outputs that are inconsistent with current clinical standards.

Furthermore, the long-term diagnostic implications of incorporating synthetic images into clinical decision-making pipelines remain largely unexplored. Prolonged reliance on generated data may introduce systematic biases or reinforce model-specific artifacts over time [301]. Consequently, rigorous longitudinal studies, post-deployment auditing mechanisms, and continuous performance assessment are essential to ensure sustained clinical safety and reliability.

7.2.3. Open Problems and Research Gaps

Despite significant progress, several open challenges remain unresolved in the application of generative models to medical imaging. First, there is a lack of standardized evaluation metrics. Many existing studies rely primarily on image similarity measures that may not adequately reflect clinical utility or diagnostic relevance [301].

Second, generalization across institutions and modalities remains limited, as most models are evaluated on single-center datasets, restricting conclusions about real-world robustness and transferability [84].

Third, training stability and reproducibility continue to pose major obstacles, particularly for GAN-based and hybrid architectures that are sensitive to hyperparameter selection and training dynamics.

Fourth, integration with clinical workflows is insufficiently explored, with relatively few studies addressing how generative models can be deployed in a transparent, interpretable, and clinically acceptable manner. Finally, the long-term clinical impact of synthetic data usage, including its effects on diagnostic accuracy and clinician trust, remains poorly understood [301].

Addressing these challenges will require close interdisciplinary collaboration among machine learning researchers, clinicians, and regulatory bodies to ensure the responsible and effective adoption of generative models in medical imaging.

7.3. Limitations and Future Research

This review paper summarizes recent advancements in deep learning-based medical image generation methods. However, several limitations should be acknowledged. First, it was not feasible to aggregate or statistically compare the performance of different generative models due to the heterogeneity of the included studies. These studies employed diverse imaging modalities, reported varied evaluation metrics, and applied models to different downstream tasks, making direct comparisons challenging.

Second, our classification approach was unidimensional. In Section 5, we categorized studies based on downstream tasks, while in Section 6, we grouped them by target modality for image generation. This approach may not facilitate a comprehensive comparative analysis of generative model methods. Future reviews could adopt a more nuanced classification framework, such as categorizing studies by image generation techniques, which might provide deeper insights into the strengths and weaknesses of different generation methods.

Third, while diffusion models have rapidly advanced in natural image generation since 2022 [285] and are increasingly being applied to medical image generation [34], most relevant papers are currently in preprint form. Due to the absence of peer review, these preprints were excluded from this review, resulting in limited coverage of diffusion models. As these models gain traction in medical image generation, future reviews should incorporate peer-reviewed studies to provide a more comprehensive assessment.

The study selection process may also have introduced biases. Reliance on specific databases and keywords might have excluded relevant studies from other sources, such as conference proceedings and technical reports. Additionally, the focus on English-language literature and timeframe restrictions may have led to the omission of significant non-English studies or earlier research. These limitations highlight the need for broader search strategies in future reviews.

Regarding gaps in the literature, several underexplored areas were identified. For instance, the application of physics-based generative models in medical image generation has received limited attention. Similarly, research on the generalizability of generative models across different disease stages or imaging devices remains scarce. Addressing these gaps could enhance the comprehensiveness of future assessments.

Finally, several challenges were encountered during the review process. Standardizing data extraction was difficult due to variations in metrics and reporting methods across studies, which hindered direct comparisons. The heterogeneity of studies, such as differences in dataset size, quality, and evaluation metrics, further complicated the synthesis of findings. Additionally, insufficient methodological detail in some studies made it challenging to fully interpret their results and methodologies. These challenges underscore the need for more standardized reporting practices in future research.

8. Conclusions

Medical image generation based on deep learning is an emerging and rapidly growing field. In this review, we categorize medical image generation into two main types: creation and translation. Creation focuses on generating new images from potential conditional variables, while translation involves mapping images from one or more modalities to another, preserving semantic and informational content.

Currently, medical image generation based on deep learning primarily utilizes three models: VAE, GANs, and Diffusion models. Each of these models has distinct characteristics, and the choice among them depends on the specific requirements of the task. The diffusion model, in particular, has demonstrated outstanding performance in natural image generation, earning widespread recognition. It is reasonable to anticipate that future research will increasingly focus on medical image creation and translation using diffusion models. Exploring methods to reduce the training and inference time of diffusion models while maintaining high generation quality may represent a promising research direction in the years to come.

Through an analysis of the literature included in this review, it is evident that deep learning-based medical image creation is a powerful technique for enhancing the performance of downstream tasks. It addresses data limitations, improves generalization, reduces overfitting, and increases model robustness to diverse imaging conditions and variations. Modality translation, on the other hand, can supplement missing modalities, aiding in diagnosis and enhancing downstream tasks such as attenuation correction for PET/MRI, radiotherapy planning without CT images, and CBCT denoising. Additionally, translation between contrast-enhanced and non-contrast images primarily aims to reduce costs, both in terms of time and financial resources, while also offering the potential for more accurate diagnoses, particularly for patients with kidney diseases. Overall, medical image translation based on generative models is multifaceted, highly effective, and holds significant promise for the future.

Author Contributions

H.P. and T.Z. conducted the literature search and analysis and drafted the work. Y.W. and S.C. made substantial contributions to the interpretation of data and the literature. W.Q., Y.Y., C.Y., P.M. and S.Q. designed the work and made major contributions in writing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the National Natural Science Foundation of China under Grant (Nos. 82472076, 62271131), and the Fundamental Research Funds for the Central Universities (N25BJD013, N2419004).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Conflicts of Interest

The authors declare that they have no competing interests.

Abbreviations

CBCT	Cone Beam Computed Tomography	MRA	Magnetic Resonance Angiography
CT	Computed Tomography	MRI	Magnetic Resonance Imaging
DM	Diffusion Model	PD	Proton density image
DWI	Diffusion-Weighted Image	PET	Positron Emission Tomography
FID	Fréchet Inception Distance	PSNR	Peak Signal-to-Noise Ratio
FLAIR	Fluid-Attenuated Inversion Recovery	RMSE	Root Mean Square Error
GAN	Generative Adversarial Network	SSIM	Structural Similarity Index
IS	Inception Score	T1w	T1-weighted image
MAE	Mean Absolute Error	T2w	T2-weighted image
MMD	Maximum Mean Discrepancy	US	Ultrasound imaging
MS	Mode Score	VAE	Variational Autoencoder
MSE	Mean Squared Error	WD	Wasserstein distance

References

Chen, X.; Wang, X.; Zhang, K.; Fung, K.-M.; Thai, T.C.; Moore, K.; Mannel, R.S.; Liu, H.; Zheng, B.; Qiu, Y. Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 2022, 79, 102444. [Google Scholar] [CrossRef]
Zhou, Y.; Chia, M.A.; Wagner, S.K.; Ayhan, M.S.; Williamson, D.J.; Struyven, R.R.; Liu, T.; Xu, M.; Lozano, M.G.; Woodward-Court, P. A foundation model for generalizable disease detection from retinal images. Nature 2023, 622, 156–163. [Google Scholar] [CrossRef]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef]
Cao, K.; Xia, Y.; Yao, J.; Han, X.; Lambert, L.; Zhang, T.; Tang, W.; Jin, G.; Jiang, H.; Fang, X. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat. Med. 2023, 29, 3033–3043. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Zhuo, Z.; Qu, L.; Jin, Y.; Hua, T.; Xu, J.; Tan, G.; Li, Y.; Duan, Y.; Wang, T. DeepWMH: A deep learning tool for accurate white matter hyperintensity segmentation without requiring manual annotations for training. Sci. Bull. 2024, 69, 872–875. [Google Scholar] [CrossRef] [PubMed]
Lu, Q.; Liu, W.; Zhuo, Z.; Li, Y.; Duan, Y.; Yu, P.; Qu, L.; Ye, C.; Liu, Y. A transfer learning approach to few-shot segmentation of novel white matter tracts. Med. Image Anal. 2022, 79, 102454. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Zhuo, Z.; Liu, Y.; Ye, C. One-shot segmentation of novel white matter tracts via extensive data augmentation and adaptive knowledge transfer. Med. Image Anal. 2023, 90, 102968. [Google Scholar] [CrossRef]
Vellmer, S.; Tabelow, M.; Zhang, H. Diffusion MRI GAN Synthesizing Fibre Orientation Distributions for White Matter Simulation. Commun. Biol. 2025, 8, 7936. [Google Scholar] [CrossRef]
Schuit, G.; Parra, D.; Besa, C. Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays. arXiv 2025, arXiv:2508.07128. [Google Scholar] [CrossRef]
Ejiga, O.O.; Anifowose, M.; Yuan, L. Advancing AI-Powered Medical Image Synthesis: Insights from the MedVQA-GI Challenge. arXiv 2025, arXiv:2502.20667. [Google Scholar]
Yang, Z.; Li, Y.; Wang, W. seg2med: A Segmentation-based Medical Image Generation Framework Using Denoising Diffusion Probabilistic Models. arXiv 2025, arXiv:2504.09182. [Google Scholar]
Zhao, C.; Guo, P.; Xu, Y. MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss. arXiv 2025, arXiv:2508.05772. [Google Scholar]
Kim, J.; Lee, S. FMed-Diffusion: Federated Learning on Medical Image Diffusion Models for Privacy-Preserving Data Generation. bioRxiv 2025. [Google Scholar] [CrossRef]
Chakraborty, T.; Naik, S.M.; Panja, M.; Manvitha, B. Ten Years of Generative Adversarial Nets (GANs): A survey of the state-of-the-art. arXiv 2023, arXiv:2308.16316. [Google Scholar] [CrossRef]
Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef]
Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. J. Imaging 2023, 9, 81. [Google Scholar] [CrossRef]
Dayarathna, S.; Islam, K.T.; Uribe, S.; Yang, G.; Hayat, M.; Chen, Z. Deep learning based synthesis of MRI, CT and PET: Review and analysis. Med. Image Anal. 2023, 92, 103046. [Google Scholar] [CrossRef]
Wang, T.H.; Lei, Y.; Fu, Y.B.; Wynne, J.F.; Curran, W.J.; Liu, T.; Yang, X.F. A review on medical imaging synthesis using deep learning and its clinical applications. J. Appl. Clin. Med. Phys. 2021, 22, 11–36. [Google Scholar] [CrossRef] [PubMed]
Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552. [Google Scholar] [CrossRef] [PubMed]
Kazeminia, S.; Baur, C.; Kuijper, A.; van Ginneken, B.; Navab, N.; Albarqouni, S.; Mukhopadhyay, A. GANs for medical image analysis. Artif. Intell. Med. 2020, 109, 101938. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.Z.; Yang, X.H.; Wei, Z.H.; Heidari, A.A.; Zheng, N.G.; Li, Z.C.; Chen, H.L.; Hu, H.G.; Zhou, Q.W.; Guan, Q. Generative Adversarial Networks in Medical Image augmentation: A review. Comput. Biol. Med. 2022, 144, 105382. [Google Scholar] [CrossRef]
Osuala, R.; Kushibar, K.; Garrucho, L.; Linardos, A.; Szafranowska, Z.; Klein, S.; Glocker, B.; Diaz, O.; Lekadir, K. Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. Med. Image Anal. 2022, 84, 102704. [Google Scholar] [CrossRef]
Zhao, J.; Hou, X.Y.; Pan, M.Q.; Zhang, H. Attention-based generative adversarial network in medical imaging: A narrative review. Comput. Biol. Med. 2022, 149, 105948. [Google Scholar] [CrossRef] [PubMed]
Frangi, A.F.; Tsaftaris, S.A.; Prince, J.L. Simulation and Synthesis in Medical Imaging. IEEE Trans. Med. Imaging 2018, 37, 673–679. [Google Scholar] [CrossRef]
Oussidi, A.; Elhassouny, A. Deep generative models: Survey. In Proceedings of the 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 2–4 April 2018; pp. 1–8. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Salimans, T.; Kingma, D.; Welling, M. Markov chain monte carlo and variational inference: Bridging the gap. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Pittsburgh, PA, USA, 2015; pp. 1218–1226. [Google Scholar]
Kulkarni, T.D.; Whitney, W.F.; Kohli, P.; Tenenbaum, J. Deep convolutional inverse graphics network. Adv. Neural Inf. Process. Syst. 2015, 28, 2539–2547. [Google Scholar]
Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.; Wierstra, D. Draw: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Pittsburgh, PA, USA, 2015; pp. 1462–1471. [Google Scholar]
Pesteie, M.; Abolmaesumi, P.; Rohling, R.N. Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans. Med. Imaging 2019, 38, 2807–2820. [Google Scholar] [CrossRef]
Alex, L.Y.H.; Galeotti, J. Ultrasound Variational Style Transfer to Generate Images Beyond the Observed Domain. In Proceedings of the 1st Workshop on Deep Generative Models for Medical Image Computing and Computer Assisted Intervention (DGM4MICCAI)/1st MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI), Strasbourg, France, 1 October 2021; pp. 14–23. [Google Scholar]
Wei, R.; Mahmood, A. Recent advances in variational autoencoders with representation learning for biomedical informatics: A survey. IEEE Access 2020, 9, 4939–4956. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 88, 102846. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Sun, B.; Jia, S.F.; Jiang, X.L.; Jia, F.C. Double U-Net CycleGAN for 3D MR to CT image synthesis. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 149–156. [Google Scholar] [CrossRef] [PubMed]
Dar, S.U.H.; Yurt, M.; Karacan, L.; Erdem, A.; Erdem, E.; Cukur, T. Image Synthesis in Multi-Contrast MRI With Conditional Generative Adversarial Networks. IEEE Trans. Med. Imaging 2019, 38, 2375–2388. [Google Scholar] [CrossRef]
Pang, H.; Qi, S.; Wu, Y.; Wang, M.; Li, C.; Sun, Y.; Qian, W.; Tang, G.; Xu, J.; Liang, Z. NCCT-CECT image synthesizers and their application to pulmonary vessel segmentation. Comput. Methods Programs Biomed. 2023, 231, 107389. [Google Scholar] [CrossRef] [PubMed]
Pan, S.; Wang, T.; Qiu, R.L.; Axente, M.; Chang, C.W.; Peng, J.; Patel, A.B.; Shelton, J.; Patel, S.A.; Roper, J.; et al. 2D medical image synthesis using transformer-based denoising diffusion probabilistic model. Phys. Med. Biol. 2023, 68, 105004. [Google Scholar] [CrossRef] [PubMed]
Dorjsembe, Z.; Odonchimed, S.; Xiao, F. Three-dimensional medical image synthesis with denoising diffusion probabilistic models. Med. Imaging Deep. Learn. 2022. [Google Scholar]
Özbey, M.; Dalmaz, O.; Dar, S.U.; Bedel, H.A.; Özturk, Ş.; Güngör, A.; Çukur, T. Unsupervised medical image translation with adversarial diffusion models. IEEE Trans. Med. Imaging 2023, 42, 3524–3539. [Google Scholar] [CrossRef]
Cui, Z.-X.; Cao, C.; Liu, S.; Zhu, Q.; Cheng, J.; Wang, H.; Zhu, Y.; Liang, D. Self-score: Self-supervised learning on score-based models for mri reconstruction. arXiv 2022, arXiv:2209.00835. [Google Scholar]
Peng, C.; Guo, P.; Zhou, S.K.; Patel, V.M.; Chellappa, R. Towards performant and reliable undersampled MR reconstruction via diffusion model sampling. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 623–633. [Google Scholar]
Hu, D.; Tao, Y.K.; Oguz, I. Unsupervised denoising of retinal OCT with diffusion probabilistic model. In Proceedings of the Medical Imaging 2022: Image Processing, San Diego, CA, USA, 20 February–28 March 2022; SPIE: Bellingham, WA, USA, 2022; pp. 25–34. [Google Scholar]
Kim, B.; Han, I.; Ye, J.C. DiffuseMorph: Unsupervised deformable image registration using diffusion model. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 347–364. [Google Scholar]
Yang, Y.; Fu, H.; Aviles-Rivero, A.; Schönlieb, C.-B.; Zhu, L. DiffMIC: Dual-Guidance Diffusion Network for Medical Image Classification. arXiv 2023, arXiv:2303.10610. [Google Scholar]
Rahman, A.; Valanarasu, J.M.J.; Hacihaliloglu, I.; Patel, V.M. Ambiguous medical image segmentation using diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11536–11546. [Google Scholar]
Wolleb, J.; Sandkühler, R.; Bieder, F.; Valmaggia, P.; Cattin, P.C. Diffusion models for implicit image segmentation ensembles. In Proceedings of the 5th International Conference on Medical Imaging with Deep Learning, Zürich, Switzerland, 6–8 July 2022; PMLR: Pittsburgh, PA, USA, 2022; pp. 1336–1348. [Google Scholar]
Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the 33rd International Conference on Machine Leasrning, New York, NY, USA, 19–24 June 2016; PMLR: Pittsburgh, PA, USA, 2016; pp. 1558–1566. [Google Scholar]
Rosca, M.; Lakshminarayanan, B.; Warde-Farley, D.; Mohamed, S. Variational approaches for auto-encoding generative adversarial networks. arXiv 2017, arXiv:1706.04987. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
Huang, S.-C.; Pareek, A.; Jensen, M.; Lungren, M.P.; Yeung, S.; Chaudhari, A.S. Self-supervised learning for medical image classification: A systematic review and implementation guidelines. npj Digit. Med. 2023, 6, 74. [Google Scholar] [CrossRef]
Obukhov, A.; Krasnyanskiy, M. Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance. In Software Engineering Perspectives in Intelligent Systems, Proceedings of 4th Computational Methods in Systems and Software 2020, Virtual, 14–16 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1, pp. 102–114. [Google Scholar]
Miranda, E.; Aryuni, M.; Irwansyah, E. A survey of medical image classification techniques. In Proceedings of the 2016 International Conference on Information Management and Technology (ICIMTech), Bandung, Indonesia, 16–18 November 2016; pp. 56–61. [Google Scholar]
Gao, L.; Zhang, L.; Liu, C.; Wu, S. Handling imbalanced medical image data: A deep-learning-based one-class classification approach. Artif. Intell. Med. 2020, 108, 101935. [Google Scholar] [CrossRef]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
Zhang, Q.Q.; Wang, H.F.; Lu, H.Y.; Won, D.; Yoon, S.W.; Soc, I.C. Medical Image Synthesis with Generative Adversarial Networks for Tissue Recognition. In Proceedings of the 6th IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018; pp. 199–207. [Google Scholar]
Lin, Y.J.; Chung, I.F. Medical Data Augmentation Using Generative Adversarial Networks X-ray Image Generation for Transfer Learning of Hip Fracture Detection. In Proceedings of the International Conference on Technologies and Applications of Artficial Intelligence (TAAI), Kaohsiung, Taiwan, 21–23 November 2019. [Google Scholar]
Salehinejad, H.; Colak, E.; Dowdell, T.; Barfett, J.; Valaee, S. Synthesizing Chest X-Ray Pathology for Training Deep Convolutional Neural Networks. IEEE Trans. Med. Imaging 2019, 38, 1197–1206. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Liu, S.Q.; Grbic, S.; Setio, A.A.A.; Xu, Z.B.; Gibson, E.; Chabin, G.; Georgescu, B.; Laine, A.F.; Comaniciu, D. Class-Aware Adversarial Lung Nodule Synthesis in CT Images. In Proceedings of the 16th IEEE International Symposium on Biomedical Imaging (ISBI), Venice, Italy, 8–11 April 2019; pp. 1348–1352. [Google Scholar]
Choong, R.Z.J.; Harding, S.A.; Tang, B.Y.; Liao, S.W. 3-To-1 Pipeline: Restructuring Transfer Learning Pipelines for Medical Imaging Classification via Optimized GAN Synthetic Images. In Proceedings of the 42nd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 1596–1599. [Google Scholar]
Menon, S.; Galita, J.; Chapman, D.; Gangopadhyay, A.; Mangalagiri, J.; Nguyen, P.; Yesha, Y.; Yesha, Y.; Saboury, B.; Morris, M. Generating Realistic COVID-19 x-rays with a Mean Teacher plus Transfer Learning GAN. In Proceedings of the 8th IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 1216–1225. [Google Scholar]
Ahmad, P.; Wang, Y.; Havaei, M. CT-SGAN: Computed Tomography Synthesis GAN. In Proceedings of the 1st Workshop on Deep Generative Models for Medical Image Computing and Computer Assisted Intervention (DGM4MICCAI)/1st MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI), Strasbourg, France, 1 October 2021; pp. 67–79. [Google Scholar]
Ambita, A.A.E.; Boquio, E.N.V.; Naval, P.C. COViT-GAN: Vision Transformer for COVID-19 Detection in CT Scan Images with Self-Attention GAN for Data Augmentation. In Proceedings of the 30th International Conference on Artificial Neural Networks (ICANN), Bratislava, Slovakia, 14–17 September 2021; pp. 587–598. [Google Scholar]
Che, H.; Ramanathan, S.; Foran, D.J.; Nosher, J.L.; Patel, V.M.; Hacihaliloglu, I. Realistic Ultrasound Image Synthesis for Improved Classification of Liver Disease. In Proceedings of the 2nd International Workshop on Advances in Simplifying Medical UltraSound (ASMUS), Strasbourg, France, 27 September 2021; pp. 179–188. [Google Scholar]
Pang, T.; Wong, J.H.D.; Ng, W.L.; Chan, C.S. Semi-supervise d GAN-base d Radiomics Model for Data Augmentation in Breast Ultrasound Mass Classification. Comput. Methods Programs Biomed. 2021, 203, 106018. [Google Scholar] [CrossRef]
Toda, R.; Teramoto, A.; Tsujimoto, M.; Toyama, H.; Imaizumi, K.; Saito, K.; Fujita, H. Synthetic CT image generation of shape-controlled lung cancer using semi-conditional InfoGAN and its applicability for type classification. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 241–251. [Google Scholar] [CrossRef]
Venu, S.K. Improving the Generalization of Deep Learning Classification Models in Medical Imaging Using Transfer Learning and Generative Adversarial Networks. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART), Virtual, 4–6 February 2021; pp. 218–235. [Google Scholar]
Zhang, G.Y.; Chen, K.X.; Xu, S.L.; Cho, P.C.A.; Nan, Y.; Zhou, X.; Lv, C.A.F.; Li, C.S.; Xie, G.T. Lesion synthesis to improve intracranial hemorrhage detection and classification for CT images. Comput. Med. Imaging Graph. 2021, 90, 101929. [Google Scholar] [CrossRef]
Abirami, R.N.; Vincent, P.; Rajinikanth, V.; Kadry, S. COVID-19 Classification Using Medical Image Synthesis by Generative Adversarial Networks. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2022, 30, 385–401. [Google Scholar] [CrossRef]
Fernandez-Quilez, A.; Parvez, O.; Eftestol, T.; Kjosavik, S.R.; Oppedal, K. Improving prostate cancer triage with GAN-based synthetically generated prostate ADC MRI. In Proceedings of the Conference on Medical Imaging—Computer-Aided Diagnosis, San Diego, CA, USA, 20 February–28 March 2022. [Google Scholar]
Guan, Q.; Chen, Y.Z.; Wei, Z.H.; Heidari, A.A.; Hu, H.G.; Yang, X.H.; Zheng, J.W.; Zhou, Q.W.; Chen, H.L.; Chen, F. Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN. Comput. Biol. Med. 2022, 145, 105444. [Google Scholar] [CrossRef]
Liang, Z.; Huang, J.X.; Antani, S. Image translation by Ad cycleGAN for COVID-19 X-ray images: A new approach for controllable GAN. Sensors 2022, 22, 9628. [Google Scholar] [CrossRef] [PubMed]
Mao, J.W.; Yin, X.S.; Zhang, G.D.; Chen, B.W.; Chang, Y.Q.; Chen, W.B.; Yu, J.Y.; Wang, Y.G. Pseudo-labeling generative adversarial networks for medical image classification. Comput. Biol. Med. 2022, 147, 105729. [Google Scholar] [CrossRef]
Moris, D.I.; de Moura, J.; Novo, J.; Ortega, M. Unsupervised contrastive unpaired image generation approach for improving tuberculosis screening using chest X-ray images. Pattern Recognit. Lett. 2022, 164, 60–66. [Google Scholar] [CrossRef]
Ovalle-Magallanes, E.; Avina-Cervantes, J.G.; Cruz-Aceves, I.; Ruiz-Pinales, J. Improving convolutional neural network learning based on a hierarchical bezier generative model for stenosis detection in X-ray images. Comput. Methods Programs Biomed. 2022, 219, 106767. [Google Scholar] [CrossRef] [PubMed]
Shah, P.M.; Ullah, H.; Ullah, R.; Shah, D.; Wang, Y.L.; Islam, S.U.; Gani, A.; Rodrigues, J. DC-GAN-based synthetic X-ray images augmentation for increasing the performance of EfficientNet for COVID-19 detection. Expert Syst. 2022, 39, e12823. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.F.; Lin, Y.L.; Xu, X.D.; Ding, J.Z.; Li, C.Z.; Zeng, Y.M.; Xie, W.F.; Huang, J.L. Multi-domain medical image translation generation for lung image classification based on generative adversarial networks. Comput. Methods Programs Biomed. 2023, 229, 107200. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Lee, J.H.; Kim, C.; Jin, K.N.; Park, C.M. GAN based ROI conditioned Synthesis of Medical Image for Data Augmentation. In Proceedings of the Conference on Medical Imaging—Image Processing, San Diego, CA, USA, 19–24 February 2023. [Google Scholar]
Wali, A.; Ahmad, M.; Naseer, A.; Tamoor, M.; Gilani, S.A.M. StynMedGAN: Medical images augmentation using a new GAN model for improved diagnosis of diseases. J. Intell. Fuzzy Syst. 2023, 44, 10027–10044. [Google Scholar] [CrossRef]
Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef]
Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
Tom, F.; Sheet, D. Simulating Patho-Realistic Ultrasound Images Using Deep Generative Networks with Adversarial Learning. In Proceedings of the 15th IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018; pp. 1174–1177. [Google Scholar]
Bargsten, L.; Schlaefer, A. SpeckleGAN: A generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1427–1436. [Google Scholar] [CrossRef]
Cronin, N.J.; Finni, T.; Seynnes, O. Using deep learning to generate synthetic B-mode musculoskeletal ultrasound images. Comput. Methods Programs Biomed. 2020, 196, 105583. [Google Scholar] [CrossRef]
Qu, Y.L.; Su, W.Q.; Lv, X.; Deng, C.F.; Wang, Y.; Lu, Y.T.; Chen, Z.G.; Xiao, N. Synthesis of Registered Multimodal Medical Images with Lesions. In Proceedings of the 29th International Conference on Artificial Neural Networks (ICANN), Bratislava, Slovakia, 15–18 September 2020; pp. 774–786. [Google Scholar]
Zama, A.; Park, S.H.; Bang, H.; Park, C.W.; Park, I.; Joung, S. Generative approach for data augmentation for deep learning-based bone surface segmentation from ultrasound images. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 931–941. [Google Scholar] [CrossRef]
Fernandez-Quilez, A.; Larsen, S.V.; Goodwin, M.; Gulsrud, T.O.; Kjosavik, S.R.; Oppedal, K. Improving Prostate Whole Gland Segmentation in T2-Weighted MRI with Synthetically Generated Data. In Proceedings of the 18th IEEE International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 1915–1919. [Google Scholar]
Liang, J.Z.; Chen, J.Y. Data Augmentation of Thyroid Ultrasound Images Using Generative Adversarial Network. In Proceedings of the IEEE International Ultrasonics Symposium (IEEE IUS), Xi’an, China, 11–16 September 2021. [Google Scholar]
Yao, S.Z.; Tan, J.H.; Chen, Y.; Gu, Y.H. A weighted feature transfer gan for medical image synthesis. Mach. Vis. Appl. 2021, 32, 22. [Google Scholar] [CrossRef]
Zhang, J.; Yu, L.D.; Chen, D.C.; Pan, W.D.; Shi, C.; Niu, Y.; Yao, X.W.; Xu, X.B.; Cheng, Y. Dense GAN and multi-layer attention based lesion segmentation method for COVID-19 CT images. Biomed. Signal Process. Control 2021, 69, 102901. [Google Scholar] [CrossRef]
Amirrajab, S.; Lorenz, C.; Weese, J.; Pluim, J.; Breeuwer, M. Pathology Synthesis of 3D Consistent Cardiac MR Images Using 2D VAEs and GANs. In Proceedings of the 7th International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI), Singapore, 18 September 2022; pp. 34–42. [Google Scholar]
Gao, J.; Zhao, W.H.; Li, P.; Huang, W.; Chen, Z.K. LEGAN: A Light and Effective Generative Adversarial Network for medical image synthesis. Comput. Biol. Med. 2022, 148, 105878. [Google Scholar] [CrossRef]
Liang, J.M.; Yang, X.; Huang, Y.H.; Li, H.M.; He, S.C.; Hu, X.D.; Chen, Z.J.; Xue, W.F.; Cheng, J.; Ni, D. Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis. Med. Image Anal. 2022, 79, 102461. [Google Scholar] [CrossRef] [PubMed]
Lustermans, D.; Amirrajab, S.; Veta, M.; Breeuwer, M.; Scannell, C.M. Optimized automated cardiac MR scar quantification with GAN-based data augmentation. Comput. Methods Programs Biomed. 2022, 226, 107116. [Google Scholar] [CrossRef]
Lyu, F.; Ye, M.; Ma, A.J.; Yip, T.C.F.; Wong, G.L.H.; Yuen, P.C. Learning from Synthetic CT Images via Test-Time Training for Liver Tumor Segmentation. IEEE Trans. Med. Imaging 2022, 41, 2510–2520. [Google Scholar] [CrossRef]
Platscher, M.; Zopes, J.; Federau, C. Image translation for medical image generation: Ischemic stroke lesion segmentation. Biomed. Signal Process. Control 2022, 72, 103283. [Google Scholar] [CrossRef]
Sasuga, S.; Kudo, A.; Kitamura, Y.; Iizuka, S.; Simo-Serra, E.; Hamabe, A.; Ishii, M.; Takemasa, I. Image Synthesis-Based Late Stage Cancer Augmentation and Semi-supervised Segmentation for MRI Rectal Cancer Staging. In Proceedings of the 2nd MICCAI International Workshop on Data Augmentation, Labeling, and Imperfections (DALI), Singapore, 22 September 2022; pp. 1–10. [Google Scholar]
Shabani, S.; Homayounfar, M.; Vardhanabhuti, V.; Mahani, M.A.N.; Koohi-Moghadam, M. Self-supervised region-aware segmentation of COVID-19 CT images using 3D GAN and contrastive learning. Comput. Biol. Med. 2022, 149, 106033. [Google Scholar] [CrossRef] [PubMed]
Sirazitdinov, I.; Schulz, H.; Saalbach, A.; Renisch, S.; Dylov, D.V. Tubular shape aware data generation for segmentation in medical imaging. Int. J. Comput. Assist. Radiol. Surg. 2022, 17, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
Tomar, D.; Bozorgtabar, B.; Lortkipanidze, M.; Vray, G.; Rad, M.S.; Thiran, J.P. Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation. In Proceedings of the 22nd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 1737–1747. [Google Scholar]
Beji, A.; Blaiech, A.G.; Said, M.; Abdallah, A.B.; Bedoui, M.H. An innovative medical image synthesis based on dual GAN deep neural networks for improved segmentation quality. Appl. Intell. 2023, 53, 3381–3397. [Google Scholar] [CrossRef]
Mendes, J.; Pereira, T.; Silva, F.; Frade, J.; Morgado, J.; Freitas, C.; Negrao, E.; de Lima, B.F.; da Silva, M.C.; Madureira, A.J.; et al. Lung CT image synthesis using GANs. Expert Syst. Appl. 2023, 215, 119350. [Google Scholar] [CrossRef]
Shen, Z.R.; Ouyang, X.; Xiao, B.; Cheng, J.Z.; Shen, D.G.; Wang, Q. Image synthesis with disentangled attributes for chest X-ray nodule augmentation and detection. Med. Image Anal. 2023, 84, 102708. [Google Scholar] [CrossRef]
Xing, X.; Papanastasiou, G.; Walsh, S.; Yang, G. Less is More: Unsupervised Mask-guided Annotated CT Image Synthesis with Minimum Manual Segmentations. IEEE Trans. Med. Imaging 2023, 42, 2566–2576. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.P.; Wang, Q.; Hu, B.L. MinimalGAN: Diverse medical image synthesis for data augmentation using minimal training data. Appl. Intell. 2023, 53, 3899–3916. [Google Scholar] [CrossRef]
Guo, P.F.; Wang, P.Y.; Yasarla, R.; Zhou, J.Y.; Patel, V.M.; Jiang, S.S. Anatomic and Molecular MR Image Synthesis Using Confidence Guided CNNs. IEEE Trans. Med. Imaging 2021, 40, 2832–2844. [Google Scholar] [CrossRef]
Han, C.; Hayashi, H.; Rundo, L.; Araki, R.; Shimoda, W.; Muramatsu, S.; Furukawa, Y.; Mauri, G.; Nakayama, H. GAN-Based Synthetic Brain MR Image Generation. In Proceedings of the 15th IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018; pp. 734–738. [Google Scholar]
Han, C.; Kitamura, Y.; Kudo, A.; Ichinose, A.; Rundo, L.; Furukawa, Y.; Umemoto, K.; Li, Y.Z.; Nakayama, H.; Soc, I.C. Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection. In Proceedings of the 7th International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019; pp. 729–737. [Google Scholar]
Kamli, A.; Saouli, R.; Batatia, H.; Naceur, M.B.B.; Youkana, I. Synthetic medical image generator for data augmentation and anonymisation based on generative adversarial network for glioblastoma tumors growth prediction. IET Image Process. 2020, 14, 4248–4257. [Google Scholar] [CrossRef]
Lee, L.H.; Noble, J.A. Generating Controllable Ultrasound Images of the Fetal Head. In Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1761–1764. [Google Scholar]
Wang, Z.W.; Lin, Y.; Cheng, K.T.; Yang, X. Semi-supervised mp-MRI data synthesis with StitchLayer and auxiliary distance maximization. Med. Image Anal. 2020, 59, 101565. [Google Scholar] [CrossRef]
Rodriguez-de-la-Cruz, J.A.; Acosta-Mesa, H.G.; Mezura-Montes, E. Evolution of Generative Adversarial Networks Using PSO for Synthesis of COVID-19 Chest X-ray Images. In Proceedings of the IEEE Congress on Evolutionary Computation (IEEE CEC), Kraków, Poland, 28 June–1 July 2021; pp. 2226–2233. [Google Scholar]
Shen, Z.R.; Ouyang, X.; Wang, Z.C.; Zhan, Y.Q.; Xue, Z.; Wang, Q.; Cheng, J.Z.; Shen, D.G. Nodule Synthesis and Selection for Augmenting Chest X-ray Nodule Detection. In Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Beijing, China, 29 October–1 November 2021; pp. 536–547. [Google Scholar]
Sungmin, H.; Marinescu, R.; Dalca, A.V.; Bonkhoff, A.K.; Bretzner, M.; Rost, N.S.; Golland, P. 3D-StyleGAN: A Style-Based Generative Adversarial Network for Generative Modeling of Three-Dimensional Medical Images. In Proceedings of the 1st Workshop on Deep Generative Models for Medical Image Computing and Computer Assisted Intervention (DGM4MICCAI)/1st MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI), Strasbourg, France, 1 October 2021; pp. 24–34. [Google Scholar]
Kiru, M.U.; Belaton, B.; Chew, X.; Almotairi, K.H.; Hussein, A.M.; Aminu, M. Comparative analysis of some selected generative adversarial network models for image augmentation: A case study of COVID-19 x-ray and CT images. J. Intell. Fuzzy Syst. 2022, 43, 7153–7172. [Google Scholar] [CrossRef]
Cepa, B.; Brito, C.; Sousa, A. Generative Adversarial Networks in Healthcare: A Case Study on MRI Image Generation. In Proceedings of the IEEE 7th Portuguese Meeting on Bioengineering (ENBENG), Porto, Portugal, 22–23 June 2023; pp. 48–51. [Google Scholar]
Li, Z.Y.; Fan, Q.Y.; Bilgic, B.; Wang, G.Z.; Wu, W.C.; Polimeni, J.R.; Miller, K.L.; Huang, S.Y.; Tian, Q.Y. Diffusion MRI data analysis assisted by deep learning synthesized anatomical images (DeepAnat). Med. Image Anal. 2023, 86, 102744. [Google Scholar] [CrossRef]
Kong, L.; Lian, C.; Huang, D.; Hu, Y.; Zhou, Q. Breaking the dilemma of medical image-to-image translation. Adv. Neural Inf. Process. Syst. 2021, 34, 1964–1978. [Google Scholar]
Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, Australia, 5–7 July 2012; pp. 37–38. [Google Scholar]
Brunet, D.; Vrscay, E.R.; Wang, Z. On the mathematical properties of the structural similarity index. IEEE Trans. Image Process. 2011, 21, 1488–1499. [Google Scholar] [CrossRef]
Sharma, A.; Hamarneh, G. Missing MRI Pulse Sequence Synthesis Using Multi-Modal Generative Adversarial Network. IEEE Trans. Med. Imaging 2020, 39, 1170–1183. [Google Scholar] [CrossRef] [PubMed]
Yu, B.T.; Zhou, L.P.; Wang, L.; Fripp, J.; Bourgeat, P. 3D cGAN Based Cross-Modality MR Image Synthesis for Brain Tumor Segmentation. In Proceedings of the 15th IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018; pp. 626–630. [Google Scholar]
Yu, B.; Zhou, L.; Wang, L.; Shi, Y.; Fripp, J.; Bourgeat, P. Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans. Med. Imaging 2019, 38, 1750–1762. [Google Scholar] [CrossRef]
Cao, B.; Zhang, H.; Wang, N.; Gao, X.; Shen, D. Auto-GAN: Self-Supervised Collaborative Learning for Medical Image Synthesis. In Proceedings of the 34th AAAI Conference on Artificial Intelligence/32nd Innovative Applications of Artificial Intelligence Conference/10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10486–10493. [Google Scholar]
Shen, A.Z.M.; Chen, B.Y.F.; Zhou, C.K.S.; Georgescu, D.B.; Liu, E.X.Q.; Huang, F.T.S. Learning a Self-Inverse Network for Bidirectional MRI Image Synthesis. In Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1765–1769. [Google Scholar]
Wu, K.; Qiang, Y.; Song, K.; Ren, X.T.; Yang, W.K.; Zhang, W.J.; Hussain, A.; Cui, Y.F. Image synthesis in contrast MRI based on super resolution reconstruction with multi-refinement cycle-consistent generative adversarial networks. J. Intell. Manuf. 2020, 31, 1215–1228. [Google Scholar] [CrossRef]
Xin, B.Y.; Hu, Y.F.; Zheng, Y.F.; Liao, O.G. Multi-Modality Generative Adversarial Networks with Tumor Consistency Loss for Brain MR Image Synthesis. In Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1803–1807. [Google Scholar]
Yu, B.T.; Zhou, L.P.; Wang, L.; Shi, Y.H.; Fripp, J.; Bourgeat, P. Sample-Adaptive GANs: Linking Global and Local Mappings for Cross-Modality MR Image Synthesis. IEEE Trans. Med. Imaging 2020, 39, 2339–2350. [Google Scholar] [CrossRef]
Zhou, T.; Fu, H.Z.; Chen, G.; Shen, J.B.; Shao, L. Hi-Net: Hybrid-Fusion Network for Multi-Modal MR Image Synthesis. IEEE Trans. Med. Imaging 2020, 39, 2772–2781. [Google Scholar] [CrossRef] [PubMed]
Islam, M.; Wijethilake, N.; Ren, H.L. Glioblastoma multiforme prognosis: MRI missing modality generation. segmentation and radiogenomic survival prediction. Comput. Med. Imaging Graph. 2021, 91, 101906. [Google Scholar] [CrossRef]
Kumar, V.; Sharma, M.K.; Jehadeesan, R.; Venkatraman, B.; Suman, G.; Patra, A.; Goenka, A.H.; Sheet, D. Learning to Generate Missing Pulse Sequence in MRI using Deep Convolution Neural Network Trained with Visual Turing Test. In Proceedings of the 43rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (IEEE EMBC), Virtual, 1–5 November 2021; pp. 3419–3422. [Google Scholar]
Luo, Y.M.; Nie, D.; Zhan, B.; Li, Z.A.; Wu, X.; Zhou, J.L.; Wang, Y.; Shen, D.G. Edge-preserving MRI image synthesis via adversarial network with iterative multi-scale fusion. Neurocomputing 2021, 452, 63–77. [Google Scholar] [CrossRef]
Ren, M.W.; Kim, H.; Dey, N.; Gerig, G. Q-space Conditioned Translation Networks for Directional Synthesis of Diffusion Weighted Images from Multi-modal Structural MRI. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 530–540. [Google Scholar]
Upadhyay, U.; Sudarshan, V.P.; Awate, S.P.; Soc, I.C. Uncertainty-aware GAN with Adaptive Loss for Robust MRI Image Enhancement. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3248–3257. [Google Scholar]
Wang, C.J.; Yang, G.; Papanastasiou, G.; Tsaftaris, S.A.; Newby, D.E.; Gray, C.; Macnaught, G.; MacGillivray, T.J. DiCyc: GAN-based deformation invariant cross-domain information fusion for medical image synthesis. Inf. Fusion 2021, 67, 147–160. [Google Scholar] [CrossRef]
Yan, K.; Liu, Z.Z.; Zheng, S.; Guo, Z.Y.; Zhu, Z.F.; Zhao, Y. Coarse-to-Fine Learning Framework for Semi-supervised Multimodal MRI Synthesis. In Proceedings of the 6th Asian Conference on Pattern Recognition (ACPR), Jeju Island, Republic of Korea, 9–12 November 2021; pp. 370–384. [Google Scholar]
Yang, H.R.; Sun, J.; Yang, L.W.; Xu, Z.B. A Unified Hyper-GAN Model for Unpaired Multi-contrast MR Image Translation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 127–137. [Google Scholar]
Yurt, M.; Dar, S.U.H.; Erdem, A.; Erdem, E.; Oguz, K.K.; Cukur, T. mustGAN: Multi-stream Generative Adversarial Networks for MR Image Synthesis. Med. Image Anal. 2021, 70, 101944. [Google Scholar] [CrossRef]
Zhan, B.; Li, D.; Wang, Y.; Ma, Z.Q.; Wu, X.; Zhou, J.L.; Zhou, L.P. LR-cGAN: Latent representation based conditional generative adversarial network for multi-modality MRI synthesis. Biomed. Signal Process. Control 2021, 66, 102457. [Google Scholar] [CrossRef]
Zhou, T.X.; Canu, S.; Vera, P.; Ruan, S. Feature-enhanced generation and multi-modality fusion based deep neural network for brain tumor segmentation with missing MR modalities. Neurocomputing 2021, 466, 102–112. [Google Scholar] [CrossRef]
Zhu, N.Y.; Liu, C.; Feng, X.Y.; Sikka, D.; Gjerswold-Selleck, S.; Small, S.A.; Guo, J. Deep Learning Identifies Neuroimaging Signatures of Alzheimeris Disease Using Structural and Synthesized Functional MRI Data. In Proceedings of the 18th IEEE International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 216–220. [Google Scholar]
Amirkolaee, H.A.; Bokov, D.O.; Sharma, H. Development of a GAN architecture based on integrating global and local information for paired and unpaired medical image translation. Expert Syst. Appl. 2022, 203, 117421. [Google Scholar] [CrossRef]
Dalmaz, O.; Yurt, M.; Cukur, T. ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis. IEEE Trans. Med. Imaging 2022, 41, 2598–2614. [Google Scholar] [CrossRef]
Huang, P.; Li, D.W.; Jiao, Z.C.; Wei, D.M.; Cao, B.; Mo, Z.H.; Wang, Q.; Zhang, H.; Shen, D.G. Common feature learning for brain tumor MRI synthesis by context-aware generative adversarial network. Med. Image Anal. 2022, 79, 102472. [Google Scholar] [CrossRef]
Li, J.X.; Chen, H.J.; Li, Y.F.; Peng, Y.H.; Sun, J.; Pan, P. Cross-modality synthesis aiding lung tumor segmentation on multi-modal MRI images. Biomed. Signal Process. Control 2022, 76, 103655. [Google Scholar] [CrossRef]
Lin, Y.; Han, H.; Zhou, S.K. Deep Non-Linear Embedding Deformation Network for Cross-Modal Brain MRI Synthesis. In Proceedings of the 19th IEEE International Symposium on Biomedical Imaging (IEEE ISBI), Kolkata, India, 28–31 March 2022. [Google Scholar]
Xu, L.M.; Zhang, H.; Song, L.Y.; Lei, Y.R. Bi-MGAN: Bidirectional T1-to-T2 MRI images prediction using multi-generative multi-adversarial nets. Biomed. Signal Process. Control 2022, 78, 103994. [Google Scholar] [CrossRef]
Yurt, M.; Ozbey, M.; Dar, S.U.H.; Tinaz, B.; Oguz, K.K.; Cukur, T. Progressively volumetrized deep generative models for data-efficient contextual learning of MR image recovery. Med. Image Anal. 2022, 78, 102429. [Google Scholar] [CrossRef] [PubMed]
Zhan, B.; Zhou, L.; Li, Z.; Wu, X.; Pu, Y.; Zhou, J.; Wang, Y.; Shen, D. D2FE-GAN: Decoupled dual feature extraction based GAN for MRI image synthesis. Knowl.-Based Syst. 2022, 252, 109362. [Google Scholar] [CrossRef]
Zhang, X.Z.; He, X.Z.; Guo, J.; Ettehadi, N.; Aw, N.; Semanek, D.; Posner, J.; Laine, A.; Wang, Y. PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers. IEEE Trans. Med. Imaging 2022, 41, 2925–2940. [Google Scholar] [CrossRef]
Zhu, L.; He, Q.; Huang, Y.; Zhang, Z.H.; Zeng, J.M.; Lu, L.; Kong, W.M.; Zhou, F.Q. DualMMP-GAN: Dual-scale multi-modality perceptual generative adversarial network for medical image segmentation. Comput. Biol. Med. 2022, 144, 105387. [Google Scholar] [CrossRef]
Cao, B.; Bi, Z.W.; Hu, Q.H.; Zhang, H.; Wang, N.N.; Gao, X.B.; Shen, D.G. AutoEncoder-Driven Multimodal Collaborative Learning for Medical Image Synthesis. Int. J. Comput. Vis. 2023, 131, 1995–2014. [Google Scholar] [CrossRef]
Kawahara, D.; Yoshimura, H.; Matsuura, T.; Saito, A.; Nagata, Y. MRI image synthesis for fluid-attenuated inversion recovery and diffusion-weighted images with deep learning. Phys. Eng. Sci. Med. 2023, 46, 313–323. [Google Scholar] [CrossRef]
Liu, J.; Pasumarthi, S.; Duffy, B.; Gong, E.; Datta, K.; Zaharchuk, G. One model to synthesize them all: Multi-contrast multi-scale transformer for missing data imputation. IEEE Trans. Med. Imaging 2023, 42, 2577–2591. [Google Scholar] [CrossRef] [PubMed]
Touati, R.; Kadoury, S. A least square generative network based on invariant contrastive feature pair learning for multimodal MR image synthesis. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 971–979. [Google Scholar] [CrossRef]
Touati, R.; Kadoury, S. Bidirectional feature matching based on deep pairwise contrastive learning for multiparametric MRI image synthesis. Phys. Med. Biol. 2023, 68, 125010. [Google Scholar] [CrossRef] [PubMed]
Wang, B.; Pan, Y.; Xu, S.; Zhang, Y.; Ming, Y.; Chen, L.; Liu, X.; Wang, C.; Liu, Y.; Xia, Y. Quantitative Cerebral Blood Volume Image Synthesis from Standard MRI Using Image-to-Image Translation for Brain Tumors. Radiology 2023, 308, e222471. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.Q.; Han, X.Y.; Zhang, S.J.; Feng, J.F.; Peng, T.Y.; Zhang, X.Y. MouseGAN plus plus: Unsupervised Disentanglement and Contrastive Representation for Multiple MRI Modalities Synthesis and Structural Segmentation of Mouse Brain. IEEE Trans. Med. Imaging 2023, 42, 1197–1209. [Google Scholar] [CrossRef]
Jiang, J.; Hu, Y.-C.; Tyagi, N.; Zhang, P.; Rimner, A.; Mageras, G.S.; Deasy, J.O.; Veeraraghavan, H. Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018, Proceedings of the 21st International Conference, Granada, Spain, 16–20 September 2018; Proceedings, Part II 11; Springer: Berlin/Heidelberg, Germany, 2018; pp. 777–785. [Google Scholar]
Jin, C.B.; Kim, H.; Jung, W.; Joo, S.; Park, E.; Ahn, Y.S.; Han, I.H.; Lee, J.I.; Cui, X.N. CT-based MR Synthesis using Adversarial Cycle-consistent Networks with Paired Data Learning. In Proceedings of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 13–15 October 2018. [Google Scholar]
Dong, X.; Lei, Y.; Tian, S.; Wang, T.; Patel, P.; Curran, W.J.; Jani, A.B.; Liu, T.; Yang, X. Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network. Radiother. Oncol. 2019, 141, 192–199. [Google Scholar] [CrossRef]
Yang, H.; Xia, K.J.; Bi, A.Q.; Qian, P.J.; Khosravi, M.R. Abdomen MRI synthesis based on conditional GAN. In Proceedings of the 6th Annual Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 1021–1025. [Google Scholar]
Chen, X.; Lian, C.F.; Wang, L.; Deng, H.N.; Fung, S.H.; Nie, D.; Thung, K.H.; Yap, P.T.; Gateno, J.; Xia, J.J.; et al. One-Shot Generative Adversarial Learning for MRI Segmentation of Craniomaxillofacial Bony Structures. IEEE Trans. Med. Imaging 2020, 39, 787–796. [Google Scholar] [CrossRef]
Xu, L.M.; Zeng, X.H.; Zhang, H.; Li, W.S.; Lei, J.B.; Huang, Z.W. BPGAN: Bidirectional CT-to-MRI prediction using multi-generative multi-adversarial nets with spectral normalization and localization. Neural Netw. 2020, 128, 82–96. [Google Scholar] [CrossRef]
Chen, J.X.; Wei, J.; Li, R. TarGAN: Target-Aware Generative Adversarial Networks for Multi-modality Medical Image Translation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 24–33. [Google Scholar]
Lei, Y.; Wang, T.H.; Tian, S.B.; Fu, Y.B.; Patel, P.; Jani, A.B.; Curran, W.J.; Liu, T.; Yang, X.F. Male pelvic CT multi-organ segmentation using synthetic MRI-aided dual pyramid networks. Phys. Med. Biol. 2021, 66, 085007. [Google Scholar] [CrossRef] [PubMed]
Touati, R.; Le, W.T.; Kadoury, S. A feature invariant generative adversarial network for head and neck MRI/CT image synthesis. Phys. Med. Biol. 2021, 66, 095001. [Google Scholar] [CrossRef]
Kang, H.; Podgorsak, A.R.; Venkatesulu, B.P.; Saripalli, A.L.; Chou, B.; Solanki, A.A.; Harkenrider, M.; Shea, S.; Roeske, J.C.; Abuhamad, M. Prostate segmentation accuracy using synthetic MRI for high-dose-rate prostate brachytherapy treatment planning. Phys. Med. Biol. 2023, 68, 155017. [Google Scholar] [CrossRef] [PubMed]
Wang, J.Y.; Wu, Q.M.J.; Pourpanah, F. DC-cycleGAN: Bidirectional CT-to-MR synthesis from unpaired data. Comput. Med. Imaging Graph. 2023, 108, 102249. [Google Scholar] [CrossRef] [PubMed]
Han, X. MR-based synthetic CT generation using a deep convolutional neural network method. Med. Phys. 2017, 44, 1408–1419. [Google Scholar] [CrossRef]
Lei, Y.; Wang, T.H.; Tian, S.B.; Dong, X.; Jani, A.B.; Schuster, D.; Curran, W.J.; Patel, P.; Liu, T.; Yang, X.F. Male pelvic multi-organ segmentation aided by CBCT-based synthetic MRI. Phys. Med. Biol. 2020, 65, 035013. [Google Scholar] [CrossRef]
Sun, H.F.; Xi, Q.Y.; Sun, J.W.; Fan, R.B.; Xie, K.; Ni, X.Y.; Yang, J.H. Research on new treatment mode of radiotherapy based on pseudo-medical images. Comput. Methods Programs Biomed. 2022, 221, 106932. [Google Scholar] [CrossRef]
Jiang, C.H.; Zhang, X.; Zhang, N.; Zhang, Q.Y.; Zhou, C.; Yuan, J.M.; He, Q.; Yang, Y.F.; Liu, X.; Zheng, H.R.; et al. Synthesizing PET/MR (T1-weighted) images from non-attenuation-corrected PET images. Phys. Med. Biol. 2021, 66, 135006. [Google Scholar] [CrossRef] [PubMed]
Bazangani, F.; Richard, F.J.; Ghattas, B.; Guedj, E. FDG-PET to T1 Weighted MRI Translation with 3D Elicit Generative Adversarial Network (E-GAN). Sensors 2022, 22, 4640. [Google Scholar] [CrossRef] [PubMed]
Jiao, J.B.; Namburete, A.I.L.; Papageorghiou, A.T.; Noble, J.A. Self-Supervised Ultrasound to MRI Fetal Brain Image Synthesis. IEEE Trans. Med. Imaging 2020, 39, 4413–4424. [Google Scholar] [CrossRef]
Thummerer, A.; de Jong, B.A.; Zaffino, P.; Meijers, A.; Marmitt, G.G.; Seco, J.; Steenbakkers, R.; Langendijk, J.A.; Both, S.; Spadea, M.F.; et al. Comparison of the suitability of CBCT- and MR-based synthetic CTs for daily adaptive proton therapy in head and neck patients. Phys. Med. Biol. 2020, 65, 235036. [Google Scholar] [CrossRef]
Zhang, T.; Pang, H.; Wu, Y.; Xu, J.; Liang, Z.; Xia, S.; Jin, C.; Chen, R.; Qi, S. InspirationOnly: Synthesizing expiratory CT from inspiratory CT to estimate parametric response map. Med. Biol. Eng. Comput. 2025, 63, 2277–2294. [Google Scholar] [CrossRef]
Zhang, T.; Pang, H.; Wu, Y.; Xu, J.; Liu, L.; Li, S.; Xia, S.; Chen, R.; Liang, Z.; Qi, S. BreathVisionNet: A pulmonary-function-guided CNN-transformer hybrid model for expiratory CT image synthesis. Comput. Methods Programs Biomed. 2025, 259, 108516. [Google Scholar] [CrossRef]
Yu, P.; Zhang, H.; Wang, D.; Zhang, R.; Deng, M.; Yang, H.; Wu, L.; Liu, X.; Oh, A.S.; Abtin, F.G. Spatial resolution enhancement using deep learning improves chest disease diagnosis based on thick slice CT. npj Digit. Med. 2024, 7, 335. [Google Scholar] [CrossRef] [PubMed]
Yang, H.R.; Sun, J.; Carass, A.; Zhao, C.; Lee, J.; Prince, J.L.; Xu, Z.B. Unsupervised MR-to-CT Synthesis Using Structure-Constrained CycleGAN. IEEE Trans. Med. Imaging 2020, 39, 4249–4261. [Google Scholar] [CrossRef]
Wei, R.; Liu, B.; Zhou, F.G.; Bai, X.Z.; Fu, D.S.; Liang, B.; Wu, Q.W. A patient-independent CT intensity matching method using conditional generative adversarial networks (cGAN) for single x-ray projection-based tumor localization. Phys. Med. Biol. 2020, 65, 145009. [Google Scholar] [CrossRef]
Zhang, Y.W.; Li, C.P.; Dai, Z.H.; Zhong, L.M.; Wang, X.T.; Yang, W. Breath-Hold CBCT-Guided CBCT-to-CT Synthesis via Multimodal Unsupervised Representation Disentanglement Learning. IEEE Trans. Med. Imaging 2023, 42, 2313–2324. [Google Scholar] [CrossRef]
Li, Y.H.; Zhu, J.H.; Liu, Z.B.; Teng, J.J.; Xie, Q.Y.; Zhang, L.W.; Liu, X.W.; Shi, J.P.; Chen, L.X. A preliminary study of using a deep convolution neural network to generate synthesized CT images based on CBCT for adaptive radiotherapy of nasopharyngeal carcinoma. Phys. Med. Biol. 2019, 64, 145010. [Google Scholar] [CrossRef]
Liang, X.; Chen, L.Y.; Nguyen, D.; Zhou, Z.G.; Gu, X.J.; Yang, M.; Wang, J.; Jiang, S. Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy. Phys. Med. Biol. 2019, 64, 125002. [Google Scholar] [CrossRef]
Zhang, Y.G.; Pei, Y.R.; Qin, H.F.; Guo, Y.K.; Ma, G.Y.; Xu, T.M.; Zha, H.B. Masseter Muscle Segmentation from Cone-Beam CT Images Using Generative Adversarial Network. In Proceedings of the 16th IEEE International Symposium on Biomedical Imaging (ISBI), Venice, Italy, 8–11 April 2019; pp. 1188–1192. [Google Scholar]
Thummerer, A.; Zaffino, P.; Meijers, A.; Marmitt, G.G.; Seco, J.; Steenbakkers, R.; Langendijk, J.A.; Both, S.; Spadea, M.F.; Knopf, A.C. Comparison of CBCT based synthetic CT methods suitable for proton dose calculations in adaptive proton therapy. Phys. Med. Biol. 2020, 65, 095002. [Google Scholar] [CrossRef]
Chen, L.Y.; Liang, X.; Shen, C.Y.; Nguyen, D.; Jiang, S.; Wang, J. Synthetic CT generation from CBCT images via unsupervised deep learning. Phys. Med. Biol. 2021, 66, 115019. [Google Scholar] [CrossRef]
Deng, L.W.; Zhang, M.X.; Wang, J.; Huang, S.J.; Yang, X. Improving cone-beam CT quality using a cycle-residual connection with a dilated convolution-consistent generative adversarial network. Phys. Med. Biol. 2022, 67, 145010145010. [Google Scholar] [CrossRef] [PubMed]
Deng, L.W.; Ji, Y.F.; Huang, S.J.; Yang, X.; Wang, J. Synthetic CT generation from CBCT using double-chain-CycleGAN. Comput. Biol. Med. 2023, 161, 106889. [Google Scholar] [CrossRef] [PubMed]
Joseph, J.; Biji, I.; Babu, N.; Pournami, P.N.; Jayaraj, P.B.; Puzhakkal, N.; Sabu, C.; Patel, V. Fan beam CT image synthesis from cone beam CT image using nested residual UNet based conditional generative adversarial network. Phys. Eng. Sci. Med. 2023, 46, 703–717. [Google Scholar] [CrossRef] [PubMed]
Szmul, A.; Taylor, S.; Lim, P.; Cantwell, J.; Moreira, I.; Zhang, Y.; D’Souza, D.; Moinuddin, S.; Gaze, M.N.; Gains, J.; et al. Deep learning based synthetic CT from cone beam CT generation for abdominal paediatric radiotherapy. Phys. Med. Biol. 2023, 68, 105006. [Google Scholar] [CrossRef]
Dong, X.; Wang, T.H.; Lei, Y.; Higgins, K.; Liu, T.; Curran, W.J.; Mao, H.; Nye, J.A.; Yang, X.F. Synthetic CT generation from non-attenuation corrected PET images for whole-body PET imaging. Phys. Med. Biol. 2019, 64, 215016. [Google Scholar] [CrossRef]
Hu, Z.L.; Li, Y.C.; Zou, S.J.; Xue, H.Z.; Sang, Z.R.; Liu, X.; Yang, Y.F.; Zhu, X.H.; Liang, D.; Zheng, H.R. Obtaining PET/CT images from non-attenuation corrected PET images in a single PET system using Wasserstein generative adversarial networks. Phys. Med. Biol. 2020, 65, 215010. [Google Scholar] [CrossRef]
Rao, F.; Yang, B.; Chen, Y.W.; Li, J.S.; Wang, H.K.; Ye, H.W.; Wang, Y.F.; Zhao, K.; Zhu, W.T. A novel supervised learning method to generate CT images for attenuation correction in delayed pet scans. Comput. Methods Programs Biomed. 2020, 197, 105764. [Google Scholar] [CrossRef]
Li, J.T.; Wang, Y.W.; Yang, Y.; Zhang, X.; Qu, Z.J.; Hu, S.B. Small animal PET to CT image synthesis based on conditional generation network. In Proceedings of the 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 23–25 October 2021. [Google Scholar]
Ying, X.D.; Guo, H.; Ma, K.; Wu, J.; Weng, Z.X.; Zheng, Y.F. X2CT-GAN: Reconstructing CT from Biplanar X-Rays with Generative Adversarial Networks. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10611–10620. [Google Scholar]
Lewis, A.; Mahmoodi, E.; Zhou, Y.Y.; Coffee, M.; Sizikova, E. Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 3258–3266. [Google Scholar]
Li, G.; Bai, L.; Zhu, C.W.; Wu, E.H.; Ma, R.B. A Novel Method of Synthetic CT Generation from MR Images based on Convolutional Neural Networks. In Proceedings of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 13–15 October 2018. [Google Scholar]
Maspero, M.; Savenije, M.H.F.; Dinkla, A.M.; Seevinck, P.R.; Intven, M.P.W.; Jurgenliemk-Schulz, I.M.; Kerkmeijer, L.G.W.; van den Berg, C.A.T. Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Phys. Med. Biol. 2018, 63, 185001. [Google Scholar] [CrossRef]
Nie, D.; Trullo, R.; Lian, J.; Wang, L.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical Image Synthesis with Deep Convolutional Adversarial Networks. IEEE Trans. Biomed. Eng. 2018, 65, 2720–2730. [Google Scholar] [CrossRef]
Xiang, L.; Wang, Q.; Nie, D.; Zhang, L.C.; Jin, X.Y.; Qiao, Y.; Shen, D.G. Deep embedding convolutional neural network for synthesizing CT image from T1-Weighted MR image. Med. Image Anal. 2018, 47, 31–44. [Google Scholar] [CrossRef]
Ge, Y.H.; Wei, D.M.; Xue, Z.; Wang, Q.; Zhou, X.; Zhan, Y.Q.; Liao, S. Unpaired MR to CT Synthesis with Explicit Structural Constrained Adversarial Learning. In Proceedings of the 16th IEEE International Symposium on Biomedical Imaging (ISBI), Venice, Italy, 8–11 April 2019; pp. 1096–1099. [Google Scholar]
Largent, A.; Nunes, J.C.; Saint-Jalmes, H.; Baxter, J.; Greer, P.; Dowling, J.; de Crevoisier, R.; Acosta, O. Pseudo-CT Generation for Mri-Only Radiotherapy: Comparative Study Between a Generative Adversarial Network, a U-Net Network, a Patch-Based, and an Atlas Based Methods. In Proceedings of the 16th IEEE International Symposium on Biomedical Imaging (ISBI), Venice, Italy, 8–11 April 2019; pp. 1109–1113. [Google Scholar]
Liu, Y.Z.; Lei, Y.; Wang, Y.N.; Shafai-Erfani, G.; Wang, T.H.; Tian, S.B.; Patel, P.; Jani, A.B.; McDonald, M.; Curran, W.J.; et al. Evaluation of a deep learning-based pelvic synthetic CT generation technique for MRI-based prostate proton treatment planning. Phys. Med. Biol. 2019, 64, 205022. [Google Scholar] [CrossRef]
Liu, Y.Z.; Lei, Y.; Wang, Y.N.; Wang, T.H.; Ren, L.; Lin, L.Y.; McDonald, M.; Curran, W.J.; Liu, T.; Zhou, J.; et al. MRI-based treatment planning for proton radiotherapy: Dosimetric validation of a deep learning-based liver synthetic CT generation method. Phys. Med. Biol. 2019, 64, 145015. [Google Scholar] [CrossRef] [PubMed]
Zeng, G.; Zheng, G. Hybrid generative adversarial networks for deep MR to CT synthesis using unpaired data. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019, Proceedings of the 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part IV 22; Springer: Berlin/Heidelberg, Germany, 2019; pp. 759–767. [Google Scholar]
Arabi, H.; Zeng, G.; Zheng, G.; Zaidi, H. Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET/MRI. Eur. J. Nucl. Med. Mol. Imaging 2019, 46, 2746–2759. [Google Scholar] [CrossRef]
Boni, K.; Klein, J.; Vanquin, L.; Wagner, A.; Lacornerie, T.; Pasquier, D.; Reynaert, N. MR to CT synthesis with multicenter data in the pelvic area using a conditional generative adversarial network. Phys. Med. Biol. 2020, 65, 075002. [Google Scholar] [CrossRef] [PubMed]
Emami, H.; Dong, M.; Glide-Hurst, C.K. Attention-Guided Generative Adversarial Network to Address Atypical Anatomy in Synthetic CT Generation. In Proceedings of the 21st IEEE International Conference on Information Reuse and Integration for Data Science (IEEE IRI), Las Vegas, NV, USA, 11–13 August 2020; pp. 188–193. [Google Scholar]
Fetty, L.; Lofstedf, T.; Heilemann, G.; Furtado, H.; Nesvacil, N.; Nyholm, T.; Georg, D.; Kuess, P. Investigating conditional GAN performance with different generator architectures, an ensemble model, and different MR scanners for MR-sCT conversion. Phys. Med. Biol. 2020, 65, 105004. [Google Scholar] [CrossRef] [PubMed]
Liu, L.L.; Johansson, A.; Cao, Y.; Dow, J.; Lawrence, T.S.; Balter, J.M. Abdominal synthetic CT generation from MR Dixon images using a U-net trained with ‘semi-synthetic’ CT data. Phys. Med. Biol. 2020, 65, 125001. [Google Scholar] [CrossRef]
Massa, H.A.; Johnson, J.M.; McMillan, A.B. Comparison of deep learning synthesis of synthetic CTs using clinical MRI inputs. Phys. Med. Biol. 2020, 65, 23NT03. [Google Scholar] [CrossRef]
Oulbacha, R.; Kadoury, S. MRI TO CT SYNTHESIS OF THE LUMBAR SPINE FROM A PSEUDO-3D CYCLE GAN. In Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1784–1787. [Google Scholar]
Abu-Srhan, A.; Almallahi, I.; Abushariah, M.A.M.; Mahafza, W.; Al-Kadi, O.S. Paired-unpaired Unsupervised Attention Guided GAN with transfer learning for bidirectional brain MR-CT synthesis. Comput. Biol. Med. 2021, 136, 104763. [Google Scholar] [CrossRef] [PubMed]
Bajger, M.; To, M.S.; Lee, G.; Wells, A.; Chong, C.; Agzarian, M.; Poonnoose, S. Lumbar Spine CT synthesis from MR images using CycleGAN—A preliminary study. In Proceedings of the International Conference on Digital Image Computing—Techniques and Applications (DICTA), Gold Coast, Australia, 29 November–1 December 2021; pp. 420–427. [Google Scholar]
Chourak, H.; Barateau, A.; Mylona, E.; Cadin, C.; Lafond, C.; Greer, P.; Dowling, J.; de Crevoisier, R.; Acosta, O. Voxel-Wise Analysis for Spatial Characterisation of Pseudo-CT Errors in MRI-Only Radiotherapy Planning. In Proceedings of the 18th IEEE International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 395–399. [Google Scholar]
Emami, H.; Dong, M.; Nejad-Davarani, S.P.; Glide-Hurst, C.K. SA-GAN: Structure-Aware GAN for Organ-Preserving Synthetic CT Generation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Strasbourg, France, 27 September–1 October 2021; pp. 471–481. [Google Scholar]
Kang, S.K.; An, H.J.; Jin, H.; Kim, J.I.; Chie, E.K.; Park, J.M.; Lee, J.S. Synthetic CT generation from weakly paired MR images using cycle-consistent GAN for MR-guided radiotherapy. Biomed. Eng. Lett. 2021, 11, 263–271. [Google Scholar] [CrossRef] [PubMed]
Liu, R.R.; Lei, Y.; Wang, T.H.; Zhou, J.; Roper, J.; Lin, L.Y.; McDonald, M.W.; Bradley, J.D.; Curran, W.J.; Liu, T.; et al. Synthetic dual-energy CT for MRI-only based proton therapy treatment planning using label-GAN. Phys. Med. Biol. 2021, 66, 065014. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.X.; Chen, A.N.; Shi, H.Y.; Huang, S.J.; Zheng, W.J.; Liu, Z.Q.; Zhang, Q.; Yang, X. CT synthesis from MRI using multi-cycle GAN for head-and-neck radiation therapy. Comput. Med. Imaging Graph. 2021, 91, 101953. [Google Scholar] [CrossRef]
Olberg, S.; Chun, J.; Choi, B.S.; Park, I.; Kim, H.; Kim, T.; Kim, J.S.; Green, O.; Park, J.C. Abdominal synthetic CT reconstruction with intensity projection prior for MRI-only adaptive radiotherapy. Phys. Med. Biol. 2021, 66, 204001. [Google Scholar] [CrossRef]
Wang, R.Z.; Zheng, G.Y. Disentangled Representation Learning for Deep MR to CT Synthesis Using Unpaired Data. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 274–278. [Google Scholar]
Shi, Z.; Mettes, P.; Zheng, G.; Snoek, C. Frequency-Supervised MR-to-CT Image Synthesis. In Proceedings of the 1st Workshop on Deep Generative Models for Medical Image Computing and Computer Assisted Intervention (DGM4MICCAI)/1st MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI), Strasbourg, France, 1 October 2021; pp. 3–13. [Google Scholar]
Ang, S.P.; Phung, S.L.; Field, M.; Schira, M.M. An Improved Deep Learning Framework for MR-to-CT Image Synthesis with a New Hybrid Objective Function. In Proceedings of the 19th IEEE International Symposium on Biomedical Imaging (IEEE ISBI), Kolkata, India, 28–31 March 2022. [Google Scholar]
Dovletov, G.; Pham, D.D.; Lörcks, S.; Pauli, J.; Gratz, M.; Quick, H.H. Grad-CAM guided U-net for MRI-based pseudo-CT synthesis. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 2071–2075. [Google Scholar]
Dovletov, G.; Pham, D.D.; Pauli, J.; Gratz, M.; Quick, H. Improved MRI-based Pseudo-CT Synthesis via Segmentation Guided Attention Networks. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC)/9th International Conference on Bioimaging (BIOIMAGING), Virtual, 9–11 February 2022; pp. 131–140. [Google Scholar]
Boroojeni, P.E.; Chen, Y.; Commean, P.K.; Eldeniz, C.; Skolnick, G.B.; Merrill, C.; Patel, K.B.; An, H. Deep-learning synthesized pseudo-CT for MR high-resolution pediatric cranial bone imaging (MR-HiPCB). Magn. Reson. Med. 2022, 88, 2285–2297. [Google Scholar] [CrossRef]
Hernandez, A.G.; Fau, P.; Rapacchi, S.; Wojak, J.; Mailleux, H.; Benkreira, M.; Adel, M. Generation of synthetic CT with Deep Learning for Magnetic Resonance Guided Radiotherapy. In Proceedings of the 16th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), Dijon, France, 19–21 October 2022; pp. 368–371. [Google Scholar]
Jabbarpour, A.; Mahdavi, S.R.; Sadr, A.V.; Esmaili, G.; Shiri, I.; Zaidi, H. Unsupervised pseudo CT generation using heterogenous multicentric CT/MR images and CycleGAN: Dosimetric assessment for 3D conformal radiotherapy. Comput. Biol. Med. 2022, 143, 105277. [Google Scholar] [CrossRef]
Liu, H.; Sigona, M.K.; Manuel, T.J.; Chen, L.M.; Caskey, C.F.; Dawant, B.M. Synthetic CT Skull Generation for Transcranial MR Imaging-Guided Focused Ultrasound Interventions with Conditional Adversarial Networks. In Proceedings of the Conference on Medical Imaging—Image-Guided Procedures, Robotic Interventions, and Modeling, San Diego, CA, USA, 20 February–28 March 2022. [Google Scholar]
Lyu, Q.; Wang, G. Conversion between CT and MRI images using diffusion and score-matching models. arXiv 2022, arXiv:2209.12104. [Google Scholar] [CrossRef]
Park, S.H.; Choi, D.M.; Jung, I.H.; Chang, K.W.; Kim, M.J.; Jung, H.H.; Chang, J.W.; Kim, H.; Chang, W.S. Clinical application of deep learning-based synthetic CT from real MRI to improve dose planning accuracy in Gamma Knife radiosurgery: A proof of concept study. Biomed. Eng. Lett. 2022, 12, 359–367. [Google Scholar] [CrossRef]
Ranjan, A.; Lalwani, D.; Misra, R. GAN for synthesizing CT from T2-weighted MRI data towards MR-guided radiation treatment. Magn. Reson. Mater. Phys. Biol. Med. 2022, 35, 449–457. [Google Scholar] [CrossRef]
Sun, H.F.; Xi, Q.Y.; Fan, R.B.; Sun, J.W.; Xie, K.; Ni, X.Y.; Yang, J.H. Synthesis of pseudo-CT images from pelvic MRI images based on an MD-CycleGAN model for radiotherapy. Phys. Med. Biol. 2022, 67, 035006. [Google Scholar] [CrossRef]
Estakhraji, S.I.Z.; Pirasteh, A.; Bradshaw, T.; McMillan, A. On the effect of training database size for MR-based synthetic CT generation in the head. Comput. Med. Imaging Graph. 2023, 107, 102227. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Xu, S.S.; Chen, H.B.; Sun, Y.; Bian, J.; Guo, S.S.; Lu, Y.; Qi, Z.Y. CT synthesis from multi-sequence MRI using adaptive fusion network. Comput. Biol. Med. 2023, 157, 106738. [Google Scholar] [CrossRef] [PubMed]
Liu, X.M.; Pan, J.L.; Li, X.; Wei, X.K.; Liu, Z.P.; Pan, Z.F.; Tang, J.S. Attention Based Cross-Domain Synthesis and Segmentation From Unpaired Medical Images. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 917–929. [Google Scholar] [CrossRef]
Parrella, G.; Vai, A.; Nakas, A.; Garau, N.; Meschini, G.; Camagni, F.; Molinelli, S.; Barcellini, A.; Pella, A.; Ciocca, M.; et al. Synthetic CT in Carbon Ion Radiotherapy of the Abdominal Site. Bioengineering 2023, 10, 250. [Google Scholar] [CrossRef]
Wang, J.Y.; Wu, Q.M.J.; Pourpanah, F. An attentive-based generative model for medical image synthesis. Int. J. Mach. Learn. Cybern. 2023, 14, 3897–3910. [Google Scholar] [CrossRef]
Wang, L.F.; Liu, Y.; Mi, J.; Zhang, J. MSE-Fusion: Weakly supervised medical image fusion with modal synthesis and enhancement. Eng. Appl. Artif. Intell. 2023, 119, 105744. [Google Scholar] [CrossRef]
Zhao, B.; Cheng, T.T.; Zhang, X.R.; Wang, J.J.; Zhu, H.; Zhao, R.C.; Li, D.W.; Zhang, Z.J.; Yu, G. CT synthesis from MR in the pelvic area using Residual Transformer Conditional GAN. Comput. Med. Imaging Graph. 2023, 103, 102150. [Google Scholar] [CrossRef]
Nyholm, T.; Svensson, S.; Andersson, S.; Jonsson, J.; Sohlin, M.; Gustafsson, C.; Kjellén, E.; Söderström, K.; Albertsson, P.; Blomqvist, L. MR and CT data with multiobserver delineations of organs in the pelvic area—Part of the Gold Atlas project. Med. Phys. 2018, 45, 1295–1300. [Google Scholar] [CrossRef]
Zhong, L.M.; Chen, Z.L.; Shu, H.; Zheng, Y.K.; Zhang, Y.W.; Wu, Y.K.; Feng, Q.J.; Li, Y.; Yang, W. QACL: Quartet attention aware closed-loop learning for abdominal MR-to-CT synthesis via simultaneous registration. Med. Image Anal. 2023, 83, 102692. [Google Scholar] [CrossRef]
Zhou, X.R.; Cai, W.W.; Cai, J.J.; Xiao, F.; Qi, M.K.; Liu, J.W.; Zhou, L.H.; Li, Y.B.; Song, T. Multimodality MRI synchronous construction based deep learning framework for MRI-guided radiotherapy synthetic CT generation. Comput. Biol. Med. 2023, 162, 107054. [Google Scholar] [CrossRef]
Zhang, Y.; Miao, S.; Mansi, T.; Liao, R. Unsupervised X-ray image segmentation with task driven generative adversarial networks. Med. Image Anal. 2020, 62, 101664. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.X.; Fan, F.X.; Syben, C.; Roser, P.; Mill, L.; Maier, A. Cephalogram synthesis and landmark detection in dental cone-beam CT systems. Med. Image Anal. 2021, 70, 102028. [Google Scholar] [CrossRef] [PubMed]
Peng, C.; Liao, H.F.; Wong, N.; Luo, J.B.; Zhou, S.K.; Chellappa, R. XraySyn: Realistic View Synthesis From a Single Radiograph Through CT Priors. In Proceedings of the 35th AAAI Conference on Artificial Intelligence/33rd Conference on Innovative Applications of Artificial Intelligence/11th Symposium on Educational Advances in Artificial Intelligence, Virtual, 2–9 February 2021; pp. 436–444. [Google Scholar]
Yuen, P.H.H.; Wang, X.H.; Lin, Z.P.; Chow, N.K.W.; Cheng, J.; Tan, C.H.; Huang, W.M. CT2CXR: CT-based CXR Synthesis for COVID-19 Pneumonia Classification. In Proceedings of the 13th International Workshop on Machine Learning in Medical Imaging (MLMI), Singapore, 18 September 2022; pp. 210–219. [Google Scholar]
Shen, L.Y.; Yu, L.Q.; Zhao, W.; Pauly, J.; Xing, L. Novel-view X-ray projection synthesis through geometry-integrated deep learning. Med. Image Anal. 2022, 77, 102372. [Google Scholar] [CrossRef]
Hu, S.Y.; Lei, B.Y.; Wang, S.Q.; Wang, Y.; Feng, Z.G.; Shen, Y.Y. Bidirectional Mapping Generative Adversarial Networks for Brain MR to PET Synthesis. IEEE Trans. Med. Imaging 2022, 41, 145–157. [Google Scholar] [CrossRef]
Raichle, M.E. Positron emission tomography. Annu. Rev. Neurosci. 1983, 6, 249–267. [Google Scholar] [CrossRef]
Ben-Cohen, A.; Klang, E.; Raskin, S.P.; Soffer, S.; Ben-Haim, S.; Konen, E.; Amitai, M.M.; Greenspan, H. Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection. Eng. Appl. Artif. Intell. 2019, 78, 186–194. [Google Scholar] [CrossRef]
Yan, Y.; Lee, H.; Somer, E.; Grau, V. Generation of Amyloid PET Images via Conditional Adversarial Training for Predicting Progression to Alzheimer’s Disease. In Proceedings of the 1st International Workshop on PRedictive Intelligence in MEdicine (PRIME), Granada, Spain, 16 September 2018; pp. 26–33. [Google Scholar]
Emami, H.; Dong, M.; Glide-Hurst, C. CL-GAN: Contrastive Learning-Based Generative Adversarial Network for Modality Transfer with Limited Paired Data. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 527–542. [Google Scholar]
Zhang, J.; He, X.H.; Qing, L.B.; Gao, F.; Wang, B. BPGAN: Brain PET synthesis from MRI using generative adversarial network for multi-modal Alzheimer’s disease diagnosis. Comput. Methods Programs Biomed. 2022, 217, 106676. [Google Scholar] [CrossRef] [PubMed]
Aldrich, J.E. Basic physics of ultrasound imaging. Crit. Care Med. 2007, 35, S131–S137. [Google Scholar] [CrossRef]
Grimwood, A.; Ramalhinho, J.; Baum, Z.M.C.; Montana-Brown, N.; Johnson, G.J.; Hu, Y.P.; Clarkson, M.J.; Pereira, S.P.; Barratt, D.C.; Bonmati, E. Endoscopic Ultrasound Image Synthesis Using a Cycle-Consistent Adversarial Network. In Proceedings of the 2nd International Workshop on Advances in Simplifying Medical UltraSound (ASMUS), Strasbourg, France, 27 September 2021; pp. 169–178. [Google Scholar]
Kim, R.J.; Wu, E.; Rafael, A.; Chen, E.-L.; Parker, M.A.; Simonetti, O.; Klocke, F.J.; Bonow, R.O.; Judd, R.M. The use of contrast-enhanced magnetic resonance imaging to identify reversible myocardial dysfunction. N. Engl. J. Med. 2000, 343, 1445–1453. [Google Scholar] [CrossRef]
Edelman, R.R. Contrast-enhanced MR imaging of the heart: Overview of the literature. Radiology 2004, 232, 653–668. [Google Scholar] [CrossRef]
Mallio, C.A.; Radbruch, A.; Deike-Hofmann, K.; van der Molen, A.J.; Dekkers, I.A.; Zaharchuk, G.; Parizel, P.M.; Zobel, B.B.; Quattrocchi, C.C. Artificial intelligence to reduce or eliminate the need for gadolinium-based contrast agents in brain and cardiac MRI: A literature review. Investig. Radiol. 2023, 58, 746–753. [Google Scholar] [CrossRef]
Olut, S.; Sahin, Y.H.; Demir, U.; Unal, G. Generative Adversarial Training for MRA Image Synthesis Using Multi-contrast MRI. In Proceedings of the 1st International Workshop on PRedictive Intelligence in MEdicine (PRIME), Granada, Spain, 16 September 2018; pp. 147–154. [Google Scholar]
Campello, V.M.; Martin-Isla, C.; Izquierdo, C.; Petersen, S.E.; Ballester, M.A.G.; Lekadir, K. Combining Multi-Sequence and Synthetic Images for Improved Segmentation of Late Gadolinium Enhancement Cardiac MRI. In Proceedings of the 10th International Workshop on Statistical Atlases and Computational Modelling of the Heart (STACOM), Shenzhen, China, 13 October 2019; pp. 290–299. [Google Scholar]
Zhao, J.F.; Li, D.W.; Kassam, Z.; Howey, J.; Chong, J.; Chen, B.; Li, S. Tripartite-GAN: Synthesizing liver contrast-enhanced MRI to improve tumor detection. Med. Image Anal. 2020, 63, 101667. [Google Scholar] [CrossRef]
Bone, A.; Ammari, S.; Lamarque, J.P.; Elhaik, M.; Chouzenoux, E.; Nicolas, F.; Robert, P.; Balleyguier, C.; Lassau, N.; Rohe, M.M. Contrast-Enhanced Brain MRI Synthesis with Deep Learning: Key Input Modalities and Asymptotic Performance. In Proceedings of the 18th IEEE International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 1159–1163. [Google Scholar]
Pan, M.Q.; Zhang, H.; Tang, Z.C.; Zhao, Y.H.; Tian, J. Attention-Based Multi-Scale Generative Adversarial Network for synthesizing contrast-enhanced MRI. In Proceedings of the 43rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (IEEE EMBC), Virtual, 1–5 November 2021; pp. 3650–3653. [Google Scholar]
Xu, C.C.; Zhang, D.; Chong, J.; Chen, B.; Li, S. Synthesis of gadolinium-enhanced liver tumors on nonenhanced liver MR images using pixel-level graph reinforcement learning. Med. Image Anal. 2021, 69, 101976. [Google Scholar] [CrossRef]
Chen, H.W.; Yan, S.A.; Xie, M.X.; Huang, J.L. Application of cascaded GAN based on CT scan in the diagnosis of aortic dissection. Comput. Methods Programs Biomed. 2022, 226, 107130. [Google Scholar] [CrossRef] [PubMed]
Hu, T.; Oda, M.; Hayashi, Y.; Lu, Z.Y.; Kumamaru, K.K.; Akashi, T.; Aoki, S.; Mori, K. Aorta-aware GAN for non-contrast to artery contrasted CT translation and its application to abdominal aortic aneurysm detection. Int. J. Comput. Assist. Radiol. Surg. 2022, 17, 97–105. [Google Scholar] [CrossRef] [PubMed]
Xue, Y.; Dewey, B.E.; Zuo, L.R.; Han, S.; Carass, A.; Duan, P.Y.; Remedios, S.W.; Pham, D.L.; Saidha, S.; Calabresi, P.A.; et al. Bi-directional Synthesis of Pre- and Post-contrast MRI via Guided Feature Disentanglement. In Proceedings of the 7th International Workshop on Simulation and Synthesis in Medical Imaging (SASHIMI), Singapore, 18 September 2022; pp. 55–65. [Google Scholar]
Chen, C.; Raymond, C.; Speier, W.; Jin, X.Y.; Cloughesy, T.F.; Enzmann, D.; Ellingson, B.M.; Arnold, C.W. Synthesizing MR Image Contrast Enhancement Using 3D High-Resolution ConvNets. IEEE Trans. Biomed. Eng. 2023, 70, 401–412. [Google Scholar] [CrossRef] [PubMed]
Khan, R.A.; Luo, Y.G.; Wu, F.X. Multi-level GAN based enhanced CT scans for liver cancer diagnosis. Biomed. Signal Process. Control 2023, 81, 104450. [Google Scholar] [CrossRef]
Killekar, A.; Kwiecinski, J.; Kruk, M.; Kepka, C.; Shanbhag, A.; Dey, D.; Slomka, P. Pseudo-contrast cardiac CT angiography derived from non-contrast CT using conditional generative adversarial networks. In Proceedings of the Conference on Medical Imaging—Image Processing, San Diego, CA, USA, 19–24 February 2023. [Google Scholar]
Kim, E.; Cho, H.H.; Kwon, J.; Oh, Y.T.; Ko, E.S.; Park, H. Tumor-Attentive Segmentation-Guided GAN for Synthesizing Breast Contrast-Enhanced MRI Without Contrast Agents. IEEE J. Transl. Eng. Health Med. 2023, 11, 32–43. [Google Scholar] [CrossRef]
Ristea, N.-C.; Miron, A.-I.; Savencu, O.; Georgescu, M.-I.; Verga, N.; Khan, F.S.; Ionescu, R.T. CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation. Neurocomputing 2023, 538, 126211. [Google Scholar] [CrossRef]
Welland, S.H.; Melendez-Corres, G.; Teng, P.Y.; Coy, H.; Li, A.; Wahi-Anwar, M.W.; Raman, S.; Brown, M.S. Using a GAN for CT contrast enhancement to improve CNN kidney segmentation accuracy. In Proceedings of the Conference on Medical Imaging—Image Processing, San Diego, CA, USA, 19–24 February 2023. [Google Scholar]
Zhang, H.X.; Zhang, M.H.; Gu, Y.; Yang, G.Z. Deep anatomy learning for lung airway and artery-vein modeling with contrast-enhanced CT synthesis. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 1287–1294. [Google Scholar] [CrossRef]
Zhong, L.M.; Huang, P.Y.; Shu, H.; Li, Y.; Zhang, Y.W.; Feng, Q.J.; Wu, Y.K.; Yang, W. United multi-task learning for abdominal contrast-enhanced CT synthesis through joint deformable registration. Comput. Methods Programs Biomed. 2023, 231, 107391. [Google Scholar] [CrossRef]
Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of generative adversarial networks (gans): An updated review. Arch. Comput. Methods Eng. 2021, 28, 525–552. [Google Scholar] [CrossRef]
Croitoru, F.-A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion models in vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10850–10869. [Google Scholar] [CrossRef]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.-H. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
Sun, L.; Wang, J.; Huang, Y.; Ding, X.; Greenspan, H.; Paisley, J. An adversarial learning approach to medical image synthesis for lesion detection. IEEE J. Biomed. Health Inform. 2020, 24, 2303–2314. [Google Scholar] [CrossRef] [PubMed]
Kaur, A.; Dong, G.; Basu, A. GradXcepUNet: Explainable AI based medical image segmentation. In Proceedings of the International Conference on Smart Multimedia, Marseille, France, 25–27 August 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 174–188. [Google Scholar]
Azadmanesh, M.; Ghahfarokhi, B.S.; Talouki, M.A.; Eliasi, H. On the local convergence of GANs with differential Privacy: Gradient clipping and noise perturbation. Expert Syst. Appl. 2023, 224, 120006. [Google Scholar] [CrossRef]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv 2018, arXiv:1802.05957. [Google Scholar] [CrossRef]
Kim, C.; Park, S.; Hwang, H.J. Local stability of wasserstein GANs with abstract gradient penalty. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4527–4537. [Google Scholar] [CrossRef] [PubMed]
Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar]
Beaulieu-Jones, B.K.; Wu, Z.S.; Williams, C.; Lee, R.; Bhavnani, S.P.; Byrd, J.B.; Greene, C.S. Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. Outcomes 2019, 12, e005122. [Google Scholar] [CrossRef] [PubMed]
Xie, L.; Lin, K.; Wang, S.; Wang, F.; Zhou, J. Differentially private generative adversarial network. arXiv 2018, arXiv:1802.06739. [Google Scholar] [CrossRef]
Chen, Q.; Xiang, C.; Xue, M.; Li, B.; Borisov, N.; Kaarfar, D.; Zhu, H. Differentially private data generative models. arXiv 2018, arXiv:1812.02274. [Google Scholar] [CrossRef]
Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
Zech, J.R.; Badgeley, M.A.; Liu, M.; Costa, A.B.; Titano, J.J.; Oermann, E.K. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018, 15, e1002683. [Google Scholar] [CrossRef]
Finlayson, S.G.; Chung, H.W.; Kohane, I.S.; Beam, A.L. Adversarial attacks against medical deep learning systems. arXiv 2018, arXiv:1804.05296. [Google Scholar]

Figure 1. Creation and translation in medical image generation.

Figure 2. Section organization.

Figure 3. Literature search and analysis. (a) The PRISMA flowchart for this review. (b) The distribution of articles by year of publication. (c) The distribution of articles by task and modality. A: Creation for classification; B: Creation for segmentation; C: Creation for other tasks; D: Translate to MRI; E: Translate to CT; F: Translate to X-ray; G: Translate to PET; H: Translate to Ultrasound; I: Translation with Non- and Contrast Enhanced image.

Figure 4. Architecture of VAE.

Figure 5. Architecture of GANs. (a) Vanilla GAN; (b) CGAN; (c) ACGAN; (d) InfoGAN; (e) CycleGAN; (f) Pix2Pix; (g) UNIT.

Figure 6. Architecture of the diffusion model.

Figure 7. Creation for downstream tasks: (a) creation for classification; (b) creation for segmentation; (c) creation for other tasks.

Figure 8. Chord diagram of medical image translation cross-modalities of MRI, CT, CBCT, X-ray, PET, and US.

Figure 9. Chord diagram of MRI image translation across contrast mechanisms. (a) Single-to-single translation; (b) multi-to-multi translation.

Table 1. Quantitative evaluation metrics of medical image creation.

Symbol	Name	Formula
IS	Inception Score	$I S (P_{g}) = e^{E_{x ~ P_{g}} [K L (p_{M} (y \| x) \| \| p_{M} (y))]}$
MS	Mode Score	$M S (P_{g}) = e^{E_{x ~ P_{g}} [K L (p_{M} (y \| x) \| \| p_{M} (y))] - K L (p_{M} (y) \| \| p_{M} (y^{*}))}$
MMD	Kernel Maximum Mean Discrepancy	$M M D (P_{r}, P_{g}) = E_{\begin{matrix} x_{r}, x_{r}^{'} \sim P_{r}, \\ x_{g}, x_{g}^{'} \sim P_{g} \end{matrix}} [k (x_{r}, x_{r}^{'}) - 2 k (x_{r}, x_{g}) + k (x_{g}, x_{g}^{'})]$
WD	Wasserstein distance	$W D (P_{r}, P_{g}) = {i n f}_{γ \in Γ (P_{r}, P_{g})} E_{(x_{r}, x_{g}) ~ γ} [d (x_{r}, x_{g})]$
FID	Fréchet Inception Distance	$F I D (P_{r}, P_{g}) = ‖μ_{r} - μ_{g}‖ + T r (C_{r} + C_{g} - 2 {(C_{r} C_{g})}^{1 / 2})$

Table 2. The studies on medical image creation for the classification task.

Paper	Model	Anatomy	Modality	Dimension
[58]	DCGAN, ACGAN	Liver	CT	2D
[59]	DCGAN, WGAN, BEGAN	Thyroid	OCT	2D
[60]	ACGAN	Limb	X-ray	2D
[30]	ICVAE	Spine, brain	Ultrasound, MRI	2D
[61]	DCGAN	Chest	X-ray	2D
[62]	-	Lung	CT	3D
[63]	PGGAN	Chest	X-ray	2D
[64]	MTT-GAN	Chest	X-ray	2D
[65]	CT-SGAN	Chest	CT	3D
[66]	COViT-GAN	Chest	CT	2D
[67]	Two-stage GAN	Liver	Ultrasound	2D
[68]	TripleGAN	Breast	Ultrasound	2D
[69]	InfoGAN	Lung	CT	2D
[70]	GAN	Chest	X-ray	2D
[71]	LSN	Brain	CT	2D
[72]	StyleGAN2	Chest	X-ray	2D
[73]	DCGAN, cGAN	Prostate	MRI	2D
[74]	TMP-GAN	Breast, pancreatic	X-ray, CT	2D
[75]	CycleGAN	Chest	X-ray	2D
[76]	PLGAN	Ophthalmology, brain, lung	OCT, MRI, CT, X-ray	2D
[77]	CUT	Chest	X-ray	2D
[78]	HBGM	Coronary	X-ray	2D
[79]	DC-GAN	Chest	X-ray	2D
[80]	MI-GAN	Chest	CT	2D
[81]	StyleGAN2	Chest	X-ray	2D
[40]	DDPM	Chest, heart, pelvis, abdomen	MRI, CT, X-ray	2D
[82]	StynMedGAN	Chest, brain	MRI, CT, X-ray	2D

Table 3. The studies on medical image creation for the segmentation task.

Paper	Model	Anatomy	Modality	Dimension
[86]	Two-stage GAN	Intravascular	Ultrasound	2D
[87]	SpeckleGAN	Intravascular	Ultrasound	2D
[88]	CycleGAN	Gastrocnemius medialis muscle	Ultrasound	2D
[89]	Private	-	-	2D
[90]	Pix2Pix	Bone surface	Ultrasound	2D
[31]	VAE	-	Ultrasound	2D
[91]	Pix2Pix	Prostate	MRI	2D
[86]	CG-SAMR	Brain	MRI	3D
[92]	GAN, VAE	Thyroid	Ultrasound	2D
[93]	WFT-GAN	-	-	2D
[94]	Dense GAN	Lung	CT	2D
[95]	VAE, GAN	Cardiac	MRI	3D
[96]	LEGAN	Retinal	Digital retinal images	2D
[97]	spGAN	Lung, hip joint, ovary	Ultrasound	2D
[98]	cGAN	Cardiac	MRI	2D
[99]	SR-TTT	Liver	CT	2D
[100]	Pix2Pix, CycleGAN, SPADE	Brain	MRI	2D
[101]	SPADE	Rectal	MRI	3D
[102]	Three-dimensional GAN	Lung	CT	3D
[103]	-	Lung	X-ray	2D
[104]	-	Brain	MRI	3D
[105]	DCGAN	Retinal, coronary, knee	X-ray, MRI	2D
[106]	Pix2Pix	Lung	CT	2D
[107]	-	Cheat	X-ray	2D
[108]	Pix2Pix	Lung	CT	2D
[109]	MinimalGAN	Retinal fundus	Nature	2D

Table 4. The studies on medical image creation for other tasks.

Paper	Model	Anatomy	Modality	Dimension	Task
[111]	DCGAN, WGAN	Brain	MRI	2D	None
[112]	MCGAN	Lung nodules	CT	3D	Object detection
[113]	SMIG	Brain glioblastoma	MRI	3D	Tumors growth prediction
[114]	InfoGAN	Fetal head	Ultrasound	2D	None
[115]	Private	Prostate	MRI	2D	Prostate Cancer Localization
[116]	DCGAN-PSO	Lung	X-ray	2D	None
[117]	U-Net	Lung nodules	X-ray	2D	Object detection
[118]	3D-StyleGAN	Brain	MRI	3D	None
[119]	CGAN, DCGAN, f-GAN, WGAN, CycleGAN	Lung	X-ray, CT	2D	None
[120]	DCGAN	Brian	MRI	2D	None
[121]	DeepAnat	Brian	MRI	3D	Neuroscientific applications

Table 5. Quantitative evaluation metrics of medical image translation.

Symbol	Name	Formula
MAE	Mean Absolute Error	$\frac{1}{m} \sum_{i = 1}^{m} \|y_{i} - x_{i}\|$
MSE	Mean Squared Error	$\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - x_{i})}^{2}$
RMSE	Root Mean Squared Error	$\sqrt{M S E}$
PSNR	Peak Signal-to-Noise Ratio	$20 \cdot \log_{10} (\frac{M A X}{R M S E})$
SSIM	Structural Similarity Index	$\frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}$

Table 6. The studies on the multi-contrast MRI translation.

Paper	Dataset	Dimension	Modality Translation	Model
Paper	Dataset	Dimension	Modality Translation	Name	Paired Image
[126]	BraTS 2015	3D	T1→FLAIR	Three-dimensional cGAN	Yes
[38]	MIDAS, IXI, BraTS	2D	T1↔T2	pGAN, cGAN	Yes, No
[127]	BraTS 2015, IXI	3D	T1→FLAIR; T1→T2	Ea-GANs	Yes
[128]	BraTS 2018	2D	T1, T2, T1ce, FLAIR (three-to-one)	Auto-GAN	Yes
[125]	ISLES 2015, BraTS 2018	2D	T1, T2, DWI; T1, T1ce, T2, FLAIR (generating the missing contrast(s))	MM-GAN	Yes
[129]	BraTS 2018	2D	T1↔T2	-	Yes
[130]	Private	2D	T1↔T2	CACGAN	No
[131]	BraTS 2018	2D	T2→(FLAIR, T1, T1ce)	TC-MGAN	Yes
[132]	BraTS 2015, SISS 2015	3D	T1→FLAIR; T1→T2	SA-GAN	Yes
[133]	BraTS 2018	2D	T1↔T2; T1↔FLAIR; T2↔FLAIR; T1, T2, FLAIR (Two-to-One)	Hi-Net	Yes
[134]	BraTS 2017, TCGA	2D	(T1ce, FLAIR)→T2	-	Yes
[135]	BraTS 2018	2D	T1, T2, T1ce, FLAIR (generating the missing contrast(s))	-	Yes
[136]	BraTS 2015	2D	T1→FLAIR; T1→T2	EP-IMF-GAN	Yes
[137]	HCP 500	2D	B0→DWI; B0, T2→DWI; B0, T1, T2→DWI	-	Yes
[138]	Private, IXI	2.5D	T1→T2	-	Yes
[139]	IXI	2D	T2↔PD	DiCyc	No
[140]	BraTS 2015	2D	T1↔T2	-	No
[141]	IXI, BraTS 2019	2D	Unified model	Hyper-GAN	Yes
[142]	IXI, ISLES	2D	T1↔T2; T1↔PD; T2↔PD; T1↔FLAIR; T2↔FLAIR; T1, T2, PD (two-to-one); T1, T2, FLAIR (two-to-one)	mustGAN	Yes
[143]	BraTS 2015	2D	T1, T1ce→FLAIR; T1, T2→FLAIR; T1, T1ce→T2	LR-cGAN	Yes
[144]	BraTS 2018	3D	T1, T2, T1ce, FLAIR (generating the missing contrast(s))	-	Yes
[145]	ADNI	2D	T1→CBV	DeepContrast	Yes
[146]	Private	2D	PD↔T2	-	No
[147]	IXI, BraTS	2D	T1, T2, PD (two-to-one); T1, T2, FLAIR (two-to-one) PD↔T2; FLAIR↔T2	ResViT	Yes
[94]	IXI	2D	T2→PD	TR-GAN	Yes
[148]	BraTS2019	3D	T1, T2, T1ce, FLAIR (generating the missing contrast(s))	CoCa-GAN	Yes
[149]	-	2D	T2↔DWI	CICVAE	No
[150]	BraTS2019	2D	T1→T2	NEDNet	Yes
[151]	BraTS, Brain, SPLP	2D	T1↔T2	Bi-MGAN	No
[152]	IXI, vivo brain dataset	2D	T1, T2, PD (two-to-one); T1, T2, T1ce, FLAIR (three-to-one)	ProvoGAN	Yes
[153]	BraTS 2015, IXI	2D	T1↔T2; T1→FLAIR; T2→FLAIR; T2↔PD	D2FE-GAN	Yes
[154]	dHCP, BCP	3D	T1↔T2	PTNet3D	Yes
[155]	BraTS 2018	2D	T1↔FLAIR; T1↔T2	DualMMP-GAN	No
[156]	BraTS 2020, ISLES 2015, CBMFM	2D	T1, T2, FLAIR, T1ce (three-to-one); T1, T2, FLAIR, DWI (three-to-one)	AE-GAN	Yes
[157]	Private	2D	T1→DWI; T2→DWI; T1, T2→DWI; T1→FLAIR; T2→FLAIR; T1, T2→FLAIR	GAN	Yes
[158]	IXI, BraTS 2021	2D	T1, T2, PD; T1, T1ce, T2, PD (generating the missing contrast(s))	MMT	Yes
[42]	BraTS, IXI	2D	T1↔T2; T1↔PD; T2↔PD; T1↔FLAIR; T2↔FLAIR	SynDiff	No
[159]	BraTS 2018, IXI	2D	PD, MRA, T2 (two-to-one)	LSGAN	No
[160]	BraTS 2018, IXI	2D	PD, MRA, T2 (two-to-one)	-	Yes
[161]	Private	2D	T1, T2, ADC, T1ce, FLAIR→CBV	-	Yes
[162]	MRM NeAt Dataset; Private	2D	T1↔T2	MouseGAN	No

Table 7. The studies on image translation to MRI from other modalities.

Paper	Origin Modality	Anatomy	Dataset	Dimension	Model
Paper	Origin Modality	Anatomy	Dataset	Dimension	Name	Paired Image
[163]	CT	Lung	NSCLC	2D	CycleGAN	No
[164]	CT	Brain	Private	2D	-	Yes
[165]	CT	Pelvis	Private	3D	CycleGAN	No
[166]	CT	Abdomen	Private	2D	Pix2Pix	Yes
[167]	CT	Brain	ADNI	3D	-	Yes
[168]	CT	Brain, abdomen	Private	2D	BPGAN	Yes
[169]	CT	Liver	CHAOS	2D	TarGAN	Yes
[170]	CT	Pelvis	Private	3D	CycleGAN	No
[171]	CT	Head and neck	Private	2D	-	Yes
[93]	CT	Abdomen	CHAOS	2D	WFT-GAN	No
[146]	CT	Brain	Private	2D	-	No
[172]	CT	Prostate	Private	2D	PxCGAN	Yes
[173]	CT	Brain	From [174]	2D	DC-CycleGAN	No
[175]	CBCT	Prostate	Private	3D	CycleGAN	Yes
[176]	CBCT	Brain	Private	3D	TGAN	Yes
[177]	PET	Brain	Private	2D	-	Yes
[178]	PET	Brain	ADNI	3D	E-GAN	Yes
[179]	Ultrasound	Brain	INTERGROWTH-21st, CRL	2D	-	No

Table 8. The studies on image translation to CT from other modalities.

Paper	Origin Modality	Anatomy	Dataset	Dimension	Model
Paper	Origin Modality	Anatomy	Dataset	Dimension	Name	Paired Image
[187]	CBCT	Nasopharyngeal carcinoma	Private	2D	U-Net	Yes
[188]	CBCT	Head and neck	Private	2D	CycleGAN	No
[189]	CBCT	masseter	Private	2D	CycleGAN-based	No
[180]	CBCT, MRI	Head and neck	Private	2D	U-Net	Yes
[190]	CBCT	Head and neck	Private	2D	U-Net	Yes
[191]	CBCT	Head and neck	Private	2D	USsCTU-net	No
[192]	CBCT	Head and neck, pelvic	Private	2D	Cycle-RCDC-GAN	Yes
[176]	CBCT, MRI	Brain	Private	3D	TGAN	Yes
[193]	CBCT	Head and neck, pelvic	Private	2D	DCC-GAN	No
[194]	CBCT	Brain	Private	2D	CGAN	Yes
[195]	CBCT	Abdomen	Private	2D	CycleGAN	No
[186]	CBCT	Lung	Private	2D	MURD	No
[196]	NAC-PET	Whole body	Private	3D	CycleGAN	No
[197]	NAC-PET	Whole body	Private	2D	Wasserstein GAN	Yes
[198]	PET	Whole body	Private	2D	U-Net	Yes
[199]	PET	Animal	Private	2D	-	Yes
[146]	PET, MRI	Brain, whole body	Private	2D	-	No
[200]	X-Ray	Lung	LIDC-IDRI	2D-3D	X2CT-GAN	Yes
[201]	X-Ray	Lung	PadChest	2D-3D	X2CT-GAN	Yes
[202]	MRI	Brain	Private	2D	U-Net	Yes
[203]	MRI	Pelvis	Private	2D	Pix2Pix	Yes
[204]	MRI	Brain, pelvis	ADNI, Private	3D	-	Yes
[205]	MRI	Brain, prostate	Private	3D	DECNN	Yes
[206]	MRI	Whole body	Private	2D	CycleGAN	No
[207]	MRI	Prostate	Private	2D	U-Net, GAN	Yes
[208]	MRI	Pelvis	Private	3D	Dense-Cycle-GAN	No
[209]	MRI	Liver	Private	3D	CycleGAN	No
[210]	MRI	Brain	[211]	3D	hGAN	No
[212]	MRI	Pelvis	Private	2D	Pix2PixHD	Yes
[128]	MRI	Brain	ADNI	2D	Auto-GAN	Yes
[167]	MRI	Brain	ADNI	3D	-	Yes
[213]	MRI	Brain	Private	2D	Attention-GAN	Yes
[214]	MRI	Pelvis	Private	2D	-	Yes
[215]	MRI	Liver	Private	2D	U-Net	Yes
[216]	MRI	Brain	Private	2D	U-Net	Yes
[217]	MRI	Lumbar spine	SpineWeb	3D	CycleGAN	No
[168]	MRI	Brain, abdomen	Private	2D	BPGAN	Yes
[184]	MRI	Brain, abdomen	Private, CHAOS	2D	SC-CycleGAN	No
[218]	MRI	Brain	Han et al. [112] and the JUH dataset	2D	uagGAN	Yes
[219]	MRI	Lumbar Spine	Private	2D	CycleGAN	No
[169]	MRI	Liver	CHAOS	2D	TarGAN	Yes
[220]	MRI	Pseudo	Private	2D	U-Net, GAN	Yes
[221]	MRI	Abdomen	Private	2D	SA-GAN	Yes
[222]	MRI	Pelvis, thorax, abdomen	Private	2.5D	CycleGAN	No
[223]	MRI	Head and neck	Private	3D	Label-GAN	Yes
[224]	MRI	Head and neck	Private	2D	Multi-Cycle GAN	No
[225]	MRI	Abdomen	Private	2D	-	Yes
[171]	MRI	Head and neck	Private	2D	-	Yes
[139]	MRI	Brain	IXI, MA³RS	2D	DiCyc	Yes
[226]	MRI	Brain	Private	2D	-	No
[93]	MRI	Abdomen	CHAOS	2D	WFT-GAN	No
[227]	MRI	Brian	Private	3D	-	Yes
[228]	MRI	Head and neck	Private	2D	-	Yes
[147]	MRI	Pelvis	Private	2D	ResViT	Yes
[229]	MRI	Brain	RIRE	2D	GCG U-Net	Yes
[230]	MRI	Head	RIRE	2D	U-Net_E-SGA, cWGAN_E-SGA	Yes
[231]	MRI	Head	Private	3D	ResUNet	Yes
[232]	MRI	Abdomen	Private	2D	U-Net, cGAN	Yes
[233]	MRI	Brain	Private	2D	CycleGAN	Yes
[234]	MRI	Brain	Private	3D	cGAN	Yes
[235]	MRI	Pelvis	Gold Atlas	2D	Diffusion	Yes
[236]	MRI	Brain	GKRS	2D	Pix2Pix	Yes
[237]	MRI	Brain	Atlas project	2D	Pix2Pix	Yes
[238]	MRI	Pelvis	VMAT	3D	MD-CycleGAN	No
[156]	MRI	Brain	CBMFM	2D	AE-GAN	Yes
[239]	MRI	Brain	Private	2D	CycleGAN	No
[240]	MRI	Brain	Private	2D	AMSF-Net	Yes
[241]	MRI	Abdomen	CHAOS	2D	SSA-Net	No
[42]	MRI	Pelvis	Private	2D	SynDiff	No
[242]	MRI	Abdomen	Private	2D	Pix2Pix	Yes
[37]	MRI	Brain	ABCs	2.5D	DU-CycleGAN	No
[243]	MRI	Brain	From [173]	2D	DC-cycleGAN	No
[244]	MRI	Brain	MedPix, Private	2D	MSE-Fusion	Yes
[245]	MRI	Pelvis	From [246]	2D	RTCGAN	Yes
[247]	MRI	Abdomen	Private	3D	QACL	Yes
[248]	MRI	Head and neck	Private	2D	CMSG-Net	Yes

Table 9. The studies on medical image translation to X-Ray from other modalities.

Paper	Origin Modality	Anatomy	Dataset	Dimension	Model
Paper	Origin Modality	Anatomy	Dataset	Dimension	Name	Paired Image
[249]	DRR	Chest	JSRT, NIH	2D	TD-GAN	No
[250]	CBCT	Head	CQ500	2D	Pix2Pix	Yes
[251]	CT	Chest	LIDC-IDRI, TBX11K	2D	XraySyn	No
[252]	CT	Chest	CheXpert	2D	CT2CXR	No
[253]	X-ray	Chest	LIDC-IDRI	2D	DL-GIPS	Yes

Table 10. The studies on image translation to PET from other modalities.

Paper	Origin Modality	Anatomy	Dataset	Dimension	Model
Paper	Origin Modality	Anatomy	Dataset	Dimension	Name	Paired Image
[257]	MRI	Brain	ADNI	3D	-	Yes
[258]	MRI	Brain	ADNI	2D	CL-GAN	Yes
[254]	MRI	Brain	ADNI	3D	BMGAN	Yes
[259]	MRI	Brain	ADNI	3D	BPGAN	Yes
[256]	CT	Liver	Private	2D	FCN-GAN	Yes
[146]	CT	Whole body	Private	2D	-	No

Table 11. The studies on the translation between the NC and CE images.

Paper	Modality	Translation	Anatomy	Dataset	Dimension	Model
Paper	Modality	Translation	Anatomy	Dataset	Dimension	Name	Paired Image
[265]	MRI	NC to CE	Brain	IXI	2D	Steerable GAN	Yes
[266]	MRI	NC to CE	Cardiac	CycleGAN	2D	MS-CMRSeg	No
[267]	MRI	NC to CE	Liver	Private	2D	Tripartite-GAN	Yes
[268]	MRI	NC to CE	Brain	Private	3D	V-net	Yes
[269]	MRI	NC to CE	Ankylosing spondylitis	Private	2D	AMCGAN	Yes
[270]	MRI	NC to CE	Liver	Private	2D	Pix-GRL	Yes
[271]	CT	NC to CE	Aorta	Private	2D	Cascade GAN	Yes
[272]	CT	NC to CE	Aorta	Private	2.5D	aGAN	Yes
[273]	MRI	NC to CE	Brain	Private	3D	BICEPS	Yes
[274]	MRI	NC to CE	Brain	Private	3D	-	Yes
[275]	CT	NC to CE	Liver	Ircadb, Sliver07, LiTS	2D	-	Yes
[276]	CT	NC to CE	Cardiac	Private	2D	Pix2Pix	Yes
[277]	MRI	NC to CE	Breast	Private	2D	TSGAN	Yes
[39]	CT	Mutual synthesis	Lung	Private	3D	Pix2Pix	Yes
[278]	CT	Mutual synthesis	Lung	Coltea-Lung-CT-100W	2D	CyTran	No
[279]	CT	NC to CE	Kidney	Private	2D	CycleGAN	No
[280]	CT	NC to CE	Lung	LIDC-IDRI, EXACT09, CARVE14, PARSE	3D	CGAN	No
[281]	CT	NC to CE	Abdomen	CHAOS, Private	3D	UMTL	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pang, H.; Zhang, T.; Wu, Y.; Chen, S.; Qian, W.; Yao, Y.; Ye, C.; Monkam, P.; Qi, S. Generative Models for Medical Image Creation and Translation: A Scoping Review. Sensors 2026, 26, 862. https://doi.org/10.3390/s26030862

AMA Style

Pang H, Zhang T, Wu Y, Chen S, Qian W, Yao Y, Ye C, Monkam P, Qi S. Generative Models for Medical Image Creation and Translation: A Scoping Review. Sensors. 2026; 26(3):862. https://doi.org/10.3390/s26030862

Chicago/Turabian Style

Pang, Haowen, Tiande Zhang, Yanan Wu, Shannan Chen, Wei Qian, Yudong Yao, Chuyang Ye, Patrice Monkam, and Shouliang Qi. 2026. "Generative Models for Medical Image Creation and Translation: A Scoping Review" Sensors 26, no. 3: 862. https://doi.org/10.3390/s26030862

APA Style

Pang, H., Zhang, T., Wu, Y., Chen, S., Qian, W., Yao, Y., Ye, C., Monkam, P., & Qi, S. (2026). Generative Models for Medical Image Creation and Translation: A Scoping Review. Sensors, 26(3), 862. https://doi.org/10.3390/s26030862

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Models for Medical Image Creation and Translation: A Scoping Review

Abstract

1. Background

2. Related Works

3. Methodology

4. Generative Models

4.1. Variational Autoencoder

4.2. Generative Adversarial Network

4.3. Diffusion Model

4.4. Hybrid Generative Models

4.5. Training Stability and Computational Requirements

5. Creation

5.1. Metrics of Medical Image Creation

5.2. Classification

5.3. Segmentation

5.4. Other Tasks

6. Translation

6.1. Metrics of Medical Image Translation

6.2. Generating MRI

6.2.1. Multi-Contrast MRI Translation

6.2.2. Generating MRI from Other Modalities

6.3. Generating CT

6.4. Generating X-Ray Image

6.5. Generating PET Image

6.6. Generating Ultrasound Image

6.7. Non-Contrast and Contrast-Enhanced Image

7. Discussion

7.1. Implementation Suggestion

7.1.1. Unified Model or Task-Specific Model?

7.1.2. GAN or Diffusion Model?

7.1.3. Translation with Prior Knowledge

7.1.4. Paired Versus Unpaired Image Translation

7.1.5. Other Possible Optimization Strategies for Training

7.2. Challenges in Medical Image Creation and Translation

7.2.1. Privacy Preservation and Data Protection

7.2.2. Safe Deployment and Clinical Reliability

7.2.3. Open Problems and Research Gaps

7.3. Limitations and Future Research

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI