Enhanced Magnetic Resonance Image Synthesis with Contrast-Aware Generative Adversarial Networks

A magnetic resonance imaging (MRI) exam typically consists of the acquisition of multiple MR pulse sequences, which are required for a reliable diagnosis. With the rise of generative deep learning models, approaches for the synthesis of MR images are developed to either synthesize additional MR contrasts, generate synthetic data, or augment existing data for AI training. While current generative approaches allow only the synthesis of specific sets of MR contrasts, we developed a method to generate synthetic MR images with adjustable image contrast. Therefore, we trained a generative adversarial network (GAN) with a separate auxiliary classifier (AC) network to generate synthetic MR knee images conditioned on various acquisition parameters (repetition time, echo time, and image orientation). The AC determined the repetition time with a mean absolute error (MAE) of 239.6 ms, the echo time with an MAE of 1.6 ms, and the image orientation with an accuracy of 100%. Therefore, it can properly condition the generator network during training. Moreover, in a visual Turing test, two experts mislabeled 40.5% of real and synthetic MR images, demonstrating that the image quality of the generated synthetic and real MR images is comparable. This work can support radiologists and technologists during the parameterization of MR sequences by previewing the yielded MR contrast, can serve as a valuable tool for radiology training, and can be used for customized data generation to support AI training.


Introduction
In magnetic resonance imaging (MRI), multiple contrasts are usually acquired within a single exam that are required to make a reliable diagnosis. Each contrast is based on the parameterization of an MR sequence through multiple acquisition (pulse sequence) parameters. In general, the acquisition parameters for an MR sequence affect image contrast, image resolution, signal-to-noise ratio, and scan time. Important acquisition parameters that affect the image contrast are the repetition time (TR) and echo time (TE). The parameterization of a sequence is subject to clinical guidelines, the MR system (vendor, model, software version, and field strength), the clinical protocol (i.e., the set of parameterized sequences used for the exam), internal guidelines (e.g., slot time), and radiologists' preferences. Clinical guidelines (e.g., ACR-SPR-SSR Practice Parameters [1]) offer recommendations but "are not inflexible rules or requirements of practice and are not intended, nor should they be used, to establish a legal standard of care" [1]. Thus, the guidelines do not specify exact acquisition parameter settings as they will depend on the field strength and desired contrast weighting.
Consequently, sequence parameterizations differ significantly across different radiology sites [2,3]. However, due to the various degrees of freedom in protocol configuration and sequence parameterization, this can also be true for a single radiology site. These differences additionally increase the complexity of MRI protocoling (i.e., selecting a set of parameterized sequences for an MRI exam), image interpretation and diagnostics, and the development of AI (artificial intelligence) applications for MRI. Since most AI applications are trained and evaluated on a limited set of MR sequences with fixed or narrowly defined acquisition parameter values [4], the applicability of AI-based applications to sequences with different parameterizations is not guaranteed. Consequently, retraining with a new set of MR images may be necessary. However, since the abundance of clinical images for AI training is limited, this is not always possible.
To mitigate the problem of the availability of MR images for varying contrast settings, we developed an approach for MR image synthesis that can be parameterized with acquisition parameters. Our method can generate customized training data for AI applications and serve as an additional data augmentation tool through contrast augmentation. Moreover, it can be used as a tool for MRI training for technologists and radiologists and can support protocoling and sequence parameterization by visualizing the yielded contrast of a parameterized sequence.
We developed a generative adversarial network (GAN) that can generate synthetic MR images of the knee conditioned on the repetition and echo time. The model can additionally synthesize MR images conditioned on the image orientation. In contrast to existing AI-based approaches for the synthesis of MR images, our model is not conditioned on different sets of contrasts (e.g., T1w, T2w, or PDw) but on the acquisition parameters that determine the contrast weighting. This allows the generation of MR images with fine-tuned contrast, adjusted to the application's specific needs.
The rest of our paper is structured as follows: first, we shortly review the current literature of generative adversarial networks with the focus on MR image contrast synthesis (Section 2). Then, we describe our approach, including the data used to train the GAN in Section 3. We evaluate our approach qualitatively and quantitatively (Section 4), discuss our results (Section 5), and then conclude our work (Section 6).

Background
Generative adversarial networks [5] learn to generate images through the adversarial training of a generator network G, which is trained to produce realistic samples and tries to fool the discriminator network D that learns to distinguish between real and synthetic samples. Commonly, the generator takes as input a latent or noise vector z, randomly sampled from a normal distribution. The original GAN formulation is adapted in several ways, e.g., introducing conditional GANs [6,7], adapting the input for imageto-image [8], and text-to-image [9] translations, enhancing the network architecture [10], the loss formulation [11,12], and training procedure [13]. GANs are successfully applied to various domains, and a detailed overview of GANs and their variants are given in [14].
GANs have also been successfully applied to the field of medical imaging (e.g., [15,16]), with applications that can mainly be divided into seven categories: synthesis, segmentation, reconstruction, detection, denoising, registration, and classification, whereas the majority of publications address synthesis applications [17].
In the following, we review publications focused on synthesizing MR images with noise-to-image GANs (Table 1). A detailed general overview of GANs in the field of medical imaging is given in [18].
Most publications in the field of MR image synthesis with noise-to-image GANs address the generation of different contrasts (e.g., T1w, T2w, and PDw) in a categorical manner, without the incorporation of the acquisition parameter values. Their main target application is enhanced data augmentation for deep learning applications (see Table 1).
In ref. [19], a Laplacian GAN [20] was trained on 2D image slices from sagittal brain MR 3D-T1w slices to enhance the augmentation of biomedical datasets.
A semi-coupled GAN was used as a data generation method for deep learning-based detection of incomplete left-ventricle coverage [21]. Additionally, a Wasserstein-GAN was used to synthesize brain MR images of different contrasts (T1w, T1c, T2w, and FLAIR) with a resolution of 128 × 128 pixels [22] and a progressive growing GAN (PGGAN) [13] to both synthesize brain MR images and place brain metastases on synthetic MR images (256 × 256 pixels) for enhanced data augmentation for AI training [23,24]. Multi-modal MR images were synthesized with a PGGAN in [25]. Moreover, GANs were used for enhanced image denoising [26] for brain MR images and the generation of additional training data for brain tissue segmentation networks [27].
Performance benefits through additional GAN-based data augmentation are reported for different medical deep learning applications [28,29]. However, the improvement gain depends on the amount of real data seen during training, with the greatest improvements observed for training procedures with a limited amount of real data [27].
Current GAN-based MR image and contrast synthesis methods only generate single categorical contrasts with no scale of similarity between the different contrasts [19,23,24,27,30]. Consequently, the GAN does not learn to disentangle the underlying anatomy from the contrast specifications, limiting the GAN's capability for MR image synthesis. We solve this by conditioning the GAN on the acquisition parameters that determine the MR image contrast, therefore disentangling anatomy and contrast synthesis.  [10]. WGAN: Wasserstein GAN [12]. PGGAN: progressive growing GAN. CPGGAN: conditional progressive growing GAN. MUNIT: multimodal unsupervised image-to-image translation framework [30]. SimGAN: semantic image manipulation using generative adversarial networks [33].

Progressive Growing WGAN-GP
In order to train a network on the synthesis of MR images, we used a progressive growing GAN proposed by [13] that started with learning low-resolution images and progressively increases the resolution by stacking additional layers to the network. Progressive growing allowed the network to learn large-scale features of the data distribution first and refined structures when adding additional layers and progressing in training, which stabilized GAN training significantly. Our network architecture was trained with 800 k images before doubling the resolution (following the training procedure proposed in [13]). A new layer was faded in during training with additional 800 k images until we reached the final resolution of 256 × 256. We used a Wasserstein GAN with gradient penalty loss (WGAN-GP) [12] following the proposed training procedure in [13]: where E[·] denotes the expected value, P r is the real data distribution, P g is the data distribution generated through x = G(z, c), and c denotes the target labels (conditions). A gradient penalty term, weighted by λ gp (λ gp = 10), was added for the random samplex ∼ Px, with ∇ŷ denoting the gradient operator toward the generated samples.
Px described the distribution of points uniformly sampled along straight lines from pairs of points from P r and P g [11]. The utilization of a gradient penalty term resulted in a more stable training process than weight clipping of the discriminator as proposed in [12].

Separate Auxiliary Classifier
We deviated from the conventional auxiliary classifier GAN (ACGAN) network architecture [7], which uses a classification layer in the discriminator to learn the conditions by employing a separate auxiliary classifier (AC) that is only trained on the conditions. This allowed us to use data augmentation on the training data for the AC. In general, this enhances the classification performance of a trained network by avoiding overfitting [34,35], and only a well-trained auxiliary classifier can provide good guidance for the training of the generator. In contrast to training the AC, heavy data augmentation diminishes the GAN performance of generating sharp, realistic images. Consequently, no data augmentation was used for training the discriminator that learns to score the realness of a given image in a WGAN-GP.
We used the DenseNet-121 architecture [36] for the AC and jointly trained the network to determine TR, TE, and imaging orientation of the patient (IOP) from MR images only. The categorical cross-entropy loss (CCE) was used for the image orientation and the mean squared error (MSE) loss for TR and TE, which were both scaled to values between 0 and 1. The AC was trained to minimize the following loss: where c = C(x) is the output of the auxiliary classifier for an image x with labels c. We heuristically set λ IOP = 1 and λ TE = λ TR = 10.

Controllable GAN
To avoid overfitting the generator to the conditioning, we incorporated the training procedure of ControlGAN [37]. Overfitting can occur if the auxiliary classifier reaches imperfect classification performance and consequently an issue for many machine learningbased classification and regression tasks. We used an adaptive loss weighting to avoid overfitting the generator on the conditions and balanced the GAN training to produce realistic images. An adaptive weight loss parameter γ t for time step t for each condition c of the GAN was introduced: where r is a learning rate parameter for γ t , γ 0 is set to zero, and τ is a maximum constraint for γ t . The parameter τ was set to 100 and the learning rate r to 0.01 for our training procedure.Ê balances the ratio between the classification loss on the real training images and the generated images and was set to one. L c denotes the condition-specific loss (CCE for the image orientation or MSE for TR and TE). Thus, the GAN was trained to minimize the conditioning loss: Minimizing the sum of the WGAN-GP loss (Equation (1)) and the auxiliary classifier loss with adaptive weights (Equation (4)) led to the generation of realistic MR images with the intended MR contrast. Thus, the overall GAN loss function was given as the sum of L GAN−GP and L AC−GAN .

Data
For training and evaluation of our GAN, we used the fastMRI dataset [38]. It contained DICOM data from 10,000 clinical knee MRI studies, each comprising a set of multiple pulse sequence parameterizations. We applied several data filters based on the DICOM header information to obtain a dataset with a comparable image impression, a dense and homogenous acquisition parameter distribution (TR, TE), and a high variance in anatomy. We wanted the image impression and contrast within our training set to depend on the acquisition parameters TR and TE. Therefore, other parameters affecting the image impression, such as field strength and manufacturer, were removed by selecting the most common parameter value within the fastMRI dataset (1.5T field strength and scanners from Siemens Healthcare, Erlangen, Germany). Several attributes were missing in the DICOM header, but we could deduce the manufacturer for DICOM images with the missing manufacturer by adopting the manufacturer of the used receiver coil. The MR images from our filtered dataset were acquired on five different Siemens Healthcare scanners (MAGNETOM Aera, MAGNETOM Avanto, MAGNETOM Espree, MAGNETOM Sonata, MAGNETOM Symphony). To create a dense data distribution for the conditioning parameters TR and TE, we only took image series with TR values between 1800 ms and 5000 ms, set the upper limit of TE to 50 ms, and discarded fat saturated images. Then, we took the six central slices from each volume to discard peripheral slices. The final dataset contained MR images from 5387 different studies and 8535 image series. The joint distribution of TR and TE values in the training dataset is shown in Figure 1. The dataset of 51,205 images was split into a training, validation, and test dataset randomly by study IDs, with 2000 images each used for validation and testing. The remaining images were used for training. Each slice was normalized to intensity values between −1 and 1 and resized to the smallest common resolution within the dataset (256 × 256 pixels) using bilinear interpolation to obtain identical image resolution. Thus, the complete dataset could be used for training without discarding images due to insufficient resolution and image upsampling was avoided, which reduced image quality.

Training Details
We trained the discriminator and generator networks with a balanced number of weight updates (Figure 2). No conditioning was applied during the progressive growing of the networks until the final resolution was reached. The AC loss was incorporated with adaptive loss weights as defined in Equation (4). The AC was pre-trained on the same dataset as the GAN for the final resolution for 200 epochs (batch size of 64, Adam optimizer [39] with learning rate 0.001, β 1 = 0, β 2 = 0.99). We trained the GAN (batch size of 16, Adam optimizer with learning rate 0.001, β 1 = 0.9, β 2 = 0.99) until it observed ten million images, at which point no further improvement for the conditioning loss could be observed. Training and inference phase of our GAN. The generator was trained to synthesize MR images for a given latent vector z and a set of acquisition parameters c 1 , guided by two networks, the discriminator, and the auxiliary classifier. After training, a synthetic MR image can be generated for a given latent vector z with any acquisition parameters c 2 (prediction phase). The shown images were generated for a random latent vector with two different sets of acquisition parameters (represented by c 1 , c 2 ), corresponding to coronal imaging orientation, and a TR of 3000 ms, TE of 15 ms, and 45 ms for c 1 and c 2 .

Results
The evaluation of our model consisted of a qualitative and quantitative component: we evaluated how indifferentiable the synthetic MR images were from real MR images and how well the intended contrast settings (through the acquisition parameters TR and TE) were reflected within the synthetic MR images.

Qualitative Evaluation
While different measures have been proposed for the assessment of GANs without reference images (i.e., without corresponding ground truth image pairs), a human observer study remains the gold standard for image quality evaluation [18]. The so-called visual Turing tests are commonly used in computer vision and medical imaging to evaluate how indistinguishable generated images are from real ones [19,23,24,40,41]. However, when asking medical experts to label randomly displayed images as either real or synthetic in a visual Turing test, the experiment may be subject to bias toward labeling more images as synthetic [23,24]. Thus, the reported (accuracy) metrics are also biased and less conclusive. Therefore, we adapted the commonly used, biased experiment by using a grid of images (3 × 2 images) with an equal number of real and synthetic images displayed. The evaluator or expert labels each image as real or synthetic (by either clicking the left or the right mouse button) and must mark the same number of images as real and synthetic within each grid. Additionally, for the images marked as synthetic, the expert can explain why the image appears synthetic. This experiment setup removes potential labeling bias by enforcing the true label distribution for the predicted labels.
We asked two experts to label 150 images (75 synthetic, 75 real images; displayed in random, but the same order for both experts) as either synthetic or real with the experiment mentioned above. The experts had more than 15 years' experience in MRI as an MRI technologist (expert 1) and five years' experience as a radiologist (expert 2). No prior information (e.g., examples of synthetic MR images) was provided before the experiment, and only the MR images and the TR and TE values were shown. The TR and TE values and the imaging orientation of the displayed synthetic images matched the values from the real images. No feedback was provided during the experiment on whether the labeling was correct. The confusion matrix of the visual Turing test is presented in Table 2. The experts reached an accuracy score of 71% (expert 1) and 48% (expert 2) on the identification of real and synthetic MR images. The experts were unable to distinguish a significant share of real and synthetic images correctly, which shows the generator's ability to synthesize MR images indistinguishable from real images with reference to anatomy and contrast. The low inter-reader agreement (IRA) for true positives (29/75 images) and true negatives (24/75 images) shows that only the minority of cases can clearly be identified as either real or synthetic.
According to the experts, image quality impediments of synthetic MR images were mainly attributed to overly smooth tissue (muscles, fat tissue, and bones) compared to fibrous muscle tissue, granular texture in fat tissue, and fine structures in bones of real MR knee images. Additionally, due to the downsampling of several of the real images to a resolution of 256 × 256 pixels using bilinear interpolation to obtain identical image resolution, the image quality of the real images was described as inferior in certain cases, making it hard for the experts to classify. An example of acquisition parameter interpolation is shown in Figure 3, and additional examples of varying anatomy demonstrating the variability of the generated samples are shown in Figure 4. The effect of TE on the tissue contrast can be seen in signal changes of the muscle tissue ( Figure 3). Varying TR results in signal differences in the fluid-cartilage contrast [42] and mainly affects contrast on T1weighted images [43], which are not available within the dataset (see Section 5. Discussion). Therefore, the signal changes through varying TR are less prominent.

Quantitative Evaluation
The performance of a GAN to generate conditional samples depends on proper guidance during training by a well-trained auxiliary classifier. Therefore, we evaluated different conditional GAN architectures with reference to the classification and regression performance to determine the acquisition parameters on the test dataset (Table 3). Although the mean squared error was used as regression loss for TR and TE, we reported the mean absolute error (MAE), as the MAE is a more descriptive metric for these targets.   With the original ACGAN architecture, the discriminator is trained to differentiate between real and fake samples and the conditions [6]. However, training the same model on the GAN loss and the conditioning loss leads to training instabilities and, therefore, poor performance in generating realistic images and conditioning the image synthesis. Introducing a separate AC model with the DenseNet-121 architecture as the discriminator, which is only trained on the conditions, significantly improves performance, leading to an optimized auxiliary classifier performance. Decoupling the development of the auxiliary classifier from the actual generator and discriminator with a separate network breaks down the complexity of the overall architecture. This facilitates additional data augmentation during training and hyperparameter tuning on the AC, significantly improving the GAN's conditioning performance (Table 3).
The adaptive weighting scheme for the conditioning loss based on ControlGAN yields different benefits. It removes the necessity to manually tune the loss weights for the various conditions. Furthermore, it guides the training process of the GAN properly by focusing the training on the conditioning terms that still need improvement (TR, TE) and decreasing the loss weight for conditions that were already learned sufficiently (imaging orientation). It also balances the overall conditioning learning with the GAN loss (WGAN-GP). Therefore, it prevents overfitting the generator on the conditions and produces realistic MR images. The AC's performance on synthetic data (identical size and label distribution as the test set) and the test set are similar (Table 3), which shows that the generator is neither over-nor underfitting on the conditions.

Discussion
Our method generates realistic MR images with high variability in the displayed anatomy that are adjustable in their image contrast through the main contrast acquisition parameter TR and TE. The images are hard to distinguish from real images for medical experts.
Since no method is currently available to retrieve the TR and TE values from an MR image alone, the quantitative evaluation of the correct contrast is difficult. However, the auxiliary classifier performs well to determine the acquisition parameters on unseen test data and can serve as a reliable method to determine the contrast settings. The GAN's capability to adjust to the contrast settings properly (i.e., TR and TE values) mainly depends on the auxiliary classifier's performance. Consequently, the AC's training must be improved with additional training data and hyperparameter tuning in future works to enhance the GAN performance further.
The generator's conditioning on the acquisition parameters was limited to values between 1800 ms to 5000 ms for TR and 12 ms to 50 ms for TE due to training data availability. All sequences within the training data were denoted as PD-weighted (by the DICOM series description). A wider range of TR and TE values within additional training data is anticipated to enable T1-and T2-weighted MR image synthesis. However, this dataset has decisive advantages over multiple publicly available datasets (e.g., [31]) despite these limitations. The dataset comprises DICOM files that includes MR acquisition parameters and offers a clinically realistic distribution of the acquisition parameters values, as shown in Figure 1.
Another issue to address in future work is the latent vector's existing feature entanglement, representing the anatomical features, with the conditioning term representing image contrast (and imaging orientation). When adapting the conditions, the image contrast adjusts properly, but the anatomy also slightly changes (see Figure 3). Feature (dis)-entanglement of generative adversarial networks is a known problem and subject of current research [44].

Applications and Future Work
GANs have proven to be a useful data generation and augmentation tool in the medical data domain with the additional benefit of anonymizing patient identifiable information [45]. Therefore, our method can be used as an advanced data augmentation technique for AI training with MR images that can generate MR images with adaptable image contrast. It enables training data generation tailored to an MRI application's contrast requirements and is also anticipated to increase AI applications' robustness against contrast changes. While AI training with synthetic data quality is not necessarily anticipated to yield the same performance as training on real data, synthetic data has several advantages. Generally, it avoids data privacy issues and can therefore be more easily shared than a proprietary dataset with patient-identifiable information. Moreover, it allows customizable training data generation in terms of dataset size and parameter distribution. Medical image datasets are often imbalanced in terms of acquisition parameter distribution or pathologies. This proof-of-concept is able to solve the acquisition imbalance and is extendable to addi-tional conditioning (e.g., pathologies). Therefore, it may enhance the use of GAN-based, synthetic data for AI training in MRI.
Furthermore, our work can be used as a training tool for radiologists and technologists to visualize the influence of the acquisition parameters on the image contrast.
Additionally, it can serve as a tool to support protocol configuration by allowing to preview the yielded contrast for a parameterized sequence. Although medical guidelines exist, MR protocol configuration and sequence parameterization are still a matter of radiologists' and technologists' preferences and vary significantly (see Figure 1). A tool to visually support sequence parameterization is anticipated to enhance and simplify the protocol configuration workflow.
However, there are also limitations to point out. The AC's performance mainly limits the generator's ability to accurately produce an MR image with an arbitrary contrast. The use of more MR images with a wider acquisition parameter range for training of the AC is anticipated to boost the AC's performance further and, therefore, of the generator. Moreover, the incorporation of additional acquisition parameters (e.g., flip angle, inversion time, and scan options) and sequence techniques (e.g., gradient echo) can enhance the model's capability for protocoling support, training data generation, and as a training tool for radiology education.
This proof-of-concept has demonstrated the capabilities of GANs to generate customizable, synthetic MR images that are difficult to distinguish from real data. However, the performed visual Turing test must be extended to a higher number of experts in order to estimate the true "indistinguishability" of the synthetic MR images in future work. Moreover, the extension of the proposed approach to 3D is anticipated to enhance the synthesis performance as 3D image stacks offer additional information that can be utilized by a generative network to produce accurate synthetic samples. The extension to other body regions is only limited by the availability of a suitable dataset with a wide range of acquisition parameter settings as well as varying anatomy.
Additionally, more advanced GAN architectures and training procedures have been proposed recently [46] and their application to MRI will be subject to future work.

Conclusions
We have proposed a generative adversarial model based on the architecture presented in [13] that uses a separate auxiliary classifier and the adaptive conditioning loss from ControlGAN. The network is trained to generate synthetic MR knee images and is conditioned on the MR acquisition parameters TR, TE, and the image orientation. Our approach allows us to generate synthetic MR images that are difficult to distinguish from real images and adapt the MR image contrast based on the input acquisition parameters (TR and TE). Our MR image synthesis approach can support radiologists' and technologists' training and can adapt to AI-based MR applications' specific requirements by providing fine-tuned, custom MR images with the required contrast.