Multi-Contrast MRI Image Synthesis Using Switchable Cycle-Consistent Generative Adversarial Networks

Multi-contrast MRI images use different echo and repetition times to highlight different tissues. However, not all desired image contrasts may be available due to scan-time limitations, suboptimal signal-to-noise ratio, and/or image artifacts. Deep learning approaches have brought revolutionary advances in medical image synthesis, enabling the generation of unacquired image contrasts (e.g., T1-weighted MRI images) from available image contrasts (e.g., T2-weighted images). Particularly, CycleGAN is an advanced technique for image synthesis using unpaired images. However, it requires two separate image generators, demanding more training resources and computations. Recently, a switchable CycleGAN has been proposed to address this limitation and successfully implemented using CT images. However, it remains unclear if switchable CycleGAN can be applied to cross-contrast MRI synthesis. In addition, whether switchable CycleGAN is able to outperform original CycleGAN on cross-contrast MRI image synthesis is still an open question. In this paper, we developed a switchable CycleGAN model for image synthesis between multi-contrast brain MRI images using a large set of publicly accessible pediatric structural brain MRI images. We conducted extensive experiments to compare switchable CycleGAN with original CycleGAN both quantitatively and qualitatively. Experimental results demonstrate that switchable CycleGAN is able to outperform CycleGAN model on pediatric MRI brain image synthesis.


Introduction
Magnetic Resonance Imaging (MRI) has been widely utilized in radiology to noninvasively generate images of normal and abnormal anatomy as well as physiological functions of the body [1]. It is a versatile imaging technique that enables the generation of different tissue contrasts depending on the acquisition parameters. For instance, T1weighted (T1w) MRI increases the signal of fat tissue and decreases the signal of water, while T2-weighted (T2w) MRI increases the signal of water. Taking full consideration of multi-contrast MRI allows for the comprehensive evaluation of scanned organs, potentially improving clinical diagnosis and patient outcomes [2]. However, all the desired contrasts/weightings may not be available due to scan-time limitations, suboptimal signalto-noise ratio, and/or image artifacts. In addition, an unavailable contrast may also lead to an insufficient data issue for developing robust machine learning and deep learning models [3][4][5][6], which, consequently, may result in poor model performance in the clinical CycleGAN. Both models are trained with unpaired images and are evaluated based on both visual assessments and quantitative metrics (i.e., image synthesis quality, robustness on small datasets, and time efficiency).

MRI Data
We used the publicly available Adolescent Brain Cognitive Development (ABCD) Study database [31] for model development and validation. 1517 subjects with both T1w and T2w MRI scans available were included in the study. Prospective motion correction was originally included in the ABCD image protocol for all structural MRI acquisitions. Both T1w and T2w were acquired using three different 3T MRI scanner manufacturers with the following acquisition parameters: Siemens Healthineers (

Overview of Switchable CycleGAN
Suppose that the domain A is composed of T1w brain images, while the images in domain B are T2w brain images. As shown in Figure 1a, a CycleGAN [11] framework for T1w and T2w image synthesis includes two separate generators: one forward generator from T1w images to T2w images (G AB ), and one backward generator from T2w images to T1w images (G BA ). In contrast, the switchable CycleGAN designed a single switchable generator for image synthesis between T1w and T2w MRI images. As shown in Figure 1b, the switchable generator includes two modules: Autoencoder G and AdaIN coder F. The Autoencoder module works as a baseline network to achieve the image "content" synthesis between domain A and domain B, while AdaIN coder adjusts the "style" of images as a switch (e.g., F(0) for synthesis from T1w to T2w, and F(1) for synthesis from T2w to T1w).
The premise of AdaIN is that image representation estimation is possible by modifying the mean and variance of the feature map. To be more specific, AdaIN-based image synthesis was performed by matching the mean and variance of the feature map of the input image to those of the reference target image. Given an input feature map is represented by X = [x 1 · · · x N ] ∈ R N×H×W , where N is the number of channels in the input feature map x n , and x n ∈ R HW×1 refers to the n-th column vector of X, which represents the input feature map of size of H × W at the n-th channel. Suppose the feature map of reference target image is represented by Y = [y 1 · · · y N ] ∈ R N×H×W . After encoding the input images and targeted target images in feature space, an AdaIN layer aligns the mean and variance of x n to match those of y n using the following transformation: where 1 ∈ R HW is the H × W-dimensional vector composed of 1, and µ(x n ), µ(y n ), σ(x n ), and σ(y n ) are the mean and standard deviation, computed across spatial dimensions.  [11] two different generators and .
is the discriminator that differentiates generated images and real T1w images, and is the discriminator that differentiates synthesized T2w ages from real T2w images. ℒ is the cycle-consistency loss, and ℒ is the discriminator loss The schema of switchable CycleGAN with one single generator consists of an image Autoenc followed by AdaIN coder . Discriminators of switchable CycleGAN are the same as CycleG The premise of AdaIN is that image representation estimation is possible by mod ing the mean and variance of the feature map. To be more specific, AdaIN-based im synthesis was performed by matching the mean and variance of the feature map of input image to those of the reference target image. Given an input feature map is re sented by X = x ⋯ x ∈ ℝ × × , where N is the number of channels in the input ture map x , and x ∈ ℝ × refers to the n-th column vector of X, which represents input feature map of size of H × W at the n-th channel. Suppose the feature map of erence target image is represented by Y = ⋯ ∈ ℝ × × . After encoding the in images and targeted target images in feature space, an AdaIN layer aligns the mean variance of to match those of using the following transformation: The AdaIN for switching between domain and domain is similar to [27] can be represented as follows: With Equation (2), AdaIN coder in Figure 1b is defined as: where and are learnable parameters during network training and is a vari that represents the domain. In Equation (2), = 0 when it represents domain , and  [11] with two different generators G BA and G AB . D A is the discriminator that differentiates generated T1w images and real T1w images, and D B is the discriminator that differentiates synthesized T2w images from real T2w images. cyc is the cycle-consistency loss, and d is the discriminator loss. (b) The schema of switchable CycleGAN with one single generator consists of an image Autoencoder G followed by AdaIN coder F. Discriminators of switchable CycleGAN are the same as CycleGAN.
The AdaIN for switching between domain A and domain B is similar to [27] and can be represented as follows: With Equation (2), AdaIN coder in Figure 1b is defined as: where σ B and µ B are learnable parameters during network training and γ is a variable that represents the domain. In Equation (2), γ = 0 when it represents domain A, and γ = 1 when it represents domain B. Then, the synthesis from domain A to domain B can be written as: The synthesis from domain B to domain A can be described as follows: G BA (y) = G 0,1 (y) := G(y; F(1)). As shown in Figure 2, the Autoencoder module (light red color), which is based on the U-net architecture [32], consists of a contracting path and an expansive path. The contracting path consists of the repeated applications of four convolution layers for image learning and down-sampling. The four convolution layers are of kernel size 4, stride size 2, and padding size 1. At each down-sampling step, we doubled the number of feature channels. The AdaIN layers take a mean vector and a variance vector as input. Each of four AdaIN layers is followed by a Leaky Rectified Linear Unit (LeakyReLU) layer.

Autoencoder Module
As shown in Figure 2, the Autoencoder module (light red color), which is based the U-net architecture [32], consists of a contracting path and an expansive path. The co tracting path consists of the repeated applications of four convolution layers for ima learning and down-sampling. The four convolution layers are of kernel size 4, stride s 2, and padding size 1. At each down-sampling step, we doubled the number of featu channels. The AdaIN layers take a mean vector and a variance vector as input. Each four AdaIN layers is followed by a Leaky Rectified Linear Unit (LeakyReLU) layer. Every step in the expansive path consists of an up-sampling of the feature map f lowed by a four convolutional layer that halves the number of feature channels. The fo convolutional layers are for image reconstructing and up-sampling, and of kernel size stride size 2, and padding size 1. These are followed by a concatenation with the cor spondingly cropped feature map from the contracting path, and three 4 × 4 convo tional layers, each connected with an AdaIN layer and a LeakyReLU layer. The cropp is necessary due to the loss of border pixels in every convolution. At the final layer, a 1 convolution is used to map each 64-component feature vector to the channel size one

AdaIN Coder Module
As shown in Figure 2, the AdaIN coder module (light blue color) connects to both encoder and decoder of the Autoencoder module. The AdaIN coder takes a vector 1 × 128 size as input and outputs nine pairs of mean and variance vectors. The Ada coder consists of two fully connected layers, one Rectified Linear Unit (RELU) layer prevent the variance vectors from becoming negative, an AdaIN layer, and a LeakyRe layer. Accordingly, the AdaIN coder is very light-weight. Since the switchable CycleGA employs a single generator, the number of the model parameters is largely reduced.

Discriminator
For the discriminator, PatchGAN [23] structure was utilized to classify whether ov lapping image patches are real or generated ( Figure 3). Such patch-level discriminator chitecture not only has fewer parameters than a full-image discriminator but also c  Every step in the expansive path consists of an up-sampling of the feature map followed by a four convolutional layer that halves the number of feature channels. The four convolutional layers are for image reconstructing and up-sampling, and of kernel size 4, stride size 2, and padding size 1. These are followed by a concatenation with the correspondingly cropped feature map from the contracting path, and three 4 × 4 convolutional layers, each connected with an AdaIN layer and a LeakyReLU layer. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer, a 1 × 1 convolution is used to map each 64-component feature vector to the channel size one.

AdaIN Coder Module
As shown in Figure 2, the AdaIN coder module (light blue color) connects to both the encoder and decoder of the Autoencoder module. The AdaIN coder takes a vector of 1 × 128 size as input and outputs nine pairs of mean and variance vectors. The AdaIN coder consists of two fully connected layers, one Rectified Linear Unit (RELU) layer to prevent the variance vectors from becoming negative, an AdaIN layer, and a LeakyReLU layer. Accordingly, the AdaIN coder is very light-weight. Since the switchable CycleGAN employs a single generator, the number of the model parameters is largely reduced.

Discriminator
For the discriminator, PatchGAN [23] structure was utilized to classify whether overlapping image patches are real or generated ( Figure 3). Such patch-level discriminator architecture not only has fewer parameters than a full-image discriminator but also can work on arbitrarily sized images [33]. The discriminator consists of five convolution layers, in which the first convolution layer uses a stride of 2, while the following four convolution layer use a stride of 1. The first convolution layer is followed by the LeakyReLU layer, and other convolution layers are followed by batch normalization layers and LeakyReLU layers, except for the last convolution layer. The first convolution layer gets an input image with one channel and generates a feature map with 64 channels. After that, each time the feature map passes through the convolution layer, the number of channels is doubled. In the last layer, the output tensor is obtained by reducing the number of channels to size one. layer, and other convolution layers are followed by batch normalization layers and LeakyReLU layers, except for the last convolution layer. The first convolution layer gets an input image with one channel and generates a feature map with 64 channels. After that, each time the feature map passes through the convolution layer, the number of channels is doubled. In the last layer, the output tensor is obtained by reducing the number of channels to size one.

Model Training
The switchable CycleGAN model for T1w and T2w image synthesis was trained in a similar manner to CycleGAN network [11]. We trained the model by solving the following min-max optimization problem [10]: * , * = arg min , max , ℒ ( , , , ).
The total loss objective is: where is the weight parameter of adversarial loss, is the weight parameter of cycle-consistency loss, and is the weight parameter of identity loss. For in Equation (7), we used least-square loss [34] same as CycleGAN. This least-square loss was more stable during training and generated higher quality results [11]. The adversarial loss is represented as follows: ( , , , ) where ‖⋅‖ is the ; , ( ) and , ( ) are defined in Equation (4) and Equation (5).
is the discriminator differentiates generated T1w images and real T1w images, and is the discriminator differentiates synthesized T2w images with real T2w images. The cycle-consistency loss is defined as

Model Training
The switchable CycleGAN model for T1w and T2w image synthesis was trained in a similar manner to CycleGAN network [11]. We trained the model by solving the following min-max optimization problem [10]: The total loss objective is: where λ adv is the weight parameter of adversarial loss, λ cyc is the weight parameter of cycle-consistency loss, and λ id is the weight parameter of identity loss. For adv in Equation (7), we used least-square loss [34] same as CycleGAN. This least-square loss was more stable during training and generated higher quality results [11]. The adversarial loss is represented as follows: where ||·|| 2 is the l 2 norm; G 1,0 (x) and G 0,1 (y) are defined in Equations (4) and (5). D A is the discriminator differentiates generated T1w images and real T1w images, and D B is the discriminator differentiates synthesized T2w images with real T2w images. The cycle-consistency loss is defined as We used identity loss to encourage the mapping when real samples of target domain are provided as the input of the generator. The identity mapping is simulated as follows: The discriminators were trained to minimize the adversarial loss adv (G, F, D A , D B ), while the generator G is trained to maximize it. The generator and discriminators are updated alternatively for adversarial training.

Implementation Details
For training, we iteratively trained a switchable CycleGAN with 200 epochs. All networks were trained using the optimizer ADAM solver [35] with β 1 = 0.5, β 2 = 0.999. The learning rate for the first 100 epochs was 0.0002, and the learning rate linearly decayed to zero over the next 100 epochs. The minibatch size was 1. For the hyperparameters in Equation (7), the loss weights λ cyc , λ id , and λ adv were set to 10, 5, and 1, respectively. The model was trained on NVIDIA GeForce RTX 3080.
All the methods in this study were implemented in Pytorch v1.9.1. The input images were randomly cropped into small patches of size 128 × 128 during the training. They were also randomly flipped both horizontally and vertically for data augmentation and model generalizability. Training images were provided in a randomized unpaired way, making it unlikely that both an T1w image and its registered corresponding T2w image were simultaneously shown to GAN model. We also followed Shrivastava et al.'s strategy [36] and updated the discriminators using a history of generated images rather than the ones produced by the latest generators. An image buffer was implemented to store the 50 previously synthesized images.

Model Evaluation and Statistical Analysis
Multi-contrast T1w and T2w MRI images from a given subject were registered using advanced normalization tools [37]. We extracted 10 slices of brain MRI images from each subject, resulting in a total of 30,340 slices. We randomly selected 1063 subjects (70%) for training, 151 subjects (10%) for validation, and 303 subjects (20%) for testing. We compared switchable CycleGAN with baseline CycleGAN [11], as well as pix2pix GAN models [23]. Tuning hyperparameters in deep neural networks, especially in complicated models such as GANs, can be computationally intensive [38,39]. Thus, it is quite common in deep learning research to perform one-fold cross-validation [40,41] or even directly adopt hyperparameter selection from published work [19,25,41]. We adopted hyperparameters of switchable CycleGAN from a prior study [27]. The epoch numbers (in the range [100, 200]) were selected based on performance of the validation set through on-fold cross-validation. Both methods are compared with the same training and test set data.
We used the structural similarity index (SSIM) [42] and peak signal-to-noise ratio (PSNR), two well-known metrics, to evaluate the quality of synthesized images. The equation for PSNR calculation is as follows: where MAX x is the maximum possible value of image x. The SSIM is calculated by the equation: where µ is mean of the image, σ is variance of the image, and σ xx . is covariance of the images x andx. L is the dynamic range of the pixel intensities, and the two variables are defined by c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 , which are used to stabilize the division. We used k 1 = 0.01, k 2 = 0.03 as in the original work [42].
To compare different image generative models, we conducted nonparametric Wilcoxon signed-rank tests to test the performance difference. A p-value less than 0.05 is consid-ered statistically significant. All statistical analyses were conducted in Python 3.8.5 and SciPy 1.7.3. Table 1 presents PSNR and SSIM data across test images synthesized using CycleGAN and switchable CycleGAN. For T1w to T2w image synthesis, the switchable CycleGAN method was 1.2 dB higher (p-value < 0.001) in PSNR than CycleGAN. For the image synthesis from T2w to T1w, switchable CycleGAN was 0.1 dB (p-value < 0.001) higher in PSNR. The PSNR for switchable CycleGAN in two directions was an average of 0.65 dB higher than CycleGAN. As for SSIM T1w to T2w image synthesis, switchable CycleGAN model was 9.5% higher (p-value < 0.001) than CycleGAN, while for the image synthesis from T2w to T1w, switchable CycleGAN was 12.5% higher (p-value < 0.001) than CycleGAN. The SSIM for switchable CycleGAN in two directions was, on an average, 11.0% higher than CycleGAN. Considering two synthesis directions together, pix2pix GAN was 0.0023 higher (p-value < 0.001) in SSIM and 0.002 dB higher (p-value < 0.001) in PSNR than CycleGAN. Switchable CycleGAN outperformed pix2pix GAN, being 0.05 higher in SSIM (p-value < 0.001) and 0.652 dB higher in PSNR (p-value < 0.001). This demonstrated that switchable CycleGAN quantitatively outperformed CycleGAN in image synthesis of T1w and T2w pediatric brain images. This also demonstrated that switchable CycleGAN trained with unpaired data outperformed pix2pix GAN trained with paired data. Since the main hypothesis of this work is to investigate difference between CycleGAN and switchable CycleGAN using unpaired data, henceforth we will only focus on experiments with models using unpaired data.

Visualization
We compare visualization results in Figures 4 and 5. Figure 4 is the T1w to T2w image synthesis, and Figure 5 is the perspective of T2w to T1w image synthesis. For both directions, the results generated by switchable CycleGAN are more consistent with the target images and could remain sophisticated structures and preserve more details of brain tissues than CycleGAN as the red arrows point to. In particular, in the red box of comparison results, we observed that the images generated by CycleGAN have some artifacts and missing details. These results demonstrate that switchable CycleGAN is also superior qualitatively to CycleGAN in synthesizing T1w and T2w images.

Robustness to Small Dataset
Since switchable CycleGAN utilizes a single generator, the number of parameters of the model are reduced, which results in robust training, even for small datasets. We set out to investigate the robustness of two generative models to various training sizes using the ABCD dataset. We varied the number of image samples in the dataset as 300, 3000, and 30,000. We then calculated the SSIM results of CycleGAN and switchable CycleGAN ( Table 2).

Robustness to Small Dataset
Since switchable CycleGAN utilizes a single generator, the number of parame the model are reduced, which results in robust training, even for small datasets. W out to investigate the robustness of two generative models to various training sizes

Robustness to Small Dataset
Since switchable CycleGAN utilizes a single generator, the number of paramete the model are reduced, which results in robust training, even for small datasets. W out to investigate the robustness of two generative models to various training sizes u the ABCD dataset. We varied the number of image samples in the dataset as 300,  CycleGAN suffers from greater loss in SSIM performance as compared to switchable CycleGAN as dataset size decreases. From data size 30,000 to 300, the SSIM of CycleGAN dropped 15.2% for the synthesis from T1w to T2w images and decreased 23.4% for the synthesis from T2w to T1w images; comparatively, SSIM values for switchable CycleGAN demonstrated less dramatic decreases of 8.81% and 12.4%, respectively. This illustrates that switchable CycleGAN is more robust on small datasets than CycleGAN. For the t-test on the SSIM results between CycleGAN and switchable CycleGAN, the p-values are less than 0.001 when the number of image samples in the dataset are 300, 3000, and 30,000. From these analyses, we could see that switchable CycleGAN shows significantly improved performance in generating T1w and T2w MR brain images.

Time Efficiency
We further investigated the training time efficiency of switchable CycleGAN. We timed the training process of 30,000 dataset size on one single NVIDIA GeForce RTX 3080 GPU, and the training epochs in the two methods were both 200 epochs. Table 3 shows that the training time of switchable CycleGAN is 50.3% less than CycleGAN under the same experiment settings, indicating that switchable CycleGAN outperforms CycleGAN in model training efficiency.

Discussion
To develop a deep learning model that performs cross-contrast MRI image synthesis, it is desirable to collect a large dataset of paired training data to train a generative model (e.g., GAN) [12,17]. However, collecting all paired MRI images for different scanners, imaging protocols, and conditions is a very challenging task that requires careful data collection plans. Thus, we are particularly interested in developing image synthesis models that utilize unpaired data. CycleGAN has achieved promising results on a number of image synthesis tasks without paired data [17,[19][20][21][22]. More recently, a novel switchable CycleGAN was developed to reduce the model complexity of CycleGAN so as to improve the model training efficiency, and its effectiveness has been demonstrated using CT data [27,28]. Here, we conducted a comprehensive evaluation of the switchable CycleGAN using a large dataset of T1w and T2w images.
We believe this is the first study to develop a switchable CycleGAN model for multicontrast T1w-T2w structural MRI synthesis. The main innovation of switchable CycleGAN is that it designs an AdaIN coder (Figure 2) outside the autoencoder module ( Figure 2). The benefit of this design is twofold. First, it reduces the number of generators from two to one. Consequently, this decreases the trainable parameters and computational time. Second, it improves image quality and model robustness on smaller datasets due to decreased model complexity. Previously, these benefits have been illustrated with CT data [27,28]. In the current work, our results seem consistent with prior findings. We observed that switchable CycleGAN outperformed the original CycleGAN model with regards to image synthesis quality, robustness on small datasets, and time efficiency.
It is beneficial to design a U-net as the Autoencoder module within the switchable generator ( Figure 2). In this way, MRI image features from the contracting path layers are combined with expansive path layers. The input image features can be easily taken into account by the generator so that the brain structure ("image content") of the real MRI images can be attained by the generated MRI images [43]. This attribute is appealing for our image-to-image task: we expect our model to maintain the same structure of brain tissue. In addition, the skip-connections in U-net can mitigate the gradient vanishing/exploding problem, which often haunts deep learning models [44].
Similar to [27,28], in this paper, we used both identity and cycle-consistency loss. The identity loss, which is equal to the autoencoder loss, plays a role in preserving the structure with target domain images by providing pixel-wise constraints. The cycle-consistency loss also poses a strong pixel-wise constraint in that it forces self-consistency when reverting to the original domain, which prevents unexpected brain structures from being created. The generative models (e.g., GANs) lack these two constraints, and it has been reported that falsified structures were observed [27,28]. Thus, we believe that both identity and cycle-consistency loss have their own contributions during model training.
In our experiments, we observed that switchable CycleGAN outperformed baseline CycleGAN in terms of PSNR, SSIM, robustness on small datasets, and time efficiency. We believe that the image quality improvement is mainly due to the inclusion of AdaIN layers. AdaIN [29] was first proposed to better control image style transfer by adjusting the mean and variance of images [30]. Despite its simplicity, AdaIN has been formally justified by recent theoretical work [45] in which image to image translation by AdaIN implements the optimal transport map between two spatial distributions of image features, which are equipped with the i.i.d. Gaussian distributions. Therefore, AdaIN finds the optimal approximations of transport map from the input image distribution to the reference target image distribution. The model efficiency improvement is mainly attributed to the design of switchable AdaIN-enabled single shared generator. The shared generator enables the common latent representation learning of two contrasts and boosts the crosscontrast correlation learning. Compared to two generators in CycleGAN, the single shared image generator of switchable CycleGAN leads to a tremendous reduction in the trainable network parameters, which accelerates the training process and, in turn, enables handling of overfitting issues with relatively smaller training datasets. Such desirable robustness and reliability make the switchable CycleGAN a more practical solution for multi-contrast T1w and T2w structural MRI synthesis.
It is also noticed that the performance of pix2pix GAN heavily relies on the quality of image registration. Unfortunately, there is typically a lack of perfect medical image registration approaches. Any less-than-perfect registered image pairs may influence the performance of pix2pix GAN. Our proposed unpaired switchable CycleGAN outperformed paired pix2pix GAN. Besides the contributions of AdaIN laters, such performacne improvement also partially attributes to the CycleGAN's cycle-consistent loss, which facilitates learning the mapping between two contrasts without paired data supervision. This mitigates the impact of less-than-perfect registered image pairs.
The multi-contrast data have been registered prior to modeling efforts. We do not expect this registration step to influence training in CycleGAN and switchable CycleGAN as training images were provided in a randomized unpaired way, making it unlikely that both a T1w/T2w image and its registered corresponding T1w/T2w image were simultaneously shown to the GAN model. In addition, images were randomly cropped into small patches of size 128 × 128 and randomly flipped both horizontally and vertically during the training, which partially cancels the efforts of registration. The registration is mostly and mainly for the test set, to make the testing evaluation metrics values more accurate and trustworthy. The same training strategy can be found in [11,19].
Our study has some limitations. First, there is large data heterogeneity in our testbed multi-contrast MRI data. As shown in Section 2, data were collected from the largest pediatric brain study, and their MRI data were acquired using multiple scanners from three different vendors. The scanner bias might be a confounding factor that impacts the quality of generated images. However, we believe this presented a good opportunity to test the generalizability of switchable CycleGAN without using well-planned, wellharmonized training data. Second, although generated MRI images using switchable CycleGAN demonstrated higher SSIM and PSNR values than the original CycleGAN model, much work remains in the area of cross-contrast image synthesis. The highest SSIM value generated by switchable CycleGAN was 0.7468. Further investigations can be conducted to improve the quality of generated MRI images. Third, it is unclear how the model synthesizes brain pathology, if there is any, in the brain MRI images. This is an interesting study that requires a large scale of MRI images with pathological regions. Finally, we only focused on a portion of brain tissues (10 slices of axial brain MRI images in each subject). Further study may be necessary to synthesize the whole volume of the pediatric brain. In the current study, we mainly focused on providing a unified environment to conduct a fair comparison between switchable CycleGAN and original CycleGAN.

Conclusions
In this paper, we conducted pediatric brain image synthesis between T1w and T2w MRI data, which, to our best knowledge, is the first multi-contrast MRI image synthesis study using switchable CycleGAN model. The model performance was evaluated both quantitively and qualitatively. Experimental results demonstrate that switchable CycleGAN outperformed the original CycleGAN and pix2pix GAN models with higher PSNR and SSIM. We further illustrated that switchable CycleGAN was more robust on small datasets than CycleGAN model. Additional time efficiency analysis showed that training time of switchable CycleGAN was 50.3% less than that of CycleGAN.
The proposed work can be extended to generate super-resolution MRI images, as in [30], where AdaIN was used to modify the relative importance of features for the subsequent convolution operation to synthesize higher spatial resolution (e.g., 512 × 512, 1024 × 1024). The proposed work can also be implemented for three-modality learning, as in [27]. As the AdaIN is able to disentangle arbitrary high-level attributes in source and target modalities, the image synthesis between T1w and T2w can be naturally extended into the conversion among T1w, T2w, and diffusion-weighted imaging, as well as other non-standard contrasts. The performance of the proposed switchable CycleGAN may be further enhanced by incorporating transformer blocks [46,47] as transformers are proven to be robust in natural language processing and computer vision domains.