Deep CT to MR Synthesis Using Paired and Unpaired Data

Magnetic resonance (MR) imaging plays a highly important role in radiotherapy treatment planning for the segmentation of tumor volumes and organs. However, the use of MR is limited, owing to its high cost and the increased use of metal implants for patients. This study is aimed towards patients who are contraindicated owing to claustrophobia and cardiac pacemakers, and many scenarios in which only computed tomography (CT) images are available, such as emergencies, situations lacking an MR scanner, and situations in which the cost of obtaining an MR scan is prohibitive. From medical practice, our approach can be adopted as a screening method by radiologists to observe abnormal anatomical lesions in certain diseases that are difficult to diagnose by CT. The proposed approach can estimate an MR image based on a CT image using paired and unpaired training data. In contrast to existing synthetic methods for medical imaging, which depend on sparse pairwise-aligned data or plentiful unpaired data, the proposed approach alleviates the rigid registration of paired training, and overcomes the context-misalignment problem of unpaired training. A generative adversarial network was trained to transform two-dimensional (2D) brain CT image slices into 2D brain MR image slices, combining the adversarial, dual cycle-consistent, and voxel-wise losses. Qualitative and quantitative comparisons against independent paired and unpaired training methods demonstrated the superiority of our approach.


I. INTRODUCTION
CT-based radiotherapy is currently used in radiotherapy planning and its effect is quite good.Recently, radiotherapy devices using magnetic resonance (MR) imaging are being developed, since MR imaging is much better than computed tomography (CT) scan in the contrast of soft tissue.In particular, the use of MR-based radiotherapy is increasing in brain tumors, and MR imaging will play a very important role in radiotherapy planning in the near future.However, MR imaging usually costs more than a CT scan, and the time required for a complete MR scan takes about 20 to 30 minutes.Conversely, a CT scan is usually completed within 5 minutes.In addition, CT scans can also differentiate soft tissue, especially with an intravenous contrast agent, and has higher imaging resolution, and less motion artifact due to its high imaging speed, which are its advantages compared with MR imaging.Furthermore, the use of MR-based radiotherapy has been limited in situations where the use of metal implants such as cardiac pacemakers and artificial joints is increasing in aging society.Much of the concern about CT scans is the harm of radiation.However, there is no risk to patients, even for a patient with lung tuberculosis who undergoes several X-rays in one year.The real risk is to professionals-technicians and radiologists.Of course, this is a controversial topic among experts.In this paper, we propose a synthetic approach to produce synthesized MR images from brain CT images.To the best of our knowledge, this is the first study that attempts to translate a CT image to an MR image.
The major contributions of this paper can be summarized as follows: • The proposed approach uses paired and unpaired data to overcome the contextmisalignment issue of unpaired training, and to alleviate the registration and blurry results of paired training.
• The paper introduces MR-GAN framework by combining adversarial loss, dual cycleconsistent loss, and voxel-wise loss for training paired and unpaired data together.
• The proposed approach can be easily extended to other data synthesis (MR-CT and CT-PET synthesis) to benefit the medical image community.
Recently, advances in deep learning and machine learning in medical computer-aided diagnosis (CAD) 1,2 , have allowed systems to provide information on potential abnormalities in medical images.Many methods have synthesized a CT image from the available MR image for MR-only radiotherapy treatment planning 3 .The MR-based synthetic CT generation method 4 used deep convolutional neural networks (CNN) with paired data, which was rigidly aligned by the minimization of voxel-wise differences between CT and MR images.
However, minimizing the voxel-wise loss between the synthesized image and the reference image during training may lead to blurry generated outputs.In order to obtain clear results, Nie et al. 5 proposed a method that combined the voxel-wise loss with an adversarial loss in a generative adversarial network (GAN) 6 .Concurrent work 7 proposed a similar idea to synthesize positron emission tomography (PET) images from CT images using multiple channel information of the pix2pix framework by Isola et al. 8 .Ben-Cohen et al. 9 combined fully convolutional network (FCN) 10 and the pix2pix 8 model to export initial results and blend the two outputs to generate a synthesized PET image from a CT image.
Although the combination of the voxel-wise loss with adversarial loss addresses the problem of blurry generated synthesis, the voxel-wise loss is dependent on the availability of large numbers of aligned CT and MR images.Obtaining rigidly aligned data can be difficult and expensive.However, most medical institutions have considerable unpaired data that were scanned for different purposes and different radiotherapy treatments.Using unpaired data would increase the amount of training data exponentially, and alleviate many of the constraints of current deep learning-based synthetic systems (Fig. 1).Unlike the paired data-based methods in 4,5,7,9 , Wolterink et al. 11 used a CycleGAN model 12 , which is an image-to-image translation with unpaired images used to synthesize CT images from MR images.In an unpaired GAN paradigm, we want the synthesized image to not only look real, but also to be paired up with an input image in a meaningful way.Therefore, cycleconsistency loss is enforced to translate the synthesized image back to the original image domain, and minimize the difference between the input and the reconstructed image as a regularization.Because of the large amount of unpaired data, the synthesized images are more realistic than the results from the paired training methods.However, compared to the voxel-wise loss of the paired data, the cycle-consistent loss still has certain limitations in correctly translating the contextual information of soft tissues and blood vessels.

II.A. Data acquisition
Our dataset consisted of the brain CT and MR images of 202 patients who were scanned for radiotherapy treatment planning for brain tumors.Among these patients, 98 patients had only CT images, and 84 patients had only MR images.These data belonged to the unpaired data.For the remaining 20 patients, CT and MR images were acquired during radiation treatment.CT images were acquired helically on a GE Revolution CT scanner (GE Healthcare, Chicago, Illinois, United States) at 120 kVp and 450 mA.T2 3D MR (repetition time, 4320 ms; echo time, 95 ms; flip angle 150 • ) images were obtained with a Siemens 3.0T Trio TIM MR scanner (Siemens, Erlangen, Germany).To generate paired sets of CT and MR images, CT and MR images of the same patient were aligned and registered using affine transformation based on mutual information.CT and MR images were resampled to the same voxel size (1.00 × 1.00 × 1.00 mm 3 ).Before the registration, the skull area in the CT images was removed by masking all voxels above a manually selected threshold.Skullstripped MR brain image were also registered.In this study, AFNI's 3dAlleniate function was used for the regression process 13 .The affine transformation parameters obtained were used to register resampled CT and MR images with the skull.To maximize information inside the brain area, CT images were windowed with a window length of 80 Hounsfield units (HU) and a window center of 40 HU.After registration (Fig. 2), CT and MR images were well-aligned spatially and temporally.

II.B. MR-GAN
The proposed approach has a structure similar to CycleGAN 12 , which contains a forward and a backward cycle.However, our model has a dual cycle-consistent term for paired and unpaired training data.The dual cycle-consistent term includes four cycles: forward unpaired-data, backward unpaired-data, forward paired-data, and backward paired-data cycles (Fig. 3).
The forward unpaired-data cycle contains three independent networks that each have a different goal.The network Syn MR attempts to translate a CT image I CT to a realistic MR image, so that the output cannot be distinguished from "real" MR images by the adversarially trained discriminator Dis MR , which is trained to do as well as possible at discriminating the synthetic "fakes."In addition, to solve the well-known problem of mode collapse, the network Syn CT is trained to translate Syn MR (I CT ) back to the original CT domain.To improve training stability, the backward unpaired-data cycle is also enforced in, translating an MR image to a CT image, and it works with a logic opposite to the forward unpaired-data cycle.Unlike the unpaired-data cycle, the discriminators in the paired-data cycles do not just discriminate between real and synthesized images; they also observe a pair of CT and MR images to differentiate between real and synthesized pairs.In addition, the voxel-wise loss between the synthesized and the reference image is included in the paireddata cycles.The synthetic networks Syn MR and Syn CT in paired-data cycles work exactly the same as in the unpaired-data cycles.

II.C. Objective
Both networks in GAN were trained simultaneously with discriminators Dis MR and Dis CT estimating the probability that a sample came from real data rather than the synthesis networks, while the synthesis networks Syn MR and Syn CT were trained to translate realistic synthetic data that could not be distinguished from real data by the discriminators.We applied adversarial losses 6 to the synthesis network Syn MR : I CT → I MR and its discriminator Dis MR , and express the objective as: where Syn MR tries to translate an I CT image to a Syn MR (I CT ) image that looks similar to an image from the MR image domain.For the first and the second term in Eq. ( 1), the discriminator Dis MR aims to distinguish between synthesized Syn MR (I CT ) and the real MR image I MR for unpaired data.For the paired data, the discriminator Dis MR also tries to discriminate between the real and synthesized pairs that provide I CT with the synthesized MR image as the third and fourth term in the Eq. ( 1).In the backward unpaired-data cycle, a CT image is instead synthesized from an input MR by the network Syn CT , Syn MR reconstructs the MR from the synthesized CT image, and Dis CT is trained to distinguish between real and synthesized CT images.The forward paired-data and the backward paired-data cycle are the same as the above forward unpaired-data and the backward unpaired-data cycle.The difference is that Dis MR and Dis CT do not just discriminate between real and synthesized images, they learn to classify between real and synthesized pairs.In addition, the voxel-wise loss between the synthesized image and the reference image is included in the paired-data cycles.
To stabilize the training procedure, the negative log-likelihood objective in unpaired data was replaced by a least squares loss 14 in our work.Hence, the discriminator Dis MR aims to apply the label 1 for real MR images, and the label 0 for synthesized MR images.However, we found that keeping the negative log-likelihood objective in the paired data generated higher quality results.Eq. ( 1) then becomes: The dual cycle-consistent loss is enforced to further reduce the space of possible mapping Previous approaches 15 have found it beneficial to combine the adversarial loss with a more traditional loss, such as L1 distance.For the paired data I CT , I MR , the synthesis network Syn MR is tasked to not only generate realistic MR images, but also to be near the reference I MR of the input I CT .Though we don't need a synthesis network Syn CT as a final product, adding the same constraint to the Syn CT enables a higher quality of synthesized MR images.
The L1 loss term for the Syn MR and Syn CT are: The overall objective is: where λ and γ control the relative importance of adversarial loss, dual cycle-consistent loss, and voxel-wise loss.We aim to solve the Eq. ( 6): The MR-GAN procedure is described in Algorithm 1.
Algorithm 1 MR-GAN, proposed algorithm.All experiments in the paper used the default values m = 1, n inter = 1.
Require: α, the learning rate.m, the batch size.n inter , the number of iterations of the unpaired/paired data.
1: for number of training iterations do 2: for n iter steps do  Update the discriminator, Dis MR , by ascending its stochastic gradient: Update the generator, Syn MR , by descending its stochastic gradient: CT 1 .

7:
Update the discriminator, Dis CT , by ascending its stochastic gradient: Update the generator, Syn CT , by descending its stochastic gradient: MR 1 .

9:
end for 10: for n iter steps do 11: Sample ∼ P data (I CT , I MR ) a batch from the paired data. 12: Update the discriminator, Dis MR , by ascending its stochastic gradient: Update the generator, Syn MR , by descending its stochastic gradient: 14: Update the discriminator, Dis CT , by ascending its stochastic gradient: Update the generator, Syn CT , by descending its stochastic gradient: 16: end for 17: end for 18: return result

II.D. Implementation
For the architecture of synthesis networks Syn MR and Syn CT , we utilized the archiecture from Johnson et al. 16 , which was a 2D fully-convolutional network with one convolutional layer, followed by two strided convolutional layers, nine residual blocks 17 , two fractionallystrided convolutional layers, and one last convolutional layer.Instance normalization 18 and ReLU followed each convolution except at the last convolutional layer.The synthesis network takes a 256 × 256 input and generates an output image of the same size.
For the discriminators Dis MR and Dis CT , we adapted PatchGANs 8 , which tries to classify each N × N patch in an image as real or fake.This way, the discriminators could better focus on high-frequency information in local image patches.Networks Dis MR and Dis CT used the same architecture, which had one convolution as an extra head for different input data, four strided convolutions as a shared network, and two convolutions as an extra tail for different tasks.Except for the first and last convolution, each convolutional layer was followed by instance normalization 18 and leaky ReLu 19 (Fig. 4).
To optimize our networks, we used minibatch SGD and applied the Adam optimizer 20 with a batch size of 1.The learning rate started at 2e −4 for the first 1e 5 iterations, and decayed linearly to zero over the next 2e 5 iterations.For all experiments, we set λ = 10 and γ = 100 in Eq. ( 5) empirically.At inference time, we ran the synthesis network Syn MR only to give a CT image.

III.A. Data preprocessing
Among the data of 202 patients, all of the unpaired data were used as training data.The paired data were separated into a training set with the data of 10 patients, and a separate test set containing CT images, and corresponding reference MR images from 10 patients.
Each CT or MR volume involved more than 35 2D axial image slices.These were resampled to 256 × 256 in 256-grayscale and uniformly distributed by HU for CT and MR data.
For training, we augmented the training data with random online transforms: • Flip: Batch data were horizontally flipped with 0.5 probability.
The paired CT and MR images were augmented using the same factor.However, in the unpaired data, CT and MR images were augmented independently.The proposed approach training took about 72 hours for 3e 5 iterations using a single GeForce GTX 1080Ti GPU.
At inference time, the system required 35 ms to synthesize a single-slice CT image to MR image.

III.B. Evaluation metrics
Real and synthesized MR images were compared using the mean absolute error (MAE) where i is the index of the 2D axial image slice in aligned voxels, and N is the number of slices in the reference MR images.MAE measures the average distance between each pixel of the synthesized and the reference MR image.In addition, the synthesized MR images were evaluated using the peak-signal-to-noise-ratio (PSNR) as proposed in 5,7,11 : where MAX = 255.PSNR measures the ratio between the maximum possible intensity value and the mean square error (MSE) of the synthesized and reference MR images.

III.C. Analysis of MR synthesis using paired and unpaired data
We first compared synthesized MR images with reference MR images that had been carefully registered to become paired data with CT images.For brevity, we refer to our method as MR-GAN.Fig. 5 shows four examples of an input CT image, synthesized MR image obtained by MR-GAN, reference MR image, and absolute difference maps between the synthesized and reference MR images.The MR-GAN learned to differentiate between different anatomical structures with similar pixel intensity in CT images, such as bones, gyri, and soft brain tissues.The largest differences are in the area of bony structures, and the smallest differences are found in the soft brain tissues.This may be partly due to the misalignment between the CT and reference MR images, and because the CT image provides more detail about bony structures to complement the shortcoming of the synthesized MR, which is focused on soft brain tissues.
Table I shows a quantitative evaluation using MAE and PSNR to compare different methods in the test set.We compare the proposed method with independent training using paired and unpaired data.To train the paired data system, a synthesis network with the same architecture Syn MR and a discriminator network with the same architecture Dis MR are trained using a combination of adversarial loss and voxel-wise loss as in the pix2pix framework 8 .To train the unpaired data system 11 , the cycle-consistent structure of the CycleGAN 12 model is used, which is the same as our approach for the forward and backward unpaired-data cycle, shown in Fig. 3. To ensure a fair comparison, we implemented all the baselines using the same architecture and implementation details as our method.and unpaired data, the quality of our results closely approximates the reference MR images, and for some details our results are much clearer than the reference MR images.
We present comparison results for several discriminator models.As mentioned in Fig.

IV. DISCUSSION
This paper has shown that a synthetic system can be trained using paired and unpaired data to synthesize an MR images from a CT image.Unlike other methods, the proposed approach utilizes the adversarial loss from a discriminator network, dual cycle-consistent loss using paired and unpaired training data, and voxel-wise loss based on paired data to synthesize realistically-looking MR images.Quantitative evaluation results in Table I show that the average correspondence between synthesized and reference MR images in our approach is much better than in other methods; synthesized images are closer to the reference, and achieve the lowest MAE of 19.36 ± 2.73 and the highest PSNR of 65.35 ± 0.86.Slight misalignments between CT images and reference MR images may have a large effect on quantitative evaluation.Although a quantitative measurement may be the gold standard for assessing the performance of a method, we found that numerical differences in the quantitative evaluation do not indicate the qualitative difference correctly.In future work, we will evaluate the accuracy of synthesized MR images based on perceptual studies with medical experts.
A synthetic system using a CycleGAN model 12 and trained with unpaired data generated realistic results.However, the results had poor anatomical definitions compared with corresponding CT images, as exemplified in Fig. 6.We found that even though it was trained with limited paired data, the pix2pix model 8 outperformed the CycleGAN model on unpaired data in our experiments.The limitation of paired training is blurry output due to the voxel-wise loss.Qualitative analysis showed that MR images obtained by the MR-GAN look more realistic and contain less blurring than other methods.This could be due to the dual cycle-consistent and voxel-wise loss for paired data.
The experimental results have implications for accurate CT-based radiotherapy treatment for patients who are contraindicated to undergo an MR scan because of cardiac pacemakers or metal implants, and patients who live in areas with poor medical services.Our synthetic system can be trained using any kind of data: paired, unpaired, or both.Using paired and unpaired data together obtain higher quality synthesized images than using one kind of data alone.

V. CONCLUSION
We propose a system for synthesizing MR images from CT images.Our approach uses paired and unpaired data to solve the context-misalignment problem of unpaired training, and alleviate the rigid registration task and blurred results of paired training.Unpaired data is plentifully available, and together with limited paired data, could be used for effective synthesis in many cases.Our results on the test set demonstrate that MR-GAN was much closer to the reference MR images when compared with other methods.The preliminary results indicated that the synthetic system is able to efficiently translate structures within complicated 2D brain slices, such as soft brain vessels, gyri, and bones.In future work, we will investigate the 3D information of anatomical structures as presented in CT and MR brain sequences to further improve performance based on paired and unpaired data.We suggest that our approach can potentially increase the quality of synthesized images for a synthetic system that depends on supervised and unsupervised settings, and can also be extended to support other applications, such as MR-CT and CT-PET synthesis.

Fig. 1
Fig. 1 Left Deep networks train with paired data, which include CT and MR slices taken from the same patient at the same anatomical location.Paired data need to be intentionally collected and aligned, which imposes difficulty.However, paired data give network regression constraints that are far more correct.Right Deep networks train with unpaired data, which include CT and MR slices that are taken from different patients at different anatomical locations.There is a considerable amount of unpaired data.

Fig. 2
Fig. 2 Examples showing registration between CT and MR images after the mutual-information affine transform.

Fig. 3
Fig. 3 Dual cycle-consistent structure consists of (a) a forward unpaired-data cycle, (b) a backward unpaired-data cycle, (c) a forward paired-data cycle, and (d) a backward paired-data cycle.In the forward unpaired-data cycle, the input CT image is translated to an MR image by a synthesis network Syn MR .The synthesized MR image is translated to a CT image that approximates the original CT image, and Dis MR is trained to distinguish between real and synthesized MR images.
functions for paired and unpaired training data.In the forward cycle, for each I CT from the CT domain, the image translation cycle should be able to bring I CT back to the original image, i.e., I CT → Syn MR (I CT ) → Syn CT (Syn MR (I CT )) ≈ I CT .Similarly, for each I MR from the MR domain, Syn CT and Syn MR should also satisfy a backward cycle consistency: I MR → Syn CT (I MR ) → Syn MR (SynCT (I MR )) ≈ I MR .The dual cycle-consistency loss is expressed as: L dual−cyc (Syn MR , Syn CT ) =E I CT ∼p data (I CT ) [ Syn CT (Syn MR (I CT )) − I CT 1 ] + E I MR ∼p data (I MR ) [ Syn MR (Syn CT (I MR )) − I MR 1 ] + E I CT ,I MR ∼p data (I CT ,I MR ) [ Syn CT (Syn MR (I CT )) − I CT 1 ] data (I CT ) a batch from the unpaired CT data.
data (I MR ) a batch from the unpaired MR data.5:

Fig. 4
Fig. 4 Flow diagram of the discriminator Dis MR in the synthetic system.Dis MR has extra head and extra tail convolutional layers for the different input and loss functions of the paired and unpaired data.Discriminator Dis CT has the same architecture as the Dis MR .

Fig. 5
Fig. 5 From left to right Input CT, synthesized MR, reference MR, and absolute error between real and synthesized MR images.

Fig. 6
Fig. 6 From left to right Input CT image, synthesized MR image with paired training, synthesized MR image with unpaired training, synthesized MR images with paired and unpaired training (ours), and reference MR images.

Fig. 7
Fig. 7 From left to right Input CT image, synthesized MR image, reconstructed CT image, and relative difference error between the input and reconstructed CT image.
The synthesis network Syn MR tries to minimize this objective against an adversarial Dis MR that tries to maximize it, i.e., Syn * MR = arg min Syn MR max Dis MR L GAN (Syn MR , Dis MR , I CT , I MR ).For another synthesis network Syn CT , I MR → I CT and discriminator Dis CT have a similar adversarial loss as well, i.e., Syn * CT = arg min Syn CT max Dis CT L GAN (Syn CT , Dis CT , I MR , I CT ).

Table I
MAE and PSNR evaluations between synthesized and real MR images when training with paired, unpaired, and paired with unpaired data (Ours).
Table III Comparison of the MAE and PSNR for different discriminator networks and leaset squares loss.The leading scores are displayed in bold font.