Optimized Resolution-Oriented Many-to-One Intensity Standardization Method for Magnetic Resonance Images

: With the development of big data, Radiomics and deep-learning methods based on magnetic resonance (MR) images, it is necessary to conduct large databases containing MR images from multiple centers. Having huge intensity distribution differences among images reduced or even eliminated, robust computer-aided diagnosis models could be established. Therefore, an optimized intensity standardization model is proposed. The network structure, loss function, and data input strategy were optimized to better avoid the image resolution loss during transformation. The experimental dataset was obtained from ﬁve MR scanners located in four hospitals and was divided into nine groups based on the imaging parameters, during which 9152 MR images from 499 participants were collected. Experiments show the superiority of the proposed method to the previously proposed uniﬁed model in resolution metrics including the peak signal-to-noise ratio, structural similarity, visual information ﬁdelity, universal quality index, and image ﬁdelity criterion. Another experiment further shows the advantage of the proposed method in increasing the effectiveness of following computer-aided diagnosis models by better preservation of MR image details. Moreover, the advantage over conventional standardization methods are also shown. Thus, MR images from different centers can be standardized using the proposed method, which will facilitate numerous data-driven medical imaging studies. with different acquisition low-dimensional feature extraction residual convolution path in the generator, the loss data augmentation strategy and are all transformation The experiments show the superiority of the proposed method the original uniﬁed many-to-one method as well as the conventional methods. This method far-reaching implications for the establishment of large-scale, homogeneous MR image datasets. Based on such datasets, various identiﬁcation and prediction models can achieve higher robustness.


Introduction
As the most commonly used imaging method in the diagnosis of brain diseases, magnetic resonance imaging (MRI) is one of the research hotspots of computer-aided brain diagnosis in recent years [1]. Many magnetic resonance (MR) image based studies, such as computer aided diagnosis [2][3][4], differential diagnosis [5], treatment options selection [6], and prognosis estimation [7], have made great progress, which also put forward higher requirements for not only the quantity, but also the quality of the image data.
To conduct a larger unified training set contains MR images obtained from different MR scanners, the scale and intensity distribution difference of such images should be suppressed. There are already some mature methods for solving scale inconsistency, including image slice resampling [8] and scale-adaptive feature extraction [9]. To deal with different intensity distribution, preventing the increasingly complex diagnostic models from being over-fitted and unstable as well as making the model more generalizable, there is a great need for methods to eliminate the inter-group difference by intensity standardization, to ensure that the intensities of the same tissue type are the same in different images. In this case, many standardized data from multiple sources can be used for model training, and the models can be used in a wide range. To this end, many methods have been proposed, which can be divided into three main categories: the global histogram-matching methods, the joint histogram registration methods, and deep-learning-based methods.
The simplest method category among the three main categories is the histogram matching methods. These methods generate a series of corresponding intensity landmarks based on the features of the target images and the reference images, and then conduct a piecewise linear function based on these intensity landmarks, which is treated as the transformation equation. Nyúl et al. applied both overall and "foreground" percentile markers as intensity landmarks [10]. Collewet et al. utilized and compared three landmark series, which contain the maximum intensity of MR images, the mean intensity of MR images, and three sigma intensity points [11]. Madabhushi et al. employed the intensity features extracted from the largest fuzzily connected homogenous region to avoid the influence of the diseased tissue [12]. Sun et al. brought forward the maximum and the minimum decile along with the mean intensity [13]. Nunzio et al. firstly formed linear transforming functions for different brain tissues, and then built the final standardization function by joining the different transforming functions with spline smoothness [14]. However, the robustness of such methods is not sufficient. If there is a difference in tissue intensity relationship between the target images and the reference images, for example, if tissues with the same intensity in the target images differ in intensity in the reference ones, the piecewise linear function may not correctly describe the transforming function due to the absence of the multi-modality information as well as neighborhood area information. Meanwhile, such global transformation method is also unable to deal with spatial intensity nonuniformities due to the radio-frequency field inhomogeneities.
The joint histogram registration methods, on the other hand, use intensities of the same spatial location in multiple modalities among one scan as coordinates of a intensity vector and form a multi-dimensional point cloud (i.e., the joint histogram) based on the intensity vectors with different spatial location. Then, the elastic registration function of two point-clouds can be used as the intensity vector mapping function between the corresponding scans. Jager et al. directly applied the multimodality histograms to form the joint probability density functions (PDFs) [15,16]. Dzyubachyk et al. applied Jager's method into body MRI scan and further came up with regional equalization of the transforming function in different body parts [17]. Robitaille et al. firstly generated the joint histogram, and then extracted a set of characteristic points for histogram matching method [18]. Our group formed the target and reference point cloud with weighted sub-region intensity distribution instead of the joint PDFs [19,20]. These methods can better describe the transforming function but require that the target dataset and the reference one used for training the elastic registration function have the same multi-modality MR image data obtained from the same patient/volunteer and that these data should be accurately registered, which is quite difficult to achieve.
To construct a generalized MR image standardization model that could transform MR images acquired from different MR scanners and/or using different parameters, while multi-modality MR images are not needed for training and spatial intensity nonuniformities might be eliminated with the fusion of regional and global information, we propose a universal intensity standardization method based on cycle generative adversarial network [21]. This method applies a many-to-one framework with jump connections in the generators and weak-pair data augmentation strategy. This method is able to produce proper intensity standardization results. However, the original CycleGAN is mainly used in image segmentation as well as low-resolution image generation, while the cycle consistency is an indirect structural similarity indicator which is easily affected by other factors and unable to remove image blurring introduced by convolutional neural networks. Meanwhile, the weak-pair method used in the original method does not guarantee a consistent balance of randomness and structural similarity when the axial density of the dataset varies. These all result in a reduced resolution of the standardized images.
To achieve better preservation of MR image spatial resolution and generate precision standardization results, in this paper, we propose a resolution-oriented MR image intensity standardization method to generate MR image datasets with strong intensity and spatial uniformity. This method is based on the cycle generative adversarial network (CycleGAN) [22] under the extended many-to-one framework. Each generator in this framework, in particular, applies cascading residual blocks to enhance super resolution performance. The generators also apply normalized mutual information (NMI) as a part of the loss function, which is able to reduce and even eliminate image blurring by directly measuring structural errors. Moreover, an advanced weak-pair data augmentation method is applied to adapt to the varying MR image axial density.
The paper is organized as follows. Section 2 expatiates on the proposed method. The dataset and data preprocessing steps are shown in Section 3. Section 4 describes the experiments and the results. The discussion is presented in Section 5 and the conclusion is made in Section 6. Figure 1 illustrates the network architecture under the extended many-to-one framework. The aim of the intensity standardization generator G f orward is to transform every T2-FLAIR MR image slice x n,m , where n ∈ {1, 2, · · · , N} represents the group which the slice is from and m ∈ {1, 2, · · · , M n } is the slice number in the n th group, ensuring G f orward (x n,m ) "seems to be" from the N th group, which is treated as the reference, while keeping its own structure/tissue properties. N reverse transformations G backward n are established, ensuring that G backward n G f orward (x n,m ) ≈ x n,m , which could indicate the preservation of image-specific structure/tissue properties. A forward discriminator D f orward and N backward discriminators D backward n are also applied to form the GAN structures. Each G is optimized in turn with the corresponding D during the training process to form the adversarial scheme so that the intensity distribution of the generated images is gradually close to that of the corresponding reference ones. Each generator shown in Figure 2, in particular, consists of a convolutional concentration path, a low-dimensional feature extraction residual convolution path, a convolutional dispersion path, and jump connection paths. The convolutional concentration path contains a preprocessing convolution layer and two strided convolution layers. The dispersion path contains two strided transposed convolution layers and a synthetical transposed convolution layer. Two jump connections feed the feature maps before and during the concentration process into the corresponding position in the dispersion path to provide more details of input images [23]. The low-dimensional feature extraction residual convolution path employs cascading residual network (CARN) blocks in order to provide more "fast propagate" jump connections and acquire multi-level representations by integrating features between layers [24]. This may significantly increase the information richness of the generated images, thus be conducive to image resolution. Such residual convolution path consists of three cascading blocks. The 1 × 1 convolutional layers are instantiated after each cascading block to synthesize feature maps with fixed output channel number from all previous cascading blocks along with the original result of the concentration path. The cascading block consists of three residual blocks. Inside the cascading block, 1 × 1 convolutional layers are also instantiated after each residual block to build a fixed-channel intermediate feature map. A residual block contains two 3 × 3 convolutional layers, and the filter number is fixed to 256.

Network Architecture
The discriminators are classical PatchGAN with three successive 4 × 4 strided convolutions and two synthetical convolutions. Each convolution process applies leaky rectified linear unit (LeakyReLU) as the activation function. The discriminative feature map is 30 × 30 with a single layer, which is then used to create the final discriminative result.

Adversarial Loss and Cycle Consistency Loss
The adversarial loss is used to judge the distribution similarity between the transformed images and the reference images. Applying least squares generative adversarial networks (LSGAN), the forward adversarial losses is defined as (1) In GANs, the objective of generator optimization is to "fool" the corresponding discriminator by reducing the discriminator result gap between generated images and reference images while that of discriminator optimization is to better distinguish the two kinds of images. Therefore, the objective of the forward adversarial loss is min The forward generator and the backward one is combined to form an end-to-end structure similar to the self-encoder. Therefore, the G f orward (x n,m ) is treated as the "code". To ensure the preservation of image-specific feature in the "code", cycle consistency loss is applied to compare x n,m with G backward n G f orward (x n,m ) and compare x N,m with G f orward (G backward n (x N,m )). Utilizing the L 1 norm, the nth cycle consistency loss function is (2)

Normalized Mutual Information Loss
To ensure the consistency of the morphological features and tissue characteristics between two MR image slices, information consistency losses are applied. In particular, we apply the NMI loss as part of the entire loss function [25] so as to ensure that the transformed image is consistent with the target image in structure, that is, only the intensities corresponding to the tissues are about to change. One of the NMI losses is expressed as where marginal entropies H (x) = − ∑ Hist (x) log [Hist (x)], and the joint entropy H (x, y) = − ∑ JointHist (x, y) log [JointHist (x, y)]. Therefore, the NMI loss and the cycle consistency loss are applied together to make both direct and indirect consistency measurement.

Entire Loss Function
The entire loss function is defined as Therefore, the objective of the optimization is to find the optimal G f orward and the N optimal G backward n by Finally, the G f orward is used to standardize MR image slices by transforming any MR image slices into the reference domain.

Dataset
This study was approved by the Ethics Committee of all four participating hospitals while informed consent was obtained from every patient and volunteer.
In total, 8192 MR image slices were obtained from 489 patients and divided into nine groups according to the different acquisition parameter used. These image slices were used to train the proposed model. The details of the patient images are shown in Table 1. Image Groups 1-5 were from Department of Neurosurgery, Huashan Hospital, Fudan University, the first three groups of which were acquired with a Siemens Magnetom Verio 3.0T MRI scanner and the other two groups were acquired with a GE Discovery MR750 3.0T MRI scanner. Image Groups 6 and 7 were from Department of Neurosurgery/Neuro-oncology, Sun Yat-sen University Cancer Center using a Siemens Magnetom Trio Tim 3.0T MRI scanner. Image Group 8 was obtained from Department of Neurosurgery, Huadong Hospital, Fudan University with a Siemens Magnetom Verio 3.0T MRI scanner. Image Group 9 was obtained from Department of Neurosurgery, Shandong Provincial Hospital using a Siemens Magnetom Skyra 3.0T MRI scanner.
Meanwhile, two groups of T2-FLAIR MR images were acquired successively with the Siemens Magnetom Verio 3.0T MRI scanner (Group 3) and GE Discovery MR750 3.0T MRI scanner (Group 6) from ten volunteers. MR images of another two modalities were acquired at the same time to meet the requirements of the joint histogram registration method (Groups 1 and 2 with the Siemens scanner and Groups 4 and 5 with the GE scanner). Therefore, 60 image sets from the ten volunteers were treated as the paired gold standard to perform all paired comparisons. The details of the volunteer images are shown in Table 2.
Moreover, to prove that the proposed method as a pre-processing method not only improves the visual perception of the resolution and related indicators, but also is conducive to the performance of various resolution-sensitive post-processing algorithms, the effectiveness of the proposed intensity standardization method for aiding the computer-aided diagnostic algorithms was evaluated. Two sets of brain glioma MR images from two MR scanners were used to build a radiomics model for differentiating high grade glioma (HGG) from lower grade glioma (LGG). The clinical image data and radiomics model are briefly described as follows. The MR images of glioma patients from the Department of Neurosurgery, Huashan Hospital affiliated to Fudan University were used, which were collected from a SIEMENS MAGNETOM R Verio 3.0T MRI scanner and a GE Discovery TM MR750 3.0T MRI scanner, respectively. Among them, the images obtained from the SIEMENS scanner were treated as a subset of the reference images in the proposed standardization method, and the images obtained from the GE scanner comprised a subset of one target image group. HGG was considered as positive sample, and LGG was considered as negative sample. Therefore, we used the MR images obtained from the SIEMENS scanner (34 HGG and 32 LGG) to train the model and tested the model with the original images from the GE scanner and the standardized GE images (17 HGG and 11 LGG). The detail acquisition protocols are shown in Table 3.

Advanced Weak-Pair Data Input Strategy
For the case where the axial position of the input MR image slice is close to the boundary, since the axial density of the reference MR images is lower there, the structural similarity of the weak-pair selection of the reference image may be lowered if the same random criterion is applied. Therefore, to make the selected reference MR image structurally similar to the input MR image and guarantee a certain randomness, an advanced weak-pair data input strategy is proposed. Firstly, the axial location of each brain MR image slice in Montreal Neurological Institute (MNI) space is extracted with SPM12 tool. Then, in every epoch, 30 MR images in the reference dataset axially closest to x n,m is used to form a set Xsel1 n,m . Meanwhile, MR images in the reference dataset within the range of ±3 mm from x n,m is used to form another set Xsel2 n,m . One image x N, m in the larger set above is randomly selected to be paired for x n,m in the training process. After that, a random number rand n,m is generated according to a Gaussian distribution with µ = 256 and σ = N epoch le f t + 64. The augmented image size is size n,m = round (256 + abs (256 − rand n,m )) .
Therefore, the x n,m and corresponding x N, m are scaled to size n,m × size n,m , randomly cropped to 256 × 256, and then randomly flipped. The above random procedures are all synchronized within an image pair to eliminate the structure disparity.

Optimized Training Strategy with Synchronized Batch Normalization
Batch normalization regards the statistics of neuron outputs with mini-batch input as the estimation of the whole dataset and normalize the outputs according to the estimation in order to prevent internal covariate shift and thus improve model stability. When multiple GPUs are used in training process, the mini-batch is defined as the per-GPU input data, which works well in low resolution scenario.
However, when higher input image resolution and a deeper network are applied, the graphics memory within a GPU can handle several or only one input image. In this case, the statistics of the mini-batch can no longer represent the entire dataset thus the batch normalization degenerates into the instance normalization. Therefore, synchronized batch normalization is applied, which extracts the statistics among all input samples distributed on multiple GPUs. Thus, the estimate based on such samples are more accurate, so that the model stability is guaranteed.

Experiments and Results
We firstly evaluated the resolution preservation ability enhancement brought by the proposed method over the original unified method previously proposed by our group. As two unified models, they should be able to convert any MR image within a certain range into an image of a reference group. Therefore, we first use a larger number of images with the same imaging parameters regarded as the reference image group as the generator input, comparing the resolution loss of the output by measuring consistency of the input and the output. We then used the paired volunteer dataset to measure resolution parameters of the standardized images in the presence of intensity distribution variation between the target and reference datasets.
Then, we compared the proposed method in general properties with the original unified method and also the representative methods of the two major types of intensity standardization methods, the histogram matching method proposed by Sun et al. and the joint histogram registration method previously proposed by our group. To meet the dataset requirements of all above methods, the volunteer dataset was applied.
To conduct a numerical comparison between these methods, the resolution-oriented metrics including the peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM), the visual information fidelity (VIF), the universal quality index (UQI), and the image fidelity criterion (IFC) were utilized. The widely-used general properties containing the gradient magnitude similarity deviation (GMSD) and the mean square error (MSE) were applied. The histogram correlation (HC) and the average disparity (AD) we proposed before were also employed. All numerical results were obtained by comparing the transformed (generated) three-dimensional (3D) image and the reference 3D image and then averaging between different participants of the corresponding experiment. The HC is defined as where the Cnt standard records the values of histogram bins in the standardized image and Cnt re f records the values of histogram bins in the reference image. The AD is defined as where I standard and I re f are the standardized and reference image, respectively. Better standardization result is denoted with larger PSNR, SSIM, IFC, UQI, VIF, and HC as well as smaller GMSD, MSE, and AD.

The Resolution Preservation Ability Enhancement Brought by the Optimizations on Methodology
To prove the proposed method's improvement over the resolution preservation of the standardized images, the resolution-related metrics of the methods was firstly explored. For the sake of achieving stable test results on a large test set, the first experiment was conducted by obtaining 320 images with the same image acquisition conditions as Group 1, which is the reference group, in the training set. Then, the standardized images were obtained with both the original unified method and the proposed method. In theory, the standardized images should be the same as the original images, thus we used the original images as the gold standard to calculate the resolution-related features of the two sets of standardized images. PSNR, SSIM, VIF, UQI and IFC were applied. Table 4 illustrates such a comparison. In this experiment, the proposed method was superior to the original unified method in most resolution-related metrics while the fidelity-related indicators were significantly improved. These demonstrate the improved effectiveness of the proposed method in maintaining MR image resolution. To better reflect the resolution preservation ability enhancement of the proposed method in the presence of intensity distribution variation between input MR images and output ones, we conducted another experiment by comparing the standardization result of volunteer MR images obtained from GE scanner with the paired MR images obtained from SIEMENS scanner. Table 5 illustrates the comparison of PSNR and SSIM. The proposed method not only successfully eliminated the huge difference between the original target image and the reference image, but also had a significant performance improvement compared with the original unified method, due to the absence of resolution-oriented optimizations.  Figure 3 shows the local details of some of the MR images within the experiment above. Since the proposed method uses a network structure that is more focused on improving image resolution, a loss function with normalized mutual information, an advanced weak-pair data augmentation method, and a multi-GPU training strategy, the output images of the proposed method have a higher local accuracy and resolution. This allows a series of image texture features to be extracted more accurately on the standardized image generated by the proposed method. The aim is to transform target images into the reference group: (a1,b1) the reference MR images, the red boxes are zoomed in (a2,b2), respectively; (a3,b3) the details of the target MR images; (a4,b4) the details of the results produced by the original many-to-one method; and (a5,b5) the details of the results produced by the proposed method.
Moreover, the performances of differential diagnosis with the original dataset and the datasets standardized with different methods were evaluated to reflect the advantage of the resolution preservation ability enhancement brought by the proposed method. Classical and highly interpretable radiomics model was applied to better illustrate the superiority of the proposed method. First, the obtained images from glioma patients and their standardized results were segmented using the GrowCut method to obtain a glioma region [26]. Then, for the segmented regions, 555 features including intensity features (21), shape features (15), texture features (39), and wavelet features (480) were extracted using a self-adaptive feature extraction method [9]. The minimum-redundancy-maximum-relevance (mRMR) based genetic-algorithm (GA) was applied for feature selection. Finally, the selected features were used to train the SVM classifier. Table 6 show the classification result of the original and standardized test set. The accuracy of differential diagnosis between HGG and LGG was noticeably increased. Therefore, we believe that the proposed method can effectively improve the accuracy of differential diagnosis methods by increasing the preservation of image details and texture features as well as wavelet features, which highly rely on image details.  Figure 4 shows the standardized results of the proposed method, the original unified many-to-one method, the histogram matching method, and the joint histogram registration method on the volunteer MR image dataset while Figure 5 shows the logarithmic absolute error image between the standardized image and corresponding reference image. The results of the histogram matching method are deviated in terms of overall image intensity and contrast of gray matter and white matter. The joint histogram registration method has good overall results, but has some image intensity errors in the skull, muscle, and skin areas. Both the original unified method and the proposed method could produce the state-of-the-art image intensity standardization results. In Figure 5, the mean intensity value of the error image corresponding to the proposed method is relatively small, but the spatial distribution of the errors is more extensive. This may be mainly because the standardization results of the proposed method are highly similar to the original target MR image structure, while the original target image and the reference "gold standard" image still have difference due to scanning interval and registration accuracy. Therefore, such difference might be reflected in the error images. Table 7 shows the numerical comparison for these different methods. The proposed method outperformed the original unified many-to-one method based on CycleGAN in the overwhelming majority of metrics. Meanwhile, the proposed method had a significant effect improvement over the two conventional standardization methods.

Other Visual and Numerical Comparison with the Previous Methods
In terms of runtime, the histogram matching method is mainly piecewise linear functions, which takes only 0.5867 s to standardize an image slice. The joint histogram registration method requires b-spline fitting, so the amount of calculation is large, and the time taken for processing a slice reaches 2.1651 s. For the original many-to-one universal method and the proposed method, since the convolution is very suitable for parallel computing using the graphics processing units (GPUs), with the help of two NVIDIA TITAN Xp GPUs, the original many-to-one universal method only needs 0.2868 s to transform an MR image slice. Because the parameters are shared between layers in the cascading block and between the cascading blocks, the proposed method only needs 0.1978 s, which is fast enough for a preprocessing step.

Analysis of the Model Stability of the Anomalies and Lesions
To support various post-processing computer-aided diagnosis models, the dataset we used in training the model includes MR images from patients with various brain diseases. Therefore, the resized and cropped MR images used in the training may be either completely normal or contain various types of lesion/tumor areas. Therefore, the diversity in data can guarantee the robustness of the transformation model in one aspect. Meanwhile, on the other hand, since we use a method based on GANs, all regions of the image, including lesions and tumors, will tend to match the intensity distribution corresponding to the same tissue type of the reference image group during training to minimize the adversarial loss. Furthermore, the consistency-based loss functions including both the mutual information loss and the cycle consistency loss ensure the stability of the structure (including the type of tissue) before and after the transformation (standardization). These loss functions ensure that, although the transformation model has undergone regional intensity changes, such changes should also be stable for abnormal tissues.

The Fusion of Various Losses during the Training Process
The loss functions we use in the proposed method include adversarial loss, cycle consistency loss, and NMI loss. Among them, the adversarial loss is the basis of the generative adversarial network and measures the difference in intensity distribution between the generated image and the reference image. Consistency-based losses including cycle consistency loss and NMI loss measure the feature preservation, especially structure retention of the original target after the intensity distribution transformations of the MR images. All the loss functions are essential for the generator to make high-resolution precise transformed MR images with standardized intensity distribution. However, in the case of multiple losses acting simultaneously, the weight of the loss largely determines the final performance of the model. In our study, the weights of these three losses have been manually fine-tuned to achieve a more consistent decay curve, which has resulted in better convergence results. Moreover, recent studies have pointed out that based on the noise or systematic uncertainty of each loss corresponding to the task, an optimal loss weight may be derived [27,28]. Therefore, we may try to give each loss element a learnable weight while giving a constraint to the overall loss, and automatically obtaining the optimal loss weights based on the systematic uncertainty estimates during the iteration process in the future.

Conclusions
In this paper, a resolution-oriented universal MR image standardization method is proposed in order to standardize MR images from different MR scanner with different acquisition parameter. The low-dimensional feature extraction residual convolution path in the generator, the loss function, the data augmentation strategy and training strategy are all optimized for achieving better transformation resolution. The experiments show the superiority of the proposed method over the original unified many-to-one method as well as the conventional methods. This method has far-reaching implications for the establishment of large-scale, homogeneous MR image datasets. Based on such datasets, various identification and prediction models can achieve higher robustness.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: