Within-Modality Synthesis and Novel Radiomic Evaluation of Brain MRI Scans

Simple Summary Brain MRI scans often require different imaging sequences based on tissue types, posing a common challenge. In our research, we propose a method that utilizes Generative Adversarial Networks (GAN) to translate T2-weighted-Fluid-attenuated-Inversion-Recovery (FLAIR) MRI volumes into T2-Weighted (T2W) volumes, and vice versa. To evaluate the effectiveness of our approach, we introduce a novel evaluation schema that incorporates radiomic features. We train two distinct GAN-based architectures, namely Cycle GAN and Dual Cycle-Consistent Adversarial network (DC2Anet), using 510 pair-slices from 102 patients. Our findings indicate that the generative methods can produce results similar to the original sequence without significant changes in radiometric features. This method has the potential to assist clinicians in making informed decisions based on generated images when alternative sequences are unavailable, or time constraints prevent re-scanning MRI patients. Abstract One of the most common challenges in brain MRI scans is to perform different MRI sequences depending on the type and properties of tissues. In this paper, we propose a generative method to translate T2-Weighted (T2W) Magnetic Resonance Imaging (MRI) volume from T2-weight-Fluid-attenuated-Inversion-Recovery (FLAIR) and vice versa using Generative Adversarial Networks (GAN). To evaluate the proposed method, we propose a novel evaluation schema for generative and synthetic approaches based on radiomic features. For the evaluation purpose, we consider 510 pair-slices from 102 patients to train two different GAN-based architectures Cycle GAN and Dual Cycle-Consistent Adversarial network (DC2Anet). The results indicate that generative methods can produce similar results to the original sequence without significant change in the radiometric feature. Therefore, such a method can assist clinics to make decisions based on the generated image when different sequences are not available or there is not enough time to re-perform the MRI scans.


Introduction
Medical imaging scans, including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), are routinely acquired and used clinically to macroscopically assess, diagnose, and monitor patients with brain abnormalities. MRI in particular can depict normal anatomy and apparent pathologies while providing data relating to the anatomical structure, tissue density, and microstructure, as well as tissue vascularization, depending on the acquired sequence [1][2][3]. Structural MRI sequences represent the basic scans acquired across comprehensive centers and community-based healthcare sites, comprising native T1-weighted (T1W), post-contrast T1-weighted (T1Gd), T2-weighted (T2W), and T2-weighted Fluid-A enuated Inversion Recovery (FLAIR). T1W scans facilitate observation and analysis of the brain anatomy, with the T1Gd scans particularly being able to easily identify the boundaries of an active tumor, while T2-weighted scans (T2w and FLAIR) help in identifying brain abnormalities, both those related to vascular lesions (e.g., stroke) and vasogenic edema [4]. The simultaneous assessment of multiple varying MRI scans (also known as multi-parametric MRI-mpMRI) from the same patient is the standard clinical practice for the evaluation of patients suspected of stroke or diffuse glioma, as it offers the maximal available medical diagnostic information.
Acquisition of mpMRI might not be possible at all times, due to numerous reasons, including but not limited to the patient's cooperation during a scanning session that could result in motion-degraded scans, thereby hindering further diagnostic usage [5][6][7]. Towards this end, the artificial synthesis of specific MRI scans has been an active area of research [8,9], with the intention of either substituting specific MRI scans corrupted by various artifacts, or generating scans that were not acquired at all. Although such synthetic scans have been successfully used in many applications, and Generative Adversarial Networks (GANs) have significantly improved their realism, the final result may not always look as realistic and/or may contain information that adversely affects downstream quantitative analyses [10,11]. Cross-domain synthesis of medical images has drawn significant interest in the medical imaging community and describes the artificial generation of a target-modality scan by learning the relationship between paired given source-modality scans and their associated target modality scans [12,13]. Of note, the data is here described as paired when it arises from the same individual at different points in time.
In recent years, deep-learning procedures and particularly Convolutional Neural Networks (CNNs) [14] and GANs have rapidly dominated the domain of medical image synthesis [15,16]. GANs use two competing CNNs: one that generates new images and another that discriminates the generated images as either real or fake. To address the problem of unpaired cross-domain data, which is common in healthcare, the Cycle Generative Adversarial Network (CycleGAN) [17] is typically chosen to obtain high-quality information translatable across images. In CycleGAN, based on the image of a subject b1 in the source domain, the purpose is to estimate the relevant image of the same subject b2 in the target domain. In theory, the CycleGAN model entails two mapping functions, i.e., G1: X → Y and G2: Y → X, and associated adversarial discriminators DY and DX. DY encourages G1 to translate X into outputs indistinguishable from domain Y and contrariwise for DX and G2. Nie et al. [18] trained a fully convolutional network (FCN) to generate CT scans from corresponding MRI scans. They specifically used the adversarial training method to train their FCN. Welander et al. [19] evaluated two models, i.e., CycleGAN and UNIT [20], for image-to-image translation of T1-and T2W MRI slices by comparing synthetic MRI scans to real ones. They used paired T1W and T2W images from 1113 axial images (only slice 120). The scans were registered to a standard anatomical template, so they were in the same coordinate space and of the same size. Two models were compared using quantitative metrics, including mean absolute error (MAE), mutual information (MI), and peak signal-to-noise ratio (PSNR). It was shown that the executed GAN models can synthesize visually realistic MRI slices. Dar et al. [21] proposed a method for multi-contrast MRI synthesis based on Conditional GANs (CGANs). They demonstrated how CGANs could generate a T2W scan from a T1W. Theis et al. [22] found that GANs can generate more realistic training data to improve the classification performance of machine learning methods. Also, they showed that models creating more visually realistic synthetic images do not certainly have be er quantitative error measurements when compared to real images. Despite the mounting promise of GANs for healthcare, both optimal model selection and quantitative evaluation remain challenging tasks, and solutions produced so far are usespecific and not generalizable. Specifically, for their quantitative performance evaluation, several metrics (such as the MAE, Mean Squared Error (MSE), and PSNR) have been proposed in the literature [23], albeit no consensus has been reached on an optimal evaluation metric for a particular domain.
Radiomics describe a novel and rapidly advancing area in medical imaging. In contrast to the traditional clinical assessment of considering medical images as pictures intended only for visual interpretation, radiomics represent visual and sub-visual quantitative measurements (also called "features") extracted from acquired radiology scans [24,25], following specific mathematical formulations, and resulting in measurements that are not even perceivable by the naked eye, i.e., sub-visual [26][27][28][29][30][31][32][33][34]. These features are widely used in both clinical and pre-clinical research studies a empting to identify associations between radiologic scans and clinical outcomes, or even molecular characteristics [35][36][37][38]. The hypothesis is that quantitative computational interrogation of medical scans can provide more and be er information than the physician's visual assessment. This is further exacerbated in observations related to the texture analyses of different imaging modalities. Various open-source tools have been developed to facilitate the harmonized extraction of high throughput radiomic features [39][40][41], contributing to the increasing evidence of their value. The primary purpose of these tools has been to expedite robust quantitative image analyses based on radiomics and standardize both feature definitions and computation strategies, thereby guaranteeing the reproducibility and reliability of radiomic features. In this study, considering the importance of T2-weighted scans (T2W and FLAIR), we focus on generating FLAIR from T2W MRI scans, and vice versa, based on the CycleGAN [17] and dual cycle-consistent adversarial network (DC 2 Anet) [42] architectures. We further consider radiomics as a novel way to quantify the dissimilarity between the distribution of the actual/real and synthesized scans. We think that radiomics can represent the first potential solution for the quantitative performance evaluation of GANs in the domain of radiology. For comparison, we also compare with traditional metrics, including MSE, MAE, and PSNR [43].

Dataset and Registration
The data utilized in our study were obtained from the public data collection 'ACRIN-DSC-MR-Brain (ACRIN 6677/RTOG 0625)' [44] at The Cancer Imaging Archive (TCIA) [45]. These data describe brain MRI scans from a multicenter phase-II trial of bevacizumab with temozolomide in recurrent glioblastoma (GBM) patients. We extracted 510 T2W/FLAIR paired slices from 102 patients, which were further divided into 410 pairs from 81 patients for training and 100 pairs from 21 patients to evaluate synthesis results (test set). Due to the limitation of the dataset, the validation set is not considered to train the networks. Each pair include the axial paired T2W/FLAIR slices for the same patient and at the same axial depth. Of note, the testing data are held out of the training process at all times. Our networks take 2D axial-plane slices of the volumes as inputs. In the preprocessing step, T2W 2D scans were first rigidly registered to FLAIR scans using the ITK-SNAP software (version 3.8.0) [46], considering 6 degrees of freedom (i.e., 3 translations and 3 rotations).

CycleGAN
A GAN network uses an image generator (G) to synthesize images of a target domain and a discriminator (D) to distinguish between real and synthesized images. A suitable analogy for visual data considers one network as an art forger and the other as an art specialist. The forger (generator G) creates forgeries. The specialist, (discriminator D) receives both forgeries and real images and a empts to tell them apart ( Figure 1). Both networks are in competition with each other and are trained simultaneously. CycleGAN is a framework that allows the unpaired data to generate an image from one domain to another. Therefore, CycleGAN reduces the problem caused by the lack of paired data. A diagram of the CycleGAN model used in this study is presented in Figure  2. Let us assume n A images x A ∈ XA (e.g., T2W) and n B images x B ∈ XB (e.g., FLAIR). The CycleGAN for T2W and FLAIR images include two mappings GT2W: T2W → FLAIR and GFLAIR: FLAIR → T2W. Therefore, the proposed CycleGAN operates with two generators (GT2W, GFLAIR) and two discriminators (DT2W, DFLAIR). Given a T2W image, GT2W learns to generate the respective FLAIR image of the same anatomy that is indistinguishable from real FLAIR images, whereas DT2W learns to discriminate between synthetic and real FLAIR images. The architecture of the CycleGAN generator is adapted from [17] with 9 residual blocks after early convolutional layers. Similarly, given a FLAIR image, GFLAIR learns to generate the respective T2W image of the same anatomy that is indistinguishable from real T2W images, whereas DFLAIR learns to discriminate between synthetic and real T2W images. We apply adversarial losses to both mapping functions for matching the distribution of generated images to the data distribution in the target domain. For the mapping function GT2W: T2W → FLAIR and its discriminator DT2W, the objective is expressed as follows: Similarly, the adversarial loss is presented for the mapping function GFLAIR: FLAIR → T2W and its discriminator DFLAIR as follows: Also, the cycle consistency loss, or Լcyc (Forward cycle-consistency and Backward cycle-consistency), is used to keep the cycle consistency between the two sets of networks as follows: The loss of the whole CycleGAN network is:

DC 2 Anet
The DC 2 Anet model, introduced by Jin et al. [42], follows a semi-supervised learning approach that alternates between optimizing supervised and unsupervised learning in order to seek a global minimum for the optimal network. The forward and backward mappings are used to generate the T2W image from a FLAIR image, and vice versa. In the forward cycle-consistent adversarial network with aligned learning, the GFLAIR network generates a synthetic T2W image from a FLAIR image, and this T2W image is then used by the GT2W network to generate the original FLAIR image in order to learn the domain structures. The input to the discriminators DisT2W is either a sample T2W image from the real T2W data or a synthetic T2W image. In the backward cycle-consistent adversarial network with aligned learning, the GT2W network generates a synthetic FLAIR image from a T2W image, and this FLAIR image is then used by the GFLAIR network to generate the original T2W image in order to learn the domain structures. The discriminator's DisT2W and DisFLAIR are expressed as follows: In the DC2Anet model, in addition to adversarial and dual cycle consistency used in the CycleGAN model, to achieve accurate and perceptual outputs, four loss functions were measured as follows: voxel-wise, gradient difference, perceptual, and structural similarity losses. We name the loss function of the supervised training of our model as L_sup. Then the value of this function is defined as a weighted sum of Lsup-adversarial, Lsup-cycle-consistency, and these four losses (Lvoxel-wise, Lgradient, Lperceptual, Lstructural). The weights used to calculate the Lsup are hyperparameters of the model, and we set all of them to be one in our experiment. Hence, these four terms are combined, and the relation between them is as follows: Lsup = Lsup-adversarial + Lsup-cycle-consistency + Lvoxel-wise + Lgradient + Lperceptual + Lstructural. A diagram outlining the forward and backward adversarial losses of the DC 2 Anet model is shown in Figure 4. Of note, the generator network of the DC 2 Anet model is the same as the CycleGAN model ( Figure 3a), but the discriminator network is different. The architecture of this discriminator is shown in Figure 5.

Implementation
To generate a T2W MRI from a FLAIR, and vice versa, the T2W and FLAIR values are converted to [0, 1] tensor. The resolutions of the FLAIR and T2W images in our dataset are 256 × 256 and 512 × 512 respectively. Therefore, in the first preprocessing step, FLAIR images are registered to T2W images using rigid registration to ensure that all images have a 256 × 256 resolution. Then, the axial T2W/FLAIR pairs were the input of the network with 256 × 256 pixels. Of note, instead of image patches, whole 2D images are used for training. CycleGAN and DC 2 Anet were trained for 400 epochs. The batch size was set to 2, and both the generator and the discriminator used the Adam optimizer [47]. In the first 200 epochs, the learning rate was fixed at 2·10 −4 . For the rest 200 epochs, the learning rate linearly decayed from 2·10 −4 to 0. It was observed that the discriminators found success faster than the generators, therefore the values of iterations for the generator and the discriminator were set to three and one, respectively. The loss function plays a crucial role in training the networks and achieving the desired translation between T2-weighted (T2W) and FLAIR images. In both CycleGAN and DC 2 Anet, adversarial losses are employed to match the distribution of generated images with the target domain data distribution. Specifically, for the mapping function GT2W: T2W → FLAIR and its discriminator DT2W, we use the adversarial loss defined as ԼGAN (GT2W, DT2W, T2W, and FLAIR). Similarly, for the mapping function GFLAIR: FLAIR → T2W and its discriminator DFLAIR, the adversarial loss is expressed as ԼGAN (GFLAIR, DFLAIR, FLAIR, and T2W). Additionally, to maintain cycle consistency between the generator networks, a cycle consistency loss (Լcyc) is utilized, which ensures that the translated images can be successfully converted back to the original domain. The complete loss function for the CycleGAN network is given by ԼCycleGAN = ԼGAN (GT2W, DT2W, T2W, FLAIR) + ԼGAN (GFLAIR, DFLAIR, FLAIR, T2W) + Լcyc (GT2W, GFLAIR).
Cyc (GT2W, GFLAIR) losses were multiplied by a constant lambda (λ) based on the importance of the cycle consistency losses (Equation (4)) concerning the adversarial loss; therefore, we set λcyc (GT2W, GFLAIR) = 10 and λGAN = 1. All experiments including data preprocessing and analysis were performed on the Google Cloud computing service "Google Colab" (colab.research.google.com) using Python 3.7 and TensorFlow 2.4.1. The parameters of training and hardware configurations are provided in (Table 1).

Evaluation
Several metrics were used to compare the real and synthetic T2W, and FLAIR, images. These metrics including MAE, MSE, and PSNR were used widely in the literature for the same purpose [48][49][50]. These metrics are defined as: where N is the total number of voxels inside the input image and i is the index of the aligned pixel. MAX denotes the largest pixel value of ground truth T2W and synthetic T2W images, and vice versa, for FLAIR images. Considering a use-inspired generalizable evaluation approach, beyond just the essential quantification but also considering radiologic appearance, in this study we introduce a novel approach based on radiomic features to compare the real and synthetic images. After running the models and generating all the synthetic images, we segmented the whole brain volume, both in the real and synthetic images, using the module of ITK-SNAP [51,52] within the Cancer Imaging Phenomics Toolkit (CaPTk) [41,53]. CaPTk is a software platform wri en in C++ to analyze medical images. The package leverages the value of quantitative imaging analytics along with machine learning to derive phenotypic imaging signatures. The specific ITK-SNAP module is based on the geometric active contour model and defines the contours using energy forces and the geometric flow curve. The counter is a collection of points that undergo the interpolation process. In this study, manual delineation of the whole brain and skull is performed. Following this segmentation, radiomic features compliant with the Image Biomarker Standardization Initiative (IBSI) [54] were extracted from both the real and synthetic T2W, as well as from the real and synthetic FLAIR images using the CaPTk 1.9.0 software [53,55]. Twenty-three features were extracted, comprising 8 Gray Level Co-occurrence Matrix features (GLCM) [56,57], 8 Gray Level Size Zone (GLSZM) [58], and 7 Gray Level Run Length Matrix features (GLRLM) [54] ( Table 2).
The choice of specific features in our study, namely the 8 Gray Level Co-occurrence Matrix (GLCM) features 8 Gray Level Size Zone (GLSZM) features, and 7 Gray Level Run Length Matrix (GLRLM) features, was based on their established significance in characterizing textural pa erns and capturing distinct image characteristics (Table 2). These features have been widely utilized in radiomics studies and have demonstrated their efficacy in quantifying spatial relationships, size variations, and run lengths within an image. By utilizing this comprehensive set of radiomic features, we aimed to capture a wide range of textural characteristics that could potentially distinguish real and synthetic images. Details of radiomic features are included in supplementary materials (Supplementary S1). These features have shown promise in previous studies as reliable indicators of image heterogeneity and structural differences. Their selection was based on their ability to quantitatively represent textural properties and provide discriminative information regarding the underlying tissue or lesion composition. We believe that the inclusion of these 23 selected radiomic features, derived from GLCM, GLSZM, and GLRLM matrices, offers a robust and comprehensive approach for evaluating the differentiation between real and synthetic images. Their relevance lies in their proven capability to capture textural patterns and provide meaningful insights into the image composition, thereby contributing to the assessment and discrimination of real and synthetic images in our study.
The exact parameters used for the feature extraction are: Bins = 20, Radius = 1, and Offset = Combined. Available studies use bin numbers varying from 8 to 1000, as suggested by the IBSI [59], or bin widths from 1 to 75. In this study, textural features were computed using the fixed-bin width approach. Radiomics data were then analyzed using the GraphPad Prism 9.5 software (GraphPad, San Diego, CA, USA). The normality of the extracted radiomic features was evaluated based on the D'Agostino test [60]. When data was described by a normal distribution we used a t-test, otherwise we used the Mann-Whitney U test was used. For determining whether the radiomic features between the real and synthetic T2W (as well as between the real and synthetic FLAIR images) are statistically significant, confidence intervals (commonly abbreviated as CI) were calculated.

Results
We evaluated the proposed CycleGAN and DC 2 Anet architectures on T2W and FLAIR brain tumor images. For the quantitative performance evaluation of the T2W and FLAIR synthesis, in line with current literature we considered the three metrics of MAE, MSE, and PSNR (Table 3).  (Figures S1 and S2).
The differences observed between real and synthesis imaging modalities can be attributed to multiple factors. Firstly, synthesis images are computer-generated simulations of original images based on GAN models, while real original images are directly acquired from patients using MRI scanners. Consequently, variations can arise due to the inherent limitations and assumptions of the synthesis process. Furthermore, physiological and technical factors, such as variances in tissue contrast, signal intensity, and image artifacts, can contribute to dissimilarities between synthesis and real images. To further investigate and address these differences, future studies should focus on refining the synthesis algorithms, incorporating more realistic training data, and exploring the impact of various imaging parameters on the synthesis process.
We then performed a secondary quantitative performance evaluation that considers the radiophenotypical properties of the images, by virtue of the extracted radiomic features. Unlike the metrics of MAE, MSE, and PSNR, significance levels and values of radiomic features vary depending on the type of feature and image. Following a comparison of the radiomic features for both the T2W and FLAIR images, our results pointed out that for most radiomic features, there was no significant difference between the real and synthetic T2W, as well as for the real and synthetic FLAIR images. The mean and standard error (SE) of the GLCM features for both T2W and FLAIR images, as well as their statistically significant difference for the CycleGAN and DC 2 Anet models, are shown in Tables 4  and 5. No significant differences were observed for all extracted GLCM features using the CycleGAN model between real T2W images and synthetic T2W images. However, this was not the case for three features (cluster prominence, contrast, and correlation) extracted from the synthetic images of the DC 2 Anet model (Table 4). Notably, there was a significant difference for FLAIR images using the DC 2 Anet model for two extracted features (cluster prominence and correlation), and for the cluster prominence feature of the CycleGAN model (Table 5).    Similarly, with GLCM features, no significant differences were observed for all extracted GLRLM features using the CycleGAN model between real T2W images and synthetic T2W images (Table 6). However, with feature extraction on the DC 2 Anet synthetic images, there was a significant difference for two extracted features (High Grey Level Run Emphasis, and Long Run Low Grey Level Emphasis) between the real and synthetic T2W, as well as between the real and synthetic FLAIR images (Table 7).  Extracted features based on GLSZM for the two models used are shown in Tables 8  and 9. By comparing the real and synthetic T2W based on CycleGAN and DC 2 Anet, except for two features (Grey Level Nonuniformity, and Large Zone Low Grey Level Emphasis), no significant differences were observed for other extracted GLSZM features (Table 8). However, for FLAIR images using the DC 2 Anet model as shown in Table 8, for two features (High Grey Level Emphasis, and Large Zone Low Grey Level Emphasis), significant differences were observed.

Discussion
A within-modality synthesis strategy was presented for Generating FLAIR images from T2W images and vice versa based on CycleGAN and DC 2 Anet networks. Comprehensive evaluations were conducted for two distinct methods where training images were registered within single subjects. It has been shown, via a perceptual study and in terms of quantitative assessments based on MAE, MSE, PSNR metrics, as well as based on a novel radiomic evaluation, that CycleGAN and DC 2 Anet can be used to generate visually realistic MR images. While our synthesis approaches were primarily evaluated for two specific brain MRI sequences, it has the potential to be applied for image-to-image MRI synthesis, as well as for synthesis across imaging modalities (such as MRI, CT, and PET). The proposed CycleGAN technique uses adversarial loss functions and cycle-consistency loss for learning to synthesize from registered images for improved synthesis. In the DC 2 Anet model, in addition to used losses in the CycleGAN model, four-loss functions, including voxel-wise, gradient difference, perceptual, and structural similarity losses, were used. In modern medical imaging modalities, generating realistic medical images that can be u erly similar to their real ones remains a challenging objective. Generated synthetic images can ensure a trustworthy diagnosis. Based on the quantitative evaluation, for all metrics, the CycleGAN model was found accurate and outperformed the DC 2 Anet model.
CycleGAN and DC 2 Anet models learn the mapping directly from the space of T2W images to the corresponding FLAIR images, and vice versa. Moreover, three metrics including MAE, MSE, and PSNR applied to the data from 510 T2W/FLAIR paired slices from 102 patients, were favorably compared with other reported results in the brain region literature. For example, Krauss et al. [61] compared the assessment results of synthetic and conventional MRIs for patients with Multiple Sclerosis (MS). The images were prospectively acquired for 52 patients with the diagnosed MS. In addition to quantitative evaluations and using the GAN-based approach, the CycleGAN model obtained be er results than the study of Krauss et al. Han et al. [62] proposed a GAN-based approach to generate synthetic multi-sequence brain MRI using Deep Convolutional GAN (DCGAN) and Wasserstein GAN (WGAN). Their model was validated by an expert physician who employed Visual Turing test. In agreement with our study, their results revealed that GANs could generate realistic multi-sequence brain MRI images. Nevertheless, this study was different from other research a empts in terms of the employed quantitative evaluation and the proposed models, CycleGAN and DC 2 Anet. In addition to the MAE, MSE, and PSNR metrics, we also conducted a novel evaluation based on radiomic features, to compare the real and synthetic MRI cases. Li et al. [63] proposed a procedure to synthesize brain MRI from CT images. They reported an MAE value of 87.73 and an MSE value of 1.392 × 10 4 for real MRI and synthesized MRI. Although their results differed from our findings, there was a common understanding that the application of the CycleGAN model was subject to much error.
For evaluation, generated images and comparing them with real images, quantitative evaluations can be used. It is clear that even if the model achieves a relatively satisfactory score in quantitative measurements including MAE, MSE, and PSNR metrics, it does not necessarily generate visually realistic images. Although visually CycleGAN produced realistic images there is not much difference between the two models CycleGAN and DC 2 Anet in quantitative measurements including MAE, MSE, and PSNR metrics, as shown in Table 3. It can be implied that the process of determining whether or not an image is visually realistic cannot be done based on the mentioned metrics. However, the current study employed radiomic features as a new evaluation approach to compare the real MRI and their synthetic counterparts. Our results (Tables 4-9) revealed that for the vast majority of radiomic features regarding the two T2W and FLAIR images, no significant difference was observed between real images and images synthesized using Cy-cleGAN. On the other side, some radiomic features indicate significant differences for images synthesized by DC 2 Anet, indicating that the set of radiomic features is more successful in assessing the realism of the generated images than traditional metrics such as MAE, MSE, and PSNR. Therefore, according to the metrics used in this study, it can be concluded that performing evaluations based on radiomic features is a viable option in the GAN models.
In this study, we have used the ACRIN 6677/RTOG 0625 data set which is a multicenter dataset and is one of the strengths of this study. Of note, as a limitation, future studies with a large sample size are suggested. This study considered synthesis for twocontrast brain MRI; hence the proposed models can also be used for other related tasks in medical image analysis such as T1W, CT-PET, and MR-CT. For future research, it is suggested that relevant evaluations be carried out based on radiomic features with larger data sets and other anatomical areas.
Despite the demonstrated effectiveness of our method in generating a T2W MRI from a FLAIR, and vice versa, it is important to acknowledge that the applicability of our approach may have limitations in certain specific cases. The performance of our method may be influenced by factors such as extreme variations in tumor size, irregular tumor shapes, or cases with substantial edema or necrosis. While our methodology has shown promising results in brain tumor patients, further research is needed to investigate its robustness in challenging scenarios and to develop additional techniques to address these limitations. Future studies should also consider expanding the dataset to include a larger cohort of patients with a wider spectrum of brain pathologies to ensure the generalizability of our findings.

Conclusions
The CycleGAN method can be used to generate realistic synthetic T2W and FLAIR brain scans, supported by both experimental qualitative and quantitative results. Radiomic features, representing quantitative data extracted from radiology images, hold a lot of promise as a novel approach to quantitatively evaluate similarities between synthetic and real MRI scans and make decisions based on the radiologic quality of the synthetic scans. Synthesis of realistic MRI scans can facilitate imaging of uncooperative patients and significantly shorten the image acquisition time thereby contributing to reducing costs to healthcare systems.

Supplementary Materials:
The following supporting information can be downloaded at: h ps://www.mdpi.com/article/10.3390/cancers15143565/s1, Supplementary S1: The radiomics features extracted in our study encompassed a total of twenty-three features, which were categorized into three groups: Informed Consent Statement: Ethical review and approval were waived for this study due to the information is from a public dataset.

Data Availability Statement:
The data presented in this study are available in this article.