Franken-CT: Head and Neck MR-Based Pseudo-CT Synthesis Using Diverse Anatomical Overlapping MR-CT Scans

Featured The Franken-CT approach allows synthesizing pseudo-CT images from using diverse anatomical overlapping MR-CT datasets as a potential application in PET/MR attenuation correction. Abstract: Typically, pseudo-Computerized Tomography (CT) synthesis schemes proposed in the literature rely on complete atlases acquired with the same ﬁeld of view (FOV) as the input volume. However, clinical CTs are usually acquired in a reduced FOV to decrease patient ionization. In this work, we present the Franken-CT approach, showing how the use of a non-parametric atlas composed of diverse anatomical overlapping Magnetic Resonance (MR)-CT scans and deep learning methods based on the U-net architecture enable synthesizing extended head and neck pseudo-CTs. Visual inspection of the results shows the high quality of the pseudo-CT and the robustness of the method, which is able to capture the details of the bone contours despite synthesizing the resulting image from knowledge obtained from images acquired with a completely different FOV. The experimental Zero-Normalized Cross-Correlation (ZNCC) reports 0.9367 ± 0.0138 (mean ± SD) and 95% conﬁdence interval (0.9221, 0.9512); the experimental Mean Absolute Error (MAE) reports 73.9149 ± 9.2101 HU and 95% conﬁdence interval (66.3383, 81.4915); the Structural Similarity Index Measure (SSIM) reports 0.9943 ± 0.0009 and 95% conﬁdence interval (0.9935, 0.9951); and the experimental Dice coefﬁcient for bone tissue reports 0.7051 ± 0.1126 and 95% conﬁdence interval (0.6125, 0.7977). The voxel-by-voxel correlation plot shows an excellent correlation between pseudo-CT and ground-truth CT Hounsﬁeld Units (m = 0.87; adjusted R 2 = 0.91; p < 0.001). The Bland–Altman plot shows that the average of the differences is low ( − 38.6471 ± 199.6100; 95% CI ( − 429.8827, 352.5884)). This work serves as a proof of concept to demonstrate the great potential of deep learning methods for pseudo-CT synthesis and their great potential using real clinical datasets.


Introduction
In the last 20 years, the interest on synthetizing pseudo-Computerized Tomography (pseudo-CT) images from Magnetic Resonance (MR) images using computer vision and machine learning techniques has been increasing consistently alongside the adoption of hybrid Positron Emission Tomography/Magnetic Resonance (PET/MR) scanners and the improvement in external radiation therapies [1][2][3]. The first approaches used traditional im-to work with datasets containing lower resolution but bigger FOV MR images alongside low dose and reduced FOV CT images.
In this work, we propose a method based on the idea of modality propagation using MR-CT atlases described in preceding developments [11,29] and a deep learning architecture approach inspired on our previous work [22]. However, our database is composed by head and neck MR images and local portions of CT including the brain, paranasal sinuses, facial orbits, and neck studies. Additionally, the deep learning architecture presented here incorporates novel state of the art techniques. With this work we demonstrate that constructing and using incomplete databases still enables accurate results without major limitations, as Deep Learning methods have the potential to be used with this kind of datasets if certain steps and correction are performed on the pipeline.

Materials and Methods
Our proposed Franken-CT approach for pseudo-CT synthesis can be divided into two main steps: (1) a non-overlapping and non-parametric atlas generation, (2) the implementation of the modality propagation for pseudo-CT synthesis algorithms. The details about both steps are described below.

Franken-Computerized Tomography (Franken-CT) Approach
Typically, modality propagation schemes proposed in the literature rely on complete corresponding atlases-volumes acquired with the same FOV-to the input volume. However, CTs are usually acquired in a reduced FOV to decrease patient ionization and acquisition time in the clinical practice. Thus, we base our work on a diverse anatomical overlapping MR-CT atlas. This new atlas may contain complete MRI volumes but only different anatomical overlapping CT scans, as shown in Figure 1.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 and generalizable atlases to train our algorithms in a real clinical setting, we shoul able to work with datasets containing lower resolution but bigger FOV MR images al side low dose and reduced FOV CT images.
In this work, we propose a method based on the idea of modality propagation u MR-CT atlases described in preceding developments [11,29] and a deep learning arch ture approach inspired on our previous work [22]. However, our database is comp by head and neck MR images and local portions of CT including the brain, paranas nuses, facial orbits, and neck studies. Additionally, the deep learning architecture sented here incorporates novel state of the art techniques. With this work we demons that constructing and using incomplete databases still enables accurate results wit major limitations, as Deep Learning methods have the potential to be used with this of datasets if certain steps and correction are performed to the pipeline.

Materials and Methods
Our proposed Franken-CT approach for pseudo-CT synthesis can be divided two main steps: (1) a non-overlapping and non-parametric atlas generation, (2) the im mentation of the modality propagation for pseudo-CT synthesis algorithms. The de about both steps are described below.

Franken-Computerized Tomography (Franken-CT) Approach
Typically, modality propagation schemes proposed in the literature rely on com corresponding atlas-volumes acquired with the same FOV-to the input volume. H ever, CTs are usually acquired in a reduced FOV to decrease patient ionization and a sition time in the clinical practice. Thus, we base our work on a diverse anatomical o lapping MR-CT atlas. This new atlas may contain complete MRI volumes but only d ent anatomical overlapping CT scans, as shown in Figure 1. Head and neck whole field of views (FOVs) could be generated from joining several smaller parts from different patients as shown in the anatomical drawing in (a). Thus, overlapping incomplete CT scans such as a brain (blue box), facial orbits (green box), maxillofacial (red box), and neck (orange box), as shown in (b), could be equivalent to having a single scan covering the same FOV regarding pseudo-computerized tomography (pseudo-CT) synthesis pipelines. Head and neck whole field of views (FOVs) could be generated from joining several smaller parts from different patients as shown in the anatomical drawing in (a). Thus, overlapping incomplete CT scans such as a brain (blue box), facial orbits (green box), maxillofacial (red box), and neck (orange box), as shown in (b), could be equivalent to having a single scan covering the same FOV regarding pseudo-computerized tomography (pseudo-CT) synthesis pipelines. females/6 males) that underwent both MR and CT imaging were selected. MR images were acquired in the sagittal plane to include head and neck in the FOV while CT images fo-cused on local FOVs including brain, paranasal sinuses, facial orbits, or neck studies. MR T1-weighted sequences differed depending on the MR scanner (images were acquired at different field strengths and in different scanner models). Additionally, images were acquired using different coils with different amount of channels. Regarding CT images, all subjects underwent CT examinations depending on their pathologies on an Aquilion Prime CT scanner (Toshiba). Table 1 summarizes the demographic details for all subjects included in the training dataset as well as their corresponding MR and CT scans vendors and models. Extended details are included in Table A1 in Appendix A. Figure 2 shows representative images from different subjects included in the training dataset. Scans of 15 subjects (mean age, 58.2 ± 18.1 years old; range, 25-80 years old; 9 females/6 males) that underwent both MR and CT imaging were selected. MR images were acquired in the sagittal plane to include head and neck in the FOV while CT images focused on local FOVs including brain, paranasal sinuses, facial orbits, or neck studies. MR T1-weighted sequences differed depending on the MR scanner (images were acquired at different field strengths and in different scanner models). Additionally, images were acquired using different coils with different amount of channels. Regarding CT images, all subjects underwent CT examinations depending on their pathologies on an Aquilion Prime CT scanner (Toshiba). Table 1 summarizes the demographic details for all subjects included in the training dataset as well as their corresponding MR and CT scans vendors and models. Extended details are included in Table A1 in Appendix A. Figure 2 shows representative images from different subjects included in the training dataset.  Figure 3 shows representative images from different

Validation Dataset
Scans of 6 subjects (mean age, 39.5 ± 23.42 years old; range, 21-83 years old; 4 females/2 males) that underwent both MR and CT head and neck imaging were selected to validate the Franken-CT approach. Figure 3 shows representative images from different subjects included in the validation dataset. Table 2 summarizes the demographics and technical details of the MR and CT imaging protocols for all subjects included in the validation dataset. Extended details are included in Table A2 in Appendix A. (d) (e) (f) Figure 2. Representative magnetic resonance (MR)-CT pairs from the training dataset showing (a-c) full head and neck MRs and (d-f) their corresponding paranasal sinuses, facial orbits, and brain CT images, respectively.

Validation Dataset
Scans of 6 subjects (mean age, 39.5 ± 23.42 years old; range, 21-83 years old; 4 females/2 males) that underwent both MR and CT head and neck imaging were selected to validate the Franken-CT approach. Figure 3 shows representative images from different subjects included in the validation dataset. Table 2 summarizes the demographics and technical details of the MR and CT imaging protocols for all subjects included in the validation dataset. Extended details are included in Table A2 in Appendix A.

Datasets Preprocessing
Image preprocessing was carried out to normalize all the images in the dataset to the same intensity value range and in the same spatial distribution, including: • MRI bias correction on the anatomical T1-weighted images (N4ITK MRI Bias Correction, 3D Slicer) to correct for inhomogeneities caused by subject-dependent load interactions and imperfections in radiofrequency coils.

•
Resampling of MR and CT images to an isotropic 1 mm space was performed (Resample Scalar Volume, 3D Slicer) to set a common resolution space for all images ((271, 271, 221) pixels) and avoid information loss in the following steps.
• Intra-patient rigid registration to align each MR-CT pair. The method consists of an initial manual registration using characteristic points (Fiducial Registration Wizard, 3D Slicer), an automatic rigid registration step (General Registration Brains, 3D Slicer), and a manual adjustment of the registration (Transforms, 3D Slicer). This is a crucial step and guarantees the correspondence between each anatomical point of both image techniques.

•
Reslicing and crop all MR and CT images to a reference image (Resample Image Brains, 3D Slicer) to ensure the same matrix size prior training our network. • MR histogram matching (MATLAB, MathWorks Inc., Natick, MA, USA) to normalize intensity values between images, especially for those images acquired with different scanners. • CT intensity normalization from −1024 to 3071 Hounsfield Units (HU) (MATLAB, MathWorks Inc.) to ensure a representation of 4096 gray levels, as defined by HU. • MR-CT image information matching (MATLAB, MathWorks Inc.) to ensure there is no MR or CT information in areas where one of the modalities is out of the other, so as to ensure that the same anatomical area is represented in both MR and CT.

Pseudo-CT Synthesis
In this work, we propose a Deep Learning architecture based on a U-net architecture that has been used in various approaches before [18,21,22]. The U-net architectures have been implemented in different ways, nevertheless it is always based on a progressive down-sampling of the feature maps followed by a step of up-sampling to generate the final output. Generally, during the down-sampling phase, several convolution filters and sub-sampling operations, such as max-pooling or convolutions with stride, are employed. On the other hand, during the up-sampling phase, the feature maps are up-sampled using operations, such as unpooling or transposed convolutions. Additionally, before each subsampling operation in the down-sampling phase, the feature maps are usually connected to their counterpart in the up-sampling phase with the same size to improve the quality of the output.
Considering this, we have redesigned the architecture of the U-net to incorporate residual operations and convolutions with dilation. These operations have been successfully employed for image classification [24] and image segmentation [30]. Figure 4 depicts an overview of our proposed U-net architecture incorporating these strategies.
The residual operation consists in a shortcut that adds the feature maps generated by a previous layer to the result of a layer ahead. Figure 5 depicts the residual operation used inside each blue box in our implementation depicted in Figure 4. The residual operation has demonstrated to be very effective to increase the deep of neural networks without a degradation of the results [24]. Therefore, we performed a residuals shortcut after every two convolutions during the down-sampling and the up-sampling phase. The convolutions with dilation conform the convolution called "Atrous" [30], which entails the use of kernels that are not applied directly on neighborhoods in the feature maps but on values that are separated by a few neighbors. Figure 6 gives a visual explanation of the Atrous convolution. We started with a filter with dilation 1, which means a regular convolution filter, but after every two convolutions (i.e., after each residual block) we increased the dilatation parameter by 1 and reset it after every sub-sampling or up-sampling operation.
As for our concrete implementation of the sub-sampling and up-sampling operations during the U-net phases, we decided to use convolutions with stride 2 as sub-sampling and transposed convolutions as up-sampling operations. Depending on the stage of the subsampling and the up-sampling a different number of residual blocks are performed, increasing them as the feature maps are down-sampled and reducing the amount blocks when the feature maps are up-sampled. Finally, after every convolution a batch normalization is performed and then a ReLU activation function is applied to generate the feature maps.  The residual operation consists in a shortcut that adds the feature maps generated by a previous layer to the result of a layer ahead. Figure 5 depicts the residual operation used inside each blue box in our implementation depicted in Figure 4. The residual operation has demonstrated to be very effective to increase the depth of neural networks without a degradation of the results [24]. Therefore, we performed a residual shortcut after every two convolutions during the down-sampling and the up-sampling phase. The convolutions with dilation conform the convolution called "Atrous" [30], which entails the use of kernels that are not applied directly on neighborhoods in the feature maps but on values that are separated by a few neighbors. Figure 6 gives a visual explanation of the Atrous convolution. We started with a filter with dilation 1, which means a regular convolution filter, but after every two convolutions (i.e., after each residual block) we increased the dilatation parameter by 1 and reset it after every sub-sampling or up-sampling operation.

Training and Reconstruction
As input for the network, we used 3D patches of shape 32 × 32 × 32 and we performed 3D convolutions with 3 × 3 × 3 kernels. As loss function, we employed Mean Absolute Error and Adam optimization with a learning rate of 10 −4 . The mini-batch size was set to 8 patches and we performed random rotations to the patches during training for data augmentation. To train the network, we generated a database using all the training volumes to obtain 3D patches with stride 8. We trained the model until convergence which happened after 25 epochs at ~45 MAE, it took around 90 h using a Nvidia RTX 2080Ti. To reconstruct a whole pseudo-CT volume, we divided every input 3D volume in cubes with shape 32 × 32 × 32 using stride 16 in every direction to use them as input in the trained network. However, to compose the pseudo-CT, we used only the inner 16 × 16 × 16 cube of every 3D patch synthetized to improve the quality of the reconstruction and avoid artifacts in the border of each patch. The whole reconstruction of a pseudo-CT volume takes around one minute in the Nvidia RTX 2080Ti.

Evaluation
The zero-normalized cross correlation (ZNCC) similarity metric, Mean Absolute Error (MAE), Structural Similarity Index Measure (SSIM), as well as Dice coefficient for bone class were computed to quantitatively measure the quality of the synthesized pseudo-CT volumes compared with the ground-truth CT volumes and thereby checking if there is an As for our concrete implementation of the sub-sampling and up-sampling operations during the U-net phases, we decided to use convolutions with stride 2 as sub-sampling and transposed convolutions as up-sampling operations. Depending on the stage of the sub-sampling and the up-sampling, a different number of residual blocks are performed, increasing them as the feature maps are down-sampled, and reducing the amount of blocks when the feature maps are up-sampled. Finally, after every convolution a batch normalization is performed and then a ReLU activation function is applied to generate the feature maps.

Training and Reconstruction
As input for the network, we used 3D patches of shape 32 × 32 × 32 and we performed 3D convolutions with 3 × 3 × 3 kernels. As loss function, we employed Mean Absolute Error and Adam optimization with a learning rate of 10 −4 . The mini-batch size was set to 8 patches and we performed random rotations to the patches during training for data augmentation. To train the network, we generated a database using all the training volumes to obtain 3D patches with stride 8. We trained the model until convergence, which happened after 25 epochs at~45 MAE; it took around 90 h using a Nvidia RTX 2080Ti. To reconstruct a whole pseudo-CT volume, we divided every input 3D volume in cubes with shape 32 × 32 × 32 using stride 16 in every direction to use them as input in the trained network. However, to compose the pseudo-CT, we used only the inner 16 × 16 × 16 cube of every 3D patch synthetized to improve the quality of the reconstruction and avoid artifacts in the border of each patch. The whole reconstruction of a pseudo-CT volume takes around one minute in the Nvidia RTX 2080Ti.

Evaluation
The zero-normalized cross correlation (ZNCC) similarity metric, Mean Absolute Error (MAE), Structural Similarity Index Measure (SSIM), as well as the Dice coefficient for bone class were computed to quantitatively measure the quality of the synthesized pseudo-CT volumes compared with the ground-truth CT volumes and thereby checking if there is an overlap of tissues, following Equations (1)-(4): where x, y, and z are the three-dimensional image dimensions; N is the total amount of voxels; pCT (x,y,z) and CT(x,y) are the pseudo-CT and ground-truth CT voxel values for a given (x,y,z) position, respectively; µ pCT and µ CT are the mean HU pseudo-CT and groundtruth CT images, respectively; σ pCT and σ CT are the standard deviation for the pseudo-CT and ground-truth CT images, respectively, and σ pCT,CT is the joint standard deviation; and c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 are two variables to stabilize the division with weak denominator depending on L= dynamic range of pixel values (typically 2 #bits per pixel − 1), k 1 = 0.01 and k 2 = 0.03 by default. The distance range of ZNCC is the interval (−1, 1) (1 for perfect direct correlation, −1 for perfect inverse correlation, and 0 for non-correlation).
where M pCT and M CT are the mask segmentations obtained by thresholding Hounsfield Units values for bone tissues in the pseudo-CT and ground-truth CT, respectively. Mean ± standard deviation (SD), as well as the 95% confidence interval (CI) for ZNCC, MAE, SSIM, and Dice were computed.
Voxel-by-voxel analyses were performed to determine differences in synthesized pseudo-CT compared to the ground truth CT. Voxel-by-voxel correlation plots, Bland-Altman plots, bias, and variability Pearson correlation coefficients were calculated for comparisons. Statistical significance was considered when the p value was lower than 0.01.  Figure 9 shows the MR and CT images as well as the pseudo-CT obtained using our Franken-CT method, demonstrating quite promising correlation between pseudo-CT and CT, considering the relatively small training dataset. Visual inspection of the results showed the high quality of the resulting pseudo-CT and the robustness of the Franken-CT method, which is able to capture the details of the bone contours and spikes in non-smooth areas such as the sinuses and the cervical vertebrae. The shape of the skull was estimated correctly despite synthesizing the resulting image from knowledge obtained from images acquired with a completely different FOV. Neck areas show limited detail in resolution compared to upper brain areas, probably due to the difference in the number of atlases for those specific areas. Moreover, certain differences can be noticed in nasal sinus cavity, mastoid cells and in air cavity delineation, possibly due to the complexity of these anatomies. Overall, the image texture is a bit smoother in the pseudo-CT compared to ground-truth CT.

Franken-CT Approach Results
| | (4) where and are the mask segmentations obtained by thresholding Hounsfield Units values for bone tissues in the pseudo-CT and ground-truth CT, respectively.
Voxel-by-voxel analyses were performed to determine differences in synthesized pseudo-CT compared to the ground truth CT. Voxel-by-voxel correlation plots, Bland-Altman plots, bias, and variability Pearson correlation coefficients were calculated for comparisons. Statistical significance was considered when the p value was lower than 0.01.     Figure 9 shows the MR and CT images as well as the pseudo-CT obtained using our Franken-CT method, demonstrating quite promising correlation between pseudo-CT and CT, considering the relatively small training dataset. Visual inspection of the results showed the high quality of the resulting pseudo-CT and the robustness of the Franken-CT method, which is able to capture the details of the bone contours and spikes in nonsmooth areas such as the sinuses and the cervical vertebrae. The shape of the skull was estimated correctly despite synthesizing the resulting image from knowledge obtained from images acquired with a completely different FOV. Neck areas show limited detail in resolution compared to upper brain areas, probably due to the difference in the number The experimental ZNCC was 0.9220 ± 0.0255 and 95% confidence interval (0.9010, 0.9430); the experimental Mean Absolute Error (MAE) was 73.9149 ± 9.2101 HU and 95% confidence interval (66.3383, 81.4915); the Structural Similarity Index Measure (SSIM) was 0.9943 ± 0.0009 and 95% confidence interval (0.9935, 0.9951); and the experimental Dice coefficient for bone tissue was 0.7051 ± 0.1126 and 95% confidence interval (0.6125, 0.7977). Moreover, the voxel-by-voxel correlation plot as well as the Bland-Altman plot between pseudo-CT and CT were computed for all test participants ( Figure 10). The correlation plot showed an excellent correlation between pseudo-CT Hounsfield Units and ground truth CT Hounsfield Units (m = 0.87; adjusted R 2 = 0.91; p < 0.001). The Bland-Altman plot showed that the average of the differences was low (−38.6471 ± 199.6100; 95% CI (−429.8827, 352.5884)); the difference between methods tended to decrease as the average increased, accumulating the error in voxels around 0 HU. Figure 9 shows the MR and CT images as well as the pseudo-CT obtained using our Franken-CT method, demonstrating quite promising correlation between pseudo-CT and CT, considering the relatively small training dataset. Visual inspection of the results showed the high quality of the resulting pseudo-CT and the robustness of the Franken-CT method, which is able to capture the details of the bone contours and spikes in nonsmooth areas such as the sinuses and the cervical vertebrae. The shape of the skull was estimated correctly despite synthesizing the resulting image from knowledge obtained from images acquired with a completely different FOV. Neck areas show limited detail in resolution compared to upper brain areas, probably due to the difference in the number of atlases for those specific areas. Moreover, certain differences can be noticed in nasal sinus cavity, mastoid cells and in air cavity delineation, possibly due to the complexity of these anatomies. Overall, the image texture is bit smoother in the pseudo-CT versus ground-truth CT. The experimental ZNCC was 0.9220 ± 0.0255 and 95% confidence interval (0.9010, 0.9430); the experimental Mean Absolute Error (MAE) was 73.9149 ± 9.2101 HU and 95% confidence interval (66.3383, 81.4915); the Structural Similarity Index Measure (SSIM) was 0.9943 ± 0.0009 and 95% confidence interval (0.9935, 0.9951); and the experimental Dice coefficient for bone tissue was 0.7051 ± 0.1126 and 95% confidence interval (0.6125,0.7977). Moreover, the voxel-by-voxel correlation plot as well as the Bland-Altman plot between pseudo-CT and CT were computed for all test participants (Figure 10). The correlation plot showed an excellent correlation between pseudo-CT Hounsfield Units and ground truth CT Hounsfield Units (m = 0.87; adjusted R 2 = 0.91; p < 0.001). The Bland-Altman plot showed that the average of the differences was low (−38.6471 ± 199.6100; 95% CI (−429.8827, 352.5884)); the difference between methods tended to decrease as the average increased, accumulating the error in voxels around 0 HU. The experimental ZNCC was 0.9220 ± 0.0255 and 95% confidence interval (0.9010, 0.9430); the experimental Mean Absolute Error (MAE) was 73.9149 ± 9.2101 HU and 95% confidence interval (66.3383, 81.4915); the Structural Similarity Index Measure (SSIM) was 0.9943 ± 0.0009 and 95% confidence interval (0.9935, 0.9951); and the experimental Dice coefficient for bone tissue was 0.7051 ± 0.1126 and 95% confidence interval (0.6125,0.7977). Moreover, the voxel-by-voxel correlation plot as well as the Bland-Altman plot between pseudo-CT and CT were computed for all test participants ( Figure 10). The correlation plot showed an excellent correlation between pseudo-CT Hounsfield Units and ground truth CT Hounsfield Units (m = 0.87; adjusted R 2 = 0.91; p < 0.001). The Bland-Altman plot showed that the average of the differences was low (−38.6471 ± 199.6100; 95% CI (−429.8827, 352.5884)); the difference between methods tended to decrease as the average increased, accumulating the error in voxels around 0 HU.

Discussion
The generation of precise pseudo-CT images, and thus the production of accurate AC

Discussion
The generation of precise pseudo-CT images, and thus the production of accurate AC maps, is a basic step for PET/MRI quantification. Several approaches have been proposed in the literature providing exciting results, but most of them focus on specific areas of the body or use specific acquisitions in order to try those methods. However, the reality in a clinical setting shows the trend to minimize FOVs in CT acquisitions, making it difficult to create high quality atlas to be used for these applications. Additionally, there is a tendency to extend FOVs to increase the amount of multimodal information and to move to whole-body applications in PET/MR.
In this work, we proposed the use of DL to acquire knowledge from diverse anatomical areas, from an overlapping MR-CT atlas, and use that information to be able to synthesize a pseudo-CT volume corresponding to a bigger FOV image. Thus, we demonstrated how using different images including brain, paranasal sinuses, facial orbits, and neck studies can lead to the successful generation of continuous head and neck pseudo-CTs. For reproducing this achievement through a new dataset with the Franken-CT approach, all the specific preprocessing steps described previously in the Materials and Methods section need to be followed every time.
Our results are in line with those recently described by other authors but using complete and/or dedicated atlases. The qualitative ( Figure 9) and quantitative ( Figure 10) image quality analyses performed showed that the CT and the pseudo-CT obtained with our Franken-CT method are very similar. On the one hand, the visual inspection shows a good correspondence between both images but a limited detail in the neck region, nasal sinus cavity, mastoid cells, and air cavity delineation. This fact is possibly due to the higher complexity of those regions and the limited amount of neck scans in the train atlas; increasing the number of such MR-CT pairs in the training dataset should improve the resulting pseudo-CTs, as previously demonstrated in state-of-the-art investigations about the effect of training dataset sizes [31]. However, despite these limitations, visual comparison of our synthesized pseudo-CT images with those previously reported in previous works shows our method provides better images than most classical and recently proposed deep-learning methods in the literature [14,32,33]. On the other hand, the high ZNCC values indicate that our method can accurately approximate the patient-specific CT volume, despite using an atlas composed of diverse anatomical overlapping MR-CT scans. Previously described patch-based pseudo-CT synthesis methods reported an experimental ZNCC of 0.9349 ± 0.0049 for a whole head and neck atlas including 18 MR-CT datasets [11,29], which is very similar to the experimental ZNCC of 0.9220 ± 0.0255 achieved in this work. Likewise, other patch-based methods of the state-of-the-art provide average ZNCC of 0.91 ± 0.03 and mean MAE of 125.46 ± 24.45 HU [34], in contrast with the results presented in this work of average ZNCC and the experimental MAE of 73.9149 ± 9.2101 HU. Works based in CNN report average SSIM of 0.92 ± 0.02 and mean MAE of 75.7 ± 14.6 [35] in comparison with average SSIM of 0.9943 ± 0.0009 and mean MAE achieved by our Franken-CT approach. In the evaluation of overlapping in bone tissues other authors reported a Dice coefficient of 0.73 ± 0.08 [36] which is in line with the average Dice coefficient of 0.7051 ± 0.1126 for bone overlap reported in this work, quite promising but probably impacted due to the neck region results discussed previously. These facts suggest that (i) the use of an atlas composed of diverse anatomical overlapping MR-CT scans can produce similar results to those reached by complete datasets and, (ii) deep learning methods enable extracting more features and information than classical methods. Recently, works on deep-learning pseudo-CT synthesis reported a Pearson correlation of up to 0.943 ± 0.009 using a similar architecture to the one presented in this work [37]; again, this demonstrates: (i) the great potential of deep learning methods to extract features and information and, (ii) the increased performance of architectures including residual operations to avoid the degradation in the feature maps generated. Additionally, the correlation and the Bland-Altman plots ( Figure 10) suggest this method leads to very accurate results, and in line with those previously reported in the literature, despite minor errors mainly accumulated in voxels between 25 and 125 HU (banding artifacts in the correlation and Bland-Altman plots). We further investigated this issue and found that most of these mislabeled voxels are predominantly located in image boundaries/edges and ears (due to slight differences between MR and CT), air-filled cavities, and dental implants, as could be appreciated in Figure 9. Despite the weaknesses presented, as the synthesized image is not going to be used for diagnostic reading performed by radiologists, but just for PET attenuation correction, the level of detail of the pseudo-CT does not need to be equivalent to a real acquired CT. In this specific context, a minor loss in spatial resolution is acceptable, as the inherent PET resolution is lower than CT or MR spatial resolution.
The approach presented in this work could be of great potential for tasks where the skull estimation and/or the pseudo-CT computation is needed, such as PET/MR attenuation correction and radiotherapy planning where MR-CT datasets are usually limited, leading to larger and highly flexible and generalizable atlases.
This method could be adapted to real clinical scenarios as training these algorithms requires long times but their use and application for synthesis is very fast, synthesizing a complete pseudo-CT volume in a similar time to that needed to acquire an actual CT scan. Additionally, our technique could, in theory, be applied in other regions of the body, potentially allowing for whole-body pseudo-CT synthesis using atlases designed with the same hypothesis. Further research will be aimed in that direction.
Our study presents several limitations. First, our training set was relatively small, which has a bigger effect when using the described Franken-CT approach compared to traditional methods. this is due to the fact that we only have a subset of subjects including information for specific anatomical areas, decreasing the effective N of the atlas. Additionally, our atlas contained subjects biased towards high ages. However, our approach proves that not discarding reduced FOV images allows producing accurate results; therefore, using as many datasets as available to increase the number of datasets and produce larger databases for DL training will allow to train more generalizable CNNs than those reported previously. Further improvement of the current model could be achieved by increasing the number of training volumes as well as its heterogeneity. Furthermore, in a real clinical scenario, the atlas should be designed to include the necessary anatomical heterogeneity to map any potential conditions related to anatomical and pathological variability among patients (i.e., patients scanned with contrast agents, patients with lesions, such as tumors or sclerotic lesions, or with implants). Again, this limitation leads to the need for larger datasets to generalize our results; nevertheless, our method proved its potential to use reduced FOV datasets and, consequently, facilitating the generation of such databases. Thus, the design of specific atlases and models that consider these conditions should be considered as a future line of exploration. Finally, comparing the resulting PET attenuation corrected images using both, pseudo-CT and CT, µ maps would be helpful to assess the potential use of our resulting pseudo-CTs and better illustrate the clinical utility of our method, which should be considered and assessed in future works.

Conclusions
In this work, we showed that extended head and neck pseudo-CTs can be synthesized using an atlas composed of diverse anatomical overlapping MR-CT scans and deep learning methods. We also showed that the proposed method introduces only minimal bias compared with typical pseudo-CT synthesis approaches described in the literature. This work serves as a proof of concept to demonstrate the great potential of deep learning methods for modality propagation as well as the feasibility of these methods using real clinical datasets. Institutional Review Board Statement: Ethical review and approval were waived for this study, due to the retrospective design of the study and the fact that all data used were from existing and anonymized clinical datasets.
Informed Consent Statement: Patient consent was waived due to the retrospective nature of this study.

Acknowledgments:
The authors thank Ursula Alcañas Martinez (Hospital Universitario HM Puerta del Sur, HM Hospitales) for helping with data management.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.