Phantom Study on the Robustness of MR Radiomics Features: Comparing the Applicability of 3D Printed and Biological Phantoms

The objectives of our study were to (a) evaluate the feasibility of using 3D printed phantoms in magnetic resonance imaging (MR) in assessing the robustness and repeatability of radiomic parameters and (b) to compare the results obtained from the 3D printed phantoms to metrics obtained in biological phantoms. To this end, three different 3D phantoms were printed: a Hilbert cube (5 × 5 × 5 cm3) and two cubic quick response (QR) code phantoms (a large phantom (large QR) (5 × 5 × 4 cm3) and a small phantom (small QR) (4 × 4 × 3 cm3)). All 3D printed and biological phantoms (kiwis, tomatoes, and onions) were scanned thrice on clinical 1.5 T and 3 T MR with 1 mm and 2 mm isotropic resolution. Subsequent analyses included analyses of several radiomics indices (RI), their repeatability and reliability were calculated using the coefficient of variation (CV), the relative percentage difference (RPD), and the interclass coefficient (ICC) parameters. Additionally, the readability of QR codes obtained from the MR images was examined with several mobile phones and algorithms. The best repeatability (CV ≤ 10%) is reported for the acquisition protocols with the highest spatial resolution. In general, the repeatability and reliability of RI were better in data obtained at 1.5 T (CV = 1.9) than at 3 T (CV = 2.11). Furthermore, we report good agreements between results obtained for the 3D phantoms and biological phantoms. Finally, analyses of the read-out rate of the QR code revealed better texture analyses for images with a spatial resolution of 1 mm than 2 mm. In conclusion, 3D printing techniques offer a unique solution to create textures for analyzing the reliability of radiomic data from MR scans.


Introduction
Conventional analyses of cancerous lesions rely on invasive histological samples, which have several limitations, including discomfort for the patient and potential problems extracting the whole lesion for total-lesion heterogeneity analysis. Therefore, alternatives to the invasive methods are desired. One possible solution is the non-invasive, in-vivo radiomic assessment of images acquired in radiological and nuclear medicine settings [1][2][3][4][5][6]. The major strength in using radiomics as an alternative to histological sampling is the multitude of imaging series, modalities, and follow-up acquisitions employed in the clinical assessment of patients. Radiomic assessments of the various imaging modalities may give more personalized treatment regimes without requiring additional testing of the patients.
However, several challenges have been identified in the clinical translation of radiomics which may affect their applicability in the clinical routine. First, the spatial resolution of the diagnostic imaging modalities (spatial resolutions of 0.5-5 mm) are orders of magnitude worse than the histological samples (spatial resolution of 10 −4 -10 −3 mm). Second, the choice of radiomic features may be affected by the diagnostic image modality and their modality-specific reconstruction settings [7][8][9][10][11]. Third, image discretization and normalization procedures may affect the radiomic analyses. While these problems have been identified, previous studies have reported great potential in the radiomic assessments for both CT and nuclear medicine imaging settings, suggesting that the discrepancy in spatial resolution nor the choice of imaging modality does not hinder radiomic assessments.
In terms of discretization of the images, two methods exist: the fixed bin size (FBS) and fixed bin number (FBN) [12]. Previous studies have identified that FBS may be the method of choice for quantitative imaging modalities (PET and CT); however, its feasibility on non-quantitative images (MRI) has not yet been evaluated. Therefore, it is of interest to assess how different discretization processes and radiomic features may be employed in MR imaging. Recently, the Image Biomarker Standardization Initiative (IBSI) has sought to standardize the radiomic feature extraction process, resulting in nearly 1000 different radiomics parameters [12]. One way to assess the influence of the discretization processes may include using physical phantoms, such as fruits and vegetables. However, identifying the most suitable phantom in MR imaging is not trivial, as the choice may affect both repeatability and reliability of the radiomics indexes (RIs) during analyses [9,10,[13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. Previous studies have mainly relied on biological phantoms, which decay over time, thus limiting their applicability in longitudinal and multi-center studies [14][15][16]. Recent developments in 3D printing have facilitated the possibility of printing image modality-specific phantoms at a low price and fast production time, which may serve as an alternative to biological phantoms [27][28][29][30][31]. The 3D-printing technique permits manufacturing complicated phantoms, which may arise from mathematical objects or real patient data. These phantoms are strict in shape and consistency and give the possibility to ignore or emphasize even a small number of details. However, dedicated 3D designed and printed MR radiomics phantoms have not yet appeared in the MR imaging literature.
This study aimed to compare radiomics obtained for biological and 3D printed phantoms. To this end, we developed fillable Quick Response (QR) code-and Hilbert curvebased 3D printed models to introduce new MR radiomic phantoms. QR codes and Hilbert curves are well structured and precisely defined mathematical objects, ensuring decent reproducibility in manufacturing [32,33]. QR codes may provide an additional analytical option through the potential of the read-out success rate of the information stored on a deeper level in MR images [34]. The 3D-printed phantoms were compared to biological phantoms that have previously been found useful in the literature (kiwis, onions, and tomatoes) [35]. QR codes and Hilbert curves are well structured and precisely defined mathematical objects, ensuring decent reproducibility in manufacturing [32]. QR codes may provide an additional analytical option through the potential of the read-out success rate of the information stored on a deeper level in MR images [34].

Biological Phantoms
Given the high water content, vegetables and fruits reflect different signal intensities, shapes, and "tissue" textures in MR images, making them ideal as biological phantoms [13,15,25,36]. For this study, the biological phantoms were selected using the following criteria: the phantom must be water-rich, have an appropriate size (below 5 × 5 × 5 cm 3 ), have certain degrees of hardness and heterogeneity, and have stable structural characteristics. Following the selection criteria, three fruits and vegetable types were considered: kiwis (n = 4), tomatoes (n = 3), and onions (n = 3). Representative MR images Diagnostics 2022, 12, 2196 3 of 24 of one of the selected kiwis are shown in Figure 1. For every acquisition, all the fruits and vegetables were placed in two separate holders to fix and standardize the place of the phantoms during the acquisition steps. Supplementary Photo S1 is shown the placement of biological phantoms. Further, the fixation ensured a harmonic orientation of the phantoms across different acquisitions. The first holder contained four kiwis, while the second contained the tomatoes and onions. A pre-selected kiwi was rotated perpendicularly to its primary axis's (labeled by kiwi rot ) between its three repetitions to examine the possible influence of the phantom orientation on the computed RI [36][37][38][39]. sities, shapes, and "tissue" textures in MR images, making them ideal as biological phantoms [20,22,23,36]. For this study, the biological phantoms were selected using the following criteria: the phantom must be water-rich, have an appropriate size (below 5 × 5 × 5 cm 3 ), have certain degrees of hardness and heterogeneity, and have stable structural characteristics. Following the selection criteria, three fruits and vegetable types were considered: kiwis (n = 4), tomatoes (n = 3), and onions (n = 3). Representative MR images of one of the selected kiwis are shown in Figure 1. For every acquisition, all the fruits and vegetables were placed in two separate holders to fix and standardize the place of the phantoms during the acquisition steps. Further, the fixation ensured a harmonic orientation of the phantoms across different acquisitions. The first holder contained four kiwis, while the second contained the tomatoes and onions. A pre-selected kiwi was rotated perpendicularly to its primary axis's (labeled by kiwirot) between its three repetitions to examine the possible influence of the phantom orientation on the computed RI [36][37][38][39].

3D Printed Phantoms
Two types of distinct, hollow plastic objects were planned and constructed. First, a Quick Response (QR) code was modeled within a fillable container, including the "UNIDEB MRI Texture Analysis Phantom" text information. We used a web application for QR code production (https://www.qrcode-monkey.com/#text (accessed on 6 May 2021)). The QR code was printed into two 3D models with different sizes (large QR (5 × 5 × 4 cm 3 ) and small QR (4 × 4 × 3 cm 3 ), with respective heights of the QR code of 3 and 2 cm)) using the Trimble SketchUp Pro 2020 (Trimble Inc., Sunnyvale, CA,

3D Printed Phantoms
Two types of distinct, hollow plastic objects were planned and constructed. First, a Quick Response (QR) code was modeled within a fillable container, including the "UNIDEB MRI Texture Analysis Phantom" text information. We used a web application for QR code production (https://www.qrcode-monkey.com/#text (accessed on 6 May 2021)). The QR code was printed into two 3D models with different sizes (large QR (5 × 5 × 4 cm 3 ) and small QR (4 × 4 × 3 cm 3 ), with respective heights of the QR code of 3 and 2 cm) using the Trimble SketchUp Pro 2020 (Trimble Inc., Sunnyvale, CA, USA). Both cubic QR code containers were constructed and saved in Standard Tessellation Language (STL) file format ( Figure 2, first row) [33,34,40,41].
As the second type of printed phantom, we chose a 3D Hilbert cube with a volume of 5 × 5 × 5 cm 3 [42]. The recursive process fills the entire space with one continuous line. We used an STL file of a pre-created Hilbert cube from the Thingiverse website to embed it to the mentioned volume (https://www.thingiverse.com/thing:1762713 (accessed on 10 June 2021)). From the downloaded STL files, the second recursion level was chosen due to its sufficient complexity and ease of redesign. We redesigned the original model containing several small vertical and horizontal support lines to achieve a more robust form to ensure successful printing and make an extra chessboard-like pattern in three dimensions. This model has an outside pattern with a plastic-air alternating rectangle and includes the so-called Hilbert square pipe ( Figure 2, second row).
All three phantoms were 3D printed from Polylactic Acid (3DJake ecoPLA, White) filament on a Creality Ender 3, Fused Deposition Modeling (FDM) type 3D printer with a 0.4 mm nozzle diameter [27,30]. The print plan was achieved using Repetier-Host (Hot-World GmbH and Co. KG, Willich, Germany) software with the following settings: 100% infill; 0.2 mm layer height; no support; brim adhesion; 40 mm/s print speed; 210 • C hot end, and 60 • C print bed temperatures. The print time was around 4 h for each object [30,43]. Each 3D printed phantom was filled with 0.01 mM NiCL 2 solution. Representative MR images of the QR and the Hilbert cube phantoms are presented in Figure 3.

MR Scanning
All phantoms were scanned at the MR imaging facility of the Clinical Center, University of Debrecen. The phantoms were scanned in two MR systems, a 3 T Philips Achieva and a 1.5 T Siemens Magnetom Essenza MRI scanner. The scans were performed with a 6-channel head coil in the 1.5 T MR and an 8-channel head or a 32channel neurovascular coil in the 3 T MRI system. In both devices, isotropic 3D T2weighted and 3D T1-weighted sequences were obtained using protocols employed in the clinical routine, with minor adjustments to the number of repetitions to ensure sufficient image quality in the smaller volumes. In the Philips MR system, the T1-and T2-weighted measurements were obtained using a 3D BrainVIEW protocol which provides high-resolution isotropic data. For the Siemens system, the SPACE (Sampling Perfection with Application optimized Contrasts using different flip angle Evolution) sequence was applied for T2 acquisitions, which provides isotropic 3D images. For T1-weighted Gradient-Echo (GRE) measurements, the sequence MPRAGE (Magnetization-Prepared Rapid GRE) was utilized [47][48][49].
The acquisition parameters for the acquisition protocols are listed in Table 1. Each imaging protocol was performed using two different isotropic voxel resolutions (1 × 1 × 1 mm 3 and 2 × 2 × 2 mm 3 ). The phantoms were scanned thrice for each setting in both MR systems, denoted as repeated scans or repetitions below. Specific to the Philips MR system, a new table position had to be chosen before each repetition. After completing imaging protocol, the acquisition software automatically moved the table from the gantry. However, the phantoms were always placed in the isocenter, the most homogeneous magnetic field region. Representative coronal MR images of 3D printed phantoms. The left is the large QR cube, and the right is the Hilbert cube. The interested readers can read the presented QR code MR image using their smartphone's QR code reader.

MR Scanning
All phantoms were scanned at the MR imaging facility of the Clinical Center, University of Debrecen. The phantoms were scanned in two MR systems, a 3 T Philips Achieva and a 1.5 T Siemens Magnetom Essenza MRI scanner. The scans were performed with a 6-channel head coil in the 1.5 T MR and an 8-channel head or a 32-channel neurovascular coil in the 3 T MRI system. In both devices, isotropic 3D T2-weighted and 3D T1-weighted sequences were obtained using protocols employed in the clinical routine, with minor adjustments to the number of repetitions to ensure sufficient image quality in the smaller volumes. In the Philips MR system, the T1-and T2-weighted measurements were obtained using a 3D BrainVIEW protocol which provides high-resolution isotropic data. For the Siemens system, the SPACE (Sampling Perfection with Application optimized Contrasts using different flip angle Evolution) sequence was applied for T2 acquisitions, which provides isotropic 3D images. For T1-weighted Gradient-Echo (GRE) measurements, the sequence MPRAGE (Magnetization-Prepared Rapid GRE) was utilized [44][45][46].
The acquisition parameters for the acquisition protocols are listed in Table 1. Each imaging protocol was performed using two different isotropic voxel resolutions (1 × 1 × 1 mm 3 and 2 × 2 × 2 mm 3 ). The phantoms were scanned thrice for each setting in both MR systems, denoted as repeated scans or repetitions below. Specific to the Philips MR system, a new table position had to be chosen before each repetition. After completing imaging protocol, the acquisition software automatically moved the table from the gantry. However, the phantoms were always placed in the isocenter, the most homogeneous magnetic field region.

Image Visualization and Segmentation
A radiologist and a radiologist MR specialist with 30 and 10 years of MRI experience, respectively, analyzed the image data qualitatively in 3D Slicer software [47,48] to determine how close the imaged texture of each object was to the actual pattern and heterogeneity.
Image alignment was performed semiautomatically using the 3D Slicer open-source software (version 4.10.2 r28257) [49]. A cubic volume of interest (VOI) was placed separately on each 3D printed phantom using the CreateModels tool from the SlicerIGT and the Segmentations modules. VOI sizes were 50 × 50 × 50, 55 × 55 × 55, and 45 × 45 × 45 mm 3 for the Hilbert cube, the large QR, and the small QR phantoms, respectively. The "Grow from seeds" algorithm was applied for biological phantoms to define VOIs that blend into the surface of fruits and vegetables as much as possible. The semiautomatically generated VOIs were manually corrected by excluding the border zone between fruit/vegetable and surrounding air and the most apical and basal slice of each fruit/vegetable using a brusherase tool. Corresponding label maps from segmentations of all images were exported and saved to Neuroimaging Informatics Technology Initiative (NIfTI) file format for further processing and calculation [50,51].

Normalization and Discretization
In this study, we used a so-called µ ± 3σ normalization technique with µ being the mean and σ the standard deviation of the image [52][53][54][55][56]. Two different discretization techniques were considered; the FBS and the FBN techniques. The FBS method is defined as where I(i) and I FBS (i) are the original and the transformed intensity level of the ith voxel [53,54,57,58]. The [] brackets stand for the ceil operation. We choose B = 0.15 for normalized images and B = 50 for non-normalized ones to have a similar number of bins in both cases [7,12,59].
The FBN method is calculated by where I FBN (i) is the new intensity value of the i-th voxel intensity after the FBN discretization, I max is the maximum, I min is the minimum original voxel intensity of the particular lesion, and D is the number of bin parameters. We set D to 64.

Texture Calculation
RIs were extracted using the GLCM (Gray Level Co-Occurrence Matrix), GLSZM (Gray Level Size Zone Matrix), and GLRLM (Gray Level Run Length Matrix)-based algorithms implemented in MATLAB (version 2020) [7]. A total of 40 radiomic features were extracted from each VOI, divided into 18 GLCM, 11 GLSZM, and 11 GLRLM metrics. All 40 RIs (Supplementary Table S1) were calculated according to the IBSI guideline [53,59]. In addition, five basic histogram-based statistical parameters were also determined as the minimum and maximum value, mean, median, and VOI volume in voxel numbers. All 45 features were determined for the segmented volumes for each acquisition and discretization setup ( Table 2).

Coefficient of Variation
For each RI and each acquisition, we report the mean, standard deviation, and coefficient of variation (CV), defined as where std(RI) and mean(RI) represent the standard deviation and mean for the three computed radiomics indices. We utilized the CV parameter as a measure of the repeatability of a group or object [9,18,60,61].

Relative Parameter Differences
The relative parameter difference (RPD) value was employed to measure the relative RI difference of a given phantom between two different measurement setups. The relative parameter difference was defined as: With mean(RI setup1 ) and mean(RI setup2 ) being the mean of the three RI at two different acquisition setups (for the definitions of acquisition setups, see Table 2). Table 3 shows all the acquisition setup pairs and the related abbreviation used in the comparative analysis for each phantom type for T1 contrast. For the T2 contrast, the number of comparisons was the same, giving a total of 14 comparisons for both contrasts. Table 3. List of abbreviations for calculating the RPD at T1 and T2 contrast. The first column contains the abbreviations, and the second and third columns show the serial number of the acquisition setup defined in Table 2.

Interclass Correlation Coefficient
For the repeatability test, the interclass correlation coefficient (ICC) for two-way mixed effects [62] was calculated for each RI based on absolute agreement, single rater/measurement model. The ICC was determined by matching results (averages of the repeated measures) from two different measurement setups and using every phantom with the following formula: where MS R stands for mean square for rows, MS E is the mean square of error, MS C the mean square of columns, n, and k are the numbers of subjects and the number of raters/measurements. Based on the ICC-values the RI were considered to be either excellent (ICC > 0.9), good (0.75 < ICC ≤ 0.9), moderate (0.5 < ICC ≤ 0.75) and poor (ICC ≤ 0.5) repeatability [21,[63][64][65][66]. Every calculation was performed using MATLAB (2020, The MathWorks Inc., Natick, MA, USA) and Microsoft Office Excel software. Figure 4A shows representative images of the large QR and the Hilbert phantoms at T1 weighted contrast and different acquisition resolutions (1 mm on the left and 2 mm on the right). We report a robust read-out of the Hilbert cube disregarding the image resolutions. In contrast, significant changes in the high-frequency patterns were reported for the QR cubes when the image resolution was changed ( Figure 4C). After visual inspection of the high-resolution and low-resolution images, a radiologist and a radiographer MR specialist reported that the textures observed in the biological phantoms were comparable to the real heterogeneity. However, the finer structures (high-frequency features) were blurred ( Figure 4B,D).

Coefficient of Variation
For repeatability purposes, CVs were determined using normalized and nonnormalized data from each object with each MR acquisition setup ( Supplementary  Figures S1 and S2). Improved repeatability (lower CV) measures were observed for all phantoms following the normalization of the data (Tables 4 and 5).

Coefficient of Variation
For repeatability purposes, CVs were determined using normalized and non-normalized data from each object with each MR acquisition setup ( Supplementary Figures S1 and S2). Improved repeatability (lower CV) measures were observed for all phantoms following the normalization of the data (Tables 4 and 5). Below, we present only results from the normalized data. Given the poor repeatability (CV > 10%), a subset of RIs were excluded from the subsequent analyses (Jmax, Energy, ClusterShade, HGRE, SRHGE, LRHGE, LZE, LZLGE, LZHGE).
The CVs for the remaining 31 RI were obtained using the different weighting (T1 and T2), and their repeatability coefficients are shown in Figures 5-7 and Supplementary Figure S3. Table 6 comprises the corresponding CV average values for different property groups (Acquisition setup, Texture parameter, and Discretization method). In general, it may be observed that most texture features' repeatability was better at 1.5 T field strengths than at 3 T (CV average is 1.94 at 1.5 T and 2.11 at 3 T) ( Figures 5-7 and Table 4). In addition, no major differences were observed between the FBN and FBS discretization (CV average (FBN) = 2.04 and CV average (FBS) = 2.06).
Each object's CV result from different points of view is presented in Table 6. The choice of discretization did not affect the repeatability measures (Table 6). In the texture parameters group, increased histogram-based CV values can be observed at the kiwis compared to the other objects' same calculations. This phenomenon was not identified for the kiwi rot (kiwi rotated through its primary axis between measurements). Columns of the acquisition setup group reveal that CV values are usually smaller at 1.5 T field strength and 1 mm resolution. Further, when excluding the kiwis, no difference in the repeatability was observed between the biological and 3D phantoms. Detailed CV data for all objects and acquisition setups are presented in Figures 5-7 heatmaps.     Figure 7 shows the sensitivity of the radiomics when the scanned object rotates in the MR system. The CV calculated from the three orthogonal orientations (kiwirot) is larger than the CVs obtained from three repetitions of a given kiwi in the conventional acquisition orientation.

Relative Difference
The relative parameter difference (RPD) values are shown in the Supplementary Figure 6. The biggest relative differences are observed when comparing results obtained at different MR field strengths.    Figure 7 shows the sensitivity of the radiomics when the scanned object rotates in the MR system. The CV calculated from the three orthogonal orientations (kiwi rot ) is larger than the CVs obtained from three repetitions of a given kiwi in the conventional acquisition orientation.

Relative Difference
The relative parameter difference (RPD) values are shown in the Supplementary Figure S6. The biggest relative differences are observed when comparing results obtained at different MR field strengths.

QR Code Readability Test
QR code readability outcomes show apparent dependency on the acquisition res olution with a standalone Python code and mobile phones. None of the applied read out methods could detect the information from the 2 × 2 × 2 mm 3 resolution scans, and only some of the 1 × 1 × 1 mm 3 resolution images could be successfully read. Read-ou ratios originating from the Python code-based process are presented in Figure 9. Ra tios are calculated from the three repeated measurements as the average read-out ra tio for each type of measurement.

QR Code Readability Test
QR code readability outcomes show apparent dependency on the acquisition resolution with a standalone Python code and mobile phones. None of the applied read-out methods could detect the information from the 2 × 2 × 2 mm 3 resolution scans, and only some of the 1 × 1 × 1 mm 3 resolution images could be successfully read. Read-out ratios originating from the Python code-based process are presented in Figure 9. Ratios are calculated from the three repeated measurements as the average read-out ratio for each type of measurement.
Mobile phone read-out numbers are shown in Table 7. The related score distribution is similar to the result shown in Figure 9. Many of the T1 images of the QR cubes were readable by the phones.
Representative QR code images are shown in Figure 10 at T1 and T2 weighting and 3T field strength.
olution with a standalone Python code and mobile phones. None of the applied readout methods could detect the information from the 2 × 2 × 2 mm 3 resolution scans, and only some of the 1 × 1 × 1 mm 3 resolution images could be successfully read. Read-out ratios originating from the Python code-based process are presented in Figure 9. Ratios are calculated from the three repeated measurements as the average read-out ratio for each type of measurement. Figure 9. Averaged read-out ratios of the three repeated measurements in the case of the Python program processing. All original and interpolated coronal images of small QR and large QR phantoms were read by the developed Phyton code. Data from the 2 × 2 × 2 mm 3 resolution are not presented due to the lack of successful decoding. Figure 9. Averaged read-out ratios of the three repeated measurements in the case of the Python program processing. All original and interpolated coronal images of small QR and large QR phantoms were read by the developed Phyton code. Data from the 2 × 2 × 2 mm 3 resolution are not presented due to the lack of successful decoding. Table 7. Read-out results by different smartphones. The middle coronal images of the three consecutive measurements were used to show the ratio of the successful and total decrypting of the QR code information. Data from the 2 × 2 × 2 mm 3 resolution are not presented due to the lack of successful decoding.

Read-Out Ratio
Phone 1 Phone 2 Phone 3 Phone 4 Phone 5 1/3 --2/3 -Representative QR code images are shown in Figure 10 at T1 and T2 weighting and 3T field strength. Figure 10. Representative coronal images from the small and the large QR phantom scans at 3 Tesla field strength and 1 × 1 × 1 mm 3 resolution. T1 and T2 contrast images are in the left and right columns, respectively. The encoded information can only be read out with a smartphone from the T1 image of the large QR phantom.

Discussion
This study aimed to test the feasibility of using printed 3D phantoms with structural information to analyze the robustness and repeatability of radiomics in MR imaging. To this end, this was performed using three different types of printed 3D phantom models and three types of biological phantoms (kiwis, tomatoes, and onions). The main finding of this study was that 3D printed phantoms provide similar robustness and repeatability metrics as biological phantoms and thus, may be favorable in identifying the optimal radiomics in MR imaging protocols. Of note, for high-resolution MR images (isotropic volume of 1 × 1 × 1 mm 3 ), it is possible to retrieve the structural QR information, thus providing acceptable image quality for radiomic analyses. To the best of our knowledge, this finding has not been previously demonstrated in studies evaluating radiomics.
Radiological assessments in tumor staging are primarily focused on the number and the size of lesions in radiological settings when using MR; nevertheless, there is also a growing interest in measuring and analyzing radiomic characteristics, which may provide more insight into the lesion composition [71][72][73][74][75][76][77][78][79][80]. Furthermore, it is not fully understood in all its details how the different MR systems and acquisition protocols affect the robustness and reliability of radiomics parameters [64,[81][82][83][84]. In general terms, radiomic assessments are challenged by the non-quantitative nature of MR imaging. The intensities in the MR images may be affected by shimming protocols and positioning of the scanned objects, among other factors, which may pose challenges in the test-retest assessment of the radiomic information [77,85]. In addition to the standard imaging parameters such as field of view, spatial resolution, and reconstruction algorithm, other factors such as magnetic field strength, repetition number, echo time, number of excitations (NEX or NSA), or the signal-to-noise ratio itself have a high impact on calculations [86]. Nevertheless, there is also a growing interest in measuring and analyzing radiomic characteristics [71][72][73][74][75][76][77][78][79][80]. One way to introduce radiomics into MR imaging protocols and to identify the relevant metrics for MR images may be through phantom studies. Previous studies have focused on biological phantoms, showing that fruits and vegetables are suitable as phantoms because they have sufficient water and organically fine texture [15,16,36,38,87]. In addition, Werz et al. showed T1 and T2 relaxation times for fruits and vegetables are similar to human tissues [36]. They found that the measured image quality of biological objects has good reproducibility, making them useful for test measurements, sequence developments, and optimization. Despite the positive findings as potential phantoms, the fruit and vegetables have limited applicability in the multi-center studies of radiomics because of their relatively rapid biological decay. Therefore, this study sought to identify if 3D printed phantoms may act as replacements. Figures 5 and 6 and Tables 4 and 6 showed that the 3D-printed phantoms provided repeatability metrics similar to those obtained from the biological phantoms. Further, it was found that the discretization process did not affect the repeatability metrics.
In radiomics, it is important to identify the most relevant metrics for tumor characterization. The literature and IBSI guidelines have identified several hundred radiomic metrics that may be extracted from segmented VOIs in medical images [1,3,53,54,57,[88][89][90][91]. While all metrics may contribute to the lesion assessment, many may be relevant only for a subset of organs and tumor phenotypes, while others may have imaging modality-specific performances. Therefore, a pre-selection of the metrics used for the initial radiomic assessments may improve both the reliability of the identified metrics and the overall processing speed of the applied machine-learning models [21,80,92,93]. Our proposed 3D printed phantoms may suit such purposes.
Several studies of biological phantoms have proven that the segmentation procedure may affect texture analysis; hence we chose two robust volume delineation methods: the fixed cube-shaped VOI definition and the "grow from seeds" 3D Slicer tool [7,47]. Figure 4A demonstrates that the visible robust texture of the Hilbert cube is not affected by the spatial resolution when using 1 mm and 2 mm isotropic volumes. However, the image structure of the finer patterned QR cubes deteriorates at 2 mm spatial resolution for both the large QR and small QR cubes ( Figure 4A,C). Similar findings were reported for the biological phantoms ( Figure 4B,D), where high-frequency objects such as the seeds were smeared. These results emphasize that the performance level of RI is highly dependent on the spatial resolution of the input images.
The radiomic analyses' repeatability is important in translating their use into the clinical routine. In this study, we investigated the repeatability measures at different steps in the analyses, the normalization process, and the RI, in addition to how they were affected by object size and imaging parameters. We report improved repeatability for both biological and printed phantoms when normalization is applied to the data (Supplementary Figures S1 and S2, Table 4), findings that are in concordance with previous studies [7,52,56,[94][95][96][97]. Similarly, when normalization was applied to the data, improvements were observed for the two discretization methods (FBS and FBN). This finding is concordant with previous findings for the FBS normalization [56,71,94,98]. Similarly, the radiomics parameter groups (Table 5) had improved repeatability when normalization was applied to the data, stressing the importance of using normalization of MR images before radiomic analyses. Despite normalization, we observed that a subset of RIs was performing poorly (CV > 10%) for all MR imaging protocols and objects. Because of their poor performance, Jmax, Energy, ClusterShade, HGRE, SRHGE, LRHGE, LZE, LZLGE, and LZHGE were omitted from further analyses in this study, and only normalized data were used in all following evaluations. The resulting RIs were observed to perform better at lower MR field strengths, in agreement with previous reports (Figures 5-7, Table 6, and Supplementary Figure S3), findings likely caused by worsened field homogeneity at 3 T [10].
In terms of parameter performance, we observed that the relative difference (RPD) might change by more than 20% for the same objects, depending on the MR systems' field strengths ( Table 3, Supplementary Figure S5, and Figure 6). Further, a bias was observed

Conclusions
We report good agreements between observations obtained from biological and 3Dprinted phantoms. Three-dimensional-printed QR codes provide a unique opportunity to analyze the reliability and challenges of radiomics in MR imaging protocols. This study found that the large QR phantom provides better insight into identifying radiomic features of interest than the usual Hilbert phantom. Further, the QR codes permit analyses of texture distortion through external validation of the readability of the QR codes using smartphone read-out success rates.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics12092196/s1, Table S1: Nomenclature of the calculated Texture Indices; Figure S1: Coefficient of Variation for all non-normalized Texture Indices and for every phantom; Figure S2: Coefficient of Variation for all normalized Texture Indices and for every phantom; Figure S3: Coefficient of Variation for the selected normalized Texture Indices; Figure S4: Relative Parameter Difference for the selected normalized Texture Indices; Figure S5: Interclass Correlation Coefficient for all normalized Texture Indices; Figure S6: Relative Parameter Difference with different acquisition setups for every fantom; Photo S1: Placement of biological phantoms.

Conflicts of Interest:
The authors declare no conflict of interest.