Voxel-Wise Comparison of Co-Registered Quantitative CT and Hyperpolarised Gas Diffusion-Weighted MRI Measurements in IPF

The patterns of idiopathic pulmonary fibrosis (IPF) lung disease that directly correspond to elevated hyperpolarised gas diffusion-weighted (DW) MRI metrics are currently unknown. This study aims to develop a spatial co-registration framework for a voxel-wise comparison of hyperpolarised gas DW-MRI and CALIPER quantitative CT patterns. Sixteen IPF patients underwent 3He DW-MRI and CT at baseline, and eleven patients had a 1-year follow-up DW-MRI. Six healthy volunteers underwent 129Xe DW-MRI at baseline only. Moreover, 3He DW-MRI was indirectly co-registered to CT via spatially aligned 3He ventilation and structural 1H MRI. A voxel-wise comparison of the overlapping 3He apparent diffusion coefficient (ADC) and mean acinar dimension (LmD) maps with CALIPER CT patterns was performed at baseline and after 1 year. The abnormal lung percentage classified with the LmD value, based on a healthy volunteer 129Xe LmD, and CALIPER was compared with a Bland–Altman analysis. The largest DW-MRI metrics were found in the regions classified as honeycombing, and longitudinal DW-MRI changes were observed in the baseline-classified reticular changes and ground-glass opacities regions. A mean bias of −15.3% (95% interval −56.8% to 26.2%) towards CALIPER was observed for the abnormal lung percentage. This suggests DW-MRI may detect microstructural changes in areas of the lung that are determined visibly and quantitatively normal by CT.


Introduction
Idiopathic pulmonary fibrosis (IPF) is a group of lung diseases that are defined by the lack of an underlying cause and are characterised by the presence of usual interstitial pneumonia (UIP) and pathological fibroblastic activity [1].UIP is spatially heterogeneous, both macroscopically and microscopically, with a peripheral and basal predominant distribution.In a CT scan, visible patterns of UIP include honeycombing cysts, reticular opacities associated with traction bronchiectasis, and ground glass opacities [2][3][4].The presence of any UIP pattern in a CT scan is crucial for IPF diagnosis, and it typically involves a multi-disciplinary team [2][3][4][5].Semi-quantitative disease severity scoring methods have been proposed that have some prognostic capabilities [6,7].However, these methods are not currently standardised, and scoring can be subjective across independent radiologists [8].
Several texture-based or machine learning algorithms [9][10][11][12][13][14][15][16] have been proposed for the automated characterisation of CT scans for interstitial lung disease (ILD) patterns that demonstrate correlation and agreement with radiologists' scoring.Computer-Aided Lung Informatics for Pathology Evaluation and Rating (CALIPER), an image analysis software developed by the Mayo Clinic (Rochester, MN, USA), can automatically characterise and quantify volumetric CT images for patterns of ILD on a voxel-wise level [12].CALIPER-derived parameters have been shown to be associated with IPF disease progression [17], and were more accurate than visual CT scoring in IPF mortality prediction and prognostication [18,19].
Hyperpolarised gas diffusion-weighted MRI (DW-MRI) with inhaled helium-3 ( 3 He) or xenon-129 ( 129 Xe) is an imaging technique that is sensitive to changes in acinar microstructure [20][21][22][23].In lungs with IPF, the global apparent diffusion coefficient (ADC) and mean acinar dimension (Lm D ) from 3 He and 129 Xe DW-MRI is elevated compared to healthy lungs, which is indicative of a loss of the acinar integrity related to fibrosis [24][25][26][27].Furthermore, DW-MRI metrics correlate with a visual scoring of ILD severity on CT images, and the Lm D demonstrated sensitivity to longitudinal change in IPF [26].Elevated DW-MRI metric regions qualitatively appeared to spatially correlate with the ILD patterns visible in CT scans, and they were hypothesised to be related to regions of honeycomb cysts.A more regional or voxel-wise comparison is therefore required to help elucidate which ILD features directly correspond to the observed elevated DW-MRI metrics.
Multi-modality spatial co-registration of hyperpolarised gas lung MRI and CT has previously been successfully implemented to compare hyperpolarised gas MRI-and CTbased maps of lung ventilation in patients with asthma [28], chronic obstructive lung disease (COPD) [29] and lung cancer [30].Moreover, the co-registration of hyperpolarised gas DW-MRI with CT has facilitated quantitative multi-parametric response mapping and MRI-based emphysema indices, which have revealed subclinical features of COPD that were not detectable with DW-MRI or CT alone [31][32][33].However, to date, there have been no studies that have spatially co-registered DW-MRI and CT in patients with IPF or ILD.The aim of this work was therefore to develop a multi-modality spatial co-registration framework for hyperpolarised gas DW-MRI and CALIPER CT.The framework will facilitate a voxel-wise comparison of DW-MRI metrics with quantitative CALIPER CT patterns in a cohort of IPF patients.

Study Participants
Sixteen patients (mean 71 ± 5 years, 14 men) with a multi-disciplinary team IPF diagnosis, and six healthy volunteers (mean 67 ± 3 years, 4 men) with no history of respiratory disorders and smoking were recruited for this retrospective interpretation of prospectively acquired data from two separate studies that were approved by the Liverpool Central NHS Research Ethics Committee [26] (February 2016 to February 2018) and the Regional Ethical Review Board in Lund, Sweden [34] (March to May 2019), respectively.All participants provided written informed consent.
The inclusion criteria for the patients with IPF included a diagnosis of IPF within one year, oxygen saturations of ≥90% in room air, and an age of 18-80.The exclusion criteria included patients on immunosuppressive treatment, pregnancy, renal impairment, oxygen saturations of <90% in room air, an age of >80 years old (or an age of <18 years old at the onset of the study), an inability to lie supine comfortably for at least 60 min, a significant co-morbidity that was likely to reduce life expectancy to less than one year, a severe ischaemic heart disease (or symptoms of angina that could not be fully controlled), significant congestive cardiac failure, any contraindication(s) to MRI scanning, and previous allergies to MRI contrast agent (gadolinium).
All IPF patients underwent 3 He MRI and CT at baseline, and eleven patients had a 1-year follow up 3 He MRI.Four IPF patients died between the follow up examinations, and one patient was too sick and withdrew from the study.All healthy volunteers underwent a baseline 129 Xe MRI only.The difference in hyperpolarised gas between the IPF patients and healthy volunteers was due to the transition of the research community from 3 He to 129 Xe gas due to the scarcity of 3 He gas [35].Figure 1 summarises the participant imaging data and analyses for this study.All IPF patients underwent 3 He MRI and CT at baseline, and eleven patients had a 1year follow up 3 He MRI.Four IPF patients died between the follow up examinations, and one patient was too sick and withdrew from the study.All healthy volunteers underwent a baseline 129 Xe MRI only.The difference in hyperpolarised gas between the IPF patients and healthy volunteers was due to the transition of the research community from 3 He to 129 Xe gas due to the scarcity of 3 He gas [35].Figure 1 summarises the participant imaging data and analyses for this study.

MRI and CT Image Acquisition
Hyperpolarised 3 He and 129 Xe lung MRI was acquired on a 1.5 T GE HDx scanner using 3 He and 129 Xe flexible quadrature chest radiofrequency coils (Clinical MR Solutions, Brookfield, WI, USA).All examinations involved the inhalation of a gas mixture of hyperpolarised 3 He (~25% polarization) or 129 Xe (~25% polarization), as well as the nitrogen from functional residual capacity (FRC).Gases were polarized under Medicines & Healthcare products Regulatory Agency (MHRA) approved licences with in-house equipment and processes (POLARIS, University of Sheffield, UK).The volume of gas mixture was titrated based upon the subjects' heights, up to 1 L, to account for the differences in lung volume.Each subject was ≥ 160 cm and subsequently inhaled 1 L gas mixtures.Before undergoing the MRI exam, each subject was coached by a lung physiologist to achieve FRC, and they practiced by inhaling 1 L of room air.Each IPF participant underwent 3 He DW-MRI and ventilation MRI, while healthy volunteers underwent 129 Xe DW-MRI only.The aforementioned 129 Xe and 3 He DW-MRI sequences were optimised such that comparable LmD values could be derived from both gases [36].

MRI and CT Image Acquisition
Hyperpolarised 3 He and 129 Xe lung MRI was acquired on a 1.5 T GE HDx scanner using 3 He and 129 Xe flexible quadrature chest radiofrequency coils (Clinical MR Solutions, Brookfield, WI, USA).All examinations involved the inhalation of a gas mixture of hyperpolarised 3 He (~25% polarization) or 129 Xe (~25% polarization), as well as the nitrogen from functional residual capacity (FRC).Gases were polarized under Medicines & Healthcare products Regulatory Agency (MHRA) approved licences with in-house equipment and processes (POLARIS, University of Sheffield, UK).The volume of gas mixture was titrated based upon the subjects' heights, up to 1 L, to account for the differences in lung volume.Each subject was ≥ 160 cm and subsequently inhaled 1 L gas mixtures.Before undergoing the MRI exam, each subject was coached by a lung physiologist to achieve FRC, and they practiced by inhaling 1 L of room air.Each IPF participant underwent 3 He DW-MRI and ventilation MRI, while healthy volunteers underwent 129 Xe DW-MRI only.The aforementioned 129 Xe and 3 He DW-MRI sequences were optimised such that comparable Lm D values could be derived from both gases [36].
The sixteen IPF patients underwent non-contrast multi-detector row CT of the thorax at one tertiary centre on a 64-section scanner (Light-Speed; GE Medical) during a single full-inspiration breath hold.The CT images were reconstructed to 1.25 mm thick sections using either "Soft", "Lung", or "Chest" reconstruction kernels, and the mean dose-length product for all participants was 313 mGy•cm (range, 101-743 mGy•cm).CT was performed as close as was practical to the MRI examination (mean 56 ± 62 days).

Image Registration and Analysis
The undersampled hyperpolarised 3 He and 129 Xe DW-MRI data were reconstructed using in-house MATLAB (MathWorks, Inc., MA, USA) code [26,36].For each IPF participant, 3 He DW-MRI was spatially co-registered to the CT indirectly via spatially aligned 3 He ventilation and structural 1 H MRI [28], which was achieved using Advanced Normalization Tools (ANTs) software [37] (Figure 2).Furthermore, 3 He DW-MRI was co-registered to the 3 He ventilation images, while CT was co-registered to the structural 1 H MRI. CT images were segmented as part of the CALIPER software analysis; meanwhile, all 3 He and 1 H MRI images were segmented using in-house developed software.Each image registration involved a rigid pre-alignment transformation that was followed by affine and diffeomorphic transformations [28].For the diffeomorphic stage, a standard pyramidal approach was followed using mutual information at the higher levels [38] and cross correlation at the base as cost functions [39].Further details on the image registration transformations can be found in Tahir et al. [28].Image registrations were assessed by Dice similarity coefficients between the binary lung segmentation masks of warped 3 He DW-MRI and 3 He ventilation, as well as with warped CT and structural 1 H.
The CT images were analysed with CALIPER software, wherein each parenchyma voxel was characterised into one of seven patterns: normal, honeycombing, reticular changes, ground-glass opacities, and lower attenuation areas (LAA) (mild, moderate, and severe) [12].To reduce the number of CALIPER patterns and DW-MRI comparisons, additional patterns were defined.Non-involved represents the physiologically normal lung, as well as the combined normal and mild LAA patterns; this was such because mild LAA patterns can appear in healthy lungs after deep inhalation.Hyperlucent voxels represent the emphysematous regions of the lung (moderate or severe LAA patterns).Thus, maps containing five CALIPER patterns (non-involved, honeycombing, reticular changes, ground-glass opacities, and hyperlucent) were derived for each CT image set.
(b = 0, and 1.6 s/cm 2 ).Furthermore, 3 He LmD was derived from fitting all respective 4 bvalues ( 3 He b = 0, 1.6, 4.2, 7.2 s/cm 2 ) to the stretched exponential model (SEM) of the gas diffusion in the lungs [34].Voxel-wise maps of the 3 He ADC and LmD were subsequently warped using the 3 He DW-MRI-ventilation deformation field transformation with the nearest neighbour interpolation.For each healthy volunteer, maps of 129 Xe LmD were calculated voxel-by-voxel from the 129 Xe DW-MRI data using the SEM for 129 Xe b= 0, 12, 20, and 30 s/cm 2 [34].In the absence of longitudinal CT imaging, the 1-year follow up of 3 He DW-MRI, available for eleven of the IPF patients, was warped to the spatial domain of the baseline 3 He ventilation using the same ANT registration pipeline detailed above for the baseline 3 He DW-MRI.Thus, after this additional registration step, both baseline and longitudinal 3 He DW-MRI were spatially co-registered to the baseline CT and CALIPER maps.CALIPER maps were co-registered to DW-MRI using the same CT-1 H deformation field transformation with the nearest neighbour interpolation.Then, 3 He ADC and Lm D values were calculated for each original IPF 3 He DW-MRI dataset on a voxel-by-voxel basis.Moreover, the 3 He ADC was calculated from a mono-exponential fit of two b-values (b = 0, and 1.6 s/cm 2 ).Furthermore, 3 He Lm D was derived from fitting all respective 4 b-values ( 3 He b = 0, 1.6, 4.2, 7.2 s/cm 2 ) to the stretched exponential model (SEM) of the gas diffusion in the lungs [34].Voxel-wise maps of the 3 He ADC and Lm D were subsequently warped using the 3 He DW-MRI-ventilation deformation field transformation with the nearest neighbour interpolation.For each healthy volunteer, maps of 129 Xe Lm D were calculated voxel-by-voxel from the 129 Xe DW-MRI data using the SEM for 129 Xe b= 0, 12, 20, and 30 s/cm 2 [34].
In the absence of longitudinal CT imaging, the 1-year follow up of 3 He DW-MRI, available for eleven of the IPF patients, was warped to the spatial domain of the baseline 3 He ventilation using the same ANT registration pipeline detailed above for the baseline 3 He DW-MRI.Thus, after this additional registration step, both baseline and longitudinal 3 He DW-MRI were spatially co-registered to the baseline CT and CALIPER maps.

Statistical Analyses
The overlapping voxels from spatially co-registered 3 He ADC or Lm D maps and CALIPER maps were compared across all of the IPF patients.Only 5% of the overlapping voxels (every 20th) were considered for statistical analyses.This was due to the computational limitations of the statistical analysis software because of the large (~4 million) number of overlapping voxels.
Statistical differences in the 3 He ADC or Lm D values between the five CALIPER patterns were assessed with one-way ANOVA and the post hoc Tukey multiple comparison tests.The IPF patients' 3 He Lm D values in CALIPER non-involved pattern voxels were compared to the healthy volunteers' 129 Xe Lm D values with an independent t-test.A 129 Xe Lm D threshold of 406 µm, corresponding to the 95% upper limit of healthy 129 Xe Lm D values (see Results), was used to classify the 3 He Lm D voxels in the IPF cohort that were greater than the threshold as an abnormal value.The percentage of lung voxels classified as abnormal by Lm D was compared to those co-registered CALIPER voxels with ILD patterns (honeycombing, reticular changes, and ground-glass opacities) using Bland-Altman analysis.In the sub-cohort of eleven IPF patients with longitudinal 3 He DW-MRI, the overlapping voxels in 3 He ADC or Lm D and CALIPER maps were compared at baseline and after 1 year.Independent t-tests for the overlapping voxels in each of the five CALIPER patterns were used to determine if the statistically significant longitudinal changes in the diffusion metrics were observed in each respective CALIPER pattern.
All statistical analyses were performed in GraphPad Prism (v9.5, La Jolla, CA, USA), and any p-values that were <0.05 indicated statistical significance.Any statistically significant difference in ADC or Lm D values were compared to a respective a priori-defined confidence interval (CI) range to contextualise if the statistical difference was a relevant one.The mean difference 95% CI of the Tukey multiple comparison tests or independent t-test differences were compared to the Bland-Altman 95% difference interval range that was previously reported for the same-day reproducibility of 3 He ADC (±0.041 cm 2 /s) and Lm D (±18.5µm) values in IPF patients [26].

Spatial Co-Registration
Table 1 summarises the demographics and DW-MRI metrics for the IPF patients and healthy volunteers at baseline.The spatial co-registration of CT and 3 He DW-MRI was successfully implemented in all sixteen IPF patients, and the resultant spatial resolution of co-registered images was 1.8 × 1.8 × 5 mm 3 .The mean Dice similarity coefficient for the CT-3 He DW-MRI spatial co-registration was 0.920 ± 0.013.The mean Dice coefficients for the two separate CT-structural 1 H and 3 He DW-MRI-ventilation co-registrations were 0.954 ± 0.008 and 0.922 ± 0.009, respectively.The individual IPF patients' co-registration Dice coefficients, CALIPER pattern percentages, and 3 He DW-MRI metrics are summarised in Supplementary Materials, Table S1.

Baseline Voxel Comparison
An analysis of the overlapping co-registered voxels that were present in all baseline IPF patients demonstrated that the voxels classified as honeycombing had the largest 3 He ADC and Lm D values for all CALIPER patterns (Table 2).One-way ANOVA revealed that there was a statistically significant difference in the 3 He ADC value (F(4, 203670) = 2618, p < 0.001) between at least two CALIPER patterns in the overlapping voxels (Figure 3a).The post hoc Tukey tests for multiple comparisons found significantly different 3 He ADC values between all five CALIPER patterns (p < 0.001) (Table 3).When the 95% CI of the differences were compared to the a priori-defined relevance range, the mean difference in the 3 He ADC between all CALIPER patterns was relevant, except that between the reticular and ground-glass patterns (Figure 3b).The 3 He LmD values within the non-involved CALIPER pattern (mean = 380 ± 87 µm) were significantly larger (t(190731) = 168.7,p < 0.001) than the older healthy volunteer 129 Xe LmD values (mean = 300 ± 61 µm) (Figure 4a).A threshold of 406 µm, corresponding to the 95% upper limit of older healthy 129 Xe LmD values was chosen to classify the 3 He LmD maps for abnormality.A Bland-Altman analysis of the percentage of abnormal lung voxels between LmD (>406 µm) and CALIPER (all ILD patterns) classifications obtained a  3 He ADC value for each CALIPER pattern from a voxel-wise comparison of all the overlapping co-registered voxels in all IPF patients.The boxplot whiskers are representative of 5th and 95th percentiles.A significant difference between the CALIPER patterns for 3 He ADC was obtained with a one-way ANOVA test (p < 0.001).(b) Plots of the mean difference confidence intervals (CI) for each post hoc Tukey multiple comparison test.The comparisons denoted by asterisks were significantly different (p < 0.001) and had a mean difference CI that was greater than the a priori-defined relevance range for 3 He ADC in IPF patients (±0.041 cm 2 /s, dotted line).Similar trends were observed for 3 He Lm D , where one-way ANOVA indicated statistically significant differences in the CALIPER patterns (F(4, 186218) = 665.9,p < 0.001) (Figure 4a).The post hoc Tukey tests for multiple comparisons found significantly different 3 He Lm D values between all five CALIPER patterns (p < 0.001), except between the reticular and ground-glass (p = 0.87), hyperlucent and ground-glass (p = 0.16), and hyperlucent and reticular (p = 0.051) patterns (Table 3).All statistically significant 3 He Lm D differences in the CALIPER patterns were relevant when compared against the a priori 3 He Lm D relevance range (Figure 4b).
The 3 He Lm D values within the non-involved CALIPER pattern (mean = 380 ± 87 µm) were significantly larger (t(190731) = 168.7,p < 0.001) than the older healthy volunteer 129 Xe Lm D values (mean = 300 ± 61 µm) (Figure 4a).A threshold of 406 µm, corresponding to the 95% upper limit of older healthy 129 Xe Lm D values was chosen to classify the 3 He Lm D maps for abnormality.A Bland-Altman analysis of the percentage of abnormal lung voxels between Lm D (>406 µm) and CALIPER (all ILD patterns) classifications obtained a mean bias of −15.3% (95% confidence interval −56.8% to 26.2%) towards CALIPER for IPF participants (Figure 5a).These results suggested that more abnormal microstructural changes are detected by DW-MRI.A trend towards increasing bias between CALIPER and Lm D with an increased percentage of abnormal voxels was also observed (Figure 5b).

Longitudinal Voxel Comparison
For the sub-cohort of 11 IPF patients who underwent a 1-year follow-up DW-MRI, overlapping co-registered 1-year ADC or Lm D voxels were statistically significant different (p < 0.001) after 1 year globally and in all baseline CALIPER patterns, except in the hyperlucent and honeycomb patterns for ADC, and in the honeycomb pattern for Lm D (Table 4).When significant longitudinal differences in CALIPER patterns were considered against the a priori 3 He ADC and Lm D relevance ranges, the largest and only relevant differences were observed for reticular (ADC and Lm D ) and ground-glass patterns (ADC).Meanwhile, the changes in non-involved and hyperlucent patterns were not relevant (Figure 6).This suggests that the longitudinal DW-MRI changes observed in this IPF cohort occur in the regions of the lung with ILD patterns, and not due to increased emphysematous regions.Table 4. Summary of the baseline and 1-year follow up 3 He DW-MRI metrics in the longitudinal subcohort of the 11 IPF patients.Values given as the mean ± standard deviation.The mean differences (95% confidence interval range) for each CALIPER pattern, as classified on baseline CT, are shown for the 3 He ADC and Lm D values obtained from independent t-tests.

Longitudinal Voxel Comparison
For the sub-cohort of 11 IPF patients who underwent a 1-year follow-up DW-MRI, overlapping co-registered 1-year ADC or LmD voxels were statistically significant different (p < 0.001) after 1 year globally and in all baseline CALIPER patterns, except in the hyperlucent and honeycomb patterns for ADC, and in the honeycomb pattern for LmD (Table 4).When significant longitudinal differences in CALIPER patterns were considered against the a priori 3 He ADC and LmD relevance ranges, the largest and only relevant differences were observed for reticular (ADC and LmD) and ground-glass patterns (ADC).Meanwhile, the changes in non-involved and hyperlucent patterns were not relevant (Figure 6).This suggests that the longitudinal DW-MRI changes observed in this IPF cohort occur in the regions of the lung with ILD patterns, and not due to increased emphysematous regions.Figure 6.Two example spatially co-registered 3 He LmD maps at baseline and after 1 year following the corresponding baseline CALIPER CT map in one representative IPF patient.In the regions classified as ground-glass and reticular, the largest longitudinal differences in the 3 He LmD value were observed (purple circles).In contrast, the regions that were classified as CALIPER non-involved demonstrated much less of a change in the 3 He LmD value after 1 year (blue circles).

Discussion
A framework for the spatial co-registration of 3 He DW-MRI and CT images was developed and implemented using images from the sixteen IPF participants.A high Dice similarity coefficient for the resultant spatial transformation indicates an excellent spatial overlap of the 3 He DW-MRI and CT imaging modalities.The choice of an indirect CT and DW-MRI registration framework using spatially aligned 3 He-1 H MRI was based on a previous study of CT and 3 He ventilation MRI registration, which demonstrated more accurate registrations with an indirect method that utilised same-breath 3 He-1 H MRI than a direct registration method [28].The slightly lower Dice coefficient for the 3 He DW-MRI-ventilation transformation may be related to the inherently lower spatial resolution of DW-MRI, which can result in fewer ventilation defects that are visible on the respective images and binary segmentation masks.
The CALIPER honeycombing pattern voxels had the largest ADC and Lm D values out of all the CALIPER patterns; this further supports the hypothesis that elevated hyperpolarised gas DW-MRI metrics are a result of honeycomb cysts [26].Smaller differences in the ADC and Lm D values between normal and reticular or ground-glass patterns, when compared to honeycombing, may suggest that, in a CT scan, these ILD patterns have minimal accompanying acinar microstructural changes.However, the Lm D values in the CALIPER non-involved or physiologically normal patterns were also significantly larger than those obtained from the healthy volunteers of a similar age range.This would suggest DW-MRI may detect microstructural changes in the regions of the IPF lung that quantitative CT characterises as physiologically normal.When the abnormal Lm D threshold, defined as the 95% upper limit of healthy volunteer 129 Xe Lm D values, was compared to the CALIPER ILD patterns in IPF patients, an overall bias towards a higher percentage of the lung being classified as abnormal with DW-MRI was obtained.This bias further suggests that hyperpolarised gas DW-MRI may be sensitive to aspects of acinar changes in IPF lung disease, such as microscopic cysts that are not resolved by CT.
In our IPF cohort, the percentage of lung voxels characterised as ground-glass opacities by CALIPER was relatively high (12.4%,Table 2), which is not typical for a radiologic UIP pattern in a CT scan.However, we can confirm that each IPF subject had either a definite UIP or probable UIP CT pattern, as visually assessed by thoracic radiologists during multi-disciplinary team diagnosis (see Supplementary Materials, Table S1), and this is indicative of no predominant regions of ground-glass opacities [2][3][4].This discrepancy in ground-glass opacity classifications could be related to differences between CALIPER and radiologist visual scoring, in which it has been previously shown that the honeycombing regions identified by radiologists were frequently characterised as reticular changes or ground-glass opacities on CALIPER [19].
A trend towards increased mean global ADC and Lm D values in spatially co-registered 1-year follow ups of 3 He DW-MRI is in keeping with the trends observed in the wider IPF patient cohort of this study [26].The largest increases in ADC and Lm D voxels after 1 year were observed for the patterns of reticular changes and ground-glass opacities, and not due to emphysema.These results suggest that if 1-year follow-up CALIPER CT images were acquired, then they would show an increased lung percentage of CALIPER ILD patterns [18,19].We also hypothesise that some baseline ground-glass and reticular patterns regions would be classified as honeycombing in a 1-year follow-up CT scan.
The main limitation was that our IPF patient cohort was small and no longitudinal CT imaging was available.More patient data, across the two modalities, are therefore required to confirm our baseline and longitudinal findings.There were also limitations in the voxel-wise comparison of CALIPER and DW-MRI.First, due to the relatively large differences in voxel resolution between DW-MRI and CT, DW-MRI voxels were up-sampled and interpolated during spatial co-registration.The combination of partial volume voxel effects and misalignment errors may result in the incorrect classification of DW-MRI voxels in the regions with subtle changes in CALIPER CT patterns.However, the large number of voxels in the comparison, albeit in a small cohort, is a strength of this study and helps minimise the possibility of registration errors affecting our voxel-wise comparison.Second, only overlapping co-registered image voxels were considered.DW-MRI metrics are derived from ventilated lung voxels only; therefore, a comparison was not possible in ventilation defect regions.This is, however, more relevant in obstructive lung diseases when compared to restrictive ones such as IPF, and this is supported by the small number of un-ventilated lung regions that were observed on 3 He DW-MRI.
The differences in imaging protocols between IPF patients and healthy volunteers, as well as between CT and DW-MRI, are also limitations of this study.The difference in hyperpolarised gas for DW-MRI was mitigated by implementing optimised diffusion imag-ing parameters that result in comparable DW-MRI metrics (see Supplementary Materials, Figure S1) [36].However, the inherent lower diffusivity of 129 Xe gas may slightly underestimate the healthy volunteers' Lm D values that are in the upper limit of normal when compared to 3 He gas.The CT scans were acquired at full inspiratory volume, while 3 He DW-MRI was imaged at FRC + 1L.Therefore, lung inflation volume differences could be a factor because DW-MRI metrics are more homogeneous and larger at full inspiration [40], and consequently the threshold for abnormal Lm D values may be underestimated due to the smaller Lm D values in the dependent lung at FRC + 1L.The combination of inflation and diffusivity differences may lead to an overestimation of the true bias between DW-MRI and CT if both imaging modalities were acquired at the same inflation level and with 3 He gas.

Conclusions
The spatial co-registration of hyperpolarised gas DW-MRI and CALIPER quantitative CT maps in IPF patients demonstrated that the largest ADC and Lm D values were observed in the regions classified as honeycombing.In addition, longitudinal DW-MRI changes were predominantly observed in reticular change and ground-glass opacity regions.Furthermore, the Lm D values in voxels with a CALIPER normal pattern were larger than those from age-matched healthy volunteers, thus suggesting DW-MRI may detect microstructural changes even in areas of the lung that are determined as structurally normal by CT scans.The quantitative biomarkers from hyperpolarised gas DW-MRI and CT could play a role in future clinical trials, whereby IPF disease progression and response to new treatments is assessed.With the transition from hyperpolarised 3 He to 129 Xe for clinical lung imaging studies [35], this spatial co-registration framework is immediately transferrable to 129 Xe DW-MRI.Furthermore, it can be used to explore quantitative CT patterns in different ILD subtypes and pulmonary diseases.

Figure 1 .
Figure 1.Flow chart summarising the imaging data obtained and the imaging analyses for this study.IPF patient and healthy volunteer data were from two separate prospective studies, respectively[26,34].

Figure 1 .
Figure 1.Flow chart summarising the imaging data obtained and the imaging analyses for this study.IPF patient and healthy volunteer data were from two separate prospective studies, respectively[26,34].

Figure 2 .
Figure 2. Framework for the spatial co-registration of CT images and 3 He diffusion-weighted (DW)-MRI.The original CT images and DW-MRI (a) were indirectly co-registered by utilising same-breath acquired 3 He ventilation and structural 1 H MRI (b).The resultant warped images had a spatial resolution of 1.8 × 1.8 × 5 mm 3 (c).The spatial transformation used to warp CT images was also used to deform CALIPER classifications maps.(d) Maps of the 3 He ADC and LmD were calculated from the original 3 He DW-MRI, and they were warped using the same spatial transformation as DW-MRI.The abnormal LmD values were defined from a threshold of 406 µm, and they were derived from older healthy volunteer 129 Xe DW-MRIs.

Figure 2 .
Figure 2. Framework for the spatial co-registration of CT images and 3 He diffusion-weighted (DW)-MRI.The original CT images and DW-MRI (a) were indirectly co-registered by utilising same-breath acquired 3 He ventilation and structural 1 H MRI (b).The resultant warped images had a spatial resolution of 1.8 × 1.8 × 5 mm 3 (c).The spatial transformation used to warp CT images was also used to deform CALIPER classifications maps.(d) Maps of the 3 He ADC and Lm D were calculated from the original 3 He DW-MRI, and they were warped using the same spatial transformation as DW-MRI.The abnormal Lm D values were defined from a threshold of 406 µm, and they were derived from older healthy volunteer 129 Xe DW-MRIs.

15 Figure 3 .
Figure 3. (a) Boxplots of the3 He ADC value for each CALIPER pattern from a voxel-wise comparison of all the overlapping co-registered voxels in all IPF patients.The boxplot whiskers are representative of 5th and 95th percentiles.A significant difference between the CALIPER patterns for3 He ADC was obtained with a one-way ANOVA test (p < 0.001).(b) Plots of the mean difference confidence intervals (CI) for each post hoc Tukey multiple comparison test.The comparisons denoted by asterisks were significantly different (p < 0.001) and had a mean difference CI that was greater than the a priori-defined relevance range for3 He ADC in IPF patients (±0.041 cm 2 /s, dotted line).

Figure 3 .
Figure 3. (a) Boxplots of the3 He ADC value for each CALIPER pattern from a voxel-wise comparison of all the overlapping co-registered voxels in all IPF patients.The boxplot whiskers are representative of 5th and 95th percentiles.A significant difference between the CALIPER patterns for3 He ADC was obtained with a one-way ANOVA test (p < 0.001).(b) Plots of the mean difference confidence intervals (CI) for each post hoc Tukey multiple comparison test.The comparisons denoted by asterisks were significantly different (p < 0.001) and had a mean difference CI that was greater than the a priori-defined relevance range for3 He ADC in IPF patients (±0.041 cm 2 /s, dotted line).

Figure 4 .
Figure 4. (a) Boxplots of the 3 He LmD values for each CALIPER pattern obtained via a voxel-wise comparison of all the overlapping co-registered voxels in all IPF patients, and the 129 Xe LmD values for the older healthy volunteers.Boxplot whiskers are representative of the 5th and 95th percentiles.A significant difference between the CALIPER patterns for 3 He LmD was obtained with a one-way ANOVA test (p < 0.001).The non-involved CALIPER patterns' 3 He LmD was significantly larger (p < 0.001, +80.1 µm) than those in the older healthy 129 Xe LmD values.(b) The plots of mean difference confidence intervals (CI) for each post hoc Tukey multiple comparison tests.The comparisons denoted by asterisks were significantly different (p < 0.001) and had a mean difference CI greater than the a priori-defined relevance range for 3 He LmD in the IPF patients (±18.5 µm, dotted line).

Figure 5 .
Figure 5. (a) The Bland-Altman analysis of the percentage of voxels classified as abnormal by CALIPER or LmD (>406 µm) in all of the IPF patients.A mean bias of −15.3% towards abnormal

Figure 4 . 15 Figure 4 .
Figure 4. (a) Boxplots of the 3 He Lm D values for each CALIPER pattern obtained via a voxel-wise comparison of all the overlapping co-registered voxels in all IPF patients, and the 129 Xe Lm D values for the older healthy volunteers.Boxplot whiskers are representative of the 5th and 95th percentiles.A significant difference between the CALIPER patterns for 3 He Lm D was obtained with a one-way ANOVA test (p < 0.001).The non-involved CALIPER patterns' 3 He Lm D was significantly larger (p < 0.001, +80.1 µm) than those in the older healthy 129 Xe Lm D values.(b) The plots of mean difference confidence intervals (CI) for each post hoc Tukey multiple comparison tests.The comparisons denoted by asterisks were significantly different (p < 0.001) and had a mean difference CI greater than the a priori-defined relevance range for 3 He Lm D in the IPF patients (±18.5 µm, dotted line).

Figure 5 .
Figure 5. (a) The Bland-Altman analysis of the percentage of voxels classified as abnormal by CALIPER or LmD (>406 µm) in all of the IPF patients.A mean bias of −15.3% towards abnormal Figure 5. (a) The Bland-Altman analysis of the percentage of voxels classified as abnormal by CALIPER or Lm D (>406 µm) in all of the IPF patients.A mean bias of −15.3% towards abnormal values as classified by CALIPER was observed.(b) Two example patients with IPF, where one patient demonstrates a small difference in the percentage of their lungs classified as abnormal (blue-CALIPER = 18.3%, Lm D = 13.4%), and one patient demonstrates a large difference (red-CALIPER = 22.2%, Lm D = 69.6%).
(6.8, 8.1) * = significant to the p < 0.001 level; # = relevant, and the mean difference 95% confidence interval range is larger than the respective relevance range for the3 He ADC (±0.041 cm 2 /s) and Lm D (±18.5 µm) values; and nr = not relevant.Diagnostics 2023, 13, x FOR PEER REVIEW 10 of 15 values as classified by CALIPER was observed.(b) Two example patients with IPF, where one patient demonstrates a small difference in the percentage of their lungs classified as abnormal (blue-CALIPER = 18.3%, LmD = 13.4%), and one patient demonstrates a large difference (red-CALIPER = 22.2%, LmD = 69.6%).

Table 4 .
Summary of the baseline and 1-year follow up3 He DW-MRI metrics in the longitudinal sub-cohort of the 11 IPF patients.Values given as the mean ± standard deviation.The mean differences (95% confidence interval range) for each CALIPER pattern, as classified on baseline CT, are shown for the3 He ADC and LmD values obtained from independent t-tests.

Figure 6 .
Figure 6.Two example spatially co-registered3 He Lm D maps at baseline and after 1 year following the corresponding baseline CALIPER CT map in one representative IPF patient.In the regions classified as ground-glass and reticular, the largest longitudinal differences in the 3 He Lm D value were observed (purple circles).In contrast, the regions that were classified as CALIPER non-involved demonstrated much less of a change in the 3 He Lm D value after 1 year (blue circles).

Table 1 .
A summary of the subject demographics and global DW-MRI metrics for IPF patients and older healthy volunteers.All values are given as the mean ± standard deviation.
*3He DW-MRI was acquired in the IPF cohort, and 129 Xe DW-MRI was in the healthy volunteer cohort.

Table 2 .
A summary of3He ADC and Lm D values for each CALIPER pattern in the baseline voxelwise comparison of the 16 IPF patients.All values are given as the mean ± standard deviation.

Table 3 .
Summary of mean differences (95% confidence interval range) for the post hoc Tukey multiple comparison tests of the3He ADC and Lm D with CALIPER patterns.=significantto the p < 0.001 level; # = relevant, and the mean difference 95% confidence interval range is larger than the respective relevance range for the3He ADC (±0.041 cm 2 /s) and Lm D (±18.5 µm) values; and nr = not relevant. *