Robust Measures of Image-Registration-Derived Lung Biomechanics in SPIROMICS

Chronic obstructive pulmonary disease (COPD) is an umbrella term used to define a collection of inflammatory lung diseases that cause airflow obstruction and severe damage to the lung parenchyma. This study investigated the robustness of image-registration-based local biomechanical properties of the lung in individuals with COPD as a function of Global Initiative for Chronic Obstructive Lung Disease (GOLD) stage. Image registration was used to estimate the pointwise correspondences between the inspiration (total lung capacity) and expiration (residual volume) computed tomography (CT) images of the lung for each subject. In total, three biomechanical measures were computed from the correspondence map: the Jacobian determinant; the anisotropic deformation index (ADI); and the slab-rod index (SRI). CT scans from 245 subjects with varying GOLD stages were analyzed from the SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS). Results show monotonic increasing or decreasing trends in the three biomechanical measures as a function of GOLD stage for the entire lung and on a lobe-by-lobe basis. Furthermore, these trends held across all five image registration algorithms. The consistency of the five image registration algorithms on a per individual basis is shown using Bland–Altman plots.


Introduction
Chronic obstructive pulmonary disease (COPD) refers to a collection of inflammatory lung diseases including chronic bronchitis and emphysema [1]. COPD continues to be the third leading cause of death worldwide [1], with an ever-increasing disease burden [2]. People with COPD may experience high morbidity, high healthcare costs, poor quality of life, activity limitation, and exacerbations [3]. Although COPD is not curable, available treatments can help relieve symptoms, improve exercise capacity, improve the quality of life, reduce the risk of death, and reduce the cost of healthcare [4]. Spirometry-a common pulmonary function test-is currently the gold standard for diagnosing and staging COPD. Based on spirometry, the Global Initiative for Chronic Obstructive Lung Disease (GOLD) recommends grading the disease into four categories, ranging from GOLD 1 to GOLD 4 [5]. However, spirometry alone cannot capture the heterogeneous manifestations of COPD, which calls for better diagnostic methods. Towards advancing our understanding of the disease, several multi-center studies such as SPIROMICS [6] and the Genetic Epidemiology of COPD, COPDGene [7] are underway. Along with spirometry, these studies rely on computed tomography (CT) scans of the subjects, which have led to the development of several imaging biomarkers for characterizing COPD.
Biomechanical properties of the lung can be used to characterize lung function and are important because they provide information on whether or not the lung is functioning normally or abnormally. One way to indirectly measure the biomechanical properties of the lung at the local level is to analyze the pointwise correspondences between inspiration and expiration CT image volumes. This can be achieved using image registration. Image registration is used to find the point-to-point correspondence between two images in the form of a transformation or deformation vector field (DVF). Biomechanical measurements (also called biomechanical biomarkers) can then be extracted from the transformation to describe the local expansion or contraction of the lung during the breathing cycle.
A complementary approach to extracting biomarkers from CT images is to compute CT intensity-based disease biomarkers. Intensity-based disease biomarkers are computed from CT image intensities with or without using image registration. These methods extract biomarkers at each CT voxel based on the voxel intensity or based on the intensity pattern (texture) of a local region of voxels. Intensity-based biomarkers are widely used to infer disease patterns, but they are susceptible to image noise and do not provide biomechanical measurements. An example of this type of analysis includes defining emphysematous regions by thresholding the inspiratory scan below −950 Hounsfield units (HU), while the cut-off for air-trapping regions in an expiratory scan is −856 HU. This technique is often referred to as CT densitometry, and is prone to noise arising from changes in volume of acquisition, CT dosage, and scanning parameters [8][9][10]. To overcome these challenges, texture-based CT analysis methods have been developed to assess parenchymal integrity and its relation to disease progression. Sørensen et al. [11] developed a CT texture classification method to assess COPD. A similar method was used to assess pulmonary emphysema using local binary patterns [12]. These methods rely on globally annotated CTs, and lack local functional information related to parenchymal disintegration. A texture-based approach called CALIPER was used to quantify interstitial lung abnormalities and showed good correlations between the fraction of abnormal lung texture and lung functional measurements [13]. More recently, a deep convolutional neural network (CNN) was employed for staging COPD and predicting disease progression [14]. Although this approach was able to learn effective representations associated with COPD severity and progression, it failed to provide insights into the regional distribution of lung function abnormalities.
Several image-registration-based biomechanical biomarkers have been derived from the DVF to understand, diagnose, and stage COPD. Galbàn et al. [15] used CT intensity and image registration to define the parametric response mapping (PRM), a two-dimensional histogram that measures the amounts of normal tissue, small airways disease, and emphysema within the lung. Amelon et al. [16] extracted three biomechanical indices from the DVF of pulmonary inspiration-to-expiration registration to characterize local and global lung deformation: the Jacobian determinant (J); the anisotropic deformation index (ADI), and the slab-rod index (SRI) [16]. Bodduluri et al. [17] evaluated the predictive performance of J, ADI, and strain extracted from image registration and compared their performance with conventional CT texture and densitometry [18,19] features. They demonstrated that for a complex task such as COPD severity prediction, the biomechanical features performed better than the conventional CT texture and density features. The current study differs from that of Bodduluri et al. in many ways. The work of Bodduluri et al. analyzed data from the Genetic Epidemiology of COPD (COPDGene) study whereas the current study analyzes data from the SPIROMICS study. The SPIROMICS study collected CT scans at total lung capacity (TLC) and residual volume (RV) whereas the COPDGene study collected CT scans at functional residual capacity (FRC), as opposed to RV. Registering TLC to RV CT images presents a more challenging problem than registering TLC to FRC because there is a comparatively larger shape change from TLC to RV than TLC to FRC. The current study extends the work of Bodduluri et al. from a single image-registration method to multiple image-registration algorithms. Finally, the current study analyzes nearly twice as many data sets compared to that of the previous study by Bodduluri et al. In another study, Bhatt et al. [20] showed an agreement between registration-based mechanics and spirometric measures of lung function and concluded that dual volume (inspiration-expiration) biomechanical measures are better indicators of declining lung function and emphysema than spirometry alone. Another important work by Bodduluri et al. [21] identified the mean of J to be significantly associated with different measures of lung function including forced expiratory volume in 1 s (FEV 1 ), emphysema, and sixminute walk distance (6MWD). In a recent work by Pan et al. [22], correlations between air-trapping regions or emphysema regions with mean J and mean ADI were found. This work showed that the trends of mean J and ADI decreased monotonically as GOLD stages increased for one registration algorithm. While the aforementioned studies point towards clinical effectiveness of registration-based mechanics, they were limited to global features derived from these biomechanical measures. Recently, Chaudhary et al. [23] showed the predictive capabilities of global statistical features extracted from J, ADI, and SRI across the four registration methods used in this study. The work by Chaudhary et al. [23] differs from the current paper in that it was limited to the development of classification methods for predicting COPD GOLD stage.
Several image registration algorithms have recently been proposed for analyzing CT lung images of COPD subjects, such as the total variation regularization method [24], key-points-based method [25], and Markov random field (MRF)-based discrete optimization [26]. In addition, deep learning image registration techniques bear the potential to perform image registration orders of magnitude faster than traditional iterative image registration techniques [27][28][29][30][31][32].
The goal of the current study was to understand and evaluate the robustness of different lung biomechanical measures across different image registration methods over a varying disease severity. In this study, we compared and contrasted five diverse image registration algorithms with different similarity costs and transformation models to investigate the impact of different algorithms on the biomechanical measures extracted from the DVF. Image registration performance was evaluated using the lung lobe Dice coefficient (LDC) to measure the overlap of the five lobes after registration; the worst 10% surface error (W10SE) to measure how well the lung surfaces aligned after registration; and the vessel tree position error (VTPE) and symmetric closest skeleton error (SCSE) to measure how well the lung vessel trees aligned. The transformations were used to compute the J, ADI and SRI measures [16] in order to analyze and characterize the lung function of individuals with COPD at varying GOLD stages.

Data
The CT images used in this study were part of the SubPopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS) with institutional review board (IRB) approval number 201003733 [6]. SPIROMICS is an ongoing prospective cohort study designed to identify novel clinical stratifications in subjects with COPD. CT images were collected at baseline, one-year, three-year, and five-year follow-ups from fourteen university-based clinical centers across the United States [6].
In this study, we analyzed CT images from current and former smokers across varying degrees of disease severity, as defined by GOLD stage. The GOLD disease staging system for COPD ranges from GOLD 1 (mild), to GOLD 2 (moderate), GOLD 3 (severe) and GOLD 4 (very severe) [5]. GOLD 0 subjects are asymptomatic smokers without airflow obstruction but at risk for COPD due to their smoking history [6]. We analyzed the baseline scans of 245 subjects chosen randomly from the 14 clinical sites, with 49 subjects from GOLD 0, 50 subjects from GOLD 1, 49 subjects from GOLD 2, 50 from GOLD 3, 47 subjects from GOLD 4. At each visit, a pair of 3D breath-hold CT scans were acquired; one at total lung capacity (TLC), and the other at residual volume (RV) [33]. The SPIROMICS imaging protocol uses a CT dose index to standardize exposure across scanners in different sites. The slice collimation was set at 0.6 mm, rotation time 0.5 s and pitch to 1.0. Philips B, GE Standard, and Siemens B35 reconstruction kernels were used [33]. The resolution of the CT scans was approximately 0.6 × 0.6 × 0.5 mm 3 , and the image size was 512 × 512 per slice, with 500 to 600 slices per image. Full details on the scanning protocol are described by Sieren et al. [33]. Figure 1 shows the preprocessing pipeline applied to each CT image volume. All CT image volumes were resampled to be isotropic with spacing 1 × 1 × 1 mm 3 . The image volumes were cropped based on the 3D bounding box containing the union of the lung regions of the inspiration and expiration scans to reduce computer memory requirements and computation time. A multi-resolution convolutional neural network [34] was used to generate a segmentation of the entire lung from the CT volume. The lobes were segmented from the CT volume using the FissureNet deep learning method [35,36]. We used the recommended parameters from the cited papers for the lung and lobe segmentations.

Preprocessing
The lobe segmentations were tessellated into triangles using the surface mesh algorithm [37] implemented in the Computational Geometry Algorithms Library (CGAL) (https://www.cgal.org (accessed on 15 September 2017)) that implements the Variational Shape Approximation (VSA) method [38]. The mesh size was set to 2 mm 2 with smoothing to give accurate representation of the surfaces. The blood vessel trees were segmented from the CT images using the vesselness filter developed by Jerman et al. [39]. In this method, the eigenvalues of the Hessian matrix are computed from the intensity values of the CT image at each voxel location. Tubular structures are identified at voxel locations that have one near-zero eigenvalue and two non-zero eigenvalues with similar magnitudes. Vessels were detected at different scales by computing the Hessian matrix at different scales. For the vessel segmentation, we started from the parameters recommended in the cited papers, then slightly fine-tuned them. A binary vessel segmentation was computed from the vesselness probability map by thresholding. The threshold was computed using Otsu's method. Next, the skeletons were extracted from the binary vessel segmentation using the binary 3D thinning algorithm implemented with Insight Toolkit (ITK) in C++ [40] with default parameters. The TLC and RV CT data sets used in this study were collected in register with each other. Therefore, no affine registration was performed in this study before applying the nonrigid image registration algorithms used in this study.

Image Registration Algorithms
We selected five image-registration algorithms to cover a wide range of image registration algorithms. In general, image registration algorithms consist of three major components: an overall cost function to be minimized, a transformation model, and an optimization method. In this study, we focus on the two former components. In general, the overall cost function is a linear combination of a similarity cost and a regularization cost of the transformation. There are various choices for these two components, and therefore different combinations. In terms of similarity cost functions, we can divide them into two major categories: intensity-based and feature-based. Commonly employed intensity-based similarity cost functions used for matching CT scans include, but are not limited to the sum of squared difference (SSD) [41,42], cross correlation (CC) [43], mutual information (MI) [44], and the sum of squared tissue volume difference (SSTVD) [45][46][47]. Examples of intensity-based featured-matching cost functions are the SSD vesselness measure [45], and SSD of lobar segmentations [48] of the lung. For shape-based feature matching, the shapes could be corresponding or non-corresponding points, curves, and surfaces. A popular example is the Iterative Closest Point (ICP) algorithm, which can be used to match point clouds [49], curves (contours), and surfaces [50]. Another group of shape-matching methods adopt the concept of currents and varifolds, initially proposed by Charon et al. [51], and extensively studied by Durrelman et al. [52][53][54][55][56][57]. The idea of applying currents to pulmonary registration was first proposed by Gorbunova et al. [58], and later Pan et al. extended the study by using varifolds representation [59,60]. The advantages of representing shapes for registration purpose with currents and varifolds are (1) point correspondence is not required to define the distance between landmarks, curves, surfaces, and intensity with currents and varifolds representations; and (2) landmarks, curves, surfaces, and intensity can be unified in one framework.
The transformation model plays a critical role in an image registration algorithm since it determines the properties of the DVF. The aim of this study was to register TLC inspiration scans to RV expiration scans of the lungs. Non-rigid image registrationsalso often referred to as deformable image registration (DIR)-can be classified based on physical models, interpolation theory models, knowledge-based models and task-specific models [61]. In this manuscript, we classify the transformations as either small or large deformation models. The main differences between small and large deformation models are that the large deformation model allows for more curved particle paths from the moving image to target image, and guarantees a one-to-one correspondence between the moving and target images. The large deformation diffeomorphic metric mapping (LDDMM) model [41] refers to the large deformation model throughout this manuscript. The DVFs that are parameterized by basis functions are categorized into small deformation model, such as Fourier series [42], Thin-Plate spline [62], B-spline [46,[63][64][65], etc.

SSTVD
The SSTVD algorithm was selected as one of the algorithms used because it was developed particularly for registering pulmonary CT and won the Computed Tomography Ventilation Imaging Evaluation 2019 (CTVIE19) Grand Challenge at the American Association of Physicists in Medicine (AAPM) 2019 annual meeting. The SSTVD algorithm is currently being used in the Functional Avoidance Radiation Therapy Clinical trial NCT02843568 and has been used to treat more than 50 subjects in that trial. The SSTVD cost function models the local intensity changes seen in the CT images of the lung due to breathing and provides good correspondence information between inspiration and expiration CT scans.
The overall cost of the SSTVD algorithm is the sum of three terms: (1) the sum of squared tissue volume difference (SSTVD) similarity cost for matching pulmonary CT intensity [46,47], (2) the sum of squared vesselness measure difference (SSVMD) similarity cost for matching lung vessels, and (3) a regularization cost on the cubic B-Splines parameterized DVF to guarantee a smooth and plausible transformation [45]. The main idea behind SSTVD is that lung tissue volume remains relatively constant over the breathing cycle while the volume of air in the lung does not. This method can be considered as a lung-specific intensity-based and feature-based combined small deformation algorithm. The cost function of SSTVD algorithm [22,[45][46][47]68,69] is given by: where I f and I m are the fixed and moving tissue density images, respectively; I f vm and I mvm are the fixed and moving vesselness images, respectively; L is the linear elasticity differential operator of the type L = (−α∆ + γ) β I n×n ; Ω ⊂ R 3 is the domain of I f and I m ; The |J ϕ −1 (x)| term is the Jacobian determinant of ϕ −1 and used to accommodate intensity changes in the CT images due to changing air content by ensuring the total tissue volume of the lung remains constant; ϕ −1 (x) is the transformation from the fixed to the moving coordinate system; u(x) = x − ϕ −1 (x) is the displacement field; and γ 1 , γ 2 and γ 3 are weights that control the relative importance of each term in the cost function.

GDR
The GDR algorithm was chosen since it is an example of a large deformation image registration algorithm that uses the SSTVD cost function. The SSTVD and GDR image registration algorithms were chosen to compare and contrast the effect of using small vs. large deformation transformation models. The SSTVD and GDR image registrations also differed in that the SSTVD algorithm included the SSVMD similarity method while the GDR did not.
The Geodesic Density Regression (GDR) algorithm [66,67] finds the geodesic path in image space to align pulmonary CT images. The overall cost of the GDR includes the SSTVD term and the regularization term on the velocity fields, which are used to parameterize the DVF as a flow of diffeomorphism. The differences between GDR in this study and the previous SSTVD method are that the GDR algorithm is based on the LDDMM large deformation model [41], and uses the SSTVD similarity cost but does not use the SSVMD similarity cost. GDR can be categorized as a lung-specific intensity-based large deformation model. The cost function of the GDR is given by where the first term is the SSTVD cost function using the same definitions as in Equation (1). The time-varying velocity field v(t) parameterizes the transformation ϕ by the ordinary differential equation (O.D.E.): where t ∈ [0, 1], φ 0 = Identity and ϕ = φ v 1 (see [41] for details). The second term in Equation (2) is the regularization cost and is used to find the geodesic path in image space between I f and I m . The large deformation model estimates a diffeomorphic transformation ϕ that guarantees the transformation is smooth with a smooth inverse. A diffeomorphic transformation preserves the topological properties of objects in an image, i.e., connected structures remain connected, disjoint structures remain disjoint, and the smoothness of curves and surfaces are preserved. The relative importance of each term compared to each other are controlled by weights γ 1 and γ 2 .

GSyN
The Greedy Symmetric Normalization (GSyN) algorithm [43] is part of the open source ANTs (Advanced Normalization Tools) toolbox [70]. GSyN aligns two images with the deformable diffeomorphic transformation (LDDMM) model [43] in a symmetric scheme. The GSyN algorithm differs from the SSTVD and GDR methods in that it estimates the transformation from moving image to fixed image by deforming both images to the mid-point between the two images. The GSyN algorithm fits into the intensity-based largedeformation model image registration algorithm class. The transformation model of GSyN differs from the GDR algorithm in that GDR uses a time-varying velocity field whereas the GSyN uses a stationary velocity field. The difference is that a stationary velocity field is constant for all time t in Equation (3) and therefore uses less parameters to represent the transformation compared to the non-stationary velocity field parameterization and has fewer degrees of freedom than a time-varying velocity field.
The GSyN algorithm used the normalized cross correlation (NCC) similarity cost function compared to the SSTVD similarity cost used by the GDR algorithm. The NCC cost function does not accommodate changes in CT intensity due to breathing in contrast to the SSTVD cost function. This means that the NCC cost function may not always provide the proper correspondences between the moving and target images. To understand this, assume two regions of the parenchyma have the same intensity in the moving image. Next, assume that one region expanded and the other region did not as a result of breathing. The intensity of the expanded region will become darker due to an increase in air filling the regions (i.e., reduced tissue density) and the other region will show no intensity change. The NCC cost is then forced to do its best to match one intensity in the moving image with two different intensities in the target image.
The overall GSyN cost function is defined as where x is a coordinate of the mid-point coordinate system of I m and I f . Note that we use the fixed and moving notation to describe the GSyN method to be consistent with the other registration algorithms even though both images can be thought of as moving. The transformations ϕ m and ϕ f are from the mid-point coordinate system to the coordinate systems of I m and I f , respectively. Equation (3) is used to parameterize ϕ m and ϕ f by v m and v f , respectively. NCC is the normalized cross-correlation cost function [43]. The relative importance of each term compared to each other are controlled by weights γ 1 and γ 2 .

PVSV
The Pulmonary blood Vessel and lobe Surface Varifold (PVSV) algorithm [60] is a shape feature-based (LDDMM) large deformation registration approach that aligns varifold representations of the lung blood vessel skeletons and lobe surfaces. The process for extracting the blood vessel trees and lobe surfaces used for the PVSV algorithm is the same as described in Section 2.2. The skeleton of each branch of the vessel tree was parameterized by a quadratic line segment using a least squares fitting process. The skeletons of the vessel trees and the lobe surfaces were then represented with delta Dirac varifolds using the procedure in [60]. Varifolds were introduced to the field of computational anatomy to overcome the orientation problem of currents [51]. The advantages of matching shape with a varifolds representation are: it does not require pre-knowledge of the point correspondence between shapes, and it can match shapes with large geometric changes.
The PVSV algorithm was chosen to determine whether meaningful biomechanical properties could be extracted using only lung surfaces and vessel tree represented via varifolds. The advantage of using lung surfaces and vessel trees for registration is that it reduces the amount of information needed to represent the lung and that these features are invariant to CT intensity changes caused by breathing. Another advantage of this approach is that it can accurately match surface and vessel structures but it has the disadvantage of needing to interpolate the transformation between these features.
The vessel tree skeletons and lung surfaces are registered as a shape complex with varifolds representation [57], and the overall PVSV cost function is given by where C m and C f are the blood vessel tree skeletons represented with varifolds, while S m and S f are the lung surfaces with varifolds representation. ϕ * C m is the push forward action of the transformation ϕ on the varifold C m that transports C m into the space of C f , and same for ϕ * S m . The relative importance of each term is controlled by the weights γ 1 , γ 2 and γ 3 .

PLOSL
The PLOSL is a fast unsupervised-learning-based framework developed for 3D pulmonary CT based on population learning (PL) and one-shot learning (OSL) [32]. It uses the same tissue volume preserving and vesselness constraints similarity metrics as the SSTVD method. PLOSL has two training stages: (1) PL which serves as a base model; (2) OSL which generates an individual specific model. The PLOSL algorithm has been shown to produce state-of-the-art deep-learning-based image registration performance [32] with comparable accuracy to traditional iterative image registration algorithms and other deep-learning-based methods for lung CT registration.
The cost function for the PLOSL network consists of three components: SSTVD, SSVMD, and a regularization cost as follows: where the first two terms are the same as the two terms of the SSTVD method, and the third term is a regularization term and the operator ∇ = [ ∂

Image Registration Parameters
The registration algorithms employed in the study were fundamentally different and therefore, the corresponding parameters and multi-resolution setups were different for each method. All the registration algorithms used a coarse-to-fine multi-resolution framework to prevent themselves from becoming stuck in local minima, speed up computation, and achieve better registration performance. Parameters were determined by optimizing image registration performance by hand using a few typical data sets and hundreds of different parameter combinations for each algorithm. We optimized the algorithms independently to achieve the best performance. The following algorithm-specific parameters were used in this work.
The multi-resolution framework employed for the GDR algorithm consisted of three image resolutions equal to 1/8, 1/4 and 1/2 of the original resolution, respectively. The corresponding standard deviation of the deformation Gaussian kernel size for the transformation were 60, 30, and 15 mm, respectively. A total of ten time points were used for the large deformation model, and the weight on the image cost γ 1 = 0.06, and the regularization cost γ 2 = 1 at each resolution.
The multi-resolution framework used for GSyN algorithm consisted of five image resolutions. The input images were smoothed with a Gaussian kernel with standard deviation 4, 3, 2, 1, and 0 mm and down sampled by a factor of 1/16, 1/8, 1/4, 1/2, and 1, respectively.
The multi-resolution framework used for PVSV algorithm consisted of two resolutions. At the coarse resolution, the shape kernel for the vessel tree skeletons and lung surfaces were set to 5 and 15 mm, respectively. At the fine resolution, the shape kernel for the vessel tree skeletons and lung surfaces were set to 3 and 10 mm, respectively. The corresponding deformation kernel sizes were 40 and 30 mm, respectively. The weights on the vessel tree and surface cost terms were set to γ 1 = 10, 000 and γ 2 = 25 at coarse resolution, respectively, and to γ 1 = 4 and γ 2 = 0.015, respectively, at the fine resolution. The weight on the regularization term is γ 3 = 1 for the two resolutions.
The population learning U-Net of PLOSL was trained with 2042 subjects randomly selected from the SPIROMICS database. There was no overlap between training data used in PLOSL and the data sets (247 subjects) analyzed in this study. The weights on the terms of the cost function Equation (6) were set to γ 1 = 1, γ 2 = 1, and γ 3 = 0.01, respectively. The number of one-shot iterations was set to 50. The methods were implemented in TensorFlow version 2.3.0 and run on an NVIDIA GeForce GTX 2080Ti GPU with 11 GB of memory. The ADAM optimizer with a learning rate of 10 −4 was used for training. W10SE is an error measurement of the alignment of two lung surfaces. The W10SE captures the average surface error where two surfaces disagree the most. The process for extracting the lung lobe surface triangulation used for this evaluation is described in Section 2.2. For each vertex p of the triangles that parametrize the surface S 1 , its closet point q on S 2 is found, and the distances between p and q are stored. It is worth noting that q is not necessarily a vertex of S 2 and may instead be on the face of a triangle. As a next step, the distances for all the vertices of S 1 are sorted from large to small, and the mean of the largest 10% distance is computed, which is denoted as W10 S 1 →S 2 . Similarly, we compute the mean of worst 10% distance in the opposite direction, i.e., we rank the closest-point distances computed from each vertex of S 2 to S 1 , then compute the mean of worst 10% again, and denote it as W10 S 2 →S 1 . The W10SE is defined as: W10SE = 1 2 (W10 S 1 →S 2 + W10 S 2 →S 1 ). VTPE measures the error of the lung blood vessel alignment. The process for extracting the blood vessel tree skeleton used for this evaluation is described in Section 2.2. The closestpoint distances from the fixed to the moving segmentation were computed for each voxel in the fixed segmentation and averaged. This process was repeated to compute the average closest-point distance from the moving to the fixed segmentation. The VTPE is defined as the average of these two closest-point distances.

Image Registration Performance
SCSE measures the alignment of the skeletons of the vessel tree. SCSE is similar to VTPE, except the symmetric closest-point distance is computed using the vessel tree skeletons instead of binary vessel segmentations.

Biomechanical Measures
This section briefly summarizes the calculation of J, ADI, and SRI [16]. Let F(x) be the deformation gradient tensor at the point x in the lung, i.e., the gradient of the transformation from the inspiration-to-expiration CT images at x. For each point x in the lung, the quantity F T F is the right Cauchy-Green deformation tensor with corresponding eigenvalues λ 2 i such that λ 1 ≥ λ 2 ≥ λ 3 .
The Jacobian determinant J(x) measures the volume change at x and can be computed as J = λ 1 λ 2 λ 3 . A value of J(x) > 1 corresponds to local volume expansion at x and a value of J < 1 corresponds to local volume contraction at x.
ADI measures the magnitude of anisotropy deformation, i.e., it describes how much stretching occurs along one or two directions in a three-dimensional space. The formula for computing ADI is: If the volume changes uniformly along each direction (isotropically), then ADI equals zero. The larger the ADI value, the more anisotropic the deformation.
SRI is a measure of the nature of anisotropy, i.e., it measures whether the volume change is predominant along one direction (rod-like) or two directions (slab-like). SRI is computed using the formula: SRI = . Note that the value of SRI ranges from zero to one where zero corresponds to a slab-like shape and one corresponds to a rod-like shape. SRI is undefined when ADI is zero. Figure 2 illustrates the relationships between ADI and SRI. The α and β axes are defined as following: [16]. Notice that the terms in the parentheses for both α and β are always positive due to the relationship between the eigenvalues. Therefore, when J is greater than one (i.e., expansion), α and β are both positive (first quadrant of the graph in panel A). Likewise, when J is smaller than one (i.e., contraction), then α and β are both negative (third quadrant). Figure 2. Illustration of regional shape change with respect to ADI and SRI. (A) shows the cuboid shape space as a function of ADI and SRI and how these values are related to α and β in terms similar to polar coordinates. The cuboid shape is a uniform cube when ADI = 0. The cuboid shape is flat when SRI = 0 and rod like when SRI = 1. The shape of a uniform cube expands or contracts as the magnitude of ADI increases. (B) shows the cuboid shape space for particular values of ADI and SRI. Figure from [16].
Panel B shows the first quadrant of panel A. This figure illustrates how the ADI and SRI can be thought of as polar coordinates with ADI as the radius and SRI as the angle. Notice that the deformation becomes more anisotropic as ADI increases. At the origin, the ADI = 0 corresponding to an isotropic deformation.

Registration Performance
The performance of the TLC-to-RV image registration was evaluated using registrations computed from 245 unique individuals. The baseline scans of 247 subjects were randomly chosen from the 14 SPIROMICS clinical sites, with 50 subjects from each of GOLD 0, 1, 2, 3 and 47 subjects from GOLD 4. During analysis, one data set was excluded from the GOLD 0 cohort because the RV scan did not cover the entire lung. In addition, one data set was excluded from the GOLD 4 cohort because the RV and TLC scans were nearly identical, i.e., the subject was imaged twice at the same lung volume. The GDR image algorithm failed (i.e., produced a nonsensical correspondence map) on one of each of the GOLD 1, 2, and 3 data sets. The reason for the three GDR image registration failures was due to large shape differences between the TLC and RV scans which caused the GDR to become stuck in a local minimum during the optimization procedure. The measurements from these three registration results were excluded from the analysis. Figure 3 shows the registration performance for each registration algorithm. The numbers used to generate these plots are summarized in Table 1. The bars at the top, middle, and bottom of the violin plots correspond to the maximum, mean, and minimum of the dataset, respectively. Notice that SSTVD had the largest variance of the five registration methods. This is because the SSTVD method was unable to handle large deformations as well as the other four methods. The performances of GDR and GSyN are similar, and the performances of PVSV and PLOSL are also close and better than the other algorithms. Appendix B shows typical difference images for the five image-registration algorithms across GOLD stage. The results shown in Table 1 and Figure A1 demonstrate that the performance of the five registration algorithms were similar. The five remaining columns correspond to values for the five algorithms: Sum-of-Squared-Tissue Volume-Difference (SSTVD), Geodesic Density Regression (GDR), Greedy Symmetric Normalization (GSyN), Pulmonary blood Vessel and lobe Surface Varifold (PVSV) and Population Learning followed by One Shot Learning (PLOSL), respectively. Note that a larger LDC value is better whereas a smaller value is better for the other three evaluation methods.  Figure 4 shows the mean, standard deviation (std), entropy, and root mean squared (RMS) of J, mean of ADI, and entropy of SRI computed with respect to GOLD stage. The numbers used to generate these graphs are summarized in Table A1. This figure shows that these measures increased or decreased monotonically as GOLD increased and that these trends were consistent for all registration algorithms.  Figure 5 shows the J images for typical subjects selected from each GOLD stage. Each row corresponds to a given subject and each column shows the J image computed by a different registration algorithm. Notice that the J images in each row (i.e., for a given subject) appear similar to each other, demonstrating some degree of invariance to registration algorithm. Further, note that along each column the mean J decreases (i.e., the overall color transitions to darker blue) as GOLD stage increases, regardless of registration algorithm. Figure 5. Typical Jacobian determinant images for each GOLD stage and registration algorithm. Dark red regions correspond to regions of large local expansion and dark blue regions correspond to regions with little to no expansion. This figure shows that each algorithm produced similar results for each GOLD stage and that the amount of local lung expansion decreased as the GOLD stage increased. Figure 6 shows the distribution of the average J computed within the whole lung and each lobe for each registration algorithm. The numbers used to generate these graphs are summarized in Table A2. These plots show that the average Jacobian determinant decreased as the GOLD stage increased globally, on a lobe-by-lobe basis, and that these trends held for all the registration algorithms. Figure 7 shows Bland-Altman plots of the mean J and compares the measurements generated by all five image-registration algorithms. These Bland-Altman plots show the degree of agreement between the measurement methods on an individual basis and support the results shown in the violin plots (see Figures 3, 4 and 6). Each graph shows four Bland-Altman plots corresponding to PLOSL vs. SSTVD, PLOSL vs. GSyN, PLOSL vs. GDR, and PLOSL vs. PVSV, respectively. Note that the horizontal axis is different for each GOLD stage plot since the average amount of lung deformation from expiration to inspiration decreased as GOLD stage increased. The downward trend in the GOLD 0 plot as the average J increases implies that the GSyN, GDR, PVSV, and PLOSL methods all diverged from the SSTVD method for subjects that had more RV-TLC expansion. A similar downward trend is evident in the GOLD 1 plot. The GOLD 2, 3, and 4 plots show less disagreement between the five registration methods. This can be explained by the fact that there was much less RV-TLC expansion (note the ranges on the horizontal axes) for GOLD 2, 3, and 4 subjects compared to GOLD 0 and 1 subjects. These plots further confirm the consistent trends detected by the five registration algorithms that the average J decreases as GOLD stage increases (as shown in the x-axis).

Discussion and Conclusions
This paper investigated the effects of using different image registration algorithms to extract lung biomechanical features from inspiration and expiration CT images of subjects with COPD. In this work, we evaluated both small deformation (SSTVD and PLOSL) and large deformation (GDR, GSyN, and PVSV) image registration algorithms. The two main differences between large and small transformation models are that a large transformation model allows for curved particle paths from the moving image to target image, and large transformation models guarantee a one-to-one correspondence between the moving and target images. Figure 6 shows that the average J of the SSTVD method was smaller than those of the other four registration algorithms. One explanation for this is that the regularization used for the SSTVD algorithm produced smoother deformations than the other approaches and was therefore unable to accommodate as much local deformation. In addition, these results demonstrate that a deep-learning-based image registration method (PLOSL) can have similar average performance to traditional iterative image registration algorithms (GDR, GSyN, and PVSV).
In total, five biomechanical measures (mean, standard deviation, and entropy of J, mean of ADI, and energy of SRI) were shown to follow a statistically significant monotonically increasing or decreasing trend for the entire lung and on a lobe-by-lobe basis as GOLD stage increased. These trends held for all five registration algorithms. This study showed that the expansion between RV and TLC decreases as GOLD stage increases in the whole lung and within each lobe of the lung, which may potentially allow for detection, monitoring, and regional evaluation of COPD. These findings support the work of Ding et al. [71], who showed that lung function and mechanics vary regionally on a lobe-by-lobe basis. This work also complements the work by Bhatt et al. [3], who reported spatial correlations between regions of functional decline identified by the Jacobian determinant and regions of emphysematous lung tissue.
The SSTVD, GDR, and PLOSL algorithms employed in this study were implemented by our group. The GSyN used was from the ANTs [70] and PVSV registration results used the Deformetrica [72] software version 3.0.
The SSTVD, GDR, GSyN, and PVSV algorithms were run on a single high-memory Argon Phase 1 compute node on the University of Iowa High-Performance Argon Cluster (https://hpc.uiowa.edu/ (accessed on 10 October 2022)). The node has two Xeon E5-2680v4 (28 Cores at 2.4 GHz) for a total of 56 compute slots, 512 GB RAM, 1 Gbps Ethernet, 100 Gbps Omnipath, and HPL benchmark performance of 766.1 GFlops, and cost approximately USD 7500. The PLOSL algorithm was run on an NVIDIA Tesla 2080Ti GPU. The computation time for registration varied depending on the amount of deformation required. In general, GOLD 4 cases demonstrated smaller shape change between inspiration and expiration compared to GOLD 0 cases. On average, the SSTVD, GDR, GSyN, and PVSV took an hour to register a 3D pair of images. The GDR algorithm required the most computational resources and took almost 2 h for registrations that required extensive deformation. In contrast, the deep learning PLOSL algorithm was 1.5 orders of magnitude faster than the other four registration methods. Once trained, the PLOSL algorithm took approximately 1.25 min to register a 3D pair of images. Thus, for this study, the PLOSL algorithm seems to be the preferred registration algorithm due to its faster run time and comparable performance. The only caveat to using the PLOSL algorithm is that it needed a large number of data sets for training. In the absence of training data, one of the other registration algorithms should be selected.
The SSTVD and GDR both used the same loss function. However, these methods differ in how the transformations are parameterized. The loss function defines what features correspond for matching while the transformation parameterization defines the amount of deformation that is allowed when matching. The SSTVD transformation is parameterized by small deformation B-splines whereas GDR transformation is parameterized by a highdimensional temporal and spatially varying velocity vector field. The result of this is that the GDR algorithm has orders of magnitude more degrees of freedom to match two image volumes compared to the SSTVD algorithm. Having more degrees of freedom is not always better if they are not needed. This paper shows that both algorithms performed similarly for the current task of registering inspiration-to-expiration CT lung volumes. Thus, one can conclude that the extra degrees of freedom, extra computation time and additional computational resources for this task were not needed.
The results presented in this paper demonstrate that five different and varied registration methods can all produce similar measures of lung biomechanics. Such biomechanical measures, when analyzed on a local level, further strengthen the link between spirometry and quantitative CT and may help explain the functional changes missed using spirometry or CT density thresholds alone [3]. There are several current and potential implications of our work. Existing measures of lung function and structure are limited by the lack of spatial information and by the lack of sensitivity to detect early changes. These image-registrationbased biomechanical measures are robust in detecting early abnormalities that have been linked to important clinical outcomes, including respiratory quality of life, functional capacity, and all-cause mortality [21]. Spatial information on a lobar and sub-lobar level are increasingly being recognized as important in patient selection for interventional COPD therapies such as bronchoscopic and surgical lung volume reduction. These measures may provide additional information, beyond density-based measures, about the health of the target and ipsilateral non-target lobes for such procedures. Biomechanical measures may also help inform patient and target selection for radiation therapy and for lobar and segmental lung resection in lung cancer. Using spirometry, CT density measures, and biomechanical features together may help provide mechanistic insights to explain changes in lung function in COPD and related lung diseases.    Figure A1 shows difference images associated with the registrations used to compute the Jacobian images shown in Figure 5. This figure shows typical difference images for all five image registration algorithms across GOLD stages. All five image-registration algorithms performed equally well with respect to difference images except for three failures of the GDR algorithm. The GDR failed because it contained too many degrees of freedom and became stuck in local minima. The lungs were segmented from the original CT before registration. The most obvious registration errors occurred near the lung mask boundaries. Subtle correspondence errors in the lung parenchyma are difficult to observe in these difference images since a zero gray-scale intensity does not imply correct correspondence. Figure A1. Difference images for the registrations used to compute the Jacobian determinant images shown in Figure 5. The first column shows a 2D coronal cross section of the lungs before registration. The last five images in each row show the same coronal cross section of the lungs that resulted from each registration algorithm. Gray corresponds to a zero difference. Black corresponds to negative values and white corresponds to positive values. A lung mask was used for registration and is the reason why the area outside the lungs is gray.