Nonrigid Medical Image Registration Using an Information Theoretic Measure Based on Arimoto Entropy with Gradient Distributions

This paper introduces a new nonrigid registration approach for medical images applying an information theoretic measure based on Arimoto entropy with gradient distributions. A normalized dissimilarity measure based on Arimoto entropy is presented, which is employed to measure the independence between two images. In addition, a regularization term is integrated into the cost function to obtain the smooth elastic deformation. To take the spatial information between voxels into account, the distance of gradient distributions is constructed. The goal of nonrigid alignment is to find the optimal solution of a cost function including a dissimilarity measure, a regularization term, and a distance term between the gradient distributions of two images to be registered, which would achieve a minimum value when two misaligned images are perfectly registered using limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimization scheme. To evaluate the test results of our presented algorithm in non-rigid medical image registration, experiments on simulated three-dimension (3D) brain magnetic resonance imaging (MR) images, real 3D thoracic computed tomography (CT) volumes and 3D cardiac CT volumes were carried out on elastix package. Comparison studies including mutual information (MI) and the approach without considering spatial information were conducted. These results demonstrate a slight improvement in accuracy of non-rigid registration.


Introduction
Volume registration is an essential task of image processing, especially in medical field, such as aiding diagnosis, surgical applications, and image-guided radiation therapy [1,2]. The images to be registered are generally obtained at different times and from different imaging sensors, namely, multi-modality imaging. Different medical imaging modalities could provide various and complementary information. For instance, CT and MRI display the anatomic structures of an organ, while positron emission tomography (PET) and single photon emission computed tomography (SPECT) provide the functional and metabolic information. Therefore, in clinic, these multi-modal volumes are often registered and fused together, in this way, much complementary information derived from

Preliminaries
For this section, we briefly review the theoretical concept of information theory, and then introduce the Arimoto entropy and JAD, along with studying their properties. In addition, the gradient distributions of reference image and float image are constructed and a distance of them is derived.

Shannon Entropy and Mutual Information
For an arbitrary random variable X(x 1 , x 2 , . . . , x N ), with its probability distributions p(x 1 , x 2 , . . . , x N ), the Shannon entropy of X is used to measure the amount of average uncertainty included in this random variable, which is also employed to measure the account of information provided by this random variable. Then, considering another random variable Y, H(X|Y) is remarked as the conditional entropy of X when Y is known. The reduction in uncertainty due to Y is called the MI. MI of X and Y is defined by where p(x) and p(y) denote marginal probability of two random variables, as well as p(x, y) being the joint distribution of them. MI is applied as a measure of the dependence between X and Y. MI is symmetric in X and Y and always nonnegative [17].

Jensen Arimoto Divergence
For a random variable X, and P(p 1 , p 2 , . . . , p N ) is probability distributions on X. JAD is defined to be [15] where A α (·) denotes Arimoto entropy and ω i is a weighted vector to constrain ω i ≥ 0 and ∑ N i=1 ω i = 1. In the following, we review the properties of JAD. Proposition 1. JAD has these properties of non-negativity, symmetry. Also, JAD is identical to 0 when and only when all of the probability distributions are the same as each other. The proof has been reported in [15].
Li et al. pointed that a distance must fulfill four requirements [20]. JAD does not meet the triangle inequality, so JAD is not a true distance metric. Nonetheless, JAD can still be adopted to measure the disparity among the probability distributions of two random variables.

Proposition 2.
The JA divergence has the maximum when p 1 , p 2 , . . . , p N are degenerate distributions, where p i = δ ij = 1 when i = j and 0 otherwise. The proof had been provided in [21], and the maximum equals to A α (ω).

Gradient Distributions Distance
Given the reference image R and float image F, ∇F, and ∇R represent the gradients of F and R, along with q(∇F) and p(∇R) representing the gradient distributions of F and R. Kullback-Leibler divergence (KLD) can be used to calculate the distance of q(∇F) and p(∇R) as where x denotes any point of image gradient, x = [x, y, z] T . KLD is also known as relative entropy, measuring the diversity of two probability distributions. In (10), we use the convention that 0·log(0/0) = 0. In other word, when R and F are completely registered, KLD between gradient distributions defined in (10) achieves the minimum value. Considering the spatial transformation T µ , µ denoted by the parameters of transformation model, the gradient distribution distance of F and the transformed F is given by where ∇F d (T µ (x)) and ∇R d (x) represent the gradients of transformed float image and reference image, and d is the dimension of the images to be registered. Subsequently, the Parzen method is employed to estimate the gradient distributions. Denote β (0) and β (3) by zero-order and three-order B-spline, respectively. In the process of image registration, the transformation parameters µ does not affect the reference image, the gradient of reference image is also constant in registration process. Consequently, the gradient of reference image can be calculated before registration to improve the implementation efficiency. In addition, a zero-order where Ω is volume domain to estimate probabilities. V denotes the number of voxels of Ω domain, as well as d being the image dimension. r i represents intensity levels of ∇R d (x), and '∇' is the gradient operator. The three-order B-spline is adopted to compute the gradient distributions ∇F d (T µ (x)) of transformed float image, with the probability density function of ∇F d (T µ (x)) shown as In Equations (12) and (13), ∆b R and ∆b F are the widths of bins. ∇R 0 d and ∇F 0 d is the minimum values in two images, and f j represents intensity levels of ∇F d (T µ (x)). Substituting (12) and (13) into (11), we can obtain the following formula.
where H is the Shannon entropy. In this paper, KLD of gradient distributions will be applied as a distance term in medical image registration, regularizing that the gradient distribution of float image is similar to the gradient distribution of reference image.

Description of Proposed Nonrigid Registration Method
The details of the registration method using a normalized information theoretic measure based on Arimoto entropy with gradient distributions is described. Firstly, an appropriate transformation model needs to be selected, where the registration criteria (objective function or cost function) and optimization algorithm is applied to optimize the criteria. Finally, the images to be registered are aligned using an optimal solution obtained by the optimization scheme. Figure 1 displayed the block diagram of our nonrigid registration framework. as well as d being the image dimension. ri represents intensity levels of ∇Rd(x), and '∇' is the gradient operator. The three-order B-spline is adopted to compute the gradient distributions ∇Fd(Tµ(x)) of transformed float image, with the probability density function of ∇Fd(Tµ(x)) shown as In Equations (12) and (13), ΔbR and ΔbF are the widths of bins.  (12) and (13) into (11), we can obtain the following formula.
where H is the Shannon entropy. In this paper, KLD of gradient distributions will be applied as a distance term in medical image registration, regularizing that the gradient distribution of float image is similar to the gradient distribution of reference image.

Description of Proposed Nonrigid Registration Method
The details of the registration method using a normalized information theoretic measure based on Arimoto entropy with gradient distributions is described. Firstly, an appropriate transformation model needs to be selected, where the registration criteria (objective function or cost function) and optimization algorithm is applied to optimize the criteria. Finally, the images to be registered are aligned using an optimal solution obtained by the optimization scheme. Figure 1 displayed the block diagram of our nonrigid registration framework.

Gradient distributions p(∇R(x)), R(x)
Stop?  Figure 2a,b depict the corresponding planes in two 3D MR images, which account for T1-weighted MR and T2-weighted MR. Non-rigid registration is formulated as the process of searching for the optimal spatial deformation function of reference image R and float image F as

Formulation
where g(x; µ) is the deformation function, µ denotes the vector of deformation parameters, with x and y being the coordinates of arbitrary point in R and F, respectively. We can formulate the non-rigid registration of F to R as where D represents a dissimilarity measure that can achieves its minimum in registration of R(x) and F(g(x; µ)).
Entropy 2018, 20, x 6 of 15 weighted MR and T2-weighted MR. Non-rigid registration is formulated as the process of searching for the optimal spatial deformation function of reference image R and float image F as ( ) where g(x; µ) is the deformation function, µ denotes the vector of deformation parameters, with x and y being the coordinates of arbitrary point in R and F, respectively. We can formulate the non-rigid registration of F to R as where D represents a dissimilarity measure that can achieves its minimum in registration of R(x) and F(g(x; µ)). However, the process of image registration is an ill-posed issue, and a penalty term need to be incorporated to obtain a smooth transformation. Considering the regularization term, the cost function E is expressed by

Transformation Model
Clinically, there exist some large deformations between medical images. Therefore, a nonrigid transformation is generally employed to deal with organ or tissue deformation. Figure 2c,d display the deformation fields and vectors of the nonrigid transformation between Figure 2a,b. The free-form deformations (FFD) model can preferably described local deformations between medical images. Therefore, cubic B-splines are employed to constructed FFD model [22] to simulate this elastic deformation.
In a 3D image, Φ is a mesh with the size of nx × ny × nz, as well as the control points [ωi, ωj, ωk] T , and δ is the size of spacing. Then, 3D deformation function of x = [x, y, z] T could be defined as ( ) x; where µijk is the vector of deformation coefficients, and β (3) is the third-order B spine function.

Registration Criteria
The dissimilarity measure D is the most significant part of objective functions, which is used to measure the difference between two images to be registered. When D achieve the minimum value, the similarity of two volumes is maximized, resulting in the two volumes completely registered. We have employed JAD as similarity measure to register medical images in the presence of non-rigid transformation, in which a negative sign is assigned to the JAD to construct the dissimilarity measure [15]. In this paper, according to Proposition 2 in Section 2.3, a normalized dissimilarity measure based on JAD is introduced. Its definition is given as However, the process of image registration is an ill-posed issue, and a penalty term need to be incorporated to obtain a smooth transformation. Considering the regularization term, the cost function E is expressed by

Transformation Model
Clinically, there exist some large deformations between medical images. Therefore, a nonrigid transformation is generally employed to deal with organ or tissue deformation. Figure 2c,d display the deformation fields and vectors of the nonrigid transformation between Figure 2a,b. The free-form deformations (FFD) model can preferably described local deformations between medical images. Therefore, cubic B-splines are employed to constructed FFD model [22] to simulate this elastic deformation.
In a 3D image, Φ is a mesh with the size of n x × n y × n z , as well as the control points [ω i , ω j , ω k ] T , and δ is the size of spacing. Then, 3D deformation function of x = [x, y, z] T could be defined as where µ ijk is the vector of deformation coefficients, and β (3) is the third-order B spine function.

Registration Criteria
The dissimilarity measure D is the most significant part of objective functions, which is used to measure the difference between two images to be registered. When D achieve the minimum value, the similarity of two volumes is maximized, resulting in the two volumes completely registered. We have employed JAD as similarity measure to register medical images in the presence of non-rigid transformation, in which a negative sign is assigned to the JAD to construct the dissimilarity measure [15]. In this paper, according to Proposition 2 in Section 2.3, a normalized dissimilarity measure based on JAD is introduced. Its definition is given as In [15], JAD is expressed by where f = (f 1 , f 2 , . . . , f M ) and r = (r 1 , r 2 , . . . , r M ) are the intensity values in F(g(x; µ)) and R(x). Also, M is bins number. Also, p(f j |r i ) is the conditional probability. JAD and A α (ω) are substituted into (19). In consequence, the dissimilarity measure D in (19) is rewritten as where M is the number of bins. To address the problem of nonrigid registration, the smooth deformation needs to be acquired by regularizing the deformation model. Incorporated with the regularization term, the objective function E is rewritten by E(F(g(x;µ)), R(x)) = D(F(g(x;µ)), R(x)) + λS(g(x;µ)) where D is the dissimilarity measure defined in (21), and S is the regularization term, with its expression given as follows.
However, objective function shown in (22) does not take into account the spatial information between voxels. To deal with the issue, the distance between two gradient distributions q(∇F(g(x; µ))) and p(∇R(x)) displayed in (14) is introduced to (22). As a result, the nonrigid registration process is expressed by (x;µ)), R(x)) + λ 1 S(g(x;µ)) +λ 2 KLD(∇F(g(x;µ))||∇R(x))} where KLD represents gradient distribution distance, as well as λ 1 and λ 2 being weight parameters, balancing the tradeoff among a dissimilarity measure D, a regularization S, and a distance term KLD.

Optimization
Newton-Raphson algorithm has been widely exploited, in which second-order derivatives can show better convergence [23] compared with these strategies based on first-order gradient. L-BFGS [24] Entropy 2019, 21, 189 8 of 15 does not calculate second-order information. Thus, a high computation efficiency can be achieved. A second-order Taylor approximation [16] of E with respect to µ is given as where ∆µ is the increment of µ, ∇ is gradient operation. The deformation parameter µ of the L-BFGS optimization algorithm is updated as In the sequel, the derivative of objective function E with respect to µ need to computed.

∂E ∂µ
The pseudo code of our registration approach is displayed in Algorithm 1.
To solve the optimization process, we need calculate the analytical gradient of the objective function E. Traditionally, the probability distributions expressed in (21) was not continuous. Hence, the continuous probability density function (pdf) needs to be estimated by Parzen-window method. The continuous marginal and joint pdfs of two images to be registered have been calculated [15]. The continuous expression of gradient distribution distance has been also provided in Section 2.4. Equalization (18) is substituted into (23), the continuity of smoothness term is acquired. Consequently, the objective function E is continuous and its analytical derivative with respect to µ can be calculated.

Derivative of the Objective Function
The objective function defined in (24) includes dissimilarity measure D, a regularization term S, and a distance term KLD. The derivative of D is deduced as According to (20), we obtain d[D(F(g(x;µ)), R(x))] dµ = 1 where ∂ p( f j , r i )/∂µ represents derivative of estimated joint probability. The derivative of with β (0) and β (3) being the zero-order B-Splines and the derivative of the three-order B-Splines, respectively. R 0 and F 0 are the minimal intensities in R(x) and F(g(x; µ)), as well as ∂F(t)/∂t being the gradient of the deformed float image F(g(x; µ)), ∂(g(x; µ))/∂µ can be estimated by FFD model. To obtain the derivative of the penalty term S, we rewrite (23) as where x represents the points in image region, and V denotes the number of pixels. The derivative of S has been provided by Staring and Klein [25], where ∂ 2 g(x;µ)/∂x i ∂x j denotes Hessian matrix of deformation function g(x; µ), ∂ ∂ 2 T/∂x i ∂x j /∂µ is the Jacobi of Hessian matrix. Next, we calculate the derivative of KLD, where p d (r i ) and q d ( f j ) represent the gradient distributions of the transforms float image and reference image, respectively. The derivative of q d ( f j ) is calculated as where β (3) is derivative of three-order B-spline, and ∇ 2 F d (t) represents the second-order gradient of F(g(x; µ)), as well as ∂(g(x;µ))/∂µ being the derivative of deformation function g(x; µ) with respect to the parameter µ. Substituting (12), (13), and (35) into (34), we can obtain the derivative of gradient distribution distance. In the terms of (29), (33), and (34), the derivative of E will be easily calculated.

Experiments and Results
To evaluate the registration method using the normalized JAD with gradient distribution (NJAD-GD), we designed several groups of tests and performed on simulated and real 3D data, respectively. In Section 4.1, the experimental data is depicted, including simulated and real medical images. The non-rigid registration of simulated MR volumes is performed in Section 4.2. The tests on real 3D thoracic CT images and 3D cardiac data are implemented, and the experimental results are shown in Sections 4.3 and 4.4, respectively. Our nonrigid registration algorithm employing JAD and gradient distributions was implemented in the elastix package [26].

Experimental Data
In this paper, simulated brain MR volumes, thoracic CT volumes and real 3D cardiac CT images were exploited as experimental data. The detailed descriptions of brain MR and 3D thoracic CT images have been reported in [15]. Additionally, non-rigid tests were also performed on twelve 4D cardiac CT sequences acquired from twelve patients. Each of 4D CT sequence consists of 10 3D cardiac CT images, which were obtained from one whole cardiac cycle of one patient. These CT images have 256 × 256 pixels along axial direction.

Experimental Data
In this paper, simulated brain MR volumes, thoracic CT volumes and real 3D cardiac CT images were exploited as experimental data. The detailed descriptions of brain MR and 3D thoracic CT images have been reported in [15]. Additionally, non-rigid tests were also performed on twelve 4D cardiac CT sequences acquired from twelve patients. Each of 4D CT sequence consists of 10 3D cardiac CT images, which were obtained from one whole cardiac cycle of one patient. These CT images have 256 × 256 pixels along axial direction. Figure 3 exhibits 10 3D cardiac CT volumes of one 4D CT sequence. It is obviously observed that some elastic deformations are existed between 10 images.

Nonrigid Registration of Simulated Brain Images
Simulated brain volumes were firstly used to design the elastic alignment experiments. Furthermore, we employ a multiresolution hierarchical strategy with three levels to carry out these non-rigid tests. Also, a comparison with JAD without gradient distribution and MI is also reported.
We selected 60 warping indexes (see parameters m of the warping function in [15]), which were yielded randomly from the interval [1,7]. Consequently, 60 float images were produced based on the 60 deformations for each pair of test volumes and 540 nonrigid trials of three pairs of brain MR volumes for NJAD-GD, JAD, and MI algorithms in total.
To assess quantitatively test results of these trails, we exploit registration error as the evaluation standard. Here, the registration error is defined as the difference of true values that can be calculated by warping indexes and the obtained values by optimization strategy. In the registration trails of brain images, the involved parameters are set as follows: the nonextensive parameter α = 1.5, bins M = 16, the number of random samples N = 2000, δ = 20 × 20 × 20. The weighting parameters λ1 = 0.005 and λ2 = 0.001 can provide a good tradeoff among three terms: D, S, and gradient distribution term KLD in the objective function E. Figure 4 shows the test results of all 540 non-rigid registrations. From Figure 4, the NJAD-GD registration algorithm could result in the lower errors of three pairs of test volumes compared to other two approaches.

Nonrigid Registration of Simulated Brain Images
Simulated brain volumes were firstly used to design the elastic alignment experiments. Furthermore, we employ a multiresolution hierarchical strategy with three levels to carry out these non-rigid tests. Also, a comparison with JAD without gradient distribution and MI is also reported.
We selected 60 warping indexes (see parameters m of the warping function in [15]), which were yielded randomly from the interval [1,7]. Consequently, 60 float images were produced based on the 60 deformations for each pair of test volumes and 540 nonrigid trials of three pairs of brain MR volumes for NJAD-GD, JAD, and MI algorithms in total.
To assess quantitatively test results of these trails, we exploit registration error as the evaluation standard. Here, the registration error is defined as the difference of true values that can be calculated by warping indexes and the obtained values by optimization strategy. In the registration trails of brain images, the involved parameters are set as follows: the nonextensive parameter α = 1.5, bins M = 16, the number of random samples N = 2000, δ = 20 × 20 × 20. The weighting parameters λ 1 = 0.005 and λ 2 = 0.001 can provide a good tradeoff among three terms: D, S, and gradient distribution term KLD in the objective function E. Figure 4 shows the test results of all 540 non-rigid registrations. From Figure 4, the NJAD-GD registration algorithm could result in the lower errors of three pairs of test volumes compared to other two approaches.

eriments of 3D Thoracic CT Images
thoracic CT volumes were chosen as the test images to carry out non-rigid registr olumes consist of four 4D sequences, and each of them includes 10 3D volumes. A thre entation scheme was still employed to decrease registration accuracy and im tation efficiency. e denote the 10 volumes from each 4D sequence by T00-T90, in which the maximal inh aximum exhalation are included, with indicated by T10 and T60, respectively. Th d the following experiments: in each 4D sequence, the T60 frame is applied as reference residual nine frames are chosen as float image, leading to nine non-rigid tests. Hence, 3 ic alignments were yielded in total for four 4D CT sequences. We also compared the g NJAD-GD and JAD without considering spatial information. Finally, 72 elastic regis ere conducted for two methods. In order to quantify the test errors, target registratio nd Hausdorff distance meansure (HDM) were calculated. HDM is a widely-used mea te the distance of two clouds of points. In the 3D thoracic CT registration, the manually m rks can be applied to calculate HDM of two images. ures 5 and 6 demonstrate the registration results of 72 tests, along with TREs tion and after alignment. Figure 7 illustrates the box-and-whisker plots of HDM va CT sequences. It is observed from these results that the registration errors applying

Experiments of 3D Thoracic CT Images
3D thoracic CT volumes were chosen as the test images to carry out non-rigid registrations. These volumes consist of four 4D sequences, and each of them includes 10 3D volumes. A threelevel implementation scheme was still employed to decrease registration accuracy and improve computation efficiency.
We denote the 10 volumes from each 4D sequence by T00-T90, in which the maximal inhalation and maximum exhalation are included, with indicated by T10 and T60, respectively. Then, we designed the following experiments: in each 4D sequence, the T60 frame is applied as reference image and the residual nine frames are chosen as float image, leading to nine non-rigid tests. Hence, 36 trails of elastic alignments were yielded in total for four 4D CT sequences. We also compared the results adopting NJAD-GD and JAD without considering spatial information. Finally, 72 elastic registration tests were conducted for two methods. In order to quantify the test errors, target registration error (TRE) and Hausdorff distance meansure (HDM) were calculated. HDM is a widely-used measure to calculate the distance of two clouds of points. In the 3D thoracic CT registration, the manually marked landmarks can be applied to calculate HDM of two images. Figures 5 and 6 demonstrate the registration results of 72 tests, along with TREs before registration and after alignment. Figure 7 illustrates the box-and-whisker plots of HDM values of four 4D CT sequences. It is observed from these results that the registration errors applying NJAD-GD algorithm are less than these obtained by the method based on JAD without gradient distribution.
In implementation of experiments, the nonextensive parameter α = 1.5, bins M = 16, the number of random samples N = 8000, the spacing of mesh points δ = 20 × 20 × 20. The weighting parameters λ 1 = 0.005 and λ 2 = 0.001. calculate the distance of two clouds of points. In the 3D thoracic CT registration, the manually marked landmarks can be applied to calculate HDM of two images. Figures 5 and 6 demonstrate the registration results of 72 tests, along with TREs before registration and after alignment. Figure 7 illustrates the box-and-whisker plots of HDM values of four 4D CT sequences. It is observed from these results that the registration errors applying NJAD-GD algorithm are less than these obtained by the method based on JAD without gradient distribution.

Registration of 3D Cardiac CT Image
The cardiac CT data consists of 12 groups of 4D image sequence, and each of which includes 10 3D images acquired from one whole cardiac cycle. One cardiac cycle consists of the phase of systole and phase of diastole. In each 4D CT sequence, two 3D images with the maximum deformation were employed as the test images, and 12 nonrigid registration experiments were carried out adopting NJAD-GD approach. Figure 8 illustrates the checkboard of 12 examples adopting our non-rigid framework (α = 1.50) for 12 4D CT sequences. As it can be seen, the registration algorithm based on gradient distribution demonstrates the accuracy results. In these tests, a multiresolution scheme with three levels was also exploited to implement these nonrigid registrations. Due to the large deformation, the number of random samples N and the spacing of mesh points δ were set to 10,000 and 10 × 10 × 10, with other parameters being as follows: bins M = 16, the weighting parameters λ 1 = 0.005 and λ 2 = 0.001, the maximum number of iterations of the limited memory BFGS scheme NMAX = 200.

Conclusions
In this work, we review the definition and properties of Arimoto entropy, with an information measure based on Arimoto entropy, called JAD. The gradient distributions of reference image and float image are constructed and a distance between them is derived. Additionally, a normalized dissimilarity measure based on JAD was presented. A nonrigid registration method exploiting the normalized measure with gradient distributions is proposed.
Arimoto entropy is regarded as a generalized form of the classical Shannon entropy. In the aforementioned section, it is proofed that the JAD measure is equal to MI when α tends to 1. We adopted FFDs as the parameter space for non-rigid registration, along with objective function E including three elements: the normalized JAD as the dissimilarity measure, a regularization to acquire the smooth deformation and a distance term of the gradient distributions.