Two Phase Non-Rigid Multi-Modal Image Registration Using Weber Local Descriptor-Based Similarity Metrics and Normalized Mutual Information

Non-rigid multi-modal image registration plays an important role in medical image processing and analysis. Existing image registration methods based on similarity metrics such as mutual information (MI) and sum of squared differences (SSD) cannot achieve either high registration accuracy or high registration efficiency. To address this problem, we propose a novel two phase non-rigid multi-modal image registration method by combining Weber local descriptor (WLD) based similarity metrics with the normalized mutual information (NMI) using the diffeomorphic free-form deformation (FFD) model. The first phase aims at recovering the large deformation component using the WLD based non-local SSD (wldNSSD) or weighted structural similarity (wldWSSIM). Based on the output of the former phase, the second phase is focused on getting accurate transformation parameters related to the small deformation using the NMI. Extensive experiments on T1, T2 and PD weighted MR images demonstrate that the proposed wldNSSD-NMI or wldWSSIM-NMI method outperforms the registration methods based on the NMI, the conditional mutual information (CMI), the SSD on entropy images (ESSD) and the ESSD-NMI in terms of registration accuracy and computation efficiency.


Introduction
Non-rigid image registration is one of the most challenging problems in medical image processing. Given two medical images, the objective of the registration process is to find a reasonable non-rigid transformation, such that a transformed version of the float image is similar to the reference one. Despite phenomenal progress in medical image resolution, one modality is often not sufficient to produce a precise diagnosis since different imaging modalities differ in interpreting the anatomy, tissue and organ that they may capture, so multi-modal medical image registration is useful for relating clinically significant information from different images. For example, it can be used to improve the diagnostic tasks and image-guided interventions. However, accurate non-rigid multi-modal registration is highly challenging because of intensity variations and non-rigid transformations between images.
In general, image registration involves three main components: deformation model, similarity metric and optimization strategy. In the non-rigid image registration, the deformation model can be divided into two main categories [1].The first category is originated from physical models of materials including elastic body models [2], fluid flow models [3] and diffusion models [4]. Another category is relevant to interpolation and approximation theory including radial basis functions [5], elastic body splines [6], free-form deformations (FFD) [7,8].
As regards optimization strategy, numerous optimization methods have been proposed to optimize the parameters of the deformation model in the non-rigid image registration. Examples include gradient descent, Newton's method, Powell's method and discrete optimization [9,10]. Especially, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [11] has been used to handle the large number of variables and constraints of registration and is included in the Insight Segmentation and Registration Toolkit (ITK) [12][13][14]. Apart from deformation model and optimization strategy, similarity metrics have received much attention in the field of image registration. Many similarity metrics have been proposed for different applications [15]. The well-known metrics such as sum of squared differences (SSD) and sum of absolute difference (SAD) have been successfully used for mono-modal image registration, but they are not appropriate for direct application to multi-modal image registration because it is required that the image intensities at corresponding points of two images should be similar [16]. To address this problem, the generalized divergence measure based on Renyi Entropy [17], Kullback-Leibler Divergence [18], mutual information (MI) and cross-cumulative residual entropy (CCRE) [19] have been proposed. Among these metrics, MI has been investigated in-depth and widely applied to multi-modal image registration. MI was firstly introduced to realize the rigid registration of multi-modal scans by Maes et al. [20] and Viola et al [21]. Rueckert et al. [7] extended this similarity metric to non-rigid image registration. Studholme et al. proposed the normalized mutual information (NMI) as an overlap invariant generalization of mutual information [22]. Conditional mutual information (CMI) was introduced for non-rigid multi-modal image registration by Loeckx et al. to reduce the negative influence of bias fields [23]. However, the traditional global MI approach involves such disadvantages as high computational complexity, tendency to get trapped in local minima and suffering from erroneous transformations of anatomical images even though the MI between the fixed image and the float image achieves the maximum value with this transform [24,25].
In addition to information theory measures-based registration methods, structural representation methods have been investigated for multi-modal medical image registration. Pluim et al. [26] utilized the local gradient orientation and Nigris et al. [27] used gradient orientations of minimal uncertainty for image registration. In [28], Liu et al. proposed a registration method based on the local frequency estimated by calculating the local phase gradient of the most significant Gabor filter response. This method has the advantage that the local frequency is the same for corresponding structures in the two images, even when edge strength and contrast have significant differences. However, the above three methods were used for rigid registration, and not discussed for the non-rigid registration in [26][27][28]. Heinrich et al. [29] utilized modality independent neighbourhood descriptor (MIND) which is not a scalar representation but a vector-valued image descriptor for multi-modal deformable registration. Wachinger et al. proposed two structural representation methods. One method utilized the entropy of an image patch to assign a new intensity value and used the SSD on entropy images (ESSD) as similarity metric. Another method firstly used Laplacian Eigenmaps to embed image patches in a lower-dimensional manifold that preserves local distances and then computed L2 distance of Laplacian images. However, the entropy image based method can only meet certain requirements of a relaxed version of the theoretical properties, and the Laplacian image based method involves high computational complexity [30].
To achieve both accuracy and efficiency of the non-rigid multi-modal registration method, a similarity metric based on the Weber local descriptor (WLD) proposed in [31] is combined with the NMI to realize a two phase image registration in this paper. The introduction of the WLD in this paper results from the fact that it can extract image features more effectively than the well-known scale-invariant feature transform and local binary patterns. In the proposed method, the first phase aims at obtaining the parameters (i.e., the control points) relevant to the large deformation, ensuring that the anatomical features of the reference and float images are not destroyed, and reducing the opportunities of getting trapped in local minima by integrating the WLD with the non-local SSD or the structural similarity (SSIM) proposed in [32]. The second phase is focused on getting the parameters related to the small deformation using the NMI to obtain high registration accuracy. This paper is structured as follows: Section 2 presents the novel two phase non-rigid multi-modal image registration method. Section 3 provides discussions of key parameters in the proposed method and comparisons of registration accuracy and efficiency among our method and the NMI, CMI, ESSD and ESSD-NMI methods. Finally, the conclusion is given in Section 4.

Registration Framework
In general, image registration is stated as the following minimization problem: (1) where denotes an objective function defined by the similarity metric, and T denotes the transformation which is defined as coordinate mapping from the domain of the reference image I R to that of the float image I F . To obtain the optimal transformation , an appropriate optimization method should be employed.
In Equation (1), it is highly challenging to tackle large deformations. On the one hand, coarse-to-fine deformation schemes have been commonly applied. Wu et al. [33] used a wavelet-based deformation model to treat global and local information in a coarse-to-fine approach. The combination of coarse-scale landmark and B-splines deformable registration techniques was proposed in [34]. A hybrid method that combined surface and volume information to register cortical structures was proposed in [35]. Postelnicu et al. [36] started with a geometric registration which was used as the initialization and refined it with a non-linear optical flow registration method. On the other hand, to prevent from folding artifacts and preserve local orientation in case of large deformations, the deformation field should be diffeomorphisms. Rueckert et al. [8] enforced the transformation of FFD model to be a diffeomorphism by limiting the control points displacement.
Different from above-mentioned coarse-to-fine deformation registration methods, the proposed two phase image registration method aims at resolving the above minimization problem by using different similarity metrics in two registration phases as shown in Figure 1. In the large deformation phase, the structural representations of I R and I F are firstly obtained using the WLD to compute similarity metric wldNSSD or wldWSSIM. Then, the iterative optimization of the objective function f 1 defined by the wldNSSD or the wldWSSIM is realized to obtain a relatively good initial transformation T 1 while reducing the opportunities of getting trapped in the local minima. However, only using the wldNSSD or the wldWSSIM cannot ensure accurate registration results especially for medical images with complicated deformations because some useful image information may be lost when using the WLD to extract image features. Therefore, a refined registration is implemented by using the original image intensities. In the small deformation phase, the float image I F is firstly deformed to generate the image using T 1 which can provide a good initial value, and then the small deformation for is processed for minimizing the objective function f 2 defined by the NMI to obtain the final transformation T 2 . In our method, we used the diffeomorphic FFD model as the deformation model which uses a multi-resolution way by concatenating the FFDs with different grid sizes and limiting the control points displacement less than 0.4 × the grid sizes [8]. Meanwhile, we selected the BFGS algorithm as the optimization method because it is particularly suited to the registration problem with very large numbers of variables without requiring explicit computations.

Weber Local Descriptor
Ernst Weber observed that the ratio of the increment threshold to the background intensity is a constant in human perception [37]. This relationship between the physical magnitudes of stimuli and the perceived intensity of the stimuli was known as Weber's Law which can be expressed as: (2) where represents the increment threshold, I denotes the initial stimulus intensity and k signifies that the proportion on the left side of Equation (2) remains constant despite the variation of I.
Although Weber's Law describes fundamental relationships of the human perception, i.e., in a biological setting rather than for digital images, several researchers have extended its application to signal and image processing. Dabeer and Onkar introduced a regularized Weber sampler for smooth deterministic signals [38]. Bruni et al. used Weber's Law for scratch detection on digital film materials [39]. By using Weber's Law, Chen proposed a Weber local descriptor (WLD) which was used to extract local features [31].
The WLD has two components: differential excitation (ξ) and orientation (θ). In our paper, only differential excitation is used for registration because using both differential excitation and orientation result in increased computational complexity, and differential excitation has such advantages as detecting edges elegantly, robustness to noise and illumination change, and its powerful representation ability for textures [31]. Specifically, a differential excitation of a current pixel is computed as illustrated in Figure 2. The symbols and are the outputs of the filters f 00 and f 01 . It is easy to understand that . The difference between the center point and its neighbors is given by: where x i (i = 0,1,· · · , p−1) denotes the i-th neighbors of and p is the number of neighbors. The difference is a discrete representation of the Laplace operator. The constancy of Laplacian images is a well-known assumption and has been used e.g., in [40] in the context of optical flow. Normally, this feature is used for invariance under directional changes.
By applying the arctangent function which can limit the output to prevent from increasing or decreasing too quickly when the input becomes larger or smaller [31], the differential excitation of the pixel is computed as: From Equation (4), we can see that although the WLD is not invariant under global brightness changes, it is robust to changes in image contrast. The reason lies in the fact that a change in image contrast in which each pixel value is multiplied by a constant will multiply differences by the same constant, and this contrast change is canceled by the division between and [31]. Here it should be noted that to avoid dividing by zero in Equation (4), a small constant is added to the denominator in practical implementation. To tailor the WLD to our non-rigid multi-modal registration method, it is expressed as: (5) where R denotes the radius of the square symmetric neighborhood.
Actually, WLD features can be extracted from a square symmetric neighborhood of size (2R + 1) × (2R + 1). Figure 3 shows examples of the neighborhood with R = 1 and R = 2. It should be noted that for extracting WLD features, only the border pixels (highlighted in blue in Figure 3) are used instead of all the pixels in the square neighborhood to reduce computation time. To demonstrate the performance of the WLD with different R and its advantage over the entropy image for structural representation, the WLD and the entropy image for deformed T1, PD and original T2 weighted MR images are shown in Figure 4. The comparison between Figure 4(b) and Figure 4(c-e) shows that the detail information in the entropy image of T1 weighted MR images is blurrier than that in the WLD, which indicates that the WLD can provide better structural representation than the entropy image for registration. It should be noted that there exist the black squares and dots in Figure 4(b) because the considered pixel is set to be zero when the minimum intensity is equal to the maximum one in the image patch centered at this pixel. Meanwhile, it is easy to understand that the performance of the WLD changes with different R. (e) WLD with R = 3 of (a); (f) deformed PD image; (g) Entropy image of (f); (h) WLD with R = 1 of (f); (i) WLD with R = 2 of (f); (j) WLD with R = 3 of (f); (k) original T2 image; (l) Entropy image of (k); (m) WLD with R = 1 of (k); (n) WLD with R = 2 of (k); (o) WLD with R = 3 of (k).
From Figure 4(c-e), (h-j) and (m-o), we can see that the WLD with a small R (e.g., R = 1) generates relatively weak but thin edges and thus facilitates the accurate localization of relatively strong edges. By comparison, the WLD with a larger R (e.g., R = 2) produces thicker but stronger edges and thus assists with detecting the weak edges while a much larger R (e.g., R = 3) results in poor edge localization. The zooming of features marked with the red square in Figure 4(c-d) and those marked with the blue square in Figure 4(d-e) can further illustrate the above characteristics. Obviously, the combination of multiple selected R for the WLD produces "multi-granuality" features especially in regions with rich textures and discontinuities. Therefore，using the various radii in the WLD is preferable to using a single radius for the effective structural representation of medical images with complicated features. Based on the above analysis, we use the WLD with R = 1 and R = 2 for structural representation of images.

Weber Local Descriptor Based Non-Local Sum of Squared Differences
To describe the difference between the extracted WLD feature of I R and that of I F , the traditional SSD is improved to resist the disadvantageous influence of noise in the medical images. Inspired by the idea of non-local means proposed by Buades et al. [41], which aims at image denoising, that the pixel similarity can be represented more effectively using image patches than using individual pixels, the non-local SSD (NSSD) is introduced to represent the difference between the WLD features. The NSSD between two images I A and I B is presented as: (6) where N is the image size, denotes the Euclidean norm, and mean the square patch of size S P1 centered at of images I A and I B , respectively. As discussed in Section 2.2.1, by combining the WLD using the neighborhoods of different radii R 1 = 1 and R 2 = 2 with the NSSD, we can obtain the following similarity metric wldNSSD: Based on the wldNSSD, we define the objective function f 1 as: (8) where is a regularization term to constrain the FFD transformation to be smooth, is the weighting parameter which defines the tradeoff between the alignment of the two images and the smoothness of the transformation. The regularization term takes the following form: (9) where N denotes the number of pixels in I R or I F.

Weber Local Descriptor Based Weighted Structural Similarity
Given two images I A and I B to be compared, the SSIM involves such components as a luminance (mean) distortion term l(I A , I B ), a contrast (variance) distortion term c(I A , I B ) and a correlation term s(I A , I B ). By combining the three comparisons, the resultant SSIM index between I A and I B is presented in a simplified form as [32]: (10) where , and are small constants for characterizing the saturation effects of the visual system at the regions of low luminance and contrast and ensuring numerical stability when the denominators are close to zero; , , , , and represent the local mean of and , the local variance of and , and the local covariance between and , respectively. The first two terms l(I A , I B ) and c(I A , I B ) account for nonstructural distortion of the image, whereas the last term accounts for structural distortion of the image.
The SSIM index is not a metric. However, the distance is a single scalar-valued distance measure [42]. In [32], a mean SSIM has been used for image quality assessment. However, it is helpful to improve the performance of image quality assessment algorithms by giving different weights to different image patches [43]. Therefore, the WLD difference based weighted SSIM (wldWSSIM) is adopted as a metric: (11) where and mean the square patch of size S P2 centered at of images I A and I B , N is the image size and the weight is computed as: Similar to the wldNSSD, the similarity metric wldWSSIM is defined as: Based on the wldWSSIM, the objective function f 1 is defined as: where the regularization term and the constant are given as in Equation (8).

Analysis of the wldNSSD and the wldWSSIM
To demonstrate the advantage of the wldNSSD and the wldWSSIM over such metrics as the NMI and the ESSD, we compute the four distance measures on ten pairs of T1 and T2 weighted MR images rotated around the domain center with different angles and translated in x and y directions. Figure 5 shows an excerpt of the corresponding computation results. We can see from Figure 5 that the wldNSSD and wldWSSIM have no local extremes while the NMI has local extremes in Figure 5(a-c), and the ESSD involves local extremes in Figure 5(b). Similar findings also exist for other computation results, which indicate that the wldNSSD and the wldWSSIM are more effective similarity metrics for registration than the NMI and the ESSD.

Small Deformation Phase
To obtain more accurate registration results, the small deformation phase is needed for the refined registration. In this phase, the float image I F is firstly deformed using T 1 which is the output of the large deformation phase. So, we have . The small deformation is processed by minimizing the objective function f 2 defined by the NMI between and I R : (15) where and are given as in Equation (8), and the NMI is defined as: (16) where H denotes the Shannon entropy. , and are defined as: where and are sets of regularly spaced intensity bin centres, p is the discrete joint probability, and are the marginal discrete probabilities of the reference image I R and the float image , respectively [44].

Experiments
In this section, to determine the key parameters in the proposed method and make comparisons of registration performance among our method, the NMI method, the CMI method, the ESSD method and the ESSD-NMI method, extensive experiments have been performed on thirty T1, T2 and PD-weighted MR images of size 256 × 212 pixels from the BrainWeb database [45]. Registration efficiency of all these evaluated methods is appreciated by their computation time (in seconds) when implemented in a multi-resolution way by rescaling the FFD grid spacing 2 k × 2 k (each image is rescaled to a square of size 2 to the kth power) [8] using MATLAB 2010 on a personal computer with 2.40 GHz CPU and 4 GB RAM. Registration accuracy is appreciated by target registration error (TRE) [46] with simulated deformation and expert landmark annotations as ground truth, respectively.
As regards the simulated deformation , it is used as the ground truth to deform one of two different weighted MR images (e.g., T1 and T2). By implementing the various registration methods on the float image and the generated reference image, the estimated deformation will be obtained. Based on the whole image domain, is computed as: (20) where denotes the size of the whole image domain I. When we use expert landmark annotations as ground truth, for a estimated deformation and a set of anatomical landmark pairs = { , } (i=1, 2,· · · , m, where m is the number of anatomical landmarks), is defined as: Considering that expert landmark annotations based evaluation requires the anatomical landmarks in the reference image I R and the float image I F to be marked, we have invited five experts to manually select twenty landmarks defined according to the anatomical structures including the left and right lateral ventricles for the deformed T1, PD and original T2 images shown in Figure 6. The patch sizes S P1 and S P2 are very important for the similarity metrics wldNSSD and wldWSSIM. To quantitatively determine these parameters, we discuss their influence on registration accuracy and efficiency by using fifteen images. Figure 7 shows the for the wldNSSD and wldWSSIM methods using various patch sizes. Figure 8 shows the computation time for the two metrics using various patch sizes. We can see from Figure 7 that a too small or too large patch size has a disadvantageous influence on registration accuracy. For the wldNSSD method, the can achieve the minimum value of 7.5 mm when the patch size is 7 × 7 or 9 × 9 while the optimal patch size is 11 × 11 for the wldWSSIM method. Meanwhile, it is shown in Figure 8 that the computation time increases with the increasing patch size because a larger patch size means that more pixels need to be processed. Therefore, to achieve the tradeoff between registration precision and efficiency, the patch size is chosen to be 7 × 7 for the wldNSSD method and 11 × 11 for the wldWSSIM method in our experiments. It should be noted that the wldWSSIM method needs bigger patches for higher registration performance than the wldNSSD. The reason can be explained in this way. The wldNSSD depends on the difference of intensities of two WLD feature images while the wldSSIM takes into account not only the local mean and the local variance, but also the local covariance of two WLD feature images for representing their structural similarity. Therefore, for the wldWSSIM method, a larger patch size is needed to ensure the statistical significance.

Choice of the Weighting Term
The weighting term is specific to the processed images. To quantitatively determine , we make some tests on with fifteen T1, T2 and PD-weighted MR images.
The comparison results of registration precision with different for similarity measures wldWSSIM and wldNSSD are shown in Figure 9. We can see from this figure that a value of =0.01 provides good registration results for the various weighted MR images. The reason lies in two aspects. On the one hand, the target registration error TRE l becomes unstable with a larger because it will reduce the impact of the similarity measure which is important for registration precision. On the other hand, the regularization term which is used to constrain the FFD transformation to be smooth is less important because we have limited the control point displacement less than 0.4 × the grid sizes in the FFD model.

Comparison of Registration Performance
To demonstrate the advantage of the wldNSSD and wldWSSIM, wldNSSD-NMI and wldWSSIM-NMI methods, they are compared with other evaluated methods in terms of registration accuracy and efficiency.
The mean and standard deviation (std) of and for all the evaluated methods are shown in Table 1 and Table 2. Here, "/" in Tables 1 and 2 means that no registration is implemented. Obviously, the wldNSSD and wldWSSIM methods can achieve a smaller and than the NMI, CMI and ESSD methods. Meanwhile, we can see that the wldNSSD-NMI and wldWSSIM-NMI methods for two deformation phases can achieve higher registration accuracy than the corresponding wldWSSIM and wldNSSD methods, which demonstrates the advantage of combining the WLD similarity metrics with the NMI. It should be noted that the NMI method can be seen as a two phase method since it was implemented in a multi-resolution way in our test, but this method achieves a bigger and than the wldWSSIM-NMI and wldNSSD-NMI methods because the NMI method is easy to get trapped into local minima only using the image intensity information. Besides, the intra-observer errors and inter-observer errors are listed in Table 2 for reference. It is shown that the smallest registration error for the wldWSSIM-NMI method is 6.1 mm and it is still higher than the inter-observer errors. The main reason is that experts were requested to repeat the marking procedure if necessary to ensure that the inter-observer errors are less than 5.0 mm in the case of the large simulated deformation among different weighted MR images.    Table 3 lists the computation time of all these methods. The observation from Table 3 shows that our methods have less computation time than the NMI, CMI, ESSD and ESSD-NMI methods and thus they outperform these compared methods in terms of registration efficiency. Moreover, the wldWSSIM-NMI method is of lower registration efficiency than the wldNSSD-NMI while the former can provide higher registration accuracy. Therefore, our two phase methods can be applied according to the clinical requirements. If one application attaches more significance to registration accuracy, the wldWSSIM-NMI method is better choice. If registration efficiency is more important, the wldNSSD-NMI method is preferable.  Figure 10 shows an example of T1-T2 and PD-T2 weighted MR images and registration results for the wldWSSIM-NMI, wldNSSD-NMI and ESSD-NMI methods. From Figure 10(d-f), we can see that the results using the wldWSSIM-NMI and wldNSSD-NMI methods are anatomically more similar to the reference T2 weighted MR image than those using the ESSD-NMI, especially in the lateral ventricle which is marked with a yellow square. Similar results can be seen in Figure 10(g-i).

Conclusions
In this paper, we have proposed a two phase non-rigid multi-modal medical image registration method using the Weber local descriptor based similarity metrics and the normalized mutual information. In the first phase, the parameters relevant to the large deformation are obtained by minimizing the objective function defined by the novel similarity metric wldNSSD or wldWSSIM which is focused not on intensities of individual pixels but on structural information of images. With the good initial deformation value provided by the output of the large deformation phase, the parameters related to the small deformation can be accurately obtained using the NMI in the second phase. The non-rigid image registration experiments on the T1, T2 and PD weighted MR images demonstrate that compared with the NMI, CMI, ESSD and ESSD-NMI methods, our method can obtain smaller registration errors and higher computational efficiency. Future work will be focused on extending our method to multi-modal 3D medical image registration.