Robust Non-Rigid Feature Matching for Image Registration Using Geometry Preserving

In this paper, a robust non-rigid feature matching approach for image registration with geometry constraints is proposed. The non-rigid feature matching approach is formulated as a maximum likelihood (ML) estimation problem. The feature points of one image are represented by Gaussian mixture model (GMM) centroids, and are fitted to the feature points of the other image by moving coherently to encode the global structure. To preserve the local geometry of these feature points, two local structure descriptors of the connectivity matrix and Laplacian coordinate are constructed. The expectation maximization (EM) algorithm is applied to solve this ML problem. Experimental results demonstrate that the proposed approach has better performance than current state-of-the-art methods.


Introduction
Image registration is a fundamental task in many fields, such as computer vision, robotics, medical image processing, and remote sensing [1][2][3][4]. The main purpose of image registration is to align two or more images of the same scene taken from different viewpoints, at different times, and/or by different sensors.
Many algorithms have been developed for image registration. It can be roughly divided into area-based and feature-based methods. Area-based methods match image intensity values directly. They mainly include the cross-correlation (CC) methods, the Fourier methods, and mutual information (MI) methods. The normalized CC method is a classic in the area-based methods [5]. The similarity of window pairs from two images are computed and the maximum is considered as a correspondence. A method based on wavelet decomposition and correlation is proposed for image registration [6]. The Fourier methods find the Fourier representation of the images and a subpixel phase correlation with Gaussian mixture model (GMM) is used to register the images [7]. The MI method provides a measure of dependence between two images, and a deterministic explanation for MI-based image registration is proposed in [8].
Feature-based methods extract the salient structures, i.e., features, from the images. The extracted features are called control points [9], and the feature-based methods are considered as point set registration. The traditional for feature-based image registration approach uses a two-step strategy. In the first step, the distances of feature points from local descriptors, such as scale-invariant Fourier in Section 3. In Section 4, experiments are carried out to validate the performance of the proposed method. Finally, conclusions are given in Section 5.

Problem Formulation
Without loss of generality, the extracted feature points from pairwise images are represented as X = {x i x i ∈ D } N i=1 and Y = {y j y j ∈ D } M j=1 , respectively, where D denotes the dimension of the feature points. Our goal is to find a suitable transformation and to establish the correct correspondence between X and Y. We set the non-rigid transformation of point set X as the GMM centroids and the other point set Y as the data point generated by the GMM. That is, (1) where N denotes the Gaussian distribution, f means the non-rigid transformation, σ 2 represents the equal isotropic covariances, and I is the identity matrix. We introduce an indicator Z = [z 1 , z 2 , . . . , z M ], where z j is a 1 × M binary vector with elements z ji for i = 1, 2, . . . N. The z ji satisfy z ji ∈ {0, 1} and ∑ i z ji = 1. It means that only one element in vector z j is equal to 1 and all the other elements in vector z j are equal to 0. We have To account for outliers in Y, a uniform component p(y j |N + 1 ) = 1 M with weight w is added to the mixture model, where w is a weight of the uniform distribution for noise and outlier. Assuming independent and identically distributed data in Y, we have The negative log-likelihood function of Equation (1) can be represented as: where q(z ji ) is the posterior probability. The non-rigid transformation f is chosen as a displacement function [18] f (X) = X + GW (6) where G denotes a N × N dimensional Gaussian kernel matrix with element g ij = exp(− 1 2 , τ denotes the width of the Gaussian kernel, and W is a N × D dimensional weight matrix of the Gaussian kernel. In order to enforce the motion coherence for preserving the global topology, the constraint on the weight matrix W given as reference is used [18] E w (W) = Tr(W T GW) (7) where Tr denotes the trace of matrix and W T is the transposition of matrix W. The objective function of feature-based image registration can then be written as where α is the trade-off parameter.
In Equation (1), the membership probability π ji is chosen in advance. Some methods set the membership probability to be equal [18,[25][26][27]. For robustness, the feature of local shape is proposed here for membership probability initialization. In the 2D case, the shape context (SC) [28] is used as feature descriptor and the Hungarian method is used to perform point matching. In the 3D case, the fast point feature histograms (FPFH) [29] and a sample consensus initial alignment method are used for matching.
Then, two local structure descriptors are constructed to preserve the local structure of point set X. By computing the Euclidean distance between each point and its neighbors in X, the K nearest neighbors of each point in X can be obtained. Each point in X can be represented as a weighted linear combination of its K nearest neighbors. Let L = {L ij } j=1:N i=1:N be an N × N weighted matrix. If a point x j does not belong to the K nearest neighbors of a point x i , then L ij is set as 0. The matrix L can be obtained by minimizing the cost function: where the sum of each of row of L is equal to 1. After the non-rigid transformation, the local structure can be preserved by minimizing the transformed cost function (10) where G(i, .) means the i th row of G. The objective function of feature-based image registration in Equation (8) can be expressed as: where β is the trade-off parameter. For the second structure descriptor, the Laplacian coordinate is proposed to preserve the size of neighborhood structure. For a graph (V, E), where V and E is the set of vertices and the set of edges in the graph, respectively, the Laplacian coordinate is expressed as where (i, j) means the edge between vertices v i and vertices v j and h ij is the weight coefficient. The graph of point set X is constructed as follows: the vertex set is set as where ε is the threshold parameter to construct the neighborhood graph. h ij is chosen as After the non-rigid transformation, the Laplacian regularization can be preserved by minimizing the following term: The objective function of the feature-based image registration in Equation (11) can be updated as: where γ is the trade-off parameter.

EM for the Proposed Method
In order to estimate the Θ = [W, σ 2 , Z], the EM algorithm is proposed. There are two steps in the EM algorithm. (1).
where m refers to the mth iteration. By iterating these two steps, the parameters Θ are determined while the likelihood function can also be increased.

E-Step
We use q(z ji ) to denote p(z ji = 1 |Y, X ), which can be found using the Bayes' theorem

M-Step
The E L (Θ, Θ (m) ) can be rewritten as is the all-one column vector, I is the identity matrix, S denotes the N × N Laplacian matrix, S = C − H, H is the N × N adjacency matrix with element h ij , and C is the diagonal matrix whose i-th entry is the sum of (h ij ) j=1,2,...N . The estimates of σ 2 k and W are updated iteratively by taking the corresponding partial derivative of the expected log likelihood. That is, We have: Similarly, W can be obtained by solving the following system The proposed feature-based image registration algorithm is summarized in Algorithm 1.

Require:
The feature point , parameters w, α, β, and γ. 1: Initialize W = 0, P = I N×N 2: Initialize W = 0, P = I N×N 3: Search the K nearest neighbors for each point in X. 4: Perform L by minimizing the Equation (9)  Extract the local shape to assign the membership probability π ji 9: Update the q(z ji ) as Equation (16) 10: M-step: 11: Update σ 2 as Equation (19) 12: Update W by solving the linear system as Equation (21) 13: end while Ensure: The aligned point set is f (X) = X + GW The probability of correspondence is given by P

Performance Validation
In this section, experiments are carried out to evaluate the performance of the proposed method. The state-of-the-art algorithms ICP method http://www.cvlibs.net/software/libicp/ [14,30], TPS-RPM method https://www.cise.ufl.edu/~anand/publications.html [16], CPD method https: //sites.google.com/site/myronenko/research/cpd [18], and CPD-GL method https://sites.google. com/site/jiayima2013/ [31] are used for comparison. The criteria for performance evaluation is chosen as where S is the true matches between the two point sets. The error refers to the registration error.

Parameter Settings
In the proposed method, there are seven parameters: w, τ, K, ε, α, β, and γ. Parameter w is a weight of the uniform distribution for noise and outlier. Parameter τ is the width of the Gaussian kernel for non-rigid transformation function. Parameter K is the number of neighbors used to perform the matrix L in Equation (9) for the first local structure descriptor. Parameter ε is a threshold used to construct the neighborhood graph for the second local structure descriptor. Parameters α, β, and γ are three trade-off parameters for global and local constraints. The other four parameters w, τ, K, ε are selected as [31]. The model selection is shown in Figure 1. In this experiment, the fish dataset is performed and the registration error error is chosen as the metric. It is observed that the proposed method has almost the same performance when α ∈ [9,11], β ∈ [200, 400], and γ ∈ [20,40]. Therefore, we set w = 0.1, τ = 2, K = 5, ε = 0.05, α = 10, β =340, and γ = 24.

Synthesized Point Set Registration
This test is based on the synthesized dataset, which is constructed by Chui and Rangarajan [16]. It consists of two point sets: Chinese character and fish shape. For each model, we designed five sets of data to measure the robustness of registration algorithms with respect to different degrees of deformation and noise. In the deformation test, we use Gaussian radial basis functions to generate deformations, where the coefficients are sampled from a Gaussian distribution with zero mean and standard deviation ranging from 0.02 to 0.08. For the noise test, the standard deviation of the Gaussian noise added to the the original data ratio ranges from 0.01 to 0.05. In each test, one of the above distortions is applied to the model set to create an observed data set, and 100 samples are generated for each degradation level. Comparisons on the fish and Chinese data with varying noise degradation are shown in Figure 2. It is observed that the proposed method with β = 0 and γ = 0 has almost the same performance as the CPD-GL method, and the proposed method with β = 340 and γ = 0 can register the point set under different degradation slightly better than CPD-GL, whereas the proposed method with β = 340 and γ = 24 has the best performance. The metrics under different noise and deformation in fish and Chinese dataset are shown in Figure 3. Apparently, the proposed method with β = 340 and γ = 24 has the best performance among all. Furthermore, in order to show the smoothness of the transformation using the proposed method, the experiment of the standard deviation 0.03 for the Gaussian noise added to the Chinese data is performed and 100 samples are generated. The performances of smoothness of the transformation by CPD, CPD-GL, and the proposed method are given in Figure 4, where NP and SD denote the points in the Chinese dataset and the number of samples, respectively, | f (X)| is defined as | f (X)| = f x (X) 2 + f y (X) 2 , f x (X) and f y (X) denote the 1st and 2nd dimensions of f (X), respectively. As can be observed, the proposed method achieves smoother performance than other methods.

IMM Hand Landmark Registration
A hand landmark experiment is used to evaluate the performance of the proposed method. The benchmark IMM Hand Database http://www2.imm.dtu.dk/pubdb/views/publication_details. php?id=403 is used. It consists of 40 images of 4 groups of the different human left hand. Images were acquired with resolution 800 × 600 JPG format in December 2001. Each group has 10 samples with different poses. The hand structures in the dataset have 56 landmarks with the ground truth correspondences. In this experiment, we use 4 groups for evaluation. For each group, the first hand shape is used as the model point set, another 9 hand shapes are considered as data point sets. Therefore, the ground truth correspondences of nine hand landmark pairs for each group are constructed. The proposed method is used to align these face landmark pairs. Comparisons are given in Figure 5. It is observed that the proposed method with β = 340 and γ = 24 outperforms than the other methods. Then, comparing the registration error are given in Figure 6, which the performance of the proposed method with β = 340 and γ = 24 has the best performance. With respect to synthesized point set and IMM hand landmark, CPD-GL outperforms ICP, CPD, and TPS, whereas the proposed method with β = 0 and γ = 0 has the almost same performance with CPD-GL method. Moreover, the proposed method with β = 340 and γ = 0 has slightly better performance than the CPD-GL method, and the proposed method with β = 340 and γ = 24 has the best performance. Then, the computational complexity of the proposed method is analyzed.
The computational complexity for searching the K nearest neighbors for each local track in X needs O((K + N) log N) by using the k − d tree, the computational complexity for performing the matrix L needs O(K 3 N), the EM algorithm needs almost O(N 3 ). Therefore, the total computational complexity of the proposed algorithm is O(N 3 ). All algorithms are implemented in MATLAB, and the tests are performed on a Core i7-4790 3.6GHz with 8GB RAM. The run-times of these algorithms are given in Table 1, which illustrates that the proposed method has almost the same computation time with CPD-GL method.

4DCT_75 Dataset
The 4DCT_75 dataset of the benchmark data from the DIR-lab http://www.dir-lab.com/index. html is used. The 4DCT_75 data contains ten ("case1" to "case10") thoracic 4D CT images. Each 4D CT images have ten 3D CT images. In particular, we chose "4DCT_75_T00" as a model point set. The other five point sets ("4DCT_75_T10" to "4DCT_75_T50") are set as the scene point sets. Thus, we have 50 pairs of point sets. The example of the registration result of the proposed method are shown in Figure 7, which the proposed method with β = 340 and γ = 24 can align these point pairs. Furthermore, the registration errors of CPD method, CPD-GL method, and the proposed method are shown in Figure 8. It is shown that the proposed method with β = 340 and γ = 24 has a better performance than the other methods.

Real Image Feature Matching
In this experiments, the real image pairs are obtained from the data of Jiayi Ma [31], Tuytelaars [32], the Oxford buildings dataset http://www.robots.ox.ac.uk/~vgg/data/oxbuildings, and the Paris dataset http://www.robots.ox.ac.uk/~vgg/data/parisbuildings. The SIFT algorithm is used to extract the feature points of each image, and the putative correspondence is created by nearest neighbor matching. The goal is to find correspondences/matches between two sets of feature points. The performance is evaluated by precision, recall, and F1, where the F1 measure is used to evaluate the balance between recall and precision. The precision, recall, and F1 are defined by: Precision = N L N C , Recall = N L N P , and F1 = 2·Recall·Precision Recall+Precision , where N L , N C , and N P denote the preserved inlier number, the preserved correspondence number, and the inlier number contained in the putative correspondences, respectively. The ground truth is established in advance manually. The results of matching these image pairs are illustrated in Figure 9. It is observed that the proposed method can distinguish inliers from the outliers. Furthermore, the performance of CPD, CPD-GL and the proposed method in different scenarios is shown in Figure 10. It is observed that the proposed method achieves slightly better performance in precision, recall and F1 metrics compared to other methods. Furthermore, the registration error error in pixels is proposed to evaluate the performance of these algorithms. Figure 10d demonstrates that the proposed method performs better than the CPD method and CPD-GL method. Based on the above evaluation results, the proposed method has better performance than other methods.

Discussion
In this paper, the proposed method is compared with the CPD and CPD-GL methods. The CPD registration method only preserves the global structure and ignores the local structure of point set. Although the CPD-GL algorithm preserves the global and local structure, it only introduces one local structure descriptor. In the proposed method, two local structure descriptors are constructed to preserve the local structure of feature points. The first one is the connectivity matrix-based constraint, which retains the local neighborhood relationship, while the other is the Laplacian coordinate-based constraint, which encodes the local neighborhood scale. Therefore, the proposed method preserves the local structure in terms of the neighborhood relationship and neighborhood scale, and it is more flexible. To illustrate the advantages of this method, three experiments were performed based on the proposed method and compared with the state-of-the-art registration algorithms. In the first experiment, for the 2D point set registration, the experimental results show that the proposed algorithm outperforms other algorithms and has almost the same computation time with CPD-GL algorithm. In the second experiment, registration error is employed to estimate the proposed algorithm on the 3D data, and the proposed algorithm outperforms other algorithms. Furthermore, the proposed algorithm is tested on several image pairs under many different evaluation metrics. The performances of these registration algorithms are evaluated by precision, recall, F1, and registration error. From the above experimental results, the proposed method outperforms the other methods. In particular, it is better to use the proposed method which provides stable and accurate transformation estimation for handling complicated non-rigid deformation. When deformation is not significant, the CPD, CPD-GL, and the proposed method obtain almost similar results, and the CPD has lower computation complexity.
There is room for further development of the proposed framework, especially proposing a fast algorithm and developing more evaluation metrics for point set registration such as smoothness of the transformation. It is interesting to investigate some real-life scenarios, where point set correspondences are weaker such as clouds of points with different amounts of points, missing correspondences or points. Furthermore, the proposed point set registration framework can be applied to perform track-to-track association with sensor bias in distributed multi-target tracking [33].

Conclusions
In this paper, a GMM-based probabilistic method for image feature matching is proposed. The feature points of one image are represented by GMM centroids, and the feature points of the other image are fitted to the first image feature points by maximizing the likelihood. To preserve the local structure of these feature points, two local structure descriptors are constructed. The EM algorithm is used to perform image feature matching. Computer simulations confirm the effectiveness of the proposed method in registering different types of images compared to the state-of-the-art registration algorithms.