Non-Rigid Multi-Modal 3D Medical Image Registration Based on Foveated Modality Independent Neighborhood Descriptor

The non-rigid multi-modal three-dimensional (3D) medical image registration is highly challenging due to the difficulty in the construction of similarity measure and the solution of non-rigid transformation parameters. A novel structural representation based registration method is proposed to address these problems. Firstly, an improved modality independent neighborhood descriptor (MIND) that is based on the foveated nonlocal self-similarity is designed for the effective structural representations of 3D medical images to transform multi-modal image registration into mono-modal one. The sum of absolute differences between structural representations is computed as the similarity measure. Subsequently, the foveated MIND based spatial constraint is introduced into the Markov random field (MRF) optimization to reduce the number of transformation parameters and restrict the calculation of the energy function in the image region involving non-rigid deformation. Finally, the accurate and efficient 3D medical image registration is realized by minimizing the similarity measure based MRF energy function. Extensive experiments on 3D positron emission tomography (PET), computed tomography (CT), T1, T2, and (proton density) PD weighted magnetic resonance (MR) images with synthetic deformation demonstrate that the proposed method has higher computational efficiency and registration accuracy in terms of target registration error (TRE) than the registration methods that are based on the hybrid L-BFGS-B and cat swarm optimization (HLCSO), the sum of squared differences on entropy images, the MIND, and the self-similarity context (SSC) descriptor, except that it provides slightly bigger TRE than the HLCSO for CT-PET image registration. Experiments on real MR and ultrasound images with unknown deformation have also be done to demonstrate the practicality and superiority of the proposed method.


Introduction
In recent years, the non-rigid three-dimensional (3D) multi-modal medical image registration has attached significant attention [1][2][3][4]. This mainly stems from two aspects. Firstly, the different 3D imaging modalities are often fused to produce the precise diagnosis, since they can provide complementary information for interpreting the anatomy, tissue, and organ. As the necessary prerequisite for image fusion, multi-modal medical image registration is significant in relating clinically significant information from the different images. However, the relationship of intensity values in multi-modal 3D medical images might be highly complicated due to differences between the imaging in the case of the high-dimensional optimization problem. Another popular method is to reduce the dimension of transformation parameters while using the geometric transform models that are based on knowledge [27][28][29][30]. In these methods, it is required to have enough understanding of material properties of organs or tissues to establish a suitable geometric transform. However, some organs and tissues are so complicated that the existing methods cannot accurately characterize their material properties. Meanwhile, when determining the geometry and the boundary conditions, it is necessary to accurately segment the anatomy of medical images, which indeed is a very challenging task. Some alternative methods can be adopted to address this challenging problem. For example, by means of the mask image, the areas in the images that involve no non-rigid deformation can be covered up to reduce the number of the deformation field variables that are involved in the optimization process. However, the shape of such areas might often be irregular, thereby accurately leading to the difficulty in determining the mask image.
We have proposed a novel registration method using an improved modality independent neighborhood descriptor (MIND) based on the foveated nonlocal self-similarity to address these problems in the construction of similarity measure and the solution of non-rigid transformation parameters. The contributions of our work lie in the two aspects. For one thing, we have designed the foveated MIND (FMIND) for the effective structural representations of 3D medical images, thereby ensuring accurate image registration. One the other hand, the spatial constraint method based on the FMIND is proposed and introduced into the Markov random field (MRF) optimization to reduce the number of non-rigid transformation parameters and restrict the calculation of the energy function in the image regions involving local non-rigid deformation, thereby ensuring efficient image registration. Extensive experiments on multi-modal medical images demonstrate that the proposed method is provided with higher registration accuracy, except for computed tomography-positron emission tomography (CT-PET) images and higher computational efficiency than other evaluated registration methods. Figure 1 shows the flowchart of the proposed image registration based on the FMIND. Firstly, the FMIND is constructed based on the foveated nonlocal self-similarity and it is applied to the reference image I R and the float image I F to produce the corresponding structural representations FMIND (I R ) and FMIND (I F ), respectively. Afterwards, the objective function, i.e., the energy function, is established based on the free-from deformation (FFD) model and the similarity measure defined as the sum of absolute differences (SAD) between FMIND(I R ) and FMIND(I F ). Finally, the FMIND based spatial constraint is introduced to produce the mask image for the MRF discrete optimization. During the iterative optimization, the deformation vector, which is a vector of parameters defining the deformation field, is produced at each iteration. The final optimal deformation vector T' will be obtained once the optimization procedure is terminated, and it is utilized to produce the registration result. transform models that are based on knowledge [27][28][29][30]. In these methods, it is required to have enough understanding of material properties of organs or tissues to establish a suitable geometric transform. However, some organs and tissues are so complicated that the existing methods cannot accurately characterize their material properties. Meanwhile, when determining the geometry and the boundary conditions, it is necessary to accurately segment the anatomy of medical images, which indeed is a very challenging task. Some alternative methods can be adopted to address this challenging problem. For example, by means of the mask image, the areas in the images that involve no non-rigid deformation can be covered up to reduce the number of the deformation field variables that are involved in the optimization process. However, the shape of such areas might often be irregular, thereby accurately leading to the difficulty in determining the mask image. We have proposed a novel registration method using an improved modality independent neighborhood descriptor (MIND) based on the foveated nonlocal self-similarity to address these problems in the construction of similarity measure and the solution of non-rigid transformation parameters. The contributions of our work lie in the two aspects. For one thing, we have designed the foveated MIND (FMIND) for the effective structural representations of 3D medical images, thereby ensuring accurate image registration. One the other hand, the spatial constraint method based on the FMIND is proposed and introduced into the Markov random field (MRF) optimization to reduce the number of non-rigid transformation parameters and restrict the calculation of the energy function in the image regions involving local non-rigid deformation, thereby ensuring efficient image registration. Extensive experiments on multi-modal medical images demonstrate that the proposed method is provided with higher registration accuracy, except for computed tomography-positron emission tomography (CT-PET) images and higher computational efficiency than other evaluated registration methods. Figure 1 shows the flowchart of the proposed image registration based on the FMIND. Firstly, the FMIND is constructed based on the foveated nonlocal self-similarity and it is applied to the reference image IR and the float image IF to produce the corresponding structural representations FMIND(IR) and FMIND(IF), respectively. Afterwards, the objective function, i.e., the energy function, is established based on the free-from deformation (FFD) model and the similarity measure defined as the sum of absolute differences (SAD) between FMIND(IR) and FMIND(IF). Finally, the FMIND based spatial constraint is introduced to produce the mask image for the MRF discrete optimization. During the iterative optimization, the deformation vector, which is a vector of parameters defining the deformation field, is produced at each iteration. The final optimal deformation vector T' will be obtained once the optimization procedure is terminated, and it is utilized to produce the registration result.

The Foveated Modality Independent Neighborhood Descriptor
The FMIND is presented based on the characteristics of human visual system (HVS). In the HVS, the distribution of cone cells is uneven. The foveation has the highest density. If the foveation is taken as the center, the cell density will fall fast when it is extended around. The optic nerve cells have similar characteristics. Therefore, when we watch a point in an image, this point will have the highest sensitivity and the sensitivity will drop with the increasing distance to the point. Inspired by the characteristics of the HVS, Alessandro Foi et al. [31] have proposed calculating the patch similarity based on the Euclidean distance d FOV between the the foveated patches, defined as: where I FOV denote the foveated patches that were obtained by foveating the image I at the two fixation points x 1 and x 2 . By applying the foveation operator F to the image I, the foveated patch I FOV x is produced as: where u denotes the location of any pixel in the foveated image patch S. In [31], the designed foveation operators mainly include the isotropic and anisotropic foveation operators. As the latter has more advantages than the former in describing the image edges and textures, it will be used as the foveation operator. This operator is defined as: where v ρ,θ u denotes the blur kernel and it is mainly structured by the elliptical Gaussian probability density function (PDF), ρ determines the elongation of the Gaussian PDF, and θ denotes the angular offset, respectively. The blur kernel v ρ,θ u is defined as [31]: denote the elliptical Gaussian PDF with the standard deviation of 1 2 √ π and ∠u+θ determines the orientation of the axes of the elliptical Gaussian PDF. Figure 2 gives an example of two anisotropic foveation operators, where S is a 7 × 7 foveated patch, θ = 0, and the different kernel elongation parameters ρ = 2 and ρ = 6 are used, respectively. Clearly, this radial design of these anisotropic foveation operators is consistent with HVS features, which thereby leads to the effective structural representation of images for the FMIND.

The Foveated Modality Independent Neighborhood Descriptor
The FMIND is presented based on the characteristics of human visual system (HVS). In the HVS, the distribution of cone cells is uneven. The foveation has the highest density. If the foveation is taken as the center, the cell density will fall fast when it is extended around. The optic nerve cells have similar characteristics. Therefore, when we watch a point in an image, this point will have the highest sensitivity and the sensitivity will drop with the increasing distance to the point. Inspired by the characteristics of the HVS, Alessandro Foi et al. [31] have proposed calculating the patch similarity based on the Euclidean distance d FOV between the the foveated patches, defined as: where I x 1 FOV and I x 2 FOV denote the foveated patches that were obtained by foveating the image I at the two fixation points x 1 and x 2 . By applying the foveation operator F to the image I, the foveated patch I x FOV is produced as: where u denotes the location of any pixel in the foveated image patch S. In [31], the designed foveation operators mainly include the isotropic and anisotropic foveation operators. As the latter has more advantages than the former in describing the image edges and textures, it will be used as the foveation operator. This operator is defined as: where v u ρ,θ denotes the blur kernel and it is mainly structured by the elliptical Gaussian probability density function (PDF), ρ determines the elongation of the Gaussian PDF, and θ denotes the angular offset, respectively. The blur kernel v u ρ,θ is defined as [31]: where K(0) = ‖v u ‖ 1 , K(u) = ‖v u ‖ 2 , g 1 2√π denote the elliptical Gaussian PDF with the standard deviation of 1 2√π and ∠ u+θ determines the orientation of the axes of the elliptical Gaussian PDF. Figure 2 gives an example of two anisotropic foveation operators, where S is a 7 × 7 foveated patch, θ = 0, and the different kernel elongation parameters ρ = 2 and ρ = 6 are used, respectively. Clearly, this radial design of these anisotropic foveation operators is consistent with HVS features, which thereby leads to the effective structural representation of images for the FMIND.  We will propose the FMIND based on the foveated nonlocal self-similarity between different image patches in the same image borrowing the idea of self-similarity in the non-local means denoising. The FMIND is expressed as: where R denotes a search window centered at x, d FOV (I, x, x + r) denotes the distance between the foveated image patches I FOV x and I FOV x+r ; n is a normalization constant to ensure that the maximum of FMIND(I, x, r) is 1; V FOV (I, x) denotes the variance of the foveated image patch I FOV x centered at x in the image I, and it controls the attenuation degree of this function in Equation (5). The variance V FOV (I, x) is estimated as the mean of foveated distances for all the pixels in the foveated patch S.
where |S| denotes the number of pixels in S. The structural information around the pixel x in the image I will be described by one-dimensional vector of size |R|, where |R| denotes the number of pixels in the search window R by means of the FMIND. After obtaining the FMIND for the reference and float images, the similarity metric SADF(I(x), J(x)) between two pixels at the same position x in the images I and J can be expressed as the mean SAD between FMIND(I, x, r) and FMIND(J, x, r) of pixels in R.
where R takes a six-neighborhood in this paper.

Discrete Optimization Based on the MRF
After obtaining the similarity measure for the two different modal images, we will use the FFD as the transformation model and use Markov random field (MRF) optimization [32] to obtain the transformation parameters in the FFD. The reason for choosing this discrete optimization method is that it does not need to calculate the gradient of the energy function in the process of optimization, which thereby facilitates producing the good registration result by avoiding falling into the local minimum. In this method, the image registration problem will be converted into the MRF based discrete optimization problem.
where E denotes the general form of a first-order MRF, i.e., the energy function and λ is a constant; G is the set of vertices and |G| denotes the number of vertices in G, where G can be regarded as the vertex set in the FFD, because this method uses the FFD as the deformation model; N(p) and N(q) refer to the neighborhood of vertices p and q, respectively; l is the discrete labelling while l p and l q are the labels that are assigned to the vertices p and q, respectively; V p l p denotes the data item of the energy function E MRF (l), while V pq l p , l q represents its smooth regularization and it takes the L 1 -norm to encourage the neighboring nodes p and q to keep the displacement. Accordingly, the MRF optimization, actually, is to seek to assign a label that is associated with the deformation to each vertex, so that the energy function in Equation (8) is minimized. In this paper, the fast primal-dual (Fast-PD) [33] algorithm will be used for the MRF optimization to produce the registration result. More details about the Fast-PD algorithm can be found in [33].

Spatial Constraint Based on the FMIND
When the above registration method is applied to three-dimensional (3D) medical images, the number of deformation field variables will be large. If all pixels' displacement along the x, y, and z directions is considered, the number of dense deformation field variables will be 3·|I x |·|I y |·|I z |, where |I x |, |I y | and |I z | denote the number of pixels in the x, y, and z dimensions of the image I, respectively. For example, there will be 50,331,648 deformation field variables when |I x | =|I y |=|I z | = 256. It is indeed very time-consuming to address such a high dimensional optimization problem.
In the reference and float images, sometimes only some areas involve non-rigid deformation. In addition, the non-rigid registration is unnecessary for some smooth areas. For these regions, the mask can be used to indicate that they will be excluded from the registration process. In this way, we can not only reduce the number of variables for describing the deformation field, but also focus the calculation of the energy function in the image areas that indeed involve the local non-rigid deformation. However, the shape of areas without non-rigid deformation is often irregular. Generally, manual intervention or image segmentation is needed for obtaining the appropriate mask image. However, these technologies cannot ensure that the satisfactory mask image can be produced for US and PET images due to their low image contrast, blurriness, and edge discontinuousness.
We will put forward the spatial constraint method based on the FMIND to address the above problem. From Equation (7), it can be seen that SADF(I(x), J(x)) contains the corresponding relationship of the local spatial information at the pixel x in the images I and J. This information can be used to reduce the number of variables for describing the deformation field and limit the control nodes in the deformation field to move in the areas with the local non-rigid deformation. In the FMIND based spatial constraint method, the vertex set G will be divided into the set G s of static vertices and the set G d of dynamic vertices based on the local spatial information included in the FMIND. The vertices in G s are similar to those in the smooth areas and the areas that involve no deformation. Meanwhile, the vertices in G d are similar to those in the non-smooth areas involving the deformation. The calculation of the energy function can be restricted in the areas involving the non-rigid deformation through the movement of these dynamic vertices with the local non-rigid deformation. In this way, the number of deformation field variables will decrease from 3·|G x |·|G y |·|G z | to 3·|G d,x |·|G d,y |·|G d,z |, where |G d,x |, |G d,y |, and |G d,z | denote the number of vertices in x, y, and z dimensions of G d . By utilizing the FMIND based spatial constraint, we can obtain the division of vertices, thereby generating the mask image without manual intervention and image segmentation.
There will be two requirements that no vertices of the whole MRF model will be omitted and no repeated division of vertices will be done to ensure the effective division of G in the FMIND based spatial constraint method. Correspondingly, the logical relationship among G, G s , and G d can be expressed as G = G s ∪ G d and ∅ = G s ∩ G d , where ∅ denotes the empty set. We have designed the following vertex partition algorithm according to the above requirements. For any vertex p i, j, k of the set G in the MRF model, i.e., G = p i, j, k |1 ≤ i ≤ |G x |, 1 ≤ j ≤ |G y |, 1 ≤ k ≤ |G z , we will check the similarity metric SADF for each pixel x in the local image patch LP(p i, j, k ) with radius R LP , which takes p i, j, k in G as the center, as shown in Figure 3. Let con denote the number of pixels in LP(p i, j, k ) whose similarity 1-SADF(I(x), J(x)) is greater than a certain threshold δ. If the ratio of con to the patch size LP(p i, j, k ) is greater than the static factor ε, p i, j, k will be regarded as a static vertex. In a similar way, we can determine other static vertices to generate the final set G s . Accordingly, G d will be easily computed as Obviously, the performance of the vertex partition algorithm depends on three key parameters R LP , δ, and ε. Here, δ will influence the decision of whether two pixels are similar and ε is used to adjust the probability that p i, j, k is divided into G s . Section 3.1 discusses the choice of these parameters. Algorithm 1 shows the detailed implementation of the proposed vertex partition algorithm.  (14) end for (15) end for (16)

Results
In this section, we will first discuss the selection of several key parameters in the FMIND method, and then use the method based on the anatomical landmarks selected by doctors to compare registration accuracy and efficiency of the proposed FMIND method with those of the entropy images based SSD (ESSD) [17], MIND [18], SSC [19], and HLCSO [26] methods. For the appreciation of registration performance, we have used four datasets with synthetic deformation, including simulated 3D MR images in BrainWeb database [34], 3D CT and MR images in NA-MIC Obviously, the performance of the vertex partition algorithm depends on three key parameters R LP , δ, and ε. Here, δ will influence the decision of whether two pixels are similar and ε is used to adjust the probability that p i, j, k is divided into G s . Section 3.1 discusses the choice of these parameters. Algorithm 1 shows the detailed implementation of the proposed vertex partition algorithm.

Results
In this section, we will first discuss the selection of several key parameters in the FMIND method, and then use the method based on the anatomical landmarks selected by doctors to compare registration accuracy and efficiency of the proposed FMIND method with those of the entropy images based SSD (ESSD) [17], MIND [18], SSC [19], and HLCSO [26] methods. For the appreciation of registration performance, we have used four datasets with synthetic deformation, including simulated 3D MR images in BrainWeb database [34], 3D CT and MR images in NA-MIC database [35], 3D CT and PET images in NA-MIC database [36], and real 3D MR images from Retrospective Image Registration Evaluation project [37]. Besides, we have used the real MR and US images with unknown real deformation in the Brain Images of Tumors for Evaluation (BITE) database [38] available at [39] to appreciate the practicality of the proposed method. Here, the implementation efficiency of the evaluated registration methods is appreciated by their running time. For all evaluated methods, they are implemented on the personal computer with 2.40 GHz CPU and 4 GB RAM while using the mixed programming of Matlab and C++.
In the case of synthetic deformation, the registration accuracy is appreciated by the target registration error (TRE) [40], defined as: where T L is the synthetic deformation (i.e., the ground truth generated by using a linear combination of radial basis functions), T D is the deformation that is estimated by the registration methods, and N denotes the number of landmarks selected manually based on doctors' advice from the reference images. For each pair of reference and float images, different synthetic deformations will be applied to the float image for 25 times and we will manually select 90 (N = 90) landmarks from each 3D reference image to compute the TRE. The mean of TREs values for registering 25 deformed images will be used to appreciate the registration accuracy. Figure 4 gives an example of chosen landmarks in one slice of simulated 3D PD weighted image and real 3D T1 weighted image for MR image registration, 3D CT image for CT-MR image registration and 3D CT image for CT-PET image registration. database [35], 3D CT and PET images in NA-MIC database [36], and real 3D MR images from Retrospective Image Registration Evaluation project [37]. Besides, we have used the real MR and US images with unknown real deformation in the Brain Images of Tumors for Evaluation (BITE) database [38] available at [39] to appreciate the practicality of the proposed method. Here, the implementation efficiency of the evaluated registration methods is appreciated by their running time. For all evaluated methods, they are implemented on the personal computer with 2.40 GHz CPU and 4 GB RAM while using the mixed programming of Matlab and C++.
In the case of synthetic deformation, the registration accuracy is appreciated by the target registration error (TRE) [40], defined as: (11) where T L is the synthetic deformation (i.e., the ground truth generated by using a linear combination of radial basis functions), T D is the deformation that is estimated by the registration methods, and N denotes the number of landmarks selected manually based on doctors' advice from the reference images. For each pair of reference and float images, different synthetic deformations will be applied to the float image for 25 times and we will manually select 90 (N = 90) landmarks from each 3D reference image to compute the TRE. The mean of TREs values for registering 25 deformed images will be used to appreciate the registration accuracy. Figure 4 gives an example of chosen landmarks in one slice of simulated 3D PD weighted image and real 3D T1 weighted image for MR image registration, 3D CT image for CT-MR image registration and 3D CT image for CT-PET image registration.

Parameter Setting
In the FMIND method, we will fix θ = 0, S to be a 5 × 5 foveated patch in Equation (3) and λ = 0.01 in Equation (8). The remaining parameters include the kernel elongation parameter ρ, the image patch radius RLP, the similarity threshold δ, and the static factor ε. We will conduct experiments on three pairs of simulated MR images (T2-T1, PD-T2, and PD-T1) from BrainWeb database in order to effectively determine these parameters, where the former and the latter will be

Parameter Setting
In the FMIND method, we will fix θ = 0, S to be a 5 × 5 foveated patch in Equation (3) and λ = 0.01 in Equation (8). The remaining parameters include the kernel elongation parameter ρ, the image patch radius R LP , the similarity threshold δ, and the static factor ε. We will conduct experiments on three pairs of simulated MR images (T2-T1, PD-T2, and PD-T1) from BrainWeb database in order to effectively determine these parameters, where the former and the latter will be used as the reference and float images, respectively. These simulated MR images are realistic MRI data volumes that are produced by an MRI simulator while using three sequences (T1, T2, and PD weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity.  Figure 5 shows the TRE values of the FMIND method using different ρ values. For the purpose of evaluating the influence of ρ on registration accuracy, we have set the significant level α = 0.05 in one-way Analysis of Variance (ANOVA) [41]. The obtained significance value P is 0.001, which means that ρ has a significant impact on registration accuracy of the FMIND method. From Figure 5, we can see that the TRE achieves the minimum value when ρ = 2. Thus, we have fixed ρ = 2 in the proposed method.
Sensors 2019, 19, x FOR PEER REVIEW 9 of 18 used as the reference and float images, respectively. These simulated MR images are realistic MRI data volumes that are produced by an MRI simulator while using three sequences (T1, T2, and PD weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity.
3.1.1. The Kernel Elongation Parameter ρ Figure 5 shows the TRE values of the FMIND method using different ρ values. For the purpose of evaluating the influence of ρ on registration accuracy, we have set the significant level α = 0.05 in one-way Analysis of Variance (ANOVA) [41]. The obtained significance value P is 0.001, which means that ρ has a significant impact on registration accuracy of the FMIND method. From Figure 5, we can see that the TRE achieves the minimum value when ρ = 2. Thus, we have fixed ρ = 2 in the proposed method.  Figure 6 shows the TRE values and computational time of the FMIND method while using the different R LP values. Likewise, one-way ANOVA with α = 0.05 is used to evaluate the influence of R LP on the registration results. The obtained significance value P is 0.001 for registration accuracy and P is 0.007 for registration efficiency. Therefore, R LP has the significant impact on both registration accuracy and efficiency. It can be seen from Figure 6a that the TRE significantly declines when R LP varies from 2 to 7, and it tends to be stable when R LP varies from 7 to 10. Besides, Figure 6b indicates that the registration time gradually increases for the increasing R LP . The reason is that, for a larger R LP , the more pixels need to be processed in the vertex partition algorithm. Therefore, we have set R LP as 7 to achieve the trade-off between registration accuracy and efficiency.   Figure 6 shows the TRE values and computational time of the FMIND method while using the different R LP values. Likewise, one-way ANOVA with α = 0.05 is used to evaluate the influence of R LP on the registration results. The obtained significance value P is 0.001 for registration accuracy and P is 0.007 for registration efficiency. Therefore, R LP has the significant impact on both registration accuracy and efficiency. It can be seen from Figure 6a that the TRE significantly declines when R LP varies from 2 to 7, and it tends to be stable when R LP varies from 7 to 10. Besides, Figure 6b indicates that the registration time gradually increases for the increasing R LP . The reason is that, for a larger R LP , the more pixels need to be processed in the vertex partition algorithm. Therefore, we have set R LP as 7 to achieve the trade-off between registration accuracy and efficiency. used as the reference and float images, respectively. These simulated MR images are realistic MRI data volumes that are produced by an MRI simulator while using three sequences (T1, T2, and PD weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity.

The Image Patch Radius R LP
3.1.1. The Kernel Elongation Parameter ρ Figure 5 shows the TRE values of the FMIND method using different ρ values. For the purpose of evaluating the influence of ρ on registration accuracy, we have set the significant level α = 0.05 in one-way Analysis of Variance (ANOVA) [41]. The obtained significance value P is 0.001, which means that ρ has a significant impact on registration accuracy of the FMIND method. From Figure 5, we can see that the TRE achieves the minimum value when ρ = 2. Thus, we have fixed ρ = 2 in the proposed method.  Figure 6 shows the TRE values and computational time of the FMIND method while using the different R LP values. Likewise, one-way ANOVA with α = 0.05 is used to evaluate the influence of R LP on the registration results. The obtained significance value P is 0.001 for registration accuracy and P is 0.007 for registration efficiency. Therefore, R LP has the significant impact on both registration accuracy and efficiency. It can be seen from Figure 6a that the TRE significantly declines when R LP varies from 2 to 7, and it tends to be stable when R LP varies from 7 to 10. Besides, Figure 6b indicates that the registration time gradually increases for the increasing R LP . The reason is that, for a larger R LP , the more pixels need to be processed in the vertex partition algorithm. Therefore, we have set R LP as 7 to achieve the trade-off between registration accuracy and efficiency.   Figure 7 shows the effect of δ. For one-way ANOVA with α = 0.05, the obtained P values are 0.001 for both registration accuracy and efficiency, which means the significant impact of δ on the registration performance of the FMIND method. From Figure 7a, we can see that the TRE significantly declines when δ varies from 0.3 to 0.8. The reason is that for a larger δ in this range, fewer control vertices are divided into the static ones and the number of dynamic control vertices increases, thereby resulting in a smaller TRE. Meanwhile, the TRE tends to be stable when δ varies from 0.8 to 1.0. The reason is that, for the threshold δ in this range, the number of dynamic control vertices will increase to a certain value, so that the variation of δ will have little effect on the TRE. Besides, Figure 7b indicates that the registration time significantly increases with the increasing δ. It is easy to understand that, for a larger δ, the increasing dynamic control vertices will lead to more processing time. Therefore, we set δ as 0.8 to balance the registration accuracy and efficiency.  Figure 7 shows the effect of δ. For one-way ANOVA with α = 0.05, the obtained P values are 0.001 for both registration accuracy and efficiency, which means the significant impact of δ on the registration performance of the FMIND method. From Figure 7a, we can see that the TRE significantly declines when δ varies from 0.3 to 0.8. The reason is that for a larger δ in this range, fewer control vertices are divided into the static ones and the number of dynamic control vertices increases, thereby resulting in a smaller TRE. Meanwhile, the TRE tends to be stable when δ varies from 0.8 to 1.0. The reason is that, for the threshold δ in this range, the number of dynamic control vertices will increase to a certain value, so that the variation of δ will have little effect on the TRE. Besides, Figure 7b indicates that the registration time significantly increases with the increasing δ. It is easy to understand that, for a larger , the increasing dynamic control vertices will lead to more processing time. Therefore, we set as 0.8 to balance the registration accuracy and efficiency.  Figure 8 shows the effect of the static factor ε on registration accuracy and efficiency. As ε is also used to divide the control vertices, it has a similar effect on registration performance to the similarity threshold δ . According to Figure 8a,b, ε has the opposite effect on TRE and computational time. We have chosen ε = 0.9 for the FMIND method based on the comprehensive consideration of registration performance.   Figure 8 shows the effect of the static factor ε on registration accuracy and efficiency. As ε is also used to divide the control vertices, it has a similar effect on registration performance to the similarity threshold δ. According to Figure 8a,b, ε has the opposite effect on TRE and computational time. We have chosen ε = 0.9 for the FMIND method based on the comprehensive consideration of registration performance. ε Figure 7 shows the effect of δ. For one-way ANOVA with α = 0.05, the obtained P values are 0.001 for both registration accuracy and efficiency, which means the significant impact of δ on the registration performance of the FMIND method. From Figure 7a, we can see that the TRE significantly declines when δ varies from 0.3 to 0.8. The reason is that for a larger δ in this range, fewer control vertices are divided into the static ones and the number of dynamic control vertices increases, thereby resulting in a smaller TRE. Meanwhile, the TRE tends to be stable when δ varies from 0.8 to 1.0. The reason is that, for the threshold δ in this range, the number of dynamic control vertices will increase to a certain value, so that the variation of δ will have little effect on the TRE. Besides, Figure 7b indicates that the registration time significantly increases with the increasing δ. It is easy to understand that, for a larger , the increasing dynamic control vertices will lead to more processing time. Therefore, we set as 0.8 to balance the registration accuracy and efficiency.  Figure 8 shows the effect of the static factor ε on registration accuracy and efficiency. As ε is also used to divide the control vertices, it has a similar effect on registration performance to the similarity threshold δ . According to Figure 8a,b, ε has the opposite effect on TRE and computational time. We have chosen ε = 0.9 for the FMIND method based on the comprehensive consideration of registration performance.

Registration Results of Simulated T1, T2 and PD Images
In order to quantitatively and qualitatively compare the registration performance of the FMIND method and other methods on 3D T1, T2 and PD weighted MR images, we will test them on three pairs of simulated T2-T1, PD-T2, and PD-T1 images of size 256 × 256 × 32. For all evaluated methods, the mean and the standard deviation (std) of TRE values as well as the P values for the t-test with the significance level α = 0.05 are computed and are shown in Table 1. In Table 1, "/" means that no registration is implemented. It is shown that all of the P values are less than 0.002, which indicates that there exists significant difference between the FMIND method and any other compared method in terms of TRE. Specifically, as regards the registration of T2-T1 images, the mean and the standard deviation of TRE values for the MIND method are 2.2 voxel and 0.5 voxel, respectively. By comparison, the FMIND method has the lower mean (1.8 voxel) and standard deviation (0.2 voxel) of TRE values than the MIND method. This is mainly due to the advantage of the proposed FMIND in describing the structural information of multi-modal MR images over the MIND method.  Figure 9 visually shows the registration results of 3D PD-T1 images for all the evaluated methods. Here, it should be noted that the background regions in these images are removed and the same operation will be implemented for other experiments in the rest of this paper. The comparison among Figures 9f and 9c-e shows that the registration result of the FMIND method is more similar to the reference image that is shown in Figure 9a than those of the ESSD, MIND, and HLCSO methods. Especially for the tissue indicated by the red boxes in Figure 9, the FMIND can recover its deformation better than other evaluated methods.

Registration Results of Simulated T1, T2 and PD Images
In order to quantitatively and qualitatively compare the registration performance of the FMIND method and other methods on 3D T1, T2 and PD weighted MR images, we will test them on three pairs of simulated T2-T1, PD-T2, and PD-T1 images of size 256 × 256 × 32. For all evaluated methods, the mean and the standard deviation (std) of TRE values as well as the P values for the t-test with the significance level α = 0.05 are computed and are shown in Table 1. In Table 1, "/" means that no registration is implemented. It is shown that all of the P values are less than 0.002, which indicates that there exists significant difference between the FMIND method and any other compared method in terms of TRE. Specifically, as regards the registration of T2-T1 images, the mean and the standard deviation of TRE values for the MIND method are 2.2 voxel and 0.5 voxel, respectively. By comparison, the FMIND method has the lower mean (1.8 voxel) and standard deviation (0.2 voxel) of TRE values than the MIND method. This is mainly due to the advantage of the proposed FMIND in describing the structural information of multi-modal MR images over the MIND method. Figure 9 visually shows the registration results of 3D PD-T1 images for all the evaluated methods. Here, it should be noted that the background regions in these images are removed and the same operation will be implemented for other experiments in the rest of this paper. The comparison among Figure 9f and Figure 9c-e shows that the registration result of the FMIND method is more similar to the reference image that is shown in Figure 9a than those of the ESSD, MIND, and HLCSO methods. Especially for the tissue indicated by the red boxes in Figure 9, the FMIND can recover its deformation better than other evaluated methods.   Table 2 lists the implementation time of all evaluated methods on 3D T1, T2, and PD weighted MR images. It can be observed that the FMIND method, on average, takes approximately 44 min to produce the registration results, and it has the highest computational efficiency among all of the methods. The reason lies in that the FMIND method can generally reduce the number of deformation field variables by utilizing the FMIND based spatial constraint for MRF optimization.  Table 2 lists the implementation time of all evaluated methods on 3D T1, T2, and PD weighted MR images. It can be observed that the FMIND method, on average, takes approximately 44 min to produce the registration results, and it has the highest computational efficiency among all of the methods. The reason lies in that the FMIND method can generally reduce the number of deformation field variables by utilizing the FMIND based spatial constraint for MRF optimization.

Registration Results of CT and MR Images
To appreciate registration performance of all evaluated methods operating on CT and MR images from NA-MIC database, these methods are implemented to correct the synthetic deformation that is applied to the float image, where the CT and MR images will be used as the reference and float images, respectively. Here, the liver CT and MR images of size 256 × 256 × 32 are intra-operatively and pre-operatively acquired, respectively. Due to strong differences in image contrast between CT and MR images, their registration is difficult. Table 3 lists the TRE and P values of t-test for the FMIND method and other methods. As you can see, among all the compared methods, the FMIND method has the highest registration accuracy by providing the lower TRE than other methods. Meanwhile, all the P values are less than 0.003, which indicates the significant difference between the FMIND method and any other method in terms of TRE.  Figure 10 shows the registration results of 3D CT-MR images for all the evaluated methods. As shown in Figure 10c,d, the ESSD and MIND method cannot effectively correct the deformation that is involved in the MR image. The FMIND method can produce a more similar registration result to the reference image that is shown in Figure 10a than the ESSD and MIND methods. When compared with the most competitive HLCSO method, the proposed method performs better in that it can correct the deformation of some tissues more effectively, as indicated by the three red boxes that are shown in Figure 10e,f. shown in Figure 10c,d, the ESSD and MIND method cannot effectively correct the deformation that is involved in the MR image. The FMIND method can produce a more similar registration result to the reference image that is shown in Figure 10a than the ESSD and MIND methods. When compared with the most competitive HLCSO method, the proposed method performs better in that it can correct the deformation of some tissues more effectively, as indicated by the three red boxes that are shown in Figure 10e,f.  Table 4 lists the implementation time of all the evaluated methods. The comparison indicates the advantage of the FMIND method in computational efficiency. Here, it should be noted that the implementation time for all evaluated registration methods in Table 4 is very similar to that in Table 2, because the used CT and MR images have the same size (256 × 256 × 32) to T1, T2, and PD images.  Table 4 lists the implementation time of all the evaluated methods. The comparison indicates the advantage of the FMIND method in computational efficiency. Here, it should be noted that the implementation time for all evaluated registration methods in Table 4 is very similar to that in Table 2, because the used CT and MR images have the same size (256 × 256 × 32) to T1, T2, and PD images.

Registration Results of CT and PET Images
The 3D whole body CT-PET images from NA-MIC database are also used to demonstrate the advantage of the FMIND method. Here, the CT and PET images of size 168 × 168 × 149 are the reference image and the float image, respectively. It is difficult to realize accurate registration of CT images and blurry PET images of low resolution. Figure 11 shows the registration results of the 3D CT-PET images for the ESSD, MIND and HLCSO, and FMIND methods. It can be observed that the ESSD and MIND methods cannot correct the deformation in the regions that are marked with the red boxes in Figure 11c,d well. By comparison, the registration results of the HLCSO and FMIND methods are more similar to the reference image shown in Figure 11a than the ESSD and MIND methods.
The 3D whole body CT-PET images from NA-MIC database are also used to demonstrate the advantage of the FMIND method. Here, the CT and PET images of size 168 × 168 × 149 are the reference image and the float image, respectively. It is difficult to realize accurate registration of CT images and blurry PET images of low resolution. Figure 11 shows the registration results of the 3D CT-PET images for the ESSD, MIND and HLCSO, and FMIND methods. It can be observed that the ESSD and MIND methods cannot correct the deformation in the regions that are marked with the red boxes in Figure 11c,d well. By comparison, the registration results of the HLCSO and FMIND methods are more similar to the reference image shown in Figure 11a than the ESSD and MIND methods.   Table 5 lists the mean and standard deviation of TRE for all evaluated methods operating on 3D CT and PET images. The comparison of TRE values shows that the HLCSO method provides the minimum mean (2.6 voxel) and standard deviation (0.7 voxel) of TRE among all the compared methods. However, the mean (2.8 voxel) and standard deviation (0.9 voxel) of TRE for the FMIND method are lower than those for the ESSD and MIND methods. The reason can be explained in this way. For the PET image, its contrast and resolution are poor and the edge features are not obvious. Therefore, the FMIND method is slightly inferior to the HLCSO method in the registration of 3D CT-PET images. However, the proposed FMIND method can still provide better structural representation results than the ESSD and MIND methods, thereby leading to its improved registration accuracy than the latter.  Table 6 lists the calculation time of the various methods operating on CT and PET images. It can be seen from Table 6 that, as compared with other methods, the calculation time of the FMIND method is significantly reduced because the spatial constraint based on the FMIND, to a certain extent, helps to reduce the number of variables that are required by the deformation model. Especially, when compared with the HLCSO method, although the FMIND method has slightly lower registration accuracy, its computational efficiency is more than two times higher. Besides, as compared with the calculation time listed in Table 2, more calculation time will be involved in the registration of CT-PET images because their size (168 × 168 × 149) is bigger than that of T1, T2, and PD images.

Registration Results of Real MR Images
The MR images from RIRE database are chosen to verify the superiority of the FMIND method in registering the real MR images, where the T1 and PD weighted MR images are used as the reference and float images, respectively. These MR images were acquired while using a Siemens SP Tesla scanner, among which the T1 and PD image volumes were obtained with an echo time of 15 ms and 20 ms, respectively [42].
Here, we will only compare the proposed method with the MIND and SSC methods, which are most similar to our method. Table 7 lists the TRE values of the three methods. Clearly, the SSC method generally provides slightly smaller TRE values than the MIND methods. The two methods are outperformed by the FMIND method in terms of registration accuracy. The comparison of TRE values indeed demonstrates the effectiveness and advantage of the FMIND method in correcting the deformation of real MR images.

Registration Results of Real MR and US Images
We will use the pre-operative T1 weighted MR and intra-operative post-resection US images of 13 patients [43] from BITE database for registration performance appreciation to further demonstrate the practicality of the FMIND method. In [43], the MR images were obtained a few days before the surgery while the post-resection 2D US images were acquired while using Philips HDI 5000 ultrasound machine with a P7-4 MHz phased array transducer and they were reconstructed into ultrasound volume with a voxel size of 1 mm. The used MR data contain the tumor, which is replaced by the resection cavity, and thus will not exist in the post-resection US images. Therefore, to register 3D MR to 3D US images is highly challenging. For each patient, 15 landmarks in average selected in [43] are used for TRE evaluation. Table 8 lists the TRE values of the MIND, SSC, and FMIND methods. Clearly, the SSC method provides smaller TRE values than the MIND method. The FMIND method also performs better than the MIND method, in that the introduction of foveated nonlocal self-similarity ensures more effective structural representations of MR and US images. Please note that the proposed method cannot significantly outperform the SSC method for registration of US-MR images due to the disadvantageous influence of speckle noise that is inherent in US images.

Conclusions
In this paper, we have proposed a novel non-rigid multi-modal 3D medical image registration method that is based on the foveated independent neighborhood descriptor. The advantages of the proposed method lie in two aspects. Firstly, the proposed FMIND can effectively capture the structural feature information of 3D medical images, thereby providing better structural representations than the existing approaches. Secondly, the FMIND based spatial constraint method can help to reduce the number of non-rigid transformation parameters because the FMIND contains the corresponding relationship of the local spatial information at the same pixel in the reference and float images, thereby providing an effective means for solving the high-dimensional optimization problem that is involved in the medical image registration. Experiments on 3D ultrasound, CT, PET, T1, T2, and PD weighted MR images demonstrate that our method can provide higher computational efficiency and higher registration accuracy as compared with the HLCSO, ESSD, MIND and SSC methods, except that its TRE is slightly bigger than that of the HLCSO for CT-PET image registration. Future work will be focused on the acceleration of the method without compromising registration accuracy by using sparse data sampling and parallel data processing strategies to facilitate its clinical applications.