A Parallel Image Registration Algorithm Based on a Lattice Boltzmann Model

: Image registration is a key pre-procedure for high level image processing. However, taking into consideration the complexity and accuracy of the algorithm, the image registration algorithm always has high time complexity. To speed up the registration algorithm, parallel computation is a relevant strategy. Parallelizing the algorithm by implementing Lattice Boltzmann method (LBM) seems a good candidate. In consequence, this paper proposes a novel parallel LBM based model (LB model) for image registration. The main idea of our method consists in simulating the convection di ﬀ usion equation through a LB model with an ad hoc collision term. By applying our method on computed tomography angiography images (CTA images), Magnet Resonance images (MR images), natural scene image and artiﬁcial images, our model proves to be faster than classical methods and achieves accurate registration. In the continuity of 2D image registration model, the LB model is extended to 3D volume registration providing excellent results in domain such as medical imaging. Our method can run on massively parallel architectures, ranging from embedded ﬁeld programmable gate arrays (FPGAs) and digital signal processors (DSPs) up to graphics processing units (GPUs).


Introduction
Lattice Boltzmann (LB) method was first introduced as a powerful numerical tool for solving partial differential equations (PDE) in computational fluid dynamics (CFD) [1,2]. The biggest difference between the LB method and the traditional numerical methods is due to the fact that the LB method is based on the microscopic description of physical systems. By simulating the microscopic behavior of physical systems, the macroscopic equation of the LB method can match certain PDEs. The idea of the LB method is to transpose discrete dynamics for simulating the macroscopic model described by a PDE, using densities of particles colliding and streaming on a regular lattice [3]. The density of each lattice node follows the same simple evolution rules, in which only each node of the lattice and its neighbors are involved. Intrinsically adapted to parallel calculation, the LB method is a good candidate for massive parallel computation.
Some approaches can be used to design a parallel algorithm for image processing. One approach is to try to convert a sequential algorithm to a parallel algorithm. If a serial algorithm already exists for a certain image processing problem, then the inherent parallelism in that algorithm may be implemented in parallel. Inherent parallelism is parallelism that occurs naturally within an algorithm, not as a result of any special effort on the part of the algorithm or machine designer. For example, in [3], the three-dimensional data registration is improved by parallel programming for solving the simultaneous localization and mapping problem of the robot motion model. The main idea is to parallize the NNS (the nearest neighbor search) computation, which is the first step Information 2020, 11, 1 2 of 22 of three-dimensional data registration. In [4], the iterative digital image correlation is divided into three sequential stages. Each stage is implemented by parallel computing. It should be noted that exploiting inherent parallelism in a sequential algorithm might not always lead to an efficient parallel algorithm. Another approach is to design a totally new parallel algorithm that is more efficient than the existing one. For example, the operator splitting (OS) method [5,6] and the lattice Boltzmann method. The OS method decomposes high-dimensional PDEs into several low-dimensional ones. Therefore, the difficulty of solving PDEs is reduced. Both the OS and LB methods are widely used in parallel computation fields. However, compared with the OS method, the LB method also offers higher computational efficiency and stability besides the parallelism [7]. Because of these advantages, variant LB methods have been proposed as an alternative to the conventional numerical solvers of PDEs in CFD, e.g., diffusion equations, convection diffusion equations, reaction diffusion equations, etc. Inspired by the application of PDEs in image processing, LB models have been applied as a novel method to the field of PDE-based image processing [8] in the past decade, e.g., image segmentation [9], image denoising [10] and image enhancement [11] by elaborating specific LB models.
Image registration is a key procedure for many image processing tasks in which the information is extracted by mapping two individual images (the reference and template images) obtained at different times or by different devices. For example, as a critical medical image fusion preprocessing, multiple medical images should be aligned before they are fused together in order to gather different important information into single one image. The imaging process is always affected by many complex factors. This will cause non-rigid deformation between two images to be registered. Therefore, the registration method is required to have the ability to deal with non-rigid deformation, especially in the fields of medical imaging, remote sensing and so on, which require high registration accuracy. Among the large number of non-rigid image registration methods, the method based on PDEs has become one of the most popular registration methods because of its easy extensibility, intuitive construction and the ability to express continuous change. Over the last three decades, many PDE-based models of physics phenomena have been successfully applied in both digital image processing and computer vision domains. The main idea is to map the image processing problem to some physical phenomenon. For example, the optical flow model is used to recover image motion at each pixel from spatio-temporal image brightness variations. As a consequence, the image registration is achieved.
At present, various registration methods have found important applications in many areas such as target recognition and tracking [12,13], image mosaic [14], image-guided surgery, image-guided radiotherapy [15], and so on. Generally speaking, registration methods can be divided into two types: feature-based registration and intensity-based registration. Rather than the total information contained in the images, feature-based image registration focuses on image features such as the edges, corners [16], etc. Therefore, the algorithm can be significantly reduced from the computational complexity point of view. In recent decades, many feature-based registration algorithms have been proposed [17][18][19]. However, feature-based methods require sufficient features, which should be properly detected and matched, otherwise the registration accuracy will decrease. For the cases low on features, a good choice is to use intensity-based registration, which is formulated as a PDE. The intensity-based image registration method could be adopted to deal with nonlinear transformation; the full information of the images is used, and does not require feature detection and matching steps. However, the main problem is that the intensity-based method is time consuming.
To override the disadvantages in both the feature and intensity-based methods, an LB model is proposed for intensity-based image registration. The main idea of our method consists in simulating the convection diffusion equation by establishing an LB model. Based on the efficiency of the LB model in dealing with the PDE, our model outperforms the classical nonlinear transformation methods, such as Wangs demons [20], optical flow model [21] and B-spline model [22], in terms of MSE (mean square error), RE (relevant error) and NMI (normalized mutual information). Because of the natural parallelism of the LB model [23,24], our model is efficient and parallel. It has the potential to be efficiently executed on massively parallel architectures on embedded field programmable gate arrays (FPGAs) and digital signal processors (DSPs) as well as graphics processing units (GPUs).
To explain our theoretical approach, this paper is structured as follows. Section 2 introduces our method including the basic theory of the LB model and how the two-dimensional (2D) and three-dimensional (3D) LB models for image registration are constructed step by step. Then, the experiments and discussion are presented in Sections 3 and 4, and finally we reach the conclusion in Section 5.

Basic Theory of the Lattice Boltzmann Model
The basic idea of the LB model is the construction of simplified discrete dynamics for simulating the macroscopic model, which is described by partial differential equations (PDEs) using densities of particles moving on a regular lattice. On the lattice, each node follows the same parallel evolution rules. The general lattice Boltzmann modeling consists in two steps: a translation step, during which particles move from node to node on a lattice, and a collision step, during which particles are redistributed at each node. The two steps can be represented by the discrete lattice Boltzmann equation (LBE). In this paper, a D2Q9 mode is used, where "D2" stands for "2 dimensions", while "Q9" stands for "9 speeds". The discrete LBE is given by [1]: where i = 0, 1, 2, . . . , 8 and f i r + σ h e i , t + σ t is the particle density (also named density distribution function) on node r + σ h e i , at time t + σ t with time step σ t and designating lattice spacing σ h in direction e i : where v is the velocity of the particles. The right-hand side of Equation (1) represents the collision term, and the constant quantity τ is the single relaxation time, which controls the rate of approach to equilibrium. The equilibrium distribution functions f (0) i r , t in direction e i is only determined by the total density ρ which is defined in terms of particle distribution functions: and f i must be positive, and c is the regulator for the diffusion speed. Similar to [25], the macroscopic equation can be derived as follows.

of 22
Equation (6) is an anisotropic diffusion equation, also called Perona-Malik diffusion equation, which has an important application to the image denoising while preserving the image edges. In Section 2.2, a more general LB model will be developed in detail.

LB-Based 2D Image Registration Model
Suppose a reference image R and a template image T are given, the aim of image registration is to search a locale or global nonlinear optimal transformation F which satisfies F(T) = R. This means the transformed template T should match the reference R. In other words, we should find the displacement vector (X, Y) which must satisfy T(x − X, y − Y) = R(x, y) or T(x − X, y − Y) ≈ R(x, y). This problem can be expressed by the so-called sum of squared differences (SSD): where Ω is the image spatial domain. Thus, the problem changes to find the optimal (X, Y) to minimize energy E: Equation (8) can be resolved by the variational method [26]. We can obtain the Euler-Lagrange equation of the variational problem (8) as follows [27]: However, Equation (9) is ill-posed because of the none-unique transformation from T onto R. To make it well-posed, the regular term is always added to Equation (8), for example, a diffusion term: The first term of the right-hand side of the two equations in (10) is the regular term, while the second term is the SSD term, called image force term. Equation (10) has been proposed by Fisher et al. [27] for curvature-based image registration. In view of the PDE, Equation (10) comes to be a kind of convection diffusion equation which can be parallelly simulated by the LB method [28], but we need to modify the discrete LBE defined in Section 2.1 as follows: and Information 2020, 11, 1 (13) and f y Without loss of generality, Equations (11) and (12) can be simplified as: Note that ε is a small value. α is a positive one, and f (0) i is defined by Equations (4) and (5).
Expanding the left side of Equation (15) to the Taylor series at r , t , we have: Introducing the Chapman-Enskog expansion [25] into the LB model, we have: According to Equation (17), we obtain Considering formula (5), we conclude that Substituting (17) into (16) and comparing the coefficients of ε and ε 2 on both sides, the following equations are obtained: Equation (20) can be rewritten as Summing up both sides of Equation (22) from i = 0 to 8 leads to Information 2020, 11, 1 6 of 22 Summing up both sides of Equation (21) from i = 0 to 8, we have Substituting (22) into Equation (24) leads to Thus, according to Equations (23), (25) and (17), we can obtain We have Similarly, we can obtain where ε and σ t are small values and might as well assume σ t = ε. Therefore, Equations (28) and (29) can be rewritten as Overall, Equations (15), (4) and (5) describe a simple LB model for computing each component of the displacement. In the 2D LB-based registration model, we need two such simple LB models to solve Equations (28) and (29), respectively.

LB-Based 3D Volume Registration
Our original model can easily be easily extended to a three-dimensional model for performing volume registration. A D3Q15 model is chosen and used in our study.
In the three-dimensional model, the displacement is a vector with three components (X, Y, Z). Similar to the problem of 2D registration, three simple LB models are used to compute the displacement. According to the simplified LBE defined by Equation (15), the 3D LB model can be written as follows.
Information 2020, 11, 1 Considering the particles have 15 streaming directions in the three-dimensional model (as shown in Figure 1), we define the equilibrium distribution functions as: Directions e i (as shown in Figure 1) are defined as: As in Section 2.2, we obtain the macroscopic equations of the 3D LB model: Information 2020, 11, 1

Algorithm
According to Equations (4) and (10) to (14), the LB model can be applied to the image registration operation and achieves the same results as the curvature-based registration method proposed by Fisher et al. Take 3D registration for instance; the algorithm consists of the iterative following steps: (1) Initialize the values of X, Y and Z as zero; (2) Particles collision and streaming according to Equations (31) and (32); (3) Update X, Y and Z according to Equation (34); (4) Update equilibrium distribution functions according to Equation (33); (5) Go to step (2).

Experimental Results
To investigate the performance of our model, the registrations of 2D images and 3D volume data are shown in Figure 2. The 2D image registration testing was carried out on four types of test images: the registration of artificial images, computed tomography angiography (CTA) images, magnetic resonance (MR) images and video frames. The test images are given in Figure 2. The left column is the reference images and the right column, the template images. To measure the performance of the LB registration model, our method is compared with Wangs demons, B-spline registration model and optical flow registration model. In 3D volume registration, the MR volume data are random distorted as template data. In the experiments, the image/volume intensities are normalized between 0 and 1.
All the experiments are implemented on a PC with Intel(R) Core(TM) i7-4702MQ CPU, 8G RAM, with MATLAB 2016.

2D Image Registration
Parameter settings are presented in Table 1. Concerning the stability of the LB model, the relaxation factor τ should be larger than 0.5 because the diffusion coefficient should be positive, otherwise the diffusion function will be ill-posed. In the LB model, the lattice nodes correspond to the image pixels. Concerning the image coordinates system in which the pixel indices are integer values ranging from 1 to the length of the row or column, the grid spacing of the lattice should be 1. This means σ h = 1.
Parameter v and σ t should satisfy σ h = vσ t , otherwise the particles will stream into the place where the coordinate is not the integer in the next step. Therefore, we need to use interpolation technology to increase grid density. On one hand, the increased grid density can improve the computation accuracy, but on the other hand, the computational complexity is also increased. According to Equations (29) and (30), both parameter c and σ t contribute to the diffusion coefficients. However, the influence of parameter σ t on the diffusion coefficients can be replaced by adjusting parameter c which is used as a regulator of the diffusion coefficient. For simplicity, parameters v, σ t and σ h are set to be 1, while τ, c and α are chosen empirically. In the experiments, the values of τ, σ t , σ h , α and c are, respectively, 1, 1, 1, 2.5, and 1.5. First, we subjectively compared the LB model with three popular methods as mentioned in Section 3.2 by exhibiting the errors D = T(x − X, y − Y) − R , and the displacement grid.
In experiment 1, a pair of synthetic checkboard images is chosen as test images (Figure 2a). The template image (right column of Figure 2a) is a distorted sine wave.
In experiment 2, the test images (Figure 2b) are extracted from a computed tomography angiography (CTA) image sequence. Note that the region of high gray-level displays high-intensity major vessels, which are a result of a weakened blood vessel wall.
In experiment 3, we register two MR images (Figure 2c). The template image (right column of Figure 2c) is generated by using a random distortion [29] on the reference image (left column of Figure 2c) acquired from the BrainWeb MRI Simulated Normal Brain Database [30].
In experiment 4, we register two video frames for tracking the moving objects (dragon dancer) recorded in our Chinese spring festival celebration.
The registration results of the four experiments are presented in Figure 3, in which the left column is the registration results, the middle column is the errors D, and the right column is the displacement grid. Parameter settings are presented in Table 1. Concerning the stability of the LB model, the relaxation factor τ should be larger than 0.5 because the diffusion coefficient should be positive, otherwise the diffusion function will be ill-posed. In the LB model, the lattice nodes correspond to the image pixels. Concerning the image coordinates system in which the pixel indices are integer values ranging from 1 to the length of the row or column, the grid spacing of the lattice should be 1.

This means
Parameters v and t σ should satisfy h t v σ σ = , otherwise the particles will stream into the place where the coordinate is not the integer in the next step. Therefore, we need to use interpolation technology to increase grid density. On one hand, the increased grid density can improve the computation accuracy, but on the other hand, the computational complexity is also increased. According to Equations (29) and (30), both parameters c and t σ contribute to the diffusion coefficients. However, the influence of parameter t σ on the diffusion coefficients can be replaced by adjusting parameter c , which is used as a regulator of the diffusion coefficient. For simplicity, parameters v , t σ and h σ are set to be 1, while τ , c and α are chosen empirically.
In the experiments, the values of τ , t σ , h σ , α and c are, respectively, 1, 1, 1, 2.5, and 1.5.  In experiment 1, a pair of synthetic checkboard images is chosen as test images (Figure 2a). The template image (right column of Figure 2a) is a distorted sine wave.
In experiment 2, the test images (Figure 2b) are extracted from a computed tomography angiography (CTA) image sequence. Note that the region of high gray-level displays high-intensity major vessels, which are a result of a weakened blood vessel wall.
In experiment 3, we register two MR images (Figure 2c). The template image (right column of Figure 2c) is generated by using a random distortion [29] on the reference image (left column of Figure  2c) acquired from the BrainWeb MRI Simulated Normal Brain Database [30].
In experiment 4, we register two video frames for tracking the moving objects (dragon dancer) recorded in our Chinese spring festival celebration.
The registration results of the four experiments are presented in Figure 3, in which the left column is the registration results, the middle column is the errors, D , and the right column is the displacement grid.
(a) Experiment 1: image registration of synthetic checkboard images by the LB model.
where Considering the difference between the registration results and reference images, the LB model produces smaller errors D. Notice that the evaluations in Figure 3 are mainly based on the subjective experience of the observer. For objective evaluation, four quantitative metrics can be used to evaluate the performance of the proposed model, namely, mean square error (MSE), relative error (RE), normalized mutual information (NMI) and execution time per iteration (ETPI). MSE is a popular metric to measure the absolute differences between two images. As a measure of precision, RE is the ratio of the errors T(x − X, y − Y) − R before and after image registration. NMI evaluates the correlation between T(x − X, y − Y) and R in information theory. Higher similarity between two images implies smaller MSE and RE, and greater NMI. The formulas are as follows: where In Equation (37),T is the deformed version of template T. In Equation (39), H(R) is the information entropy of image R, while H R,T is the joint information entropy.
The experimental results are shown in Table 2. In Table 2, from the point view of registration accuracy, the LB model gains the smallest MSE and RE and the greatest NMI in experiments 1, 3 and 4. In experiment 2, there is only a tiny gap between the LB model and the optical flow registration model, which has the best registration accuracy. In all the four experiments, the LB model obtains the smallest index values of time cost per iteration. This implies that the LB model is relatively faster than the other three methods. Figure 4 amplifies the displacement (X, Y) given by the LB model in experiment 2.
Information 2019, 10 FOR PEER REVIEW 16  In Table 2, from the point view of registration accuracy, the LB model gains the smallest MSE and RE and the greatest NMI in experiments 1, 3 and 4. In experiment 2, there is only a tiny gap between the LB model and the optical flow registration model, which has the best registration accuracy. In all the four experiments, the LB model obtains the smallest index values of time cost per iteration. This implies that the LB model is relatively faster than the other three methods. Note that the region of high gray-level displays high-intensity major vessels, which are a result of a weakened blood vessel wall. The displacements around the vessel tends to shrink so that the template image T can be aligned with the reference. As shown in Figure 5a-c, the convergence of the LB model is illustrated by indices MSE, RE and NMI, which approach constants with the iterations.  Note that the region of high gray-level displays high-intensity major vessels, which are a result of a weakened blood vessel wall. The displacements around the vessel tends to shrink so that the template image T can be aligned with the reference. As shown in Figure 5a-c, the convergence of the LB model is illustrated by indices MSE, RE and NMI, which approach constants with the iterations.

3D Volume Registration
To test the performance of our LB model for 3D volume registration, we used the BrainWeb MRI Simulated Normal Brain Database [30]. A random distortion was added to the reference data (

3D Volume Registration
To test the performance of our LB model for 3D volume registration, we used the BrainWeb MRI Simulated Normal Brain Database [30]. A random distortion was added to the reference data (181 × 217 × 181) with a maximum of 30 voxels of deformation, as shown in Figure 6a. Figure 6c shows the template data after registration, and Figure 6b, the reference data. Figure 7 gives the slices of original MR volume data, the distorted MR volume data and the registered results. The experiment takes 5132 seconds for 500 iterations. We can notice that there are many 3D LB models besidesD3Q15, e.g., D3Q7, D3Q19 and D3Q27. An LB model with more velocities can achieve higher calculation accuracy but is more time consuming. In the experiment, we used D3Q15 for considering the compromise between the calculation accuracy and time consumption.
Information 2019, 10 FOR PEER REVIEW 18 shows the template data after registration, and Figure 6b, the reference data. Figure 7 gives the slices of original MR volume data, the distorted MR volume data and the registered results. The experiment takes 5132 seconds for 500 iterations. We can notice that there are many 3D LB models besidesD3Q15, e.g., D3Q7, D3Q19 and D3Q27. An LB model with more velocities can achieve higher calculation accuracy but is more time consuming. In the experiment, we used D3Q15 for considering the compromise between the calculation accuracy and time consumption.

Discussion
Parameterτ should be larger than 0.5 so that the diffusion coefficient is constrained to being positive. Otherwise, the LB model will be unstable, because a diffusion equation with a negative diffusion coefficient is ill-posed. Even when the diffusion coefficient 1 2 τ − is positive but very close to 0 (or τ very close to 0.5), the LB model is also unstable. As shown in Figure 8, the registration comes to be unstable when τ approaches 0.503 from 0.505.

Discussion
Parameter τ should be larger than 0.5 so that the diffusion coefficient is constrained to being positive. Otherwise, the LB model will be unstable, because a diffusion equation with a negative diffusion coefficient is ill-posed. Even when the diffusion coefficient τ − 1 2 is positive but very close to 0 (or τ very close to 0.5), the LB model is also unstable. As shown in Figure 8, the registration comes to be unstable when τ approaches 0.503 from 0.505.

Discussion
Parameterτ should be larger than 0.5 so that the diffusion coefficient is constrained to being positive. Otherwise, the LB model will be unstable, because a diffusion equation with a negative diffusion coefficient is ill-posed. Even when the diffusion coefficient 1 2 τ − is positive but very close to 0 (or τ very close to 0.5), the LB model is also unstable. As shown in Figure 8, the registration comes to be unstable when τ approaches 0.503 from 0.505.  In this study,τ = 1 2 , this means that the gradients of X and Y should not be very large.
However, the regular term Div ∇X |∇X| and Div ∇Y |∇Y| determine the smoothness of X and Y respectively, while the weight 2σ h 3cσ t controls the contribution of the regular term in Equations (25) and (26). In other words, a higher value of 2σ h 3cσ t leads to smaller gradients of X and Y.
Parameter α and c substantially affect the performance of our model. Parameter α and 1 c denote the SSD and the regular terms weight, respectively. For higher registration accuracy, we hope for large α and small 1 c values. However, this will result in a decrease in the regularization effect. With regard to the stability, we need relatively smaller α and larger 1 c . Considering Equations (4) and (5), parameter c ∈ [1, +∞), the maximum of 1 c is 1. Parameter α ∈ (0, +∞) because α is positive. In Figure 9, we give the relative errors with different α and c. Large values of α and c can lead to the instability of the LB model, and a sharp increase in the RE value. Smaller values of α and c can ensure the stability, but the accuracy is not the best.

2
X ∇ large. However, the regular terms Equations (4) and (5), parameter because α is positive. In Figure 9, we give the relative errors with different α and c . Large values of α and c can lead to the instability of the LB model, and a sharp increase in the RE value. Smaller values of α and c can ensure the stability, but the accuracy is not the best.  In Figure 10, we present the relative errors RE in domain RE(α, c)|1 ≤ c ≤ 6 and 0 ≤ α ≤ 5 . We find that the larger parameter c and α, the smaller the relative errors, but with a poorer stability. In domain {c, α|c > 6 and α > 5}, the LB model is even unstable. Figure 11 illustrates an unstable example with c = 6.5 and α = 5.5. On the basis of the test, we suggest choosing (c, α) in domain A = {c, α|1 ≤ c ≤ 5.0 and 0 ≤ α ≤ 9.0}. Generally speaking, in domain A, if (c, α) is far away from (5.0, 9.0), the model is more stable, but less accurate, and vice versa.

Conclusions
Image registration is a key pre-procedure for high-level image processing. In this paper, an original parallel LB-based model for image registration is constructed. The main idea of the proposed method consists in simulating the convection diffusion equation by establishing an LB model. Experiments on CTA images and artificial images indicate that our model is much faster than classical methods and achieves accurate registration.
The stability is analyzed in the discussion, mainly focusing on the parameters c and α . High c and α values can help to improve the registration accuracy, but may lead to instability. In this paper, we suggest

Conclusions
Image registration is a key pre-procedure for high-level image processing. In this paper, an original parallel LB-based model for image registration is constructed. The main idea of the proposed method consists in simulating the convection diffusion equation by establishing an LB model. Experiments on CTA images and artificial images indicate that our model is much faster than classical methods and achieves accurate registration.
The stability is analyzed in the discussion, mainly focusing on the parameters c and α . High c and α values can help to improve the registration accuracy, but may lead to instability. In this paper, we suggest

Conclusions
Image registration is a key pre-procedure for high-level image processing. In this paper, an original parallel LB-based model for image registration is constructed. The main idea of the proposed method consists in simulating the convection diffusion equation by establishing an LB model. Experiments on CTA images and artificial images indicate that our model is much faster than classical methods and achieves accurate registration.
The stability is analyzed in the discussion, mainly focusing on the parameter c and α. High and α values can help to improve the registration accuracy, but may lead to instability. In this paper, we suggest c ∈ [1, 5.0] and α = [0, 9.0]. We also extend the LB model to 3D volume registration. Because of the high parallelism and high efficiency for programming and computation, the LB method is faster than the classical methods on a personal computer. Especially, our method can run on massively parallel architectures on large-scale systems such as embedded FPGAs, DSPs or GUPs.