Symmetric Diffeomorphic Image Registration with Multi-Label Segmentation Masks

: Image registration aims to align two images through a spatial transformation. It plays a signiﬁcant role in brain imaging analysis. In this research, we propose a symmetric diffeomorphic image registration model based on multi-label segmentation masks to solve the problems in brain MRI registration. We ﬁrst introduce the similarity metric of the multi-label masks to the energy function, which improves the alignment of the brain region boundaries and the robustness to the noise. Next, we establish the model on the diffeomorphism group through the relaxation method and the inverse consistent constraint. The algorithm is designed through the local linearization and least-squares method. We then give spatially adaptive parameters to coordinate the descent of the energy function in different regions. The results show that our approach, compared with the mainstream methods, has better accuracy and noise resistance, and the transformations are more smooth and more reasonable.


Introduction
Image registration plays a crucial role in biomedical imaging applications, especially brain imaging analysis. It aims to find a spatial transformation to align datasets across subjects, modalities, or times geometrically. A variety of imaging processing approaches require registration as a preprocessing step. For example, a considerable amount of structural or functional information can be obtained from the brain atlases established from images of a large population [1]. However, it is a great challenge to find potential links between the images because they are of different people, ages, or modalities. We can settle these images through image registration into a standard space where shapes and structures are well aligned. Many subsequent analyses, such as the analysis of anatomical and connectivity patterns, can be performed after image registration [2].
Image registration can be divided into linear and non-linear according to the representation of the transformation. Linear registration methods compute affine transformations. Nevertheless, linear registration generally fails to meet the demands of processing. The reason is that the physiological movements of human bodies may lead to the organs' unregulated changes in position, volume, and shape. Therefore, scholars focus their eyes on non-linear registration because affine transformations cannot describe these changes [3]. Non-linear registration methods are classified into two categories: model-driven and data-driven methods [4]. Model-driven means establishing the explicit expressions of optimization models and then obtaining transformations through optimization methods. In contrast, data-driven approaches do not require explicit expressions. Their main task is to compute the mappings from the image pairs to the transformations [5]. We focus on model-driven methods in our study. The reason is that data-driven methods are prone to overfitting due to the high dimensionality of medical images and the small number of training samples. Under model-driven settings, the transformations can be expressed as simple parametric functions, such as B-spline functions [6][7][8], radial basis functions [9][10][11], and thin-plate spline functions [12][13][14]. The registration models are then turned into parametric optimization models by doing this. The non-parametric treatment is also currently popular. Non-parametric methods consider the registration models as variational problems in which Euler-Lagrange equations should be solved. There are several common non-parametric approaches, such as large deformation diffeomorphic metric mapping (LDDMM) [15][16][17][18], elastic registration [19][20][21], fluid registration [22][23][24], and diffusion registration [25][26][27].
Non-parametric methods are more suitable than parametric methods for our research topic on brain magnetic resonance imaging (MRI) registration. We have known a lot of effective methods for brain MRI registration [28][29][30][31], which shows that non-parametric methods perform better than parametric methods. The main reason is that non-parametric methods set independent transformation functions at every pixel. Therefore, they have far higher degrees of freedom than parametric methods and can describe the complex deformation of structures in the brain more easily. Moreover, non-parametric methods have many acceleration techniques [32][33][34], which save the computational time greatly.
However, the existing methods for brain MRI registration have several disadvantages. These methods pay closer attention to the intensity-based local similarity of the images than to the boundary alignment of the brain regions. The complex features of the brain, such as the sulcus gyrus of the cerebral cortex [35], may pull the optimization into local minima. Furthermore, the terminal conditions of these models are based on the average level of the global energy function descent. It easily leads to the situation that some brain regions have not yet been aligned when the iteration stops. Additionally, there is noise in the image acquisition process. These models do not emphasize the robustness to the noise, which can affect the performance in practice.
In this study, we propose a symmetric diffeomorphic image registration method based on multi-label segmentation masks to compensate for the above shortcomings. Firstly, we introduce the similarity metric of the multi-label segmentation masks, i.e., the segmentation result of large regions, into the energy function, which strengthens the alignment of the region boundaries and improves the noise resistance. The acquisition of the masks is not difficult because today's deep learning-based segmentation models [36] offer significant improvements in accuracy and computation time compared to traditional segmentation methods [37][38][39]. Secondly, we give a selection of spatially adaptive parameters based on the masks. It can prioritize the optimization of both the image similarity metric in the aligned regions and the mask similarity metric in the unaligned regions, so it coordinates the decline of the energy function spatially. Thirdly, we design an effective approximation algorithm through model relaxation, least-squares method, etc. We validate the effectiveness of our method in three different experiments. The results show that our method has better accuracy and robustness compared to the mainstream approaches, and meanwhile, the deformation field retains excellent reversibility, smoothness, and reasonableness.

Mathematical Background Knowledge
Let F, M : Ω → [0, 1] denote the fixed image and the moving image, respectively. The domain of the images is Ω ⊂ R n , where n = 2 for 2D images and n = 3 for 3D images. The task of the non-linear image registration is to compute a deformation field (transformation) ϕ : Ω → Ω to make the warped moving image M • ϕ as similar as possible to the fixed image F. In the setting of non-parametric registration, the deformation field is expressed as ϕ = Id − u, where Id is the identity map and u is called the displacement field of ϕ.
The accuracy of the transformation is measured by a similarity metric E 1 , which often takes the form of the sum of squared difference (SSD) In addition, the transformation is required to have the presupposed properties, which are measured by a regularization term E 2 . The most common property is the global smoothness, i.e., L 2 regularization expressed as E 2 (ϕ) = ∇ϕ 2 L 2 . The registration problem is written as where 1/σ 2 is a positive number that controls the balance between E 1 and E 2 . We omit the notation L 2 for simplicity. The solution space of this problem can be restricted to the diffeomorphism group Diff(Ω) = ϕ|ϕ −1 exists and ϕ, ϕ −1 ∈ C ∞ (Ω, Ω) , which is a Lie group. Any tangent vector v ∈ V of ϕ 0 = Id is a vector field on Ω, where V = T Id Diff(Ω) is a Lie algebra. We often call v a velocity field in the research fields of registration [4]. A diffeomorphism can be generated by the exponential map on Diff(Ω), i.e., ϕ = exp(v). It provides the intrinsic update step [40,41] on the diffeomorphism group: where v is the velocity field update, and • is both the function composition and the multiplication on the Lie group.
A practical method to compute the exponential map of vector fields, given by Arsigny et al. [42], is based on the idea of "scaling and squaring" [43]. For any integer N, it holds exp(v) = exp N −1 v N because of the property of the exponential map, Suppose v is a small vector field, it is reasonable to use the first-order approximation of the exponential map. We denote w = exp(v) as the result transformation, and the algorithm of the exponential map is described as follows (Algorithm 1).

Algorithm 1
The first-order algorithm for the exponential map of vector fields Choose a proper N such that 2 −N v is close enough to 0, e.g., max x 2 −N v(x) ≤ 0.5. Implement the first-order integration: w(x) = 2 −N v(x) for every x ∈ Ω. Do N recursive squarings: w ← w • w.
Therefore, in the above setting, the velocity field update of each iteration is selected in V discretely, and the result velocity field is regarded as a constant. We often call it a stationary velocity field (SVF). By contrast, the velocity field of LDDMM [15][16][17][18] varies over time, i.e., v = v t is a continuous curve in V. The SVF methods, such as Diffeomorphic Demons [27] and Log-domain Diffeomorphic Demons [44], do not optimize the global variational problem like LDDMM because they do not update the velocity field in the whole time flow. However, the SVF methods have less computational cost and can thus obtain a diffeomorphism quickly.

The Reassignment of the Segmentation Masks
We can improve the boundary alignment of brain regions and the robustness to the noise through the multi-label segmentation masks of large regions, which are the regions of white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF). The masks are available easily because there is an obvious difference in the intensity values (see Figure 1). We propose reassigning the intensity values on the masks. Suppose there are m images participating in the registration, we set the intensity value of the r-th region to be where B is the segmentation mask, I j is the j-th image, and Ω j r is the r-th region on I j . It means we first average the intensity value on the region of one image and then average among different images. For example, we assume to use only two images I 1 , I 2 , and there are two regions on the images. The average intensity values of the first region of We then obtain the average intensity value on the first region, i.e., B 1 = 1 2 B 1 1 + B 2 1 . Therefore, we calculate the intensity values of the segmentation masks once throughout the process. Significantly, we need histogram matching [45] on the images before the reassignment because we should avoid the gap of the intensity values in the same region among different images.
We add the information of the average intensity values of the regions to the masks through the reassignment. Consequently, the intensity magnitude of the masks coincides with that of the images. The spatial information of the region boundaries is also retained. Furthermore, we can improve the robustness to the noise by the reassignment. The reason is that the intensity values of the contaminated pixels are pulled back towards the average values on the regions when we introduce the similarity metric of the masks to the energy function. Therefore, the influence of the noise is weakened, and we improve the anti-noise ability of the model by doing this.

The Proposed Model
Firstly, we take the SSD as the similarity metric and the L 2 smoothness as the regularization term. In the integral form, the energy function is We restrict the solution space to the diffeomorphism group Diff(Ω). Therefore, we introduce the energy of the inverse transformation to ensure reversibility. The energy function is then extended to a symmetric form with the transformation and the inverse transformation as variables. We denote ψ as the slack variable of ϕ −1 and reformulate the energy function, Equation (4), as Next, we penalize Equation (6) to the energy function Equation (5), i.e., we add the inverse consistent constraint [46,47] We tie ϕ and ψ together in the energy function by doing this.
However, we should retain both of them because there is an error between ψ and ϕ −1 , which will be amplified during calculation. Moreover, keeping E 3 symmetric can reduce the difficulty of solving because we can divide them into two subproblems later.
After that, we introduce the similarity metric of the multi-label segmentation masks. We denote B F , B M : Ω → [0, 1] as the segmentation masks of the fixed image F and the moving image M, respectively. The magnitude of B F and B M is consistent with that of the images because we have carried out the reassignment of the masks. We use the symmetric SSD form, i.e., Minimizing E 4 helps the alignment of the region boundaries. It should be noted that the segmentation masks are different from the images even if they both take the SSD form. The reason is that the intensity value of the segmentation masks is a constant within one region, leading to E 4 = 0. Therefore, there is little force to cause deformation in the overlap of B F and B M when E 1 is not involved in the registration.

Numerical Implementation
Firstly, we split the registration problem Equation (9) into two subproblems so that the alternating iteration strategy can be applied.
where k is the number of external iterations. Next, we consider only the subproblem Equation (11) in the following steps because Equation (10) can be solved in the same way. We introduce a slack variable c to avoid hard point-to-point correspondences following the strategy of Cachier et al. [48]. Consequently, we turn the subproblem Equation (11) into min ϕ,c where 1 σ 2 x controls the spatial correspondence error. We can separate Equation (12) further into two new subproblems: where l is the number of internal iterations. In addition, we will use the approximation of in Equation (13), i.e., applying the fixed image F on it, which changes it to It can match the magnitude of this item with that of the images and the masks. After that, we use the first-order approximation of the intrinsic update step c = ϕ k,l • exp v k,l to simplify Equation (13), where v k,l is the update velocity field. The approximation is reasonable because v k,l is small. Therefore, Equation (13) becomes where ) is a row vector, and We can rewrite Equation (16) as the following least-squares problem: where I is the identity matrix of size n. Finally, we obtain the least-squares solution of Equation (20) through the Sherman-Morrison formula: where we omit the position x for simplicity. Similarly, we can solve the least-squares problem acquired from Equation (10) from the above steps: where s k,l is the update velocity field of ψ k,l , and As for the new subproblem Equation (14), we can obtain the closed-form solution with a Gaussian convolution, i.e., ϕ k,l+1 = K * ϕ k,l • exp v k,l , where K is a Gaussian kernel. This is because of the special form of the regularization term [49]. Consequently, the closed-form solution of the subproblem Equation (10) is also obtained: The algorithm of our proposed method is as follows (Algorithm 2).

Algorithm 2 Symmetric diffeomorphic image registration with multi-label segmentation masks
Initialize ϕ , ψ , v , s. repeat {Update the backward transform ψ} repeat Compute the velocity field s using Equation (22).
Compute the velocity field v using Equation (21).

The Selection of Spatially Adaptive Parameters
In this section, we consider only the parameters in Equation (21) because we can acquire those in Equation (22) similarly.
Firstly, we need to review the traditional selection of parameters based on statistics. We assume that {FM(x)| x ∈ Ω} is independent and identically distributed (IID) in a normal distribution N(0, σ 2 1 ). This setting is reasonable when Ω is huge because of the Lindberg-Lévy central limit theorem. We choose the zero mean because we hope that the warped moving image fully aligns with the fixed image. We write the likelihood function, based on the maximum likelihood estimation (MLE), as The optimal estimation of σ 2 1 is obtained through −∂ ln(W)/∂σ 1 = 0, i.e., where |Ω| is the area of Ω. In particular, we can regard Ω as the neighborhood of a specific position x, denoted as Ω x , where we can change the radius of it arbitrarily. The extreme condition is when the radius is zero, leading to σ 2 1 (x) = FM(x) 2 , which is the selection of Thirion [50]. This parameter not only adjusts the influence between the similarity metric and the regularization term dynamically but also controls the magnitude of E 1 by transferring the distribution of FM(x)/σ 1 to N(0, 1).
Next, we evaluate the misalignment based on the multi-label segmentation masks. We treat B F B M (x) as a continuous random variable. The reason is that the activation functions [51], such as the Sigmoid function and the tanh function, can smooth the gap around the area boundaries, although B F B M (x) can take only some values. We denote 1 as the indicator function, which fulfills 1(x) = 1 if x > 0 and 1(x) = 0 if x = 0. We then introduce the probability of misalignment based on the masks: where N x is the neighborhood of x. The indicator function helps distinguish whether the alignment improves or the low-intensity values occur when B F B M (y) declines. This probability p is especially a widely used overlap metric called the "target overlap" [28] when N x is large enough to cover Ω r After that, we give our selection of spatially adaptive parameters. We think that 1/σ 2 i , i = 1, 2, 3 are connected because the decline of E 1 and E 3 is not meaningful unless the alignment of region boundaries is good. Therefore, the parameters should satisfy two requirements: • When the alignment is poor (p approaches 1), we reduce the effect of E 1 and E 3 and increase that of E 4 . When the alignment is good (p approaches 0), we do the opposite. • The role of E 1 should be stronger than that of E 3 because the decline of E 3 cannot improve registration accuracy.
Therefore, we denote the parameters as functions of p, i.e., 1/σ 2 i = q i (p). Specifically, we define where λ ∈ [0, 1], and c 1 and c 2 are the MLEs of 1/σ 2 1 and 1/σ 2 2 (see Equation (27)), respectively. We denote the radius of N x as R. The parameters, λ and R, are both determined by the user because the influence of E 3 on E 1 and the amount of computation vary in different tasks. It is worth noting that we need to add a small positive number to the denominator of c i when c i is not well defined.
Finally, we validate the effectiveness of the parameters. When 0 < p < 1, we express the ratio of the importance between the similarity metric of the images and that of the masks, i.e., the ratio of E 1 to E 4 , as We know that N x FM(y) 2 dy is constant when x is fixed. Meanwhile, we infer that p and N x B F B M (y) 2 dy are positively related based on Equation (28). We thus conclude from Equation (32) that q 1 /q 2 decreases when p increases, which means E 4 plays a key role; q 1 /q 2 increases when p decreases, which means E 1 plays a key role. Therefore, the parameters vary according to the current position and the decline of the energy function, which coordinates the optimization process spatially.

Results
In this section, we evaluate the performance of our method with three experiments. The experimental images include synthetic 2D images, the OASIS-1 dataset [52], and the IBSR18 dataset [53]. All these experiments are implemented using C++ in a Ubuntu 16 system, with two Intel Xeon Silver 4216 @2.1GHz CPUs and 128GB 2666MHz memory.

Implementation Details and Algorithmic Comparison
Firstly, we regard our algorithm's user-determined parameters. We choose the radius of N x to be R = 1 considering the calculation time. We select λ = 0.5, which means the ratio of E 1 to E 3 is 2 : 1. In addition, the algorithm has two terminal conditions, i.e., the maximum iterations and the minimum magnitude of the update step.
Next, we implement the algorithm based on the open-source library Insight ToolKit (ITK) 5.1 (https://itk.org, accessed on 31 May 2022). We use the combination of two itkPDEDeformableRegistrationFilters to iterate alternatively, as shown in Figure 2a. There are four modules in itkPDEDeformableRegistrationFilter, as shown in Figure 2b. We denote two Gaussian kernels as K fluid and K diff , where K fluid smooths the update field and K diff smooths the deformation field. The exponential map is computed through itkExponentialD-isplacementFieldImageFilter. All modules are set to compute parallelly by the multithreading technique [54]. After that, we use two mainstream approaches for comparison, i.e., Diffeomorphic Demons and SyN. Vercauteren et al. [27], who proposed Diffeomorphic Demons, applied the SVF framework to the classic Demons algorithm [55], enhancing the smoothness and deformability of the transformations. SyN is a symmetric diffeomorphic registration method, which uses the local correlation coefficient (LCC) as the similarity metric. Avants et al. [56] provided the SyN approach based on the LDDMM algorithm [15], strengthening the registration accuracy and reversibility. Diffeomorphic Demons and SyN are both widely used to compare with the brain registration methods [28]. In addtion, K fluid and K diff are also used in these two approaches.
Finally, we introduce the algorithmic comparison and related notations. We denote our method as Ours, our method without the inverse consistent constraint as Ours-NId, and our method without the similarity metric of the masks as Ours-NSeg. We also denote Diffeomorphic Demons [27] as DiffDe and SyN [56] as SyN. In addition, we denote the number of iterations as numIt and denote two Gaussian kernels K fuild and K diff as Kfσ and Kdσ, respectively, where σ is the standard deviation (stDev). All methods are performed only once at the highest resolution of the images for fairness. Moreover, we introduce the following five evaluation metrics considering accuracy, reversibility, reasonability, and smoothness.
• Accuracy: The Dice ratio (DR(Ω)) [28] is defined by where r is the index of the regions, and |·| is the volume. It is a measure of region overlap, which should be as high as possible. • Reversibility: The identity error (IdErr) is defined by 1 |Ω| Id − ϕ • ψ 2 . It reflects the difference between the identity map and the composition of the forward and backward transformation, which should be as low as possible. Furthermore, we denote the masks of WM, GM, and CSF segmentation as WGCS, which are the multi-label segmentation masks B F and B M . We denote the detailed masks of brain regions' segmentation as BRS, which are used for computing the Dice ratio.

The Ablation Experiments on Synthetic 2D Data
In this section, we conduct ablation experiments to verify the effectiveness of E 3 and E 4 , i.e., the inverse consistent constraint and the similarity metric of the masks. The synthetic 2D data, with the size of 100 × 100, is a simple simulation of the brain, as shown in the first row of Figure 3. We generate the fixed image first with three ellipses of the same center. Then, we set the asymptotic intensity values to avoid being the same as the masks, as shown in the second row of Figure 3. We produce the moving image last by changing the long and short axes, which mimics the possible deformation in the brain [58]. We also apply a shear transform towards the x-axis with 84 • to test the ability to recover simple linear transformations.
We will carry out registration with numIt = 300, which makes the optimization process converged. In particular, we set 20 external iterations × 15 internal iterations in our method. Moreover, we select two kinds of smoothing parameters, i.e., Kf2 Kd0.5 and Kf1 Kd1. The reason to select Kf1 Kd1 is that it is the same as the classic DiffDe [27], and many methods verify its effectiveness. By contrast, Kf2 Kd0.5 is a compromise between the standard setting of DiffDe and SyN [59]. Since DiffDe cannot obtain the inverse transformation in one registration, it performs an additional backward registration, i.e., exchanging M and F. The qualitative results are shown in Figure 4, and the quantitative results are shown in Tables 1 and 2. Firstly, we discuss, based on Figure 4, the effect of the smoothing parameters. The standard deviations of Kf and Kd determine the sharpness of the deformation, as shown in the magnitude images of the transformations. It reveals that Kd affects smoothness the most because the higher the stDev of Kd, the smoother the transformation. In addition, the results of Kf2 Kd0.5 are better than those of Kf1 Kd1, as shown in the warped moving images and the residual images. It suggests that low stDevs lead to more accurate results. Therefore, we should choose the smoothing parameters carefully in practice to balance accuracy and smoothness.
Secondly, we discuss, based on Tables 1 and 2, the results of the ablation experiments. Ours-NSeg shows excellent results in the items of IdErr, M(DetJ), and SMErr, but the DR is relatively low. However, the behavior of Ours-NId is just the opposite. We conclude that the similarity metric of the masks improves the accuracy, and the inverse consistent constraint improves the reversibility, reasonability, and smoothness. Therefore, the good results of Ours shows it combines the advantages of both E 3 and E 4 .   In each part, the three rows display the warped moving images, the deformation grids, and the residual images, respectively. Different columns are the results of different methods. The color of the deformation grids, whose scale bar ranges from −4.5 to 5.8 pixels, represents the magnitude of the transformations. The color of the residual images, whose scale bar ranges from −3.0 to 3.2, represents the difference between the intensity values of the fixed image and the warped moving images.
Thirdly, we discuss the comparison among DiffDe, SyN, and Ours. As described in Tables 1 and 2, DiffDe shows poor results in the items of IdErr, M(DetJ), and SMErr, although the warped moving images of DiffDe are very similar to those of Ours. Moreover, SyN achieves the best reversibility. However, the warped moving images of SyN are not satisfactory because the shapes of the white ellipses do not maintain, as displayed in Figure 4. By contrast, the shapes of Ours are good, which means Ours can recover the deformation. Moreover, Ours has the best accuracy and reasonability, as revealed in Tables 1  and 2.

The Anti-Noise Experiments on the OASIS-1 Dataset
In this section, we show the robustness of our method to the noise through the antinoise experiments. The OASIS-1 dataset [52], which we use, contains 416 subjects aged from 18 to 96. This dataset is a part of the OASIS project to make neuroimaging data freely available (https://www.oasis-brains.org, accessed on 31 May 2022). It is used widely in the comparison of registration algorithms [60][61][62]. Each subject is equipped with a T1-weighted skull-stripped image, a BRS of 35 brain regions, and a WGCS, as shown in the first three images of Figure 5. All the masks are verified by experts. The size of the images is 160 × 192 × 224 voxels with a voxel size of 1 × 1× 1 mm 3 . We selected five subjects in the OASIS-1 dataset randomly to avoid the influence of the sort order in the dataset. Next, we designate one subject as the fixed image and leave the other four as moving images. Specifically, the fixed image is #161, and the moving images are #227, #287, #319, and #333, respectively. We then add Gaussian noise of four standard deviations to the images, i.e., σ n = 0, 0.01, 0.025, 0.05, where 0 means no noise. Therefore, each method performs 4 × 4 = 16 registration. The higher the standard deviation, the stronger the noise is, as shown in the last three images of Figure 5. We use additive Gaussian white noise because it is common in nature. After that, we carry out histogram matching to reduce the uncertainty caused by the different distributions of the intensity values in the images. Meanwhile, we reassign the intensity value of the masks. Finally, we conduct registration with numIt = 150. In particular, we set 10 external iterations × 15 internal iterations in our method. The smoothing parameters are chosen to be Kf2 Kd0.5 for all methods. The reason is that Kf2 Kd0.5 can obtain high accuracy while maintaining good smoothness, which is suggested by the quantitative and qualitative results in Section 3.2.
Firstly, we analyze an example in which #227 and #161 are the moving image and the fixed image, respectively. The registration of this example is performed with the noise of σ n = 0.05. We present the warped BRSs in Figure 6, which can show the accuracy is poor if a large difference between the warped BRSs and the fixed BRSs occurs. The results of DiffDe and SyN are not satisfactory, e.g., DiffDe's and SyN's volume of the left and right ventricles is larger than the fixed BRS's. By contrast, Ours has the best results in this example because the volume and position of the brain regions are very close to the fixed BRS. Secondly, we plot how the DR varies with the noise intensity in Figure 7. The high slope of the line segments indicates that the noise can affect the accuracy greatly. The stability of DiffDe is weak because the DR decreases rapidly when the noise is enhanced. By contrast, SyN's performance is better than DiffDe's, e.g., the line segments of 0.01 → 0.025 are almost flat. The results of Ours are similar to those of SyN, but the overall performance of Ours is better. Consequently, Ours has strong robustness to the noise.

The Performance Experiments on the IBSR18 Dataset
In this section, we demonstrate the excellent comprehensive capabilities of our approach through the performance experiments. The Internet Brain Segmentation Repository (IBSR) contains T1-weighted MRI brain images of 18 subjects. The original images were preprocessed by the Center for Morphometric Analysis, Massachusetts General Hospital in Boston, U.S. [28]. We use the IBSR18 v2.0 dataset (https://www.nitrc.org/projects/ibsr/, accessed on 31 May 2022), in which Rohlfing et al. [53] modified the IBSR18 dataset by removing non-brain regions, etc. We can see this dataset is commonly used in the field of brain image registration [28,[63][64][65]. Each subject is equipped with a skull-stripped image, a BRS of 84 brain regions, and a WGCS. The size of the images is 256 × 256 × 128 voxels with a voxel size of (0.837 ∼ 1) × (0.837 ∼ 1) × 1.5 mm 3 .
We first crop the unnecessary area of the images, which changes the size into 166 × 161 × 128. We also unify the voxel size to 0.97 × 0.97 × 1 mm 3 . We then choose the same smoothing parameters as in Section 3.3, i.e., Kf2 Kd0.5. After that, we conduct registration with numIt = 150. In particular, we set 10 external iterations × 15 internal iterations in our method. Finally, we select the first ten subjects in the dataset. We make each subject the fixed image and the other nine subjects the moving images. Therefore, each method performs 10 × 9 = 90 registration.
Firstly, we analyze an example in which #02 and #01 are the moving and fixed images, respectively. We display, in Figure 8, the moving image, the fixed image, and the warped moving images of different methods. We enlarge representative positions to analyze the difference in the results conveniently. For example, the first row of Figure 8 shows the axial sections. The result of Ours is the most similar to the fixed image considering the contour of the left ventricle and the left thalamus. Furthermore, we can also verify the excellent performance of Ours from the second and third rows of Figure 8, representing the coronal and sagittal sections of the images, respectively.

DiffDe
Ours SyN Moving Image Fixed Image Axial Sagittal Coronal Figure 8. The registration results of an example in Section 3.4 are shown. All images are plotted in the axial, coronal, and sagittal sections. The first column displays the moving image #02, and the second column displays the fixed image #01. The third to fifth columns represent the warped moving images of different methods, i.e., DiffDe, SyN, and Ours, respectively.
Secondly, we can also compare the results from the volumetric plots and point cloud maps of the evaluation metrics based on the above example. We show the volumetric plots of the DR for each brain region after stripping the upper part of the brain, as displayed in the first row of Figure 9. We can see that Ours achieves the optimal result because its color is the closest to blue. In the second row of Figure 9, we plot the volume of the SMErr of the transformations over the entire brain. All three methods appear to be smooth overall, but SyN shows more large deformations because there is more volume of red. We last display in the third row of Figure 9 the point cloud maps where the negative Jacobian determinant occurs. In contrast, SyN has the most points, DiffDe has the second most, and Ours has the fewest. Therefore, Ours shows, in this example, wonderful results considering accuracy, smoothness, and reasonability.

SMErr
DetJ < 0 Thirdly, we organize the quantitative results of 90 registration in Table 3, which lists the mean and standard deviation of each evaluation metric. The results of DiffDe are relatively stable because its standard deviations are lower than other methods' in most items. It is worth noting that SyN achieves the best result of the IdErr, but SyN is not remarkable in the remaining items. By contrast, Ours is the best considering the DR, P(DetJ), M(DetJ), and SMErr, which means the results of Ours have excellent accuracy, reasonability, and smoothness.

Discussion
We verify the advantages of our method through the experimental results. The multilabel segmentation masks are added to the model as a priori information. The reassignment ensures the masks contain both the spatial information of the region boundaries and the mean intensity values in the regions. Therefore, the masks improve the alignment of the regions and robustness to the noise. Moreover, the symmetric form of the model strengthens the reversibility, reasonability, and smoothness without losing much accuracy. Furthermore, we need the masks of large regions only, which are accessible easily in practice. The masks can also be changed with different region definitions. Finally, our method has a very small computational cost and can thus obtain transformations quickly because of the SVF framework.
Our method also has some disadvantages. Firstly, the method relies on additional segmentation methods to obtain the masks. Secondly, the accuracy can be reduced if the inaccurate masks of large regions are used. Thirdly, the amount of calculation increases exponentially with the radius of N x , which is (2R + 1) n specifically, where n is the image dimensionality and R is the radius. If the image size and dimensionality are small, enlarging the radius is a good choice, which improves the accuracy of the parameter estimation. However, selecting a large radius is not applicable for high-resolution 3D medical images.
We plan to provide the masks for the model by establishing a relating segmentation method in the future. In addition, we will research to strengthen the convergence of reversibility. We will also apply the method to multi-modality registration.

Conclusions
In this study, we propose a symmetric diffeomorphic image registration model based on the multi-label segmentation masks to solve the problems in brain MRI registration. We tackle the issue that existing methods pay little attention to the alignment of the region boundaries by introducing the similarity metric of the multi-label segmentation masks. It also improves the robustness to the noise. We build the model on the diffeomorphism group, with the relaxation method and the inverse consistent constraint to strengthen the smoothness and reversibility. Moreover, we help the descent of the energy function in different regions through the spatially adaptive parameters. Finally, we verify the effectiveness of our method through three experiments. Compared with mainstream methods, the approach has better accuracy and noise resistance, and the transformations are more smooth and more reasonable.