miRID: Multi-Modal Image Registration Using Modality-Independent and Rotation-Invariant Descriptor

: Axiomatically, symmetry is a fundamental property of mathematical functions deﬁning similarity measures, where similarity measures are important tools in many areas of computer science, including machine learning and image processing. In this paper, we investigate a new technique to measure the similarity between two images, a ﬁxed image and a moving image, in multi-modal image registration (MIR). MIR in medical image processing is essential and useful in diagnosis and therapy guidance, but still a very challenging task due to the lack of robustness against the rotational variance in the image transformation process. Our investigation leads to a novel, local self-similarity descriptor, called the modality-independent and rotation-invariant descriptor (miRID). By relying on the mean of the intensity values, an miRID is simply computable and can e ﬀ ectively handle the complicated intensity relationship between multi-modal images. Moreover, it can also overcome the problem of rotational variance by sorting the numerical values, each of which is the absolute di ﬀ erence between each pixel’s intensity and the mean of all pixel intensities within a patch of the image. The experimental result shows that our method outperforms others in both multi-modal rigid and non-rigid image registrations.


Introduction
Over the last decade, medical imaging has been an important tool in clinical practice and many biomedical studies [1]. In image-guided therapy, medical images from various sources provide different information and, therefore, are employed for different purposes. For example, X-ray, ultrasound, computer tomography (CT) and magnetic resonance (MR) provide anatomical structure information, but positron emission tomography (PET) and single-photon emission computed tomography (SPECT) give the metabolism and physiological information [2]. Consequently, medical practitioners have to use multi-modal medical images generated at different times with different resolutions in order to make a better diagnostic of the disease [1]. Primarily, multi-modal image registration (MIR) is an underlying mechanism of an image-guided diagnostic tool. The aim of MIR is to align the corresponding features of two images. In addition, it has been enhanced to combine the anatomical and functional structures from two different image modalities. Therefore, currently, applications of MIR can be classified into two major categories: mono-modal and multi-modal registrations. The former involves the alignment between the same image modality and the latter different image modalities [2][3][4].
In general, CT and MR possess high spatial resolutions, but have limited functional information. On the other hand, PET and SPECT can provide physiological images, but lack anatomical information [3]. In order to obtain more complete medical information, MIR has been proposed to align two images with different modalities: a moving and a fixed one. Particularly, it can provide both anatomical and functional information [5,6]. For example, registration of CT and PET images has been used for positioning esophageal cancer, which is useful in detecting the tumor target and for planning the radiation treatment in clinical therapy [7]. MR-CT registration has been used to delineate the clinical target volume in the image-guided radiotherapy of prostate cancer [8]. In fact, applications of MIR in medical multi-modal images have faced more challenges due to the complicated relationships of the intensities between the two images. A difficult task is to define the similarity of the registered images [9].
During the last few decades, several methods of similarity measures in MIR have been proposed. In order to qualify as a similarity measure, all of them must satisfy at least two fundamental mathematical properties: symmetry and reflexivity [10]. Initially, mutual information (MI) has been extensively and successfully used in image processing to measure the intensity relationships [4,6]. Borvornvitchotikarn and Kurutach [11] introduced a taxonomy of MI methods and explained some MI principles and their limitations. The performance of MI was enhanced by many researchers, such as through Tsallis and Renyi's entropy [12] and Jensen-Renyi's entropy [13]. Second-order MI (SMI) [14] incorporated spatial information by extending the co-occurrence of intensity pairs of neighbors in the image. Regional MI (RMI) [15] and principal component analysis (PCA) regional MI (PRMI) enhanced the traditional RMI by using principal component analysis (PCA) [16]. Conditional MI (CMI), proposed by Loeckx et al. [17], used spatial information to improve the calculation of the joint intensity histogram. Moreover, Rivas et al. [18] presented self-similarity α-MI (SeSaMI) which adds the local structural information in the image into the MI. In addition, they proposed the contextual conditioned MI (CoCoMI), which incorporates the contextual information into CMI [19]. Although they achieved a good result in MIR, the computational cost of SeSaMI and CoCoMI also increased.
Furthermore, Heinrich et al. [20] proposed a method called the modality independent neighbourhood descriptor (MIND) as a local self-similarity measure based on neighborhood information. MIND, using the sum of squared differences (SSD) in estimating the similarity, is more accurate than several methods. However, its evaluation is heavily based on the central pixel and, thus, possibly vulnerable to noise. Consequently, Heinrich et al. [21] proposed a so-called self-similarity context (SSC) to improve the noise robustness of MIND. SSC uses the hamming distance of the binary results instead of the sum of the absolute differences (SAD). Moreover, Kasiri et al. [22] proposed the patch-based estimation, which applies MI to measure the similarity of the patches. In another work by Kasiri et al. [23], the measure called sorted self-similarity (Ssesi) was proposed. It sorts the elements within the patch to solve the rotational variance problem of the SSD for evaluating the patch similarity.
Local binary patterns (LBP) is one of the most effective and frequently used methods in texture classification [24]. Recently, Dongsheng et al. [25] have introduced a modality-independent local binary pattern (miLBP) descriptor, which can outperform SSC, SeSaMI, CoCoMI and the linear correlation of linear combination (LC2), in terms of robustness against noise and intensity non-uniformity in medical image processing. However, miLBP is still variant in rotational deformations. Borvornvitchotikarn and Kurutach [26] proposed a robust self-similarity descriptor (RSSD), which enhances miLBP in terms of rotational variance and improves the accuracy of MIR. However, both the miLBP and RSSD methods assess similarity based on the use of the central pixel and, therefore, any image artifact in that pixel can affect the performance of the descriptors.
To deal with the above-mentioned problems, we propose a novel robust local descriptor based on miLBP for the registration of different image modalities. Its fundamental technique relies on the mean of all pixels or voxels to estimate the self-similarity of the intensities within the patch. Moreover, it yields a rotation invariant metric by sorting the relevant values in the region of interest. We will call this method a modality-independent and rotation-invariant descriptor (miRID). This paper is organized as follows. Section 2 provides our proposed method. Section 3 shows the experimental result and gives some discussion. Finally, Section 4 concludes this work.

Registration Framework
The goal of MIR is to find the best alignment between a moving image and a fixed image by maximizing the similarity or minimizing the dissimilarity between the two registered images. Generally, MIR has three main components: similarity measure, transformation and optimization. Similarity measure is used to estimate the (dis)similarity between the fixed image and the transformed moving image. The transformation model deforms the moving image to fit the fixed image. The optimization method selects the transformation parameters that gives the best similarity [27]. In this work, a registration T of a moving image I m and a fixed image I f can be formulated as where D I f , T(I m ) is a dissimilarity measure which determines the degree of transformation and T denotes a transformation function used to find the minimum value of D I f , T(I m ) .

miLBP
miLBP is a technique based on the traditional LBP. However, instead of using a selectively fixed threshold value as in LBP, miLBP introduced an adaptive threshold value using the standard deviation of all pixel intensities within the patch. Defined by Equation (2) [25], the concept of miLBP can be illustrated in Figure 1 and as follows: where P denotes the number of the neighboring pixels, c denotes the central pixel of the patch, p denotes a neighboring pixel of c (where p is counted as 1 for the pixel at the upper-left corner and increased by 1 as it moves to the next position in a clockwise manner), g p denotes the intensity value of p, g c denotes the intensity value of c and δ denotes the standard deviation of all pixel intensities within the patch.

miRID
This section proposes a novel local self-similarity descriptor which enhances the technique of miLBP. As mentioned previously, the miLBP method estimates the similarity value based on the use of the central pixel. Consequently, image artifacts within the central pixel can affect the performance of the descriptor. Therefore, we need a descriptor for the registration of multi-modal images by avoiding the use of the central pixel in evaluating the patch relationships. To accomplish that, we adopted the mean of all intensity values within the patch instead of using the intensity of the central pixel.
In addition, to make our method robust against rotational deformations, the sorting operation was performed on the absolute difference between each pixel's intensity and the mean of all pixel intensities within a patch of the image I.
The proposed method is defined by Equation (3): where g i denotes the intensity value of the pixel at position i, M denotes the mean of intensity of all pixels within the patch, g i − M denotes the absolute difference between g i and M and Sort desc operation denotes the descending order operation. In Equation (3), we investigated the dissimilarity measure within an image patch. Firstly, G denotes the number of the pixels in the patch. G can be calculated by G = (2r + 1) d , where r is the radius and d is the dimension. For example, assume that we have a 3 × 3 patch in a 2D image. As shown in Figure 2, the value of G is 9 (r = 1 and d = 2). Our work employed the mean of the pixel intensities within the patch (denoted by M), rather than the central pixel, in calculating the distance. More specifically, an absolute mean difference g i − M between the intensity of a pixel g i and the mean M can be defined as follows. Let g i denote the intensity of a pixel at position i in a patch (i.e., g i ∈ {g 1 , g 2 , g 3, . . . , g G }), where G is the patch size. The mean of the intensity values can be defined by 1 G G i=1 g i . Secondly, after receiving s( g i − M ), 1 ≤ i ≤ G from the previous step, each intensity value of the pixel is taken into account and arranged in descending order. The binary results of the sorting operation are obtained as the invariant when the image is rotated. This is an essential step to yield a rotation invariant self-similarity descriptor. The sorting step can be formally described by s(Sort desc g i − M ), 1 ≤ i ≤ G.
Finally, thresholding is a step to digitize the values of s(Sort desc g i − M ) in order to label the pixel location i with 0 or 1. δ is an adaptive threshold that is used to calculate the labels. δ can be defined by The miRID was used for evaluating the dissimilarity between the fixed image I f and the deformed moving image I m;φ . The dissimilarity measure D (I f ,I m;φ ) , whose value is within the range of [0,1] between the two images, can be evaluated by Equation (4): where D (I f ,I m;φ ) denotes the value of the dissimilarity between the images to be registered; € represents the bit count operation, which counts the number of bits having the value of 1; ⊕ is the hamming distance operation; miRID I f and miRID I m;φ are the sorted binary pattern results of bits within I f and I m;φ ; and |W| denotes the bit number of miRID results. Figure 3 shows a block diagram illustrating our registration model. Specifically, miRID I f and miRID I m;φ are used to represent the self-similarity values of the original fixed image and the transformed moving images, respectively. These descriptors can be evaluated by Equation (3). The dissimilarity measure between the miRID of I f and the miRID of I m;φ can be calculated by D (I f ,I m;φ ) , as defined by Equation (4). For the optimization of D (I f ,I m;φ ) , the gradient descent optimization method [28] was used to find the optimal transformation. Adjusting the value of φ k in the transformation process using the gradient descent method was performed according to Equation (5). To find the optimal transformations, we minimized the dissimilarity value D (I f ,I m;φ ) : where ∇D (I f ,I m;φ k ) is the derivative of the cost function D (I f ,I m;φ k ) based on φ k , φ k+1 is the next position, φ k denotes the current position, s k represents the step size at the k th iteration and φ is the values of the transformation parameters. The gradient of the cost function D (I f ,I m;φ ) with respect to the transformation parameter φ is shown in Equation (6):

Experimental Results and Discussion
In our experimentations, an miRID as a local self-similarity descriptor was implemented using MATLAB R2019b with a mex file complied from C. All experiments were tested on a computer with an Intel ® Core™ i7-7700 CPU 3.60 GHz and 16.0 GB of RAM. We set 300 iterations to be the maximum number of the stopping condition in the registration experiments (see more details on the settings in the original work of Myronenko et al. [29]).
In order to demonstrate the efficiency of the proposed method, we employed simulated longitudinal relaxation time (T1), transverse relaxation time (T2) and proton density (PD) brain MR images with an image size of 181 × 217 × 181 voxels, 3% noise and 40% intensity non-uniformity from the BrainWeb dataset [30]. We also compared the registration performance of the proposed method with the MI [6], SSC [21], miLBP [25] and Ssesi [23] methods. They were quantitatively assessed by the target registration error (TRE) [20] and the registration time (Time).
To demonstrate the rigid image registration, the moving image was rotated within the range of ±20 • in rotational experiments. The rigid transformation of T I m;φ can be expressed as where R denotes the rotation matrix. All pixels a in the moving image I m were treated as a rigid body, which could rotate with respect to the Euler angle θ. In a 2D image, φ = θ, where θ represents the rotation around the axis normal to the image [31]. Table 1 shows the comparison of the experimental results (in terms of TRE) of the multi-modal rigid image registrations carried out by various approaches. It can be seen that the performances of the miRID and Ssesi were slightly different (1.82 and 1.89 mm, respectively), but both were much better than the rest. Moreover, the TRE of Ssesi was lower than the miRID's on the T1-PD image registration. Figure 4 shows the visual assessments of the alignment accuracies.  Free-form deformation (FFD) with a B-spline control point was used in non-rigid image registration for this experiment. It can be expressed as where a φ represents the control points, p j denotes the B-spline coefficient vectors, β D denotes the cubic D-dimensional B-spline polynomial, N a denotes the set of B-spline control points at a I m and σ is the B-spline control spacing, while a φ is used as the regular grid and overlaid on the fixed image I m [29]. Table 2 shows the comparison of the experimental results (in terms of TRE) of the non-rigid image registrations carried out by various approaches. The registration performance of our miRID method was significantly better than the others. In fact, our proposed method has been proven to be the best approximation of the similarity metric for the complicated relationships of the pixels' intensities between multimodalities. However, the efficiency of the SSC method was similar to ours, which could outperform others in T1-PD image registration. Figures 5-7 show the visual assessments of the alignment accuracies.   It can be seen from the experimentation that our proposed method, the miRID, was able to handle the complexity of the intensity relationships between different modalities and was robust to rotational variances. Its computation of the similarity measure for multi-modal image registration was straightforward. We compared our approach with some well-known ones-MI, SSC, miLBP and SSesi-in our experiment. For the multi-modal rigid image registration, as shown in Table 1, the miRID obtained the highest accuracy in all cases of modalities, except for the case of T1-PD, where Ssesi was slightly better than our method. Therefore, by adding the sorting operation into our method, rotation invariance could be accomplished. For the case of multi-modal non-rigid image registration, as shown in Table 2, the miRID method can still receive the best accuracy in all cases of registrations. In [25,26,32], the pairwise distance computations were defined by the absolute difference between the intensity of the neighboring pixels and the central pixel. This has the weakness that the performance of the self-similarity descriptor is directly affected by image artefacts within the central pixel. In Table 2, the efficiency of the miLBP was achieved for non-rigid transformation. However, the miLBP descriptor was used in the computation of the corner and gradient regions in the image patch, where g c may be very dark grey or very bright grey, and g p is significantly different in that the g p − g c may be astoundingly high. Fortunately, our method uses M instead of g c , which is suitable for estimating the opposite intensity of the pixels in some corresponding regions of multi-modal images. As a result, the registration under the miRID method can transform the distorted images on the fixed images well. In addition, in the comparison of the registration time terms as shown in Tables 1 and 2, the miLBP and miRID methods can perform at high registration speeds compared with others. Figure 7. To illustrate the T1-PD registration cases, this shows the visual results of the miRID method experimenting on the T2-PD images. (a,e,i,m) are the distorted 20th, 30th, 40th and 50th slices of the original T2 image, respectively. They were used as the moving images in the registrations. (b,f,j,n) are the 20th, 30th, 40th and 50th slices of the original PD image, respectively, and they were the fixed images in the registrations. The results of registering the moving images on the fixed images are shown in (c,g,k,o). The transformations of these registrations are shown in (d,h,l,p), respectively.

Conclusions
In this paper, we have proposed a novel self-similarity descriptor called miRID that is to be used in medical multi-modal image registration. Unlike MIND, miLBP and RSSD, this descriptor is simpler in terms of computation and can handle multi-modal image registration more efficiently. Particularly, it has been experimentally proven that our proposed technique is robust against both rotational variances in the deformation process and image artifacts that may occur. Finally, the experimental results have also shown that our method outperformed others in terms of alignment accuracy.