2D/3D Multimode Medical Image Alignment Based on Spatial Histograms

: The key to image-guided surgery (IGS) technology is to ﬁnd the transformation relationship between preoperative 3D images and intraoperative 2D images, namely, 2D/3D image registration. A feature-based 2D/3D medical image registration algorithm is investigated in this study. We use a two-dimensional weighted spatial histogram of gradient directions to extract statistical features, overcome the algorithm’s limitations, and expand the applicable scenarios under the premise of ensuring accuracy. The proposed algorithm was tested on CT and synthetic X-ray images, and compared with existing algorithms. The results show that the proposed algorithm can improve accuracy and efﬁciency, and reduce the initial value’s sensitivity.


Introduction
Image-guided surgery, which involves computer vision, biomedicine, imaging, automatic control, and other disciplines, is an interdisciplinary research direction [1]. Through the comprehensive application of a variety of medical image information [2], it carries out the preoperative diagnosis [3][4][5][6][7], disease analysis [8,9], planning of the surgical path, intraoperative localization of the lesion, real-time tracking of surgical instruments [10], and adjustment of the spatial position of surgical instruments to achieve an accurate diagnosisso as to provide groundbreaking and precise treatment [11]. This technology provides many benefits for patients, such as reducing surgical trauma, speeding up recovery, and reducing hospital stay and costs. The accurate image information in image-guided surgery is obtained by integrating preoperative and intraoperative images and navigation technology [12,13]. Usually, high-resolution three-dimensional scanning methods such as MRI, CT, and PET are used to obtain the desired image of the anatomical region of interest [14,15]. This image with high-resolution characteristics and more spatial information can better reflect the human tissue structure and physiological information. However, the imaging time is extended, which is not conducive to the surgical environment. The data used in the operation are two-dimensional ultrasound, X-ray, and optical images [16]. These two-dimensional images have the characteristics of fast imaging and low radiation, adapting to the operating environment. However, their resolution is low, so it is difficult to obtain accurate and complete lesion location and texture information. Therefore, a threedimensional image is needed to display higher-dimensional information. In image-guided surgery, the preoperative and intraoperative images are mapped to the same coordinate system by comparing the corresponding information in the same tissue or organ to keep the anatomical structure consistent [2]. Preoperative and intraoperative data registration and surgical instrument tracking provide surgeons with information about the instrument's current position relative to the planned trajectory, nearby vulnerable structures, and the final target in image-guided minimally invasive surgery. Pre-intervention images with X-ray or ultrasound images in interventional radiology make tools such as catheters and needles visible, significantly improving navigation accuracy. In image-guided endoscopic surgery, 3D virtual images of anatomy and pathology are generated from preoperative images and registered in real-time endoscopic images. Through augmented reality visualization, anatomical structures hidden under tissues can be displayed. In extracorporeal radiotherapy, the registration of planned CT images and daily pre-treatment images can achieve accurate patient positioning, which is essential for accurate targeted dose delivery and avoiding exposure of health-critical tissues.
Image-guided surgery's success largely depends on preoperative and intraoperative image data registration accuracy and 2D/3D registration [17][18][19][20]. Image registration is developed for the integration of multisource image information. Image registration's main purpose is to find the spatial transformation mapping relationship between two or more images in the same location to obtain the maximum image information [11,[21][22][23]. These images can come from different times, different imaging equipment, or different angles. After registration, the spatial pose and texture information is consistent. The images that need registration due to spatial transformation are called moving images [24][25][26]. The standard images without transformation are called reference images. Spatial transformation can be linear or nonlinear.
Researchers have proposed a new method called intensity distance [27,28]. The essence of gray distance information is to add the Euclidean distances of all pixels with the same gray value. Therefore, it contains three kinds of image information: gray information, pixel coordinate information, and pixel number information, increasing the reliability compared with single-gray-level information registration. Others [4] have proposed 2D/3D registration methods combining machine learning and geometric transformation. The projection space based on the traditional projection algorithm contains three translation and rotation parameters [29]. The high complexity of projection space dramatically affects the timeliness of registration. Ghafurian et al. [30] considered that the 2D/3D registration problem needs to search the complex solution space, leading to many calculations, so they proposed a spatial parameter-decoupling method to achieve registration. According to the dimensional differences between the registration images, image registration can be divided into 2D/2D registration, 2D/3D registration, 3D/3D registration, and time-series registration [31]. The uses of these approaches are different. This paper mainly studies 2D/3D registration in image-guided surgery.
This paper mainly proposes a 2D/3D registration algorithm based on a spatial histogram. The algorithm introduces a spatial histogram into the 2D/3D registration problem, and develops a weighted spatial histogram of gradient directions for registration, improving the registration accuracy and convergence range of translation transformation.

Materials and Methods
The algorithms involved in this paper are all based on the open-source software toolkit ITK (Insight Toolkit) [32]. ITK is an open-source toolkit for medical imaging research, mainly used for medical image registration and segmentation. Moreover, ITK includes many image-processing algorithms, such as medical image filtering and image data statistical analysis. When ITK is used to develop medical image registration, it is necessary to build a complex environment, which is not reported here.
The DICOM sequence obtained from the human brain model's CT scan was used as a 3D floating image in the registration experiment. In addition, the digitally reconstructed radiograph (DRR) generated by projection rendering under specific CT parameters was used as a 2D reference image to simulate a real X-ray image. The size of the CT image was 512 × 512 × 283, the voxel spacing was 0.7813 × 0.7813 × 1.0, and the unit was mm. The projected image size was 512 × 512, the pixel spacing was 0.5 × 0.5, and the unit was mm. The 3D screenshot of the CT image is shown in Figure 1, and the 3D model rendered by the CT image is shown in Figure 2.
structed radiograph (DRR) generated by projection rendering under specific CT parameters was used as a 2D reference image to simulate a real X-ray image. The size of the CT image was 512 × 512 × 283, the voxel spacing was 0.7813 × 0.7813 × 1.0, and the unit was mm. The projected image size was 512 × 512, the pixel spacing was 0.5 × 0.5, and the unit was mm. The 3D screenshot of the CT image is shown in Figure 1, and the 3D model rendered by the CT image is shown in Figure 2.  Resampling was carried out on the CT images to facilitate the calculation and reduce the amount of experimental calculation. The sampled image data are shown in Table 1. The experiment's pixel range for all 2D images was linearly mapped to 0-255 through Formula (1): structed radiograph (DRR) generated by projection rendering under specific CT parameters was used as a 2D reference image to simulate a real X-ray image. The size of the CT image was 512 × 512 × 283, the voxel spacing was 0.7813 × 0.7813 × 1.0, and the unit was mm. The projected image size was 512 × 512, the pixel spacing was 0.5 × 0.5, and the unit was mm. The 3D screenshot of the CT image is shown in Figure 1, and the 3D model rendered by the CT image is shown in Figure 2.  Resampling was carried out on the CT images to facilitate the calculation and reduce the amount of experimental calculation. The sampled image data are shown in Table 1. The experiment's pixel range for all 2D images was linearly mapped to 0-255 through Formula (1): Resampling was carried out on the CT images to facilitate the calculation and reduce the amount of experimental calculation. The sampled image data are shown in Table 1. Table 1. Related parameters of image registration.

Data
Size The experiment's pixel range for all 2D images was linearly mapped to 0-255 through Formula (1): A weighted histogram of gradient directions (WHGD) is a simple image histogram that considers only the gradient information in the image [33]. It takes the gradient direction as the histogram bin's division basis, making it sensitive to rotation transformation. Like other image histograms, the weighted histogram of gradient directions completely ignores the image's spatial information, making it very insensitive to translation transformation. To complete the registration, it must be assisted by other registration algorithms. Because of the complexity of parameter decoupling, the accuracy of translation parameters is difficult to guarantee. Spatial information can be added to the gradient direction histogram to compensate for the translation transform's sensitivity, helping to overcome these defects. Therefore, this paper proposes the use of a spatial histogram instead of a simple image histogram to count the gradient feature information of the image.
For a discrete function f : N is a set of nonnegative integers, and h f (v) is the number or frequency of elements x. This describes a simple zero-order moment histogram that discards all information about a domain or space. Stanley et al. [34] proposed the use of high-order moments containing spatial information to make up for the defects of the simple histogram, and named the new histogram a spatial histogram or spatial graph. A spatial histogram can capture the frequency information of function elements and the spatial domain information of function. Each pixel's weight in space is determined by the average value and covariance of the pixel position when the spatial histogram is calculated based on the second moment of the image.
For a 2D image I, its value at the pixel coordinates (x, y) is v, representing the original pixel value or the value after image preprocessing. The pixel value is assumed. The second-order spatial histogram of image I can be expressed by Formula (1) by dividing the histogram interval according to the value v.
where B represents the number of bins in the spatial histogram, n b is the number of b pixels belonging to the bin, and µ b and ∑ b are the coordinate mean value and coordinate variance of b pixels in the bin, respectively. Function (3) is a simple histogram of an image. By comparing Function (2) with Function (3), the spatial histogram extracts the mean and variance of pixel coordinates while extracting frequency information, which is beneficial for calculating the similarity between the two histograms. The similarity between two spatial histograms can be expressed in the form of a weighted sum, as shown in Formula (4): where ρ n is the distance measure, which is used to measure the similarity between the same bin in the histograms h and h , while ψ b is the weighted coefficient. When calculating the similarity between simple histograms, ψ b = 1; ρ(h, h ) degenerates to the common histogram similarity calculation method. For the second-order spatial histogram, ψ b is determined by the product of two probability distributions related to the coordinate mean value and coordinate variance-that is, the Gaussian distribution described by (5): where η is the Gaussian normalization constant, and its definition is shown in Formula (7). The exponential part in Formula (5) is the average of the Mahalanobis distance between µ b and µ b , as well as between µ b and µ b . A spatial histogram can be considered to be a geometric model. It makes up for the gap between the typical histogram and more specific models (such as translation, rotation, affine, Appl. Sci. 2022, 12, 8261 5 of 13 projection, or B-spline). Like simple histograms, spatial histograms are computationally efficient and compared between corresponding image blocks. However, a spatial histogram retains some geometric information between pixels, unlike a simple histogram. More specifically, the spatial histogram captures the global positional information of pixels.
We propose using a spatial histogram instead of the simple histogram to extract the statistical characteristics of image gradient direction information and gradient amplitude information, to solve the registration algorithm's problem based on the gradient direction histogram. Specifically, when constructing the weighted histogram of the reference image and the DRR image, each bin's coordinate mean value and coordinate variance in the weighted histogram of gradient directions should be calculated. The new statistical histogram is called the weighted spatial histogram of gradient directions (WSHGD). The expression is shown in Formula (8): The WSHGD still takes gradient direction as the division basis of the histogram interval. Here, every 1 degree is a bin, for a total of 360 bins. According to Formula (10), every pixel (i, j) of the image is divided into corresponding bins, where µ d and ∑ d are the coordinate mean vector and the coordinate variance matrix of the pixel in the bin, respectively, GM(i, j) represents the gradient amplitude at the pixel (i, j), GD(i, j) represents the gradient direction at the pixel (i, j), and GM(i, j) ∈ [0, 1], while GM(i, j) ∈ 0, 360); n d represents the height of the d bin in the histogram-as shown in Formula (9), this equals the sum of gradient amplitudes corresponding to all pixels in the bin.
When the weighted spatial histogram of gradient directions is used for 2D/3D registration [35], only the WSHGD of the reference and DRR images, respectively, needs to be constructed. Then, the similarity between the two WSHGDs can be calculated by the weighted distance measure. The weighted distance measure is the objective function of the registration process, and its definition is shown in Formula (11): where h R and h D denote the WSHGD of the reference image I R and the DRR image I D , respectively, n Rd and n Dd are the d bin of the corresponding h R and h D , respectively, and µ Rd and µ Dd are the coordinate mean values corresponding to n Rd and n Dd respectively. ∑ Rd and ∑ Dd are the coordinate variances corresponding to n Rd and n Dd , respectively. η is the Gaussian normalization constant, and ρ n is a commonly used distance measure. The coordinate mean µ d and coordinate variance ∑ d contained in the WSHGD are spatial information. When the image is translated, the coordinates of the internal pixels change accordingly so that the sum can reflect the change in the image caused by translation transformation. Moreover, the sensitivity of the histogram to rotation is increased. In 2D/3D registration based on the WSHGD, because the WSHGD is sensitive to translation and rotation transformation, it can optimize the translation parameters and rotation parameters at the same time, without the assistance of other registration algorithms. This makes the registration process more straightforward and the algorithm more robust. Moreover, introducing pixel coordinate information can overcome the limitations of the WSHGD algorithm. The image's foreground no longer needs to be against a larger background, because all of the transformation parameters are optimized together. The 2D/3D registration optimization based on the WSHGD features can be expressed as shown in Function (13). Function (14) represents the mapping from the reference image I R to the WSHGD features, while Function (15) represents the mapping from the DRR image I D to the WSHGD features, which are obtained from the 3D floating image I M through spatial transformation and DRR projection.

Results
Firstly, the CT image is transformed into a rigid body according to the initial space transformation parameters [8,16,36]. Then, the CT image is projected to generate a DRR image. Next, the WSHGDs of the DRR image and the reference image are extracted. Finally, the distance measured between the two WSHGDs is calculated. By taking the distance measure as the objective function, the Powell-Brent optimization algorithm optimizes the space transformation parameters until the optimization algorithm reaches the iteration stop condition. As a result, Manhattan distance and J-divergence distance are better performance distance measures. Manhattan distance is also known as city block distance (Formula (16)), and J-divergence distance can be calculated with Formula (17).
where a d = max{n Rd , n Dd }, b d = min{n Rd , n Dd }. In this experiment, the sum of Manhattan distance and J-divergence distance is used as the distance measure. The definition of the function ρ n (n Rd , n Dd ) is shown in Formula (18): Two statistical histogram models-WSHGD and WHGD-were used as the experimental control group to perform 2D/3D registration between CT and DRR images of the skull model to verify the effectiveness and accuracy of the WSHGD. The Powell algorithm was used for the optimization algorithm, and rigid-body transformation was used for the spatial transformation model. The one-dimensional search accuracy of the Powell optimization algorithm was set to 0.01. The overall iterative accuracy of the algorithm was set to 0.001. The maximum number of iterations was set to 1000. The rigid-body transformation parameters were arranged in the order α, β, θ, t x , t y , t z . The first three parameters were the rotation along the X-, Y-, and Z-axes. Finally, the last three parameters were the translation along the X-, Y-, and Z-axes.
Firstly, three groups of experiments were conducted to perform qualitative analysis. The truth value of the first group of experimental reference images was set to (20,20,20,10,10,10), the second group was set to (10, 10, 10, 10, 10, 10), and the third group was set to (10,10,10,20,20,20). In the experiments, the initial values were optimized as (0, 0, 0, 0, 0, 0), as shown in Figure 3 (the DRR image obtained by CT projection at the initial value). Due to the increased resolution of the DRR image, the skull information can be more clearly displayed. The results of the first group of experiments are shown in Figure 4. The second group of experimental results is shown in Figure 5, while the third is shown in Figure 6. The first column represents the reference image. After registration, the second column represents the DRR image generated by 3D CT projection. The third column represents the difference between the reference and registered DRR images. The first row corresponds to the registration results based on WSHGD features. The second row corresponds to the registration results based on WHGD features.  The results of the first group of experiments are shown in Figure 4. The second group of experimental results is shown in Figure 5, while the third is shown in Figure 6. The first column represents the reference image. After registration, the second column represents the DRR image generated by 3D CT projection. The third column represents the difference between the reference and registered DRR images. The first row corresponds to the registration results based on WSHGD features. The second row corresponds to the registration results based on WHGD features.
Firstly, three groups of experiments were conducted to perform qualitative analysis. The truth value of the first group of experimental reference images was set to (20,20,20,10,10,10), the second group was set to (10,10,10,10,10,10), and the third group was set to (10,10,10,20,20,20). In the experiments, the initial values were optimized as (0,0,0,0,0,0), as shown in Figure 3 (the DRR image obtained by CT projection at the initial value). Due to the increased resolution of the DRR image, the skull information can be more clearly displayed.
The results of the first group of experiments are shown in Figure 4. The second group of experimental results is shown in Figure 5, while the third is shown in Figure 6. The first column represents the reference image. After registration, the second column represents the DRR image generated by 3D CT projection. The third column represents the difference between the reference and registered DRR images. The first row corresponds to the registration results based on WSHGD features. The second row corresponds to the registration results based on WHGD features.   The registration results of the WHGD and WSHGD at different initial points were analyzed quantitatively to verify the WSHGD features' effectiveness in 2D/3D registra- The registration results of the WHGD and WSHGD at different initial points were analyzed quantitatively to verify the WSHGD features' effectiveness in 2D/3D registration. For the true value point (0,0,0,0,0,0) of the rigid-body transformation, the three rotation parameters of the optimized initial value were selected within ±60 degrees. We took 10 degrees as the sampling unit. The three translation parameters of the optimized     The registration results of the WHGD and WSHGD at different initial points were analyzed quantitatively to verify the WSHGD features' effectiveness in 2D/3D registration. For the true value point (0, 0, 0, 0, 0, 0) of the rigid-body transformation, the three rotation parameters of the optimized initial value were selected within ±60 degrees. We took 10 degrees as the sampling unit. The three translation parameters of the optimized initial value were selected within ±40 mm, with 5 mm as the sampling unit. The sampling space ensures that the registration requirements of the WHGD can still be met after the space transformation. This paper selects the differences between the images after registration for qualitative analysis. The smoother the difference image, the smaller the difference between the registration result and the reference image.

Rotation/° Translation/mm
Furthermore, the mean absolute error (MAE), mean error (ME), and standard deviation of the error (SDE) were used as evaluation indices [35,37]. The experimental results are shown in Table 2. To more intuitively reflect the differences between the two methods in terms of mean error and standard deviation of the error, the registration results were visualized, as shown in Figure 7, where blue represents rotational error and red represents translational error.    Table 2. Registration results of the WSHGD and WHGD.

Discussion
Comparing the different images of the three groups of experiments, the difference images of the WHGD in the three experiments were smoother. The difference images of the WSHGD in Experiment 2 and Experiment 3 were smooth, but there was a significant deviation in Experiment 1. Therefore, the WHGD features have better stability in the registration process. In contrast, the WSHGD features cannot guarantee consistent registration accuracy, but the WSHGD features can still achieve successful registration.
As shown in Table 2, compared with the WSHGD proposed in this paper, the parameterdecoupling registration method based on WHGD features performs better in terms of MAE and ME, reflecting that the registration accuracy based on WHGD features is higher. Compared with Figure 7, the standard deviation of the WSHGD method is more significant than that of the WHGD method, indicating that the registration performance based on WHGD is more stable. Sometimes, WSHGD registration may have a more significant deviation, consistent with qualitative analysis. However, from the perspective of MAE, the registration accuracy of the WHGD method and the WSHGD method is not very different. Based on WSHGD features, CT and X-ray image registration can also be achieved within an acceptable accuracy range. When the sampling space of the translation parameters of the initial point of the rigid-body transformation is expanded, the DRR image's foreground will exceed the image's field of view. It is challenging to ensure consistent registration accuracy using the WHGD method, and sometimes it cannot even be registered. In contrast, the WSHGD method can still achieve registration, but its accuracy is reduced; however, it is still within the acceptable range.
Based on the above analysis, the pixel coordinates' mean and variance information can be introduced through a spatial histogram to ensure sensitivity to rotation transformation. This ensures that the synchronous optimization of translation and rotation parameters sacrifices a certain amount of precision to expand the convergence range of translation transformation. However, the weighted histogram of gradient directions is only suitable for the foreground-one of the limitations of small image registration.

Conclusions
This paper introduces two weighted histograms of gradient directions and secondorder spatial histogram concepts, and analyzes the advantages and limitations of weighted histograms of gradient directions in 2D/3D registration. The weighted histogram of gradient directions is sensitive to rotation and scaling, but insensitive to translation. It leads to a complex registration process and a small convergence range of translation parameters, and the algorithm has certain limitations. This paper introduces a spatial histogram into the registration process. It proposes a 2D/3D registration algorithm based on a weighted spatial histogram of gradient directions to solve these problems.
The algorithm uses the spatial weighted histogram of gradient directions to extract the statistical characteristics of the image. The pixels' positional information is added based on retaining the weighted histogram of gradient directions' sensitivity to the rotation. By weighting the distance measure of the weighted histogram of gradient directions with the coordinate mean and square difference, the spatial information and the gradient information are effectively combined.
Our experimental results show that the spatial weighted histogram of gradient directions has a high sensitivity to translation and rotation. As a result, the convergence range of the algorithm is larger. When the organizational structure in the image's foreground exceeds the image's field of view, it can still achieve successful registration.
There are still some aspects that can be improved and expanded. The experimental data used in this paper were two-dimensional simulated X-ray images and three-dimensional CT images. Their imaging principles are similar, but many types of medical images are used in clinical practice. Moreover, there is a significant difference between some types of images in terms of imaging principles [38]. Therefore, the 2D/3D registration method based on DRR is not always applicable.
Deep learning technology is developing rapidly [39], and has strong performance in feature extraction. However, there are few studies on 2D/3D registration based on deep learning. Therefore, combining deep learning technology with 2D/3D registration should be a research focus in the future.