Optical and SAR Image Registration Based on Multi-Scale Orientated Map of Phase Congruency

: Optical and Synthetic Aperture Radar (SAR) images are highly complementary, and their registrations are a fundamental task for other remote sensing applications. Traditional feature-matching algorithms fail to solve the signiﬁcant nonlinear radiation difference (NRD) caused by different sensors. To address this problem, a robust registration algorithm with the multi-scale orientated map of phase congruency (MSPCO) is proposed. First, a nonlinear diffusion scale space is established to obtain the scale invariance of feature points. Compared with the linear Gaussian scale space, the nonlinear diffusion scale space can better preserve the edge and texture information. Second, to ensure the quantity and repeatability of features, corner points and edge points are detected on the moment map of phase congruency, respectively, which is the foundation to the next feature matching. Third, the MSPCO descriptor is constructed via the orientation of phase congruency (PCO). PCO is highly robust to NRD, and the different scales of PCOs enhance the robustness of the descriptor. Finally, a feature-matching strategy based on an effective scale ratio is proposed, which reduces the number of comparisons among features and improves computational efﬁciency. The experimental results show that the proposed method is better than the existing feature-based methods in terms of the number of correct matches and registration accuracy. The registration accuracy is only inferior to that of the most advanced template matching method, and the accuracy difference is within 0.3 pixels, which fully demonstrates the robustness and accuracy of the proposed method in optical and SAR image registration.


Introduction
With high efficiency and a wide range, remote sensing has become one of the effective means for people to observe, describe, and analyze the Earth's surface. The maturity of sensor technology provides a variety of ways for Earth's observation, and the types of remote sensing images that people can obtain are increasingly diversified. Compared with a single sensor, multiple sensors can observe different features of the Earth's surface. Combining them for analysis is helpful to obtain a more accurate understanding of the ground scene [1]. In various types of remote sensing images, optical and SAR images have evident complementarity. The optical imaging system is the earliest developed, and the technology is relatively mature. Optical images conform to people's observation habits and have good interpretability. However, optical images are easily affected by bad weather, and accurate object information cannot be obtained in weather conditions such as clouds, rain, fog, and haze. SAR is an active microwave imaging system, which can observe the Earth in all-weather, all-day, and penetrate the clouds. It is sensitive to man-made targets, especially metal targets, but its imaging quality is poor, and it has serious speckle noise. For this reason, it is meaningful to conduct optical and SAR images simultaneously in image mosaics, image fusions [2], change detections [3], 3D reconstructions [4], navigations [5], and other applications.
Image registration is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors [6]. It is a necessary step for many remote sensing analysis tasks. Owing to the huge difference between them, their registration must solve the following problems: (1) Geometric differences: The imaging mode of the optical sensor is center projection, while SAR's is side-looking. Different imaging modes will produce large geometric differences, and topographic relief will lead to serious geometric distortion between optical and SAR images, especially when the image contains mountains; (2) Radiation differences: An optical sensor passively collects the sunlight reflected from ground objects, while a SAR sensor actively transmits a microwave with a longer wavelength and collects the backscattered energy from ground objects, which is greatly affected by a surface's roughness, slope, and complex dielectric constant; (3) Speckle noise: As a coherent imaging system, the SAR system will inevitably produce speckle noise, which will change the real gray information of the image and lead to errors in gray statistics.
To address the above issues, scholars have conducted a lot of research on the registration of optical and SAR images. Concretely, the registration approaches can be classified into two categories, namely, intensity based and feature based [7]. Intensity-based methods include mutual-information-based methods [8], correlation-based methods [9], and frequency-domain-based methods [10]. Mutual-information-based methods cannot register images from different sources [11]. It is sensitive to radiation differences. Since the image intensity and gradient information are sensitive to NRD, some scholars have studied the structural features of images to enhance the robustness of registration [12]. Some of them used local self-similarity (LSS) [13] to obtain structural features. Ye et al. [14] proposed a feature descriptor of dense local self-similarity (DLSS), defined a new similarity measure, and achieved a better registration effect. Xiong et al. [15] proposed rank-based (RLSS) features on the basis of DLSS. In addition, some researchers used phase congruency to obtain the structural features. Ye et al. [12] built a histogram of oriented phase congruency (HOPC) by replacing gradient information with phase congruency information, which improved the registration accuracy of multimodal images. Ye et al. [16] designed a robust feature descriptor based on channel features of oriented gradient (CFOG) and provided a general framework for template matching. Xiang et al. [17] mapped the image features of a 3D space and used the phase correlation method to register image pairs. Fan et al. [18] designed the phase congruency structural descriptor (PCSD) by combining LSS and phase congruency and achieved good registration results. Xiang et al. [19] improved the phase congruency model. In conclusion, the application of structural features improves the registration accuracy of template matching method. However, these methods are sensitive to rotation and scale change [20].
Feature-based methods are more suitable to deal with the local geometric distortion and radiation differences between multimodal images. First, feature-based methods extract the salient features in the image, including points, lines, and regions. Then, a reliable matching pair is established according to the distance of the feature descriptors. Finally, the parameters of the transformation model can be calculated. Scale-invariant feature transform (SIFT) [21] is the most classical method, which has scale and rotation invariance. However, it cannot register images with great differences, because it cannot solve radiation differences. SIFT has been improved by numerous scholars. FAN et al. [22] extracted feature points from the second layer to prevent the influence of from the speckle noise, which improved the registration effect of SIFT. Xu et al. [23] proposed the ILS-SIFT algorithm and introduced the iterative idea into the SIFT algorithm to realize image registration. Inspired by the SIFT algorithm, Ma et al. [24] designed the PSO-SIFT algorithm, which solved the radiation differences through a new gradient definition. Zhang et al. [25] used the Canny algorithm to extract image edges and remove edge points from candidate key points. According to the characteristics of the image pairs, Xiang [26] proposed the OS-SIFT algorithm, which used different operators to calculate the image gradient. Based on OS-SIFT, Zhang et al. [27] improved the ratio of exponential weighted average operator and adopted a new matching algorithm with cascaded sample consensus. These improved SIFT algorithms have a limited improvement on registration effects because they all build feature descriptors with gradient information, and gradient information is very susceptible to radiation differences [16].
In recent years, feature-based methods used structural features rather than gradient information, and structural features are more robust than radiation differences. Paul [28] proposed a new registration algorithm based on structure descriptors. Based on the maximum index map of phase congruency, Li et al. [29] designed the radiation-variation insensitive feature transform (RIFT) algorithm. Wang et al. [30] constructed descriptors by combining the orientation information of phase congruency with the map of maximum index, which improved the robustness of descriptors. Based on the improved maximal selfsimilarity (IMSD) feature detector and the oriented self-similarity (OSS) feature descriptor, Xiong et al. [31] registered image pairs successfully. Then, an optimized offset mean filtering (OMF) method was proposed to extract adjacent self-similarity (ASS) features, which improved registration efficiency and robustness [32]. The above methods have obtained satisfactory results for NRD, but they would be affected by the speckle noise in different degrees.
To address the above issues, we propose a novel registration algorithm based on nonlinear diffusion scale space (NDSS) and MSPCO. First, NDSS is established to obtain the scale invariance of features. Compared with the linear Gaussian scale space (GSS), NDSS can preserve the texture information, suppress the speckle noise, and provide more accurate positioning of features. Second, the MSPCO descriptors are constructed based on PCO with different scales. The descriptor consists of four parts, which are obtained from four different scales' PCOs. PCO with a smaller scale describes the texture details of the image but is easily influenced by noise, while the PCO with a larger scale is just the opposite. The combination of the two effectively improves the registration robustness. Finally, a feature-matching strategy based on effective scale ratio (ESR) is proposed, which reduces the number of comparisons between features and improves computational efficiency.
Here, is the main work in this paper: (1) A novel MSPCO descriptor is constructed based on PCO with different scales. The descriptor contains four different scales of phase congruency information. PCO with a smaller scale ensures the accuracy of registration, and PCO with a larger scale can effectively suppress the speckle noise. The MSPCO descriptor improves the stability of registration; (2) The concept of ESR is proposed between two NDSSs. The application of the ESR matching strategy reduces the repeated and invalid comparisons of features and improves feature matching efficiency; (3) The MSPCO descriptor is robust in scale, rotation, and radiation differences. It can register any two images without limitations, in theory.
We organize this paper in the following parts: Section 2 introduces the proposed methods, including the establishment of the MSPCO descriptor and the detailed description of matching strategies based on ESR. The experimental verification of the MSPCO method and its comparison with other algorithms are given in Section 3. Section 4 is the conclusion.

Methodology
In Section 2, we proposed the MSPCO algorithm, which is robust to rotation, scale, and NRD between images. First, NDSSs are established for image pairs to detect feature points at each scale. Next, by performing FAST feature detection on the maximum and minimum moment map of phase congruency, respectively, edge and corner features with stable, reliable, and high repeatability are obtained. Then, based on the different scales of PCOs, Electronics 2023, 12, 1635 4 of 18 a novel MSPCO descriptor is established, which is similar to the descriptor construction method in the SIFT algorithm. Finally, we use ESR to optimize the matching process and obtain correct matches by performing the nearest neighbor distance ratio (NNDR) matching strategy and the fast sample consensus (FSC) algorithm [32]. The flow chart of the MSPCO algorithm is shown in Figure 1.

Methodology
In Section 2, we proposed the MSPCO algorithm, which is robust to rotation, scale, and NRD between images. First, NDSSs are established for image pairs to detect feature points at each scale. Next, by performing FAST feature detection on the maximum and minimum moment map of phase congruency, respectively, edge and corner features with stable, reliable, and high repeatability are obtained. Then, based on the different scales of PCOs, a novel MSPCO descriptor is established, which is similar to the descriptor construction method in the SIFT algorithm. Finally, we use ESR to optimize the matching process and obtain correct matches by performing the nearest neighbor distance ratio (NNDR) matching strategy and the fast sample consensus (FSC) algorithm [32]. The flow chart of the MSPCO algorithm is shown in Figure 1.

Establishment of NDSS
The classical SIFT algorithm uses the Gaussian kernel to establish linear scale space, but the image becomes blurred after Gaussian filtering, thus resulting in inaccurate feature location and feature descriptors [33]. To make up for the deficiencies of GSS, we established NDSS. The nonlinear diffusion equation can be defined as [34]:

Establishment of NDSS
The classical SIFT algorithm uses the Gaussian kernel to establish linear scale space, but the image becomes blurred after Gaussian filtering, thus resulting in inaccurate feature location and feature descriptors [33]. To make up for the deficiencies of GSS, we established NDSS. The nonlinear diffusion equation can be defined as [34]: c(x, y, t) = g(|∇L(x, y, t)|) Electronics 2023, 12, 1635 where L is the input image, t represents the scale parameter, ∇L is the gradient value of the image, c(x, y, t) represents the diffusion function, g is the conductivity function, and k represents a constant related to the diffusion speed. The smaller the k is, the more the edge information is retained. Since the partial differential equation in Equation (1) has no analytical solution, the traditional numerical difference method cannot be used for the iterative solution, and the convergence speed is slow. To improve the convergence speed, the KAZE algorithm uses Additive Operator Splitting (AOS) [35] to solve the equation. Thus, the solution of the nonlinear diffusion equation is: where τ is the time step, I denotes the identity matrix, l represents the direction, and A l is the derivative along the lth direction. Different from the construction of GSS in SIFT, this paper only takes the first group, and there is no down sampling, i.e., all images in this group have the same size. The scale of each layer can be calculated by the initial scale σ 0 and the scale ratio k: where N is the total number of scale layers. We set σ 0 = 1.6, N = 6, and k = 2 1/3 . Let I o (x, y) and I s (x, y) represent an optical image and a SAR image, respectively. The NDSS of the two is expressed as NDSS opt = I where L is the input image, t represents the scale parameter, L  is the gradient value of the image, ) , , ( t y x c represents the diffusion function, g is the conductivity function, and k represents a constant related to the diffusion speed. The smaller the k is, the more the edge information is retained. Since the partial differential equation in Equation (1) has no analytical solution, the traditional numerical difference method cannot be used for the iterative solution, and the convergence speed is slow. To improve the convergence speed, the KAZE algorithm uses Additive Operator Splitting (AOS) [35] to solve the equation. Thus, the solution of the nonlinear diffusion equation is: where  is the time step, I denotes the identity matrix, l represents the direction, and l A is the derivative along the l th direction.
Different from the construction of GSS in SIFT, this paper only takes the first group, and there is no down sampling, i.e., all images in this group have the same size. The scale of each layer can be calculated by the initial scale 0  and the scale ratio k : where N is the total number of scale layers. We set As we can see in Figure 2d,e, the difference between NDSS and GSS is apparent. With the increase in i  , GSS becomes fuzzier, and detailed information is seriously lost. For NDSS, with the increase in i  , the image has a relatively low degree of blur, and the image edges and details are well preserved. The experimental results fully verify the advantages of NDSS. As we can see in Figure 2d,e, the difference between NDSS and GSS is apparent. With the increase in σ i , GSS becomes fuzzier, and detailed information is seriously lost. For NDSS, with the increase in σ i , the image has a relatively low degree of blur, and the image edges and details are well preserved. The experimental results fully verify the advantages of NDSS.

Feature Detection
Feature detection is an important step in image registration. The repeatability of features directly determines the number of correct matches. However, due to the serious NRD between optical and SAR images and the noise in SAR images, the effect of directly extracting features from SAR images is poor. It is difficult to extract the correspondence directly. Although the NRD between optical and SAR images is evident, their structural features have high similarity, and higher repeatability can be obtained by extracting corner points from the structural features. The phase information can preserve the structural features of the image and is invariant to image contrast, illuminance, scale, and rotation changes.
Oppenheim et al. [36] found that the phase information of the image can retain the main contour features. Then, Morrone and Owens [37] pioneered the phase congruency theory that pointed out that human access to image information mainly relied on the phase information and features always appeared in the place where the Fourier component was superimposed the most. Different from gradient-based feature detection, phase congruency is a frequency-domain feature detection method. Kovesi [38] extended the phase congruency theory and provided the Log Gabor Filter (LGF). According to RIFT [29], we can obtain the maximum and minimum moment map, marked as M Ψ and m Ψ . M Ψ and m Ψ reflect the edge and corner features of the image, respectively.
We extract edge features and corner features on M Ψ and m Ψ through the FAST algorithm [29]. The process of feature detection is shown in Figure 3. The features in Figure 3 are evenly distributed, where the yellow points are the edge points detected on  M , the red points are the corner points detected on  m , and the green points represent the common part of both.

Feature Description
It is necessary to design a descriptor for the detection of features. Descriptor construction follows two principles: The first is robustness, i.e., the descriptor can overcome the influence of noise, local geometric difference, and NRD; and the second is uniqueness. The uniqueness of descriptors can reduce the mismatching rate of features. Due to the NRD between image pairs, the gradient information at the same location will produce significant differences. Therefore, gradient-based descriptors are often unreliable and have poor registration performance for multimodal images. In this paper, we construct feature descriptors based on multi-scale PCOs. The MSPCO descriptor is robust to the The features in Figure 3 are evenly distributed, where the yellow points are the edge points detected on M Ψ , the red points are the corner points detected on m Ψ , and the green points represent the common part of both.

Feature Description
It is necessary to design a descriptor for the detection of features. Descriptor construction follows two principles: The first is robustness, i.e., the descriptor can overcome the influence of noise, local geometric difference, and NRD; and the second is uniqueness. The uniqueness of descriptors can reduce the mismatching rate of features. Due to the NRD between image pairs, the gradient information at the same location will produce significant differences. Therefore, gradient-based descriptors are often unreliable and have poor registration performance for multimodal images. In this paper, we construct feature descriptors based on multi-scale PCOs. The MSPCO descriptor is robust to the NRD.

PCO
Phase congruency has good robustness to NRD and is widely used for image registration. However, in previous algorithms, the amplitude of phase congruency is sensitive to speckle noise, resulting in large differences in local descriptors between image pairs, which degrades the accuracy of image registration. Compared with the amplitude, the orientation information represents the direction with the most drastic feature changes, which has stronger stability and is more suitable for descriptor construction.
We calculate the orientation of phase congruency by LGF odd s,o (x, y), and then the PCO of the image is obtained. PCO describes the directional changes of features at each pixel and can effectively resist NRD between images. To improve the robustness of registration, this paper constructs the multi-scale PCOs, as shown in Figure 4. Figure 5 is a schematic diagram of PCO construction at a single scale.
where θ o represents the angle of the filter, and o s,o (θ o ) is the convolution result of the odd filter at the direction o.
Then, the orientation of phase congruency O pc can be obtained by the arctangent function: Due to the phenomenon of gradient inversion, the value O pc should be constrained, i.e.,

MSPCO Feature Descriptor
The MSPCO descriptor is constructed based on the PCO. Figure 6 shows the construction process of the MSPCO descriptor.

MSPCO Feature Descriptor
The MSPCO descriptor is constructed based on the PCO. Figure 6 shows the construction process of the MSPCO descriptor.   = , Figure 6. Main processing chain of the proposed MSPCO descriptor.
From Figure 6, we can see that MSPCO is calculated based on a grid of sub-regions. First, we select a local image region with m × m pixels centered at each feature point. We divide the local region into n p × n p sub-regions. In this paper, we set m = 96(σ = σ 0 ) and n p = 4. Next, for each sub-region, the PCO histogram is calculated separately.

Feature Matching
After obtaining the features and their descriptors, the traditional registration method is to match the features from scales one by one, which needs a vast number of calculations. Only when the two image scales are at the same level or close to each other, an accurate and effective correspondence can be obtained. Therefore, this paper proposes a feature-matching strategy for ESR. The effective scale combination is the scale ratio that can be covered by the scale space. For Previous algorithms compare features from different scales one by one, i.e., N × N = 36 comparisons are required. This paper proposes the concept of effective scale combination and reduces the number of calculations to 2N − 1 = 11. Specifically, we extract feature points from two groups of NDSSs. Each group of NDSS contains six layers. Assuming that 500 feature points are extracted from each layer, 3000 feature points need to be extracted, respectively. According to the traditional feature matching algorithm, 9 × 10 6 (3000 × 3000) Euclidean distance calculations are required. This paper proposes an ESR matching strategy, which reduces the number of Euclidean distance calculations to 2.75 × 10 6 (500 × 500). Its calculation quantity is reduced by 69%.
To gain illumination invariance, we, finally, normalize the feature vector. We can obtain initial matches through the NNDR matching strategy. Then, correct matches are selected by the FSC algorithm.

Experimental Results
In this section, we demonstrate the effectiveness of the MSPCO algorithm in optical and SAR image registration through verification experiments and contrast experiments. First, the experimental data and evaluation criteria used in this paper are given. The experimental data include synthetic image data and real image data, and we use the number of correct matches (NCM) and root mean square error (RMSE) as the evaluation metrics. Then, through validation experiments, the robustness of the descriptor and the ESR strategy are verified by using synthetic image data. Finally, we compare the MSPCO algorithm with five state-of-the-art algorithms, i.e., SIFT, SAR-SIFT, OS-SIFT, HOPC, and·RIFT.

Datasets and Evaluation Criteria
The verification experiment was conducted on synthetic image pairs, which were generated from multispectral data. In general, different bands of the multispectral image has evident radiation differences. We regarded the R-band as the optical image and added multiplicative speckles in the B-band to generate the SAR image. Each band of the multispectral image had strict position consistencies, which were very useful for experimental evaluation. The synthetic image pairs are shown in Figure 7. Its calculation quantity is reduced by 69%.
To gain illumination invariance, we, finally, normalize the feature vector. We can obtain initial matches through the NNDR matching strategy. Then, correct matches are selected by the FSC algorithm.

Experimental Results
In this section, we demonstrate the effectiveness of the MSPCO algorithm in optical and SAR image registration through verification experiments and contrast experiments. First, the experimental data and evaluation criteria used in this paper are given. The experimental data include synthetic image data and real image data, and we use the number of correct matches (NCM) and root mean square error (RMSE) as the evaluation metrics. Then, through validation experiments, the robustness of the descriptor and the ESR strategy are verified by using synthetic image data. Finally, we compare the MSPCO algorithm with five state-of-the-art algorithms, i.e., SIFT, SAR-SIFT, OS-SIFT, HOPC, and RIFT.

Datasets and Evaluation Criteria
The verification experiment was conducted on synthetic image pairs, which were generated from multispectral data. In general, different bands of the multispectral image has evident radiation differences. We regarded the R-band as the optical image and added multiplicative speckles in the B-band to generate the SAR image. Each band of the multispectral image had strict position consistencies, which were very useful for experimental evaluation. The synthetic image pairs are shown in Figure 7. Six image pairs were obtained to perform the contrast experiment. Optical images mainly came from Google Earth, the Landsat satellite, and Google Maps. SAR was mainly provided by the GF 3 and TerraSAR-X satellites. The image covers different scenes, including urban, rural, and suburban areas. Table 1 describes the datasets that are shown in Six image pairs were obtained to perform the contrast experiment. Optical images mainly came from Google Earth, the Landsat satellite, and Google Maps. SAR was mainly provided by the GF 3 and TerraSAR-X satellites. The image covers different scenes, including urban, rural, and suburban areas. Table 1 describes the datasets that are shown in Figure 8. The registration performance of the MSPCO algorithm is evaluated quantitatively by two metrics: NCM and RMSE.  Figure 8. The registration performance of the MSPCO algorithm is evaluated quantitatively by two metrics: NCM and RMSE.

Robustness and Uniqueness Analysis of MSPCO
In this section, a synthetic image pair is used to verify the robustness and uniqueness of the MSPCO. As shown in Figure 7, there is a large radiation difference and even a gradient inversion phenomenon between image pairs. Moreover, there is evident multiplicative speckle noise in SAR images. All of these differences create significant challenges for the robustness of image registration. We compared SIFT, SAR-SIFT [39], OS-SIFT, HOPC, RIFT, and MSPCO and used Euclidean distance to quantify the differences between descriptors. Ten pairs of matching points were randomly selected, and the Euclidean distance d c between matching points was calculated for different descriptors to verify the robustness. The average Euclidean distance d n between one and the other nine non-corresponding points was calculated to verify the uniqueness. Repeat 10 times to increase the reliability of the experiment.
After normalization, the range of the Euclidean distance between descriptors is [0, √ 2]. The larger the Euclidean distance, the lower the descriptor similarity. The smaller the distance, the higher the descriptor similarity. For a pair of correct matches, a small Euclidean distance value means the robustness of the matching. For two mismatched points, a large Euclidean distance indicates that the descriptor is more unique. It can be seen from Figure 9a that the MSPCO descriptor obtains the minimum Euclidean distance in the robustness analysis. SIFT and SAR-SIFT build feature descriptors based on gradient information, which are sensitive to radiation differences. HOPC, RIFT, and MSPCO are all established based on phase congruency and obtain a smaller d c , which indicates that phase congruency is robust to NRD. Among them, HOPC is established based on the amplitude of phase congruency. Li et al. proved that phase congruency is not suitable for feature description directly, thus the robustness of HOPC is slightly worse than RIFT and MSPCO. The RIFT algorithm built the maximum index map and achieved good radiation invariance. We built the MSPCO descriptor based on the multi-scale PCOs. Since the orientation information was more robust than the amplitude information, the best radiation invariance was obtained in this paper. As can be seen from Figure 9b, all descriptors have achieved good uniqueness. dient inversion phenomenon between image pairs. Moreover, there is evident multiplicative speckle noise in SAR images. All of these differences create significant challenges for the robustness of image registration. We compared SIFT, SAR-SIFT [39], OS-SIFT, HOPC, RIFT, and MSPCO and used Euclidean distance to quantify the differences between descriptors. Ten pairs of matching points were randomly selected, and the Euclidean distance c d between matching points was calculated for different descriptors to verify the robustness. The average Euclidean distance n d between one and the other nine non-corresponding points was calculated to verify the uniqueness. Repeat 10 times to increase the reliability of the experiment.
After normalization, the range of the Euclidean distance between descriptors is ] 2 , 0 [ . The larger the Euclidean distance, the lower the descriptor similarity. The smaller the distance, the higher the descriptor similarity. For a pair of correct matches, a small Euclidean distance value means the robustness of the matching. For two mismatched points, a large Euclidean distance indicates that the descriptor is more unique. It can be seen from Figure 9a that the MSPCO descriptor obtains the minimum Euclidean distance in the robustness analysis. SIFT and SAR-SIFT build feature descriptors based on gradient information, which are sensitive to radiation differences. HOPC, RIFT, and MSPCO are all established based on phase congruency and obtain a smaller c d , which indicates that phase congruency is robust to NRD. Among them, HOPC is established based on the amplitude of phase congruency. Li et al. proved that phase congruency is not suitable for feature description directly, thus the robustness of HOPC is slightly worse than RIFT and MSPCO. The RIFT algorithm built the maximum index map and achieved good radiation invariance. We built the MSPCO descriptor based on the multi-scale PCOs. Since the orientation information was more robust than the amplitude information, the best radiation invariance was obtained in this paper. As can be seen from Figure 9b, all descriptors have achieved good uniqueness.

Validation of ESR
The establishment of scale space is a common method to solve scale invariance, but it is followed by a huge amount of computation. Previous algorithms, which compare descriptors from different scales one by one, not only have low computational efficiency but also tend to produce mismatching, which will affect the registration results. In this paper, the ESR strategy is proposed, which can improve the registration speed and reduce the mismatching rate.
In the process of feature point description, the selection of a local support region is very important, and the same local support region can increase the similarity of descriptors. In the SIFT algorithm, the radius of the local support region is:

Validation of ESR
The establishment of scale space is a common method to solve scale invariance, but it is followed by a huge amount of computation. Previous algorithms, which compare descriptors from different scales one by one, not only have low computational efficiency but also tend to produce mismatching, which will affect the registration results. In this paper, the ESR strategy is proposed, which can improve the registration speed and reduce the mismatching rate.
In the process of feature point description, the selection of a local support region is very important, and the same local support region can increase the similarity of descriptors. In the SIFT algorithm, the radius of the local support region is: Here, σ oct is the scale of the scale space where the feature point is located (D = 4). It is not difficult to see that the local support region becomes larger with the increase in σ oct . In fact, images with larger scales usually have lower resolution, and the unit pixel contains more true ground distance. Therefore, in this paper, the local support region decreases with the increase in σ oct , i.e., Synthetic image pairs were selected to verify the matching strategy of ESR, whose image size was 300 × 300. The synthetic SAR image was upsampled to 600 × 600. The image was registered according to the ESR strategy described in Section 2.4. The real transformation was used to identify the correct matches by considering a threshold value of two pixels [40].
With the change in the scale σ, the local support regions of features also changed. Only when the local support regions of the features had sufficient overlap area could their descriptor achieve high similarity. As shown in Figure 10, the most matches were obtained when ESR = k 3 = 2. This is in line with our expectations because the ratio of the ground sampling distance between the optical and SAR images was two. This experiment shows that correct matches can only be obtained when the local support regions of features are the same or close. The registration strategy of ESR proposed in this paper can efficiently find the true scale ratio of the input image pairs, which not only reduces meaningless repeated comparisons between descriptors but also reduces the cross-scale mismatching rate between features to a certain extent. Here, oct  is the scale of the scale space where the feature point is located ( It is not difficult to see that the local support region becomes larger with the increase in oct  . In fact, images with larger scales usually have lower resolution, and the unit pixel contains more true ground distance. Therefore, in this paper, the local support region decreases with the increase in Synthetic image pairs were selected to verify the matching strategy of ESR, whose image size was 300 300  . The synthetic SAR image was upsampled to 600 600  . The image was registered according to the ESR strategy described in 2.4. The real transformation was used to identify the correct matches by considering a threshold value of two pixels [40].
With the change in the scale  , the local support regions of features also changed. Only when the local support regions of the features had sufficient overlap area could their descriptor achieve high similarity. As shown in Figure 10, the most matches were obtained when . This is in line with our expectations because the ratio of the ground sampling distance between the optical and SAR images was two. This experiment shows that correct matches can only be obtained when the local support regions of features are the same or close. The registration strategy of ESR proposed in this paper can efficiently find the true scale ratio of the input image pairs, which not only reduces meaningless repeated comparisons between descriptors but also reduces the cross-scale mismatching rate between features to a certain extent. To quantitatively analyze the impact of ESR strategy on feature matching efficiency, we analyzed the number and processing time required to calculate Euclidean distance, with respect to the number of points. Figure 11 shows the number of calculations with respect to the number of points. As seen, with the number of feature points increasing, using ESR can effectively reduce the number of Euclidean distance calculations. Figure 12 shows the processing time with respect to the number of points. Similarly, with the number of feature points increasing, using ESR can effectively reduce the processing time. To quantitatively analyze the impact of ESR strategy on feature matching efficiency, we analyzed the number and processing time required to calculate Euclidean distance, with respect to the number of points. Figure 11 shows the number of calculations with respect to the number of points. As seen, with the number of feature points increasing, using ESR can effectively reduce the number of Euclidean distance calculations. Figure 12 shows the processing time with respect to the number of points. Similarly, with the number of feature points increasing, using ESR can effectively reduce the processing time.

Contrast Experiments
In this section, we utilize real image pairs to evaluate the performance of the proposed algorithm and comprehensively evaluate the experimental results qualitatively and quantitatively. To better analyze the advantages of the MSPCO algorithm, it is compared with SIFT, SAR-SIFT, OS-SIFT, HOPC, and RIFT. Figures 13-18 show the comparison results of the MSPCO algorithm with SIFT, SAR-SIFT, OS-SIFT, HOPC, and RIFT on six pairs of real optical and SAR images. Since the descriptor of the SIFT algorithm uses gradient information, its registration results are heavily dependent on the similarity of the gradient map of the image. However, the gradient is very sensitive to NRD, thus SIFT performs poorly on multimodal image registration. All image pairs failed to be matched, indicating that SIFT can only register images with the same mode. Similar to SIFT, the descriptor of the SAR-SIFT algorithm is also built based on gradient information. Different from SIFT, given the serious speckle noise problem of SAR images, SAR-SIFT provides a new definition of the gradient by changing the traditional difference gradient operator into a ratio gradient operator, which reduces the influence of speckle noise to a certain extent and improves the registration effect between SAR images. However, the redefined gradient does not weaken the influence of NRD, thus the registration effect of the multimodal image is poor. In all six registration experiments, no successful matching was achieved. The OS-SIFT algorithm is a registration algorithm specially used for optical and SAR images. According to the different characteristics of optical and SAR images, different gradient calculation forms are selected to obtain better registration results. In this paper, OS-SIFT completed the registration, except for the second and the sixth pairs of images. The descriptor of OS-SIFT was also constructed based on gradient information, which was robust to NRD to some extent. The second pair of images had a significant NRD, which directly led to the failure of the registration. For the sixth pair of images, there were more dark areas in the SAR image. The overall intensity of the image was weak, and the edge and texture information was less. Additionally, there

Contrast Experiments
In this section, we utilize real image pairs to evaluate the performance of the proposed algorithm and comprehensively evaluate the experimental results qualitatively and quantitatively. To better analyze the advantages of the MSPCO algorithm, it is compared with SIFT, SAR-SIFT, OS-SIFT, HOPC, and RIFT. Figures 13-18 show the comparison results of the MSPCO algorithm with SIFT, SAR-SIFT, OS-SIFT, HOPC, and RIFT on six pairs of real optical and SAR images. Since the descriptor of the SIFT algorithm uses gradient information, its registration results are heavily dependent on the similarity of the gradient map of the image. However, the gradient is very sensitive to NRD, thus SIFT performs poorly on multimodal image registration. All image pairs failed to be matched, indicating that SIFT can only register images with the same mode. Similar to SIFT, the descriptor of the SAR-SIFT algorithm is also built based on gradient information. Different from SIFT, given the serious speckle noise problem of SAR images, SAR-SIFT provides a new definition of the gradient by changing the traditional difference gradient operator into a ratio gradient operator, which reduces the influence of speckle noise to a certain extent and improves the registration effect between SAR images. However, the redefined gradient does not weaken the influence of NRD, thus the registration effect of the multimodal image is poor. In all six registration experiments, no successful matching was achieved. The OS-SIFT algorithm is a registration algorithm specially used for optical and SAR images. According to the different characteristics of optical and SAR images, different gradient calculation forms are selected to obtain better registration results. In this paper, OS-SIFT completed the registration, except for the second and the sixth pairs of images. The descriptor of OS-SIFT was also constructed based on gradient information, which was robust to NRD to some extent. The second pair of images had a significant NRD, which directly led to the failure of the registration. For the sixth pair of images, there were more dark areas in the SAR image. The overall intensity of the image was weak, and the edge and texture information was less. Additionally, there were few non-zero values in the gradient map of the SAR image, and most of the non-zero values were noise. The above three algorithms were all gradient information-based methods, among which OS-SIFT achieved the best registration effect. HOPC is a template matching method, which uses phase congruency to construct descriptors. HOPC has illumination and contrast invariance and is robust to NRD, but it does not have scale and rotation invariance. Because of the large-scale difference, HOPC fails to match the first image pairs. RIFT is a feature-based image registration method. RIFT proposes the concept of a maximum index map, which further improves the descriptor similarity. Similar to HOPC, RIFT has no scale invariance. The proposed algorithm constructs the MSPCO descriptor, which not only preserves image details but also reduces the effect of image noise. The proposed method has achieved good matching performance in all six pairs of images. rotation invariance. Because of the large-scale difference, HOPC fails to match the first image pairs. RIFT is a feature-based image registration method. RIFT proposes the concept of a maximum index map, which further improves the descriptor similarity. Similar to HOPC, RIFT has no scale invariance. The proposed algorithm constructs the MSPCO descriptor, which not only preserves image details but also reduces the effect of image noise. The proposed method has achieved good matching performance in all six pairs of images. rotation invariance. Because of the large-scale difference, HOPC fails to match the first image pairs. RIFT is a feature-based image registration method. RIFT proposes the concept of a maximum index map, which further improves the descriptor similarity. Similar to HOPC, RIFT has no scale invariance. The proposed algorithm constructs the MSPCO descriptor, which not only preserves image details but also reduces the effect of image noise. The proposed method has achieved good matching performance in all six pairs of images. rotation invariance. Because of the large-scale difference, HOPC fails to match the first image pairs. RIFT is a feature-based image registration method. RIFT proposes the concept of a maximum index map, which further improves the descriptor similarity. Similar to HOPC, RIFT has no scale invariance. The proposed algorithm constructs the MSPCO descriptor, which not only preserves image details but also reduces the effect of image noise. The proposed method has achieved good matching performance in all six pairs of images.   Table 2 shows the registration results of SIFT, SAR-SIFT, OS-SIFT, HOPC, RIFT, and the proposed method on six pairs of real optical and SAR images. Among them, the OS-SIFT algorithm has the smallest NCM and the largest RMSE. There are two main reasons: one is the inherent disadvantages of the gradient information-based descriptor, and the other is that the Harris feature detection algorithm obtains fewer feature points, which inevitably leads to fewer matches. The NCM of RIFT is 2-5 times larger than OS-SIFT, and its RMSE is about 0.2 pixels lower than OS-SIFT. The proposed method can register all  Table 2 shows the registration results of SIFT, SAR-SIFT, OS-SIFT, HOPC, RIFT, and the proposed method on six pairs of real optical and SAR images. Among them, the OS-SIFT algorithm has the smallest NCM and the largest RMSE. There are two main reasons: one is the inherent disadvantages of the gradient information-based descriptor, and the other is that the Harris feature detection algorithm obtains fewer feature points, which inevitably leads to fewer matches. The NCM of RIFT is 2-5 times larger than OS-SIFT, and its RMSE is about 0.2 pixels lower than OS-SIFT. The proposed method can register all optical and SAR image pairs, achieving the most NCM and the smallest RMSE compared to OS-SIFT and RIFT. HOPC is one of the best template matching methods, with the highest registration accuracy. Compared with HOPC, the maximum difference of RMSE is less than 0.3 pixels, which fully demonstrates the high registration accuracy of the proposed method.

Conclusions
In this paper, a new optical and SAR image registration method was proposed based on multi-scale PCOs. To ensure the number and repeatability of features, corner and edge points were detected on the maximum and minimum moment map of phase congruency, respectively. The MSPCO descriptor was constructed, which was highly robust to NRD. In addition, we proposed a new feature-matching strategy based on ESR, which reduced the number of comparisons between features and improved computational efficiency. The experimental results showed that the proposed method was superior to the existing featurebased methods. The registration accuracy was only inferior to the current most advanced template matching method, and the accuracy difference was within 0.3 pixels, which fully demonstrated the robustness and accuracy of the proposed method in optical and SAR image registration. It should be noted that the rotation invariance of the MSPCO descriptor was implemented in the same way as that of RIFT, using a circular convolution sequence. In the future, we will continue to study the rotation invariance of phase congruency.  Data Availability Statement: All implementation details, sources, and data are available upon request from the corresponding author.