A Robust Algorithm Based on Phase Congruency for Optical and SAR Image Registration in Suburban Areas

: Automatic registration of optical and synthetic aperture radar (SAR) images is a challenging task due to the inﬂuence of SAR speckle noise and nonlinear radiometric di ﬀ erences. This study proposes a robust algorithm based on phase congruency to register optical and SAR images (ROS-PC). It consists of a uniform Harris feature detection method based on multi-moment of the phase congruency map (UMPC-Harris) and a local feature descriptor based on the histogram of phase congruency orientation on multi-scale max amplitude index maps (HOSMI). The UMPC-Harris detects corners and edge points based on a voting strategy, the multi-moment of phase congruency maps, and an overlapping block strategy, which is used to detect stable and uniformly distributed keypoints. Subsequently, HOSMI is derived for a keypoint by utilizing the histogram of phase congruency orientation on multi-scale max amplitude index maps, which e ﬀ ectively increases the discriminability and robustness of the ﬁnal descriptor. Finally, experimental results obtained using simulated images show that the UMPC-Harris detector has a superior repeatability rate. The image registration results obtained on test images show that the ROS-PC is robust against SAR speckle noise and nonlinear radiometric di ﬀ erences. The ROS-PC can tolerate some rotational and scale changes. memory. 3.1. Performance Experiments of Proposed UMPC-Harris Detector We test the performance of the proposed UMPC-Harris detector on simulated images with di ﬀ erent noise levels and radiometric (non-uniform intensity) changes. The UMPC-Harris is compared to three other We designed a novel feature detector named UMPC-Harris, comprising an overlapping block strategy, Harris on multi-moment of the PCM, and the vote strategy to obtain uniformly distributed keypoints and increase the repeatability rate of the keypoints. The experimental results on simulated images demonstrated that the proposed UMPC-Harris method achieved a good performance in keypoints detection. The proposed HOSMI descriptor is constructed using the histograms of the PC orientation on the multi-scale MIM. The image registration experiments prove that the ROS-PC is robust to nonlinear radiation variations of optical and SAR images and it can tolerate a small amount of rotation and scale changes.


Introduction
The rapid development of sensor technology provided multiple remote sensing images for the observation of the Earth. Optical images ensure facilitated interpretation and are similar to human vision; however, they are affected easily by the weather. The synthetic aperture radar (SAR) is an active microwave imaging system that effectively compensates for the shortcomings of optical imaging systems and operates irrespective of the time of day and weather conditions. Optical and SAR images can be used together to form complementary information, which has important application value, such as image fusion [1,2], pattern recognition [3], and change detection [4,5]. Image registration is a preliminary work of these applications. It refers to aligning two or more images of the same scene acquired by different times, viewpoints, or sensors. Registration accuracy seriously affects these applications. Optical and SAR registration is still a challenging task owing to the speckle noise of SAR and the large radiation differences between optical and SAR images [6,7]. for remote sensing image matching, which consists of a feature detector called minimum moment of PC (MMPC)-Lap and a feature descriptor called the local HOPC (LHOPC) [34]. Similar to gradients, PC also reflects the significance of the features of local image regions. Chen et al. proposed an optical and SAR image registration method by combining a new Gaussian-Gamma-shaped bi-windows-based gradient operator and the histogram of oriented gradient pattern [35]. To address large geometric differences and speckle noise in SAR images, a novel optical-to-SAR image registration algorithm was proposed using a new structural descriptor [36]. A dense descriptor named the histograms of oriented magnitude and phase congruency was proposed to register multi-sensor images. It is based on the combination of the magnitude and PC information of local regions, and successfully captures the common features of images with nonlinear radiation changes [37]. A novel image registration method, which combines nonlinear diffusion and PC structural descriptors, has been proposed for the registration of SAR and optical images [38]. To overcome nonlinear radiation distortions, Li et al. [39] proposed a radiation invariant feature transform (RIFT) algorithm to register multi-sensor images, including optical and SAR images. The RIFT uses PC instead of image intensity for feature point detection and it proposes a maximum index map (MIM) for feature description. Further, the RIFT not only largely improves the stability of feature detection but also overcomes the limitation of gradient information for feature description.
Although a number of PC-based image registration methods have been proposed in the past few years, there are limitations that cannot be ignored when these methods are applied to optical and SAR image registration with large radiation differences. These limitations are listed below.
• Several methods detect keypoints directly from a PC map (PCM) or the moment of the PCM for feature matching. However, because of SAR speckle noise, some unreasonable points are detected in the SAR image; further, because of significant nonlinear intensity differences, the feature detection result of one image generally has no corresponding feature points in the other image. Several classical methods are tested using a pair of optical and SAR images, as shown in Figure 1. These limitations lead to the low repeatability of the feature point, which is not conducive to feature matching.

•
The extracted points are not uniformly distributed. When calculating the PC of a whole image, the noise threshold T is estimated using the Rayleigh distribution mode, which is a fixed value. This leads to a noise threshold larger than the actual noise in the dark region. The feature information is drowned by the noise. As shown in Figure 1, the features are always concentrated in the bright region, especially in the SAR image. The nonuniform distribution of feature points leads to limited registration accuracy on large images or high-contrast images.

•
Because of the different imaging mechanisms for optical and SAR sensors, the acquired images have different expressions for the same objects, thereby resulting in large radiation differences between image pairs. Such nonlinear radiation differences reduce the correlation between corresponding points, which often leads to difficulties in feature description.
In this paper, we address the above limitations by developing a robust optical and SAR image registration method based on PC (ROS-PC). The proposed method mainly contains the following two works.
First, a uniform Harris feature detection method based on multi-moment of the PCM (UMPC-Harris) is proposed. In the UMPC-Harris, we take the corners and edge points as keypoints. The edge structure feature has a high similarity and better resistance to radiation difference between optical and SAR images [30,36,39], thus, extracting feature points on the edge can ensure enough number of features and robustness to radiation difference. Besides, corner features can increase the number of homologous points. Therefore, the multi-moment of the PCM is constructed by using maximum and minimum moment maps. Harris operator is used on the multi-moment to detect corners and edge points. Finally, the overlapping block and voting strategy are introduced to detect uniformly distributed and reliable keypoints. Second, since PC is not suitable for constructing descriptors directly, the feature descriptor is derived for a keypoint by utilizing the histogram of phase congruency orientation on multi-scale max amplitude index maps (HOSMI). The proposed HOSMI descriptor is utilizing the MIM instead of the PCM because it is more robust to intensity radiation distortions than the PCM [39]. Furthermore, in remote sensing images, many salient features usually appear in different scales [38]. Therefore, we construct the phase congruency orientation maps and max amplitude index maps, respectively. In the local region of each keypoint, the histograms of phase congruency orientation on multi-scale max amplitude index maps are calculated. Finally, the descriptor is constructed by combining the feature vectors of all patched in order. Compared with state-of-the-art, the main contribution of this study can be summarized as follows: • The UMPC-Harris feature detection method is proposed based on the multi-moment of the PCM, a voting strategy, and an overlapping block strategy. The detector can obtain enough reliable and uniformly distributed feature points.

•
The HOSMI feature description method is proposed based on the histograms of phase congruency orientation on multi-scale max amplitude index maps. The descriptor is more robust against nonlinear radiation variation and speckle noise.
The rest of this paper is organized as follows: Section 2 starts with a review of PC theory, and followingly introduces the ROS-PC in detail, including the UMPC-Harris feature detector and HOSMI feature descriptor. In Section 3, through several experiments, the repeatability rate of keypoints by UMPC-Harris, the robustness of ROS-PC, and the sensitivity of ROS-PC to scale and rotation changes are evaluated and discussed. Finally, the conclusions are provided in Section 4.

Methodology
The PC has been confirmed to be robust to nonlinear radiometric differences, which can capture the common features between multi-sensor images [37,39,40]. The ROS-PC method is based on PC. This section first reviews the PC theory briefly and then presents the design processing of the UMPC-Harris detector and HOSMI descriptor.

Review of PC Theory
According to Kovesi's approach, PC can be computed by convolving an image with a log-Gabor filter (LGF) to extract local phase information. The LGF is efficient for detecting features over multiple scales and orientations. In the frequency domain, LGF is defined as: where ω 0 is the central frequency of the filter, κ is the related-width parameter of the filter that varies with ω 0 , which ensures that κ/ω 0 is a constant. The filter is transformed from the frequency to the spatial domain using an inverse Fourier transform. In the spatial domain, the 2-D LGF is represented as: LGF(x, y) = LGF even Considering the coordinates of an input image I(x, y), the convolution responses e s,o (x, y) and o s,o (x, y) at scale s and orientation o are obtained, and then, the convolution results of even and odd symmetric wavelets form the response arrays as follows: where LGF even s,o and LGF odd s,o refer to the even-symmetric (cosine) and odd-symmetric (sine) wavelets of the LGF at scale s and orientation o, respectively. Further, e s,o (x, y) and o s,o (x, y) are the convolution responses of LGF even s,o and LGF odd s,o at scale s and orientation o, respectively. The corresponding amplitude A s,o (x, y) and phase ϕ s,o (x, y) at scale s and orientation o are given by: Considering the negative effect of image noise, the improved PC (called PC 2 ) and the phase deviation function are, respectively, defined as [41]: where W o (x, y) is the weighting function, T is the estimated noise threshold, ε is a small constant to prevent division by zero, and ϕ s,o (x, y) is the mean phase angle. The function · denotes that the enclosed quantity is equal to itself when its value is positive, and zero otherwise. PC 2 denotes the PC magnitude map of the input image.
Further, to obtain the information of PC varying with orientation o in the image, phase congruency is calculated independently in each orientation. Thus, serval PCMs according to the orientation angle are obtained [42].
where θ o denotes the angle corresponding to orientation o, and PC(θ o ) represents a PCM at orientation angle θ o . The moment of PC is calculated using these intermediate quantities as: Remote Sens. 2020, 12, 3339 The maximum moment max ψ and the minimum moment min ψ of PC are defined as: The maximum and minimum moments of the PCM represent the edge and corner strength map, respectively.

The Proposed UMPC-Harris Feature Detector
Keypoints with high repeatability and uniform distribution can obtain sufficient matches, thus improving the image registration accuracy [38]. The subsection presents a novel feature detector UMPC-Harris, which is based on voting strategy, Harris on the multi-moment of PCMs, and overlapping block strategy for the detection of corners and edge points. The purpose of this UMPC-Harris detector is to detect sufficient, reliable, and well-distributed keypoints in optical and SAR images. Figure 2 presents the main process of the UMPC-Harris detector, which contains three steps.
The maximum moment max ψ and the minimum moment min ψ of PC are defined as: The maximum and minimum moments of the PCM represent the edge and corner strength map, respectively.

The Proposed UMPC-Harris Feature Detector
Keypoints with high repeatability and uniform distribution can obtain sufficient matches, thus improving the image registration accuracy [38]. The subsection presents a novel feature detector UMPC-Harris, which is based on voting strategy, Harris on the multi-moment of PCMs, and overlapping block strategy for the detection of corners and edge points. The purpose of this UMPC-Harris detector is to detect sufficient, reliable, and well-distributed keypoints in optical and SAR images. Figure 2 presents the main process of the UMPC-Harris detector, which contains three steps.
Overlapping block strategy   First, the input image is divided into S n × S m blocks. Further, to avoid missing feature information on the block boundary, an overlap region with n op pixels is added between adjacent blocks. The choice of parameters S n and S m is a tradeoff between the amount of computation and the uniform distribution of the keypoints. When more blocks are divided, the keypoints will become more uniform, while increasing the number of calculations. The size and local complexity of the image should be considered for the selection of S n and S m .
Second, according to the description in Figure 2, we take block (1,2) as an example to illustrate the construction of multi-moment of the PCM. According to the definition of the maximum and minimum moments, the moment of the PCM M k is defined as: where k t is a variable between −1 and 1. The moment map contains the maximum and minimum moment map, and we can use max ψ and min ψ to describe the above equation as: where M k represents the moment of the PCM with parameter k t , and it is obvious that if k t is set to −1, M k is the minimum moment map M k = min ψ , and if k t is set to 1, M k is the maximum moment map M k = max ψ . The number of moments is n, and the step h is 2 n−1 . Third, the points detected by Harris on the maximum and minimum moments of the PCM represent edge points and corners, respectively. Because the edge feature has a high similarity and better resistance to radiation difference between optical and SAR images, thus, extracting feature points on the edge can ensure enough number of features and robustness to radiation difference. Besides, corner features can increase the number of homologous points. Thus, we combine the corners and edge points as keypoints. However, corner features are sensitive to SAR speckle noise and the repeatability rate of the edge points is poor, and therefore, if all of them are considered as keypoints, there could be some unreasonable keypoints. Therefore, we extract Harris corners on the multi-moment of the PCMs, respectively, and we consider the points appearing many times as the final keypoints. Stable and reliable keypoints are found based on the voting strategy.

Feature Description
After a set of keypoints are detected by the UMPC-Harris method, feature descriptors need to be designed for each keypoint to achieve image registration. The orientation and maximum amplitude index of PC have been proved suitable for describing the similar local features of multi-sensor images [34,37]. In this subsection, we first introduce the construction process of the multi-scale max index maps and the orientation map of phase congruency. Finally, the HOSMI descriptor is established to increase the distinction of the features.

Multi-Scale Max Index Maps
Max index map is more suitable for multimodal image registration and is more robust to intensity radiation distortions compared to the gradient amplitude map. Furthermore, in remote sensing images, many salient features appear in different scales [38]. Inspired by this, multi-scale MIMs are formed by calculating the index of the maximum amplitude at each scale to improve the significance of descriptors. Figure 3 shows the construction process of the four-scale MIMs.
First, the input image I(x, y) is convoluted with LGF to obtain s × o PC amplitude maps in four scales and six orientations. Second, in the six amplitude maps of the same scale, we can find the maximum amplitude max A s,o (i, j) o 1 and the corresponding orientation o, where the superscripts and subscripts indicate orientations ranging from 1 to o for a pixel p with coordinates A s,o (i, j). Third, the coordinate of the pixel p in MIM is represented by the corresponding orientation o. Thus, the MIM is an image with all elements from 1 to o. Finally, multi-scale MIMs are constructed by calculating the index of the maximum amplitude on four scales. Therefore, the information in the fine scale and coarse scale of the input image can be obtained, which can effectively enhance the saliency of features in the image.

Orientation of Phase Congruency
The orientation of PC represents the important directions of feature variation, and it has been proved robust to nonlinear radiation distortions [32]. Therefore, similar to the gradient and gradient orientations in the SIFT algorithm, we need to find orientation information in addition to multiscale MIMs.
The PC is calculated by the odd and even symmetric wavelets of LGF, wherein the odd-symmetric wavelet is a smooth derivative filter, which can compute the image derivative in a single direction [34]. For the calculation of PC, the convolution results of the odd-symmetric are obtained according to six orientations. The six convolution results are projected into x and y axes, respectively, and the projections of the x and y axes are obtained. The orientation of the PC can be calculated by the arctangent functions defined as:  Figure 4 shows the calculation process of the PC orientation.

The Proposed HOSMI Feature Descriptor
The proposed HOSMI descriptor is constructed using the histograms of the PC orientation on the multi-scale MIM. Figure 5 presents the main processing chain of the proposed HOSMI descriptor.  1.
Apply the LGF to the local region Lx of each keypoint, and then, calculate the odd and even convolution results of four scales and six orientations.

2.
Calculate the amplitude map over four scales and six orientations. In each scale, the corresponding orientation to the maximum amplitude forms the multi-scale MIMs; the detailed calculation process is shown in Figure 3.

3.
Obtain the PC orientation using the odd convolution results; the detailed calculation process is shown in Figure 4. The PC orientation is restricted to an interval 0 • , 180 • , which can handle gradient inversion in optical and SAR images, and large intensity differences between the optical and SAR images can be reduced.

4.
Divide the PC orientation map and the multi-scale MIMs of each keypoint into n p × n p patches. If the local region Lx is selected with a size of m × m pixels, the size of the patch is (m/n p ) × (m/n p ) pixels. The feature vector of each patch is calculated in order, and then a descriptor is constructed by combining the feature vectors of all patches.
• To calculate the feature vector of a patch, PC orientation is formed using n o bins covering the 180 degrees range of orientations. The sample added to the histogram is the element of the corresponding location on the MIM. To interpolate the peak position for better accuracy, a parabola is fitted to the three histogram values closest to each peak. The feature vector of patch P is calculated on four scales; therefore, the dimension of the feature vector of a patch is s × n o . In Figure 5, we take the first patch P 1 as an example. The scale used in the PC method is set to 4, and the feature vector of the patch is constructed as where H 1~H4 are the histograms of the four scales.

•
To obtain the feature descriptor of a keypoint, the feature vectors of all patches are combined into one feature vector. The feature descriptor is normalized by the L2 norm to achieve better invariance to illumination and shadowing. The dimension of the feature descriptor of a keypoint is s × n o × n p × n p . As shown in Figure 5, if the local region of a keypoint is divided into 4 × 4 patches, the feature descriptor is constructed by the 16 patches, as in HOSMI = [P 1 , P 2 , · · ·, P 16 ].

5.
Construct a local feature descriptor HOSMI for optical and SAR image registration.
A pair of corresponding points in the optical and SAR images are selected to construct descriptors. To verify the similarity of descriptors, we draw descriptors into stem images in Figure 6. keypoint is divided into 4 × 4 patches, the feature descriptor is constructed by the 16 patches, as in , , , 5. Construct a local feature descriptor HOSMI for optical and SAR image registration.
A pair of corresponding points in the optical and SAR images are selected to construct descriptors. To verify the similarity of descriptors, we draw descriptors into stem images in Figure 6.  Figure 6 shows the HOSMI descriptors of a pair of keypoints between the optical and SAR images. This pair of optical and SAR images has a strong difference in intensity and in gradient inversion, and there is obvious scattering in the SAR image. These differences introduce great challenges to the robustness of the descriptors. The square region represents the local region (96 × 96 pixels) around the keypoint, which is used for computing the feature descriptor. As shown in Figure  6, the similarity of the feature vector is high and radiation changes have a low effect on the proposed descriptor.  Figure 6 shows the HOSMI descriptors of a pair of keypoints between the optical and SAR images. This pair of optical and SAR images has a strong difference in intensity and in gradient inversion, and there is obvious scattering in the SAR image. These differences introduce great challenges to the robustness of the descriptors. The square region represents the local region (96 × 96 pixels) around the keypoint, which is used for computing the feature descriptor. As shown in Figure 6, the similarity of the feature vector is high and radiation changes have a low effect on the proposed descriptor.

Experimental Results and Discussion
In this section, we evaluate the performance of the feature detector on simulated images with different SAR noise levels and radiometric (non-uniform intensity) changes. Then, eight pairs of optical and SAR images are used to test the ROS-PC and analyze the experimental results. The registration performances are evaluated via objective and subjective approaches. One approach is to use the evaluation criteria, and the other is to use a chessboard mosaic image and enlarged submaps. Finally, experiments are conducted to evaluate the tolerance of rotation and scale changes from the ROS-PC method. All experiments were conducted with the MATLAB R2017b software on a computer with an Intel Core i5-7200U CPU and 16.0 GB memory.

Performance Experiments of Proposed UMPC-Harris Detector
We test the performance of the proposed UMPC-Harris detector on simulated images with different noise levels and radiometric (non-uniform intensity) changes. The UMPC-Harris is compared to three other state-of-the-art detectors, Harris, SAR-Harris, and m+M-Harris.

Evaluation Criteria of Feature Detector
Repeatability rate: Given a pair of simulated optical and SAR images to be registered, the keypoints are detected on the two images. Further, two points are regarded as a pair of corresponding keypoints, only if their coordinates are satisfied: where p so (x, y) and p ss (x, y) denote the coordinates of the corresponding keypoints in the simulated optical and SAR images, respectively. The function · 2 denotes the Euclidean distance between points p so (x, y) to p ss (x, y). T is the threshold of the Euclidean distance, which is set to 2 pixels in this experiment. The repeatability rate is defined as: where N cor is the number of pairs of the corresponding keypoints, and n so and n ss are the number of keypoints in the simulated optical and SAR images, respectively. The repeatability rate is a number between 0 and 1. The larger the repeatability rate, the better is the robustness of the detector.

Experimental Data and Parameter Settings of Feature Detector a. Experimental Data
We used the high-resolution (HR) optical images from the official website of Changguang Satellite Technology Company as the experimental data. The resolution of these images is better than 1 m/pixel. We selected three images with 1000 × 1000 pixels, captured at the Kabul International Airport, Afghanistan, in June 2018; these images are named as Group 1 to 3, as shown in Figure 7.

b. Parameter Settings
For the Harris detector, the threshold of the Harris operator is normalized between 0 and 1, and it is set to 0.1 in the following experiments. For the SAR-Harris detector, the first scale is set as σ = 2, the constant between two adjacent scales is set as k = 2 1/3 , the number of the scale layer is set to 8, and the arbitrary parameter is set as d = 0.04. Based on our previous experience, the threshold in the keypoint detection of the simulated optical image is set between 1 and 5, and that of the SAR image is set between 5 and 10. Other parameters used in the experiments follow the parameter settings suggested in Reference [25]. The m+M-Harris and UMPC-Harris are both based on PC. For fairness, the parameters of the PC method are tuned to the same value at each noise level. The PC is calculated in four scales and six orientations. The wavelength of the smallest filters is set from 3 to 5 pixels, according to different images. The scaling factor between successive filters is set to 1.6. In the experiment, the parameters are selected as S n = 4, S m = 5, n op = 20, and the number of moments of the PCM is selected as n = 5 and h = 0.5. The array of parameters k t is [−1, −0.5, 0, 0.5, 1].

Influence of Noise Level on Proposed UMPC-Harris Detector
To assess the robustness of the feature detector to noise, the HR optical images are utilized to generate simulated optical and SAR images by adding Gaussian noise and speckle noise, respectively. The simulated optical image is obtained by adding Gaussian white noise with 0 mean and 0.01 variance. In the simulated SAR images, the noise level is defined to describe the degree of multiplicative noise. The multiplicative noise with a different number of looks is simulated from one-look to nine-look in the simulated SAR image. It decreases with an increase in the SAR number of looks. A high number of looks refers to a small noise level in SAR images. The simulated images are shown in Figure 8. With the SAR noise level ranging from 1 to 9, the repeatability rate of the UMPC_Harris detector is compared to three other detectors, Harris, Sar-Harris, and m+M-Harris. For fairness, each detector extracts approximately 600 pairs of points by adjusting the threshold, and the average value of ten calculations is taken as the experimental result. The curves of the repeatability rate with the SAR noise level in the three groups are shown in Figure 9. The repeatability rate of the UMPC-Harris is the highest among the four detectors in the three groups, and it is more robust to noise than the other detectors. SAR-Harris and m+M-Harris have similar repeatability rates as that of the SAR noise level. Their repeatability rates are between 0.5 and 0.3. When the SAR noise level is high, the repeatability rate of the UMPC-Harris is still higher than 0.4. When the SAR noise level is small, the differences between the two simulated images caused by noise are small, and hence, the three methods show good performance, except for the Harris detector. The repeatability rate of the Harris detector is lower than 0.4, and it decreases rapidly with an increase in the SAR noise level. It is difficult for the Harris detector to deal with multiplicative noise in SAR images directly.

Influence of Radiometric Changes on Proposed UMPC-Harris Detector
To assess the robustness of the feature detector to nonlinear radiometric differences, HR optical images are utilized to generate an image with non-uniform radiometric differences. This is achieved by multiplying the HR optical image by a variable coefficient according to the change of the image column. The results are shown in Figure 10. The four detectors are tested on these images. Each detector extracts approximately 600 pairs of keypoints by adjusting the threshold properly, and the experiment results are shown in Figure 11. Furthermore, the repeatability rate of the four detectors are presented in Table 1.  The repeatability rate of the UMPC-Harris is the highest among the four detectors. One can see that the keypoints detected by UMPC-Harris are distributed more uniformly over the image than other detectors, which illustrates that the UMPC-Harris is more robust to radiometric variation and it further indicates that the UMPC-Harris can be applied to keypoint detection for images with radiometric differences.  The comparison of the repeatability rates of the four detectors indicates that the proposed UMPC-Harris detector has the highest repeatability rate. It is more robust to noise and it has better resistance to radiation differences. The reasons are listed below.

•
The UMPC-Harris aims to extract feature points on the multi-moment of the PCM. Stable and valuable keypoints are selected by voting on the Harris corner, which appears repeatedly on the PCM. The combination of the effective corners and edge points not only ensures the high repeatability of the features, but also a large number of features, which lays a foundation for subsequent feature matching.

•
Keypoints are well-distributed in the entire image, further points with obvious local features in the dark regions can be detected. This ensures that the keypoints are not limited to the bright region, thereby improving the accuracy of image registration.

Performance Experiments of Proposed ROS-PC Registration Algorithm
To evaluate the performance of the ROS-PC, it is compared with two other state-of-the-art algorithms, namely OS-SIFT [28] and RIFT [39]. The OS-SIFT is a feature-based method, and ROEWA and Sobel operators are used to calculate consistent gradients for optical and SAR images. The RIFT is a PC-based method that detects corners and edge points on the PC map, and it proposes a MIM for feature description. Both the OS-SIFT and RIFT exhibit good performances on multi-sensor image registration. The comparative programs are obtained through their respective academic home pages. Subjective and objective criteria are used to evaluate the performance of the registration algorithm.

Evaluation Criteria of the Registration Algorithm
The checkboard mosaic image and enlarged sub-images are displayed to observe the effect and details of image registration. For each test image pair, each algorithm is executed ten times, and the average of the ten results is computed as the final result. The following evaluation criteria are used to analyze the performance of the algorithm objectively and quantitatively.
Root mean square error (RMSE): This criterion is used to measure the accuracy of the image registration algorithm and is computed by the following method.
First, approximately 20 pairs of corresponding points are manually selected from the optical and SAR images to estimate the affine transformation matrix H. The coordinates of ith correctly matched keypoints are (x o i , y o i ), (x s i , y s i ) . The RMSE is computed as [26]: where N cor is the number of correctly matched keypoints after the fast sample consensus (FSC) [43], ((x s i ) , (y s i ) ) and it denotes the transformed coordinates of (x s i , y s i ) by the estimated transformation matrix H.

Number of correct matches (NCM):
The NCM is the number of correctly matched keypoints after the FSC. If the NCM of an image pair is less than four, the matching is considered to have failed.
A small RMSE indicates that the accuracy of optical and SAR image registration is high. A large NCM indicates that there exist more correctly matched keypoints, thereby resulting in a more accurate transformation matrix H.

Datasets and Parameter Settings of the Registration Algorithm a. Datasets
In our experiment, eight pairs of optical and SAR images are used to test the ROS-PC; these pairs are referred to as Pairs A-H. Table 2 lists the information for the test images. Numerous factors are considered in the selection of test images, including different SAR sensors, date, resolution, and size. The optical images of the eight pairs are obtained from Google Earth, and the SAR image contains a satellite SAR image and seven airborne SAR images. To verify the robustness of the ROS-PC, the image contains different features, as shown in Figure 13.
Pair A includes images of an airport in Tucson, AZ, USA; in this pair, there exists a slight rotation and translation difference. Pair B is obtained in Zhengzhou, Henan, China, and it includes images of a small village. There is a slight rotation and translation in this pair, and because of the arc-shaped roof, there is a large radiation difference over the houses. Some houses even are difficult to recognize on SAR images. Pair C is also obtained in Zhengzhou, Henan, China, and it includes images of a field, and the features of the field vary significantly with the date. Some obvious features exist in the SAR image but not in the optical image. The remaining five pairs of images are obtained in Weinan, Shaanxi, China. Pair D includes images of a large scene, that includes a river, small buildings, multiple fields, and several roads. There are no scale and rotation changes in this pair, and the features exhibit a little temporal difference. This pair of images is used for the rotation and scale variation experiments. Pair E mainly includes images of a lake. There are certain scale variations and time differences. Pair F includes images of a terrace. The fields and terraces in the image are divided into two parts by a road. The feature intensity of the terraces is stronger than that of the fields. Pair G includes images of a scene including a river and some fields, there exists a large time span between the two images. Pair H includes images of a complex scene with some different structure buildings.

b. Parameter Settings
The parameter settings of the UMPC-Harris detector are described in Section 3.1.2. The proposed ROS-PC method contains three parameters n o , n p and m, respectively. Parameter n o and n p are related to the dimensions of the descriptor, so they should not be too large. Parameter m is the size of the local region used for feature description. If the local region is too small, it contains less information, which does not reflect the difference of features. On the contrary, if the local region is too large, not only the amount of calculation will increase but also the effect of geometric distortion will be received. In the feature descriptor, the parameters are set as n o = 6, n p = 4, and m = 96. Therefore, the dimension of the feature descriptor of a keypoint is 384. The parameter settings of comparative algorithms OS-SIFT and RIFT follow the References [28,39]. For a fair, the thresholds of keypoints detection are properly adjusted to obtain similar numbers of keypoints (approximately 1000~1200).
For the feature matching, the sum of squared differences (SSD) is selected for the feature matching metric. If the distance between the two feature vectors is less than the threshold, a pair of keypoints is considered as a potential match, and the threshold is set to 3 pixels. Generally, the matching pairs contain many false matches. The FSC algorithm is used to remove false matches.

Comparison of Experimental Results and Discussion
To evaluate the optical and SAR image registration performance of the ROS-PC, the algorithm is compared with OS-SIFT and RIFT. The OS-SIFT utilizes two different operators to calculate the gradients for SAR and optical images. Multiple image patches are aggregated to construct a gradient location orientation histogram-like descriptor. It is an advanced gradient-based method. The RIFT is a radiation-insensitive feature matching method based on PC and MIM, which is considerably more robust to nonlinear radiation distortions than traditional gradient maps. The registration results of eight pairs of optical and SAR images are shown in Figures 14-21.       Eight groups of images with different features are selected to verify the robustness of ROS-PC algorithm. It can be found that the ROS-PC has the best performance among the three methods, owing to the advantages of the proposed UMPC-Harris detector and HOSMI descriptor. For images with date and season differences, shown in Figures 14 and 16, the ROS-PC shows better robustness and obtains some matched keypoints with the time difference. For images with large radiation differences, shown in Figures 15 and 21, ROS-PC can still obtain some correctly matched keypoints, which are well-distributed in the image. For images with multiple objects, shown in Figures 17-20, ROS-PC can extract matching keypoints from each object, and they are uniformly distributed, which ensures the accuracy of registration. To sum up, the ROS-PC is a robust algorithm, which is suitable for optical and SAR image registration.
To further observe the registration accuracy of the ROS-PC, the checkboard mosaic images and enlarged sub-images of each pair are displayed in Figure 22. The sub-image is an enlarged view of the intersection of the checkboard mosaic images, where the common features in the optical and SAR images are displayed clearly. In each pair, three sub-images with different features are selected.
Comparisons of RMSE, NCM, and the running time of the eight pairs are presented in Table 3. In the eight pairs of test images, most features of suburban areas, such as airports, houses, fields, terraces, roads, rivers, and lakes are included. The optical and SAR images exhibit nonlinear radiation distortion, which leads to intensity differences or gradient inversion. Next, we analyze and discuss the registration results of each pair of images.
For Pair A, all three methods can successfully achieve optical and SAR image registration because of the HR and less noise. The ROS-PC performs best on the RMSE and NCM, benefiting from the high repeatability rate of keypoints and the robustness of the feature description method. For Pair B, the RIFT fails to register the two images correctly because of the serious nonlinear radiometric difference. This is because the houses in the image have arc-shaped roofs, which induce the optical and SAR images with intensity differences. Although several correctly matched keypoints are detected by the OS-SIFT, the number and accuracy are significantly lower than the ROS-PC. For Pair C, the OS-SIFT fails to register the two images because of the gradient difference and obvious scattering in the SAR image. However, the ROS-PC is suitable for describing similar local information based on the PC orientation and the multi-scale MIMs of the keypoints. For Pair D, the image contains many features, and there are little scale, rotation, and date difference. The ROS-PC remains superior to the other two methods, as it is robust to nonlinear radiation differences and noise. For pair E and F, the image contains two types of features. ROS-PC can obtain correctly matching keypoints from each object, and they are well-distributed. For pair G, owing to the time difference, there are many unmatched keypoints of the river in the image, which causes extra difficulties in feature matching. However, the ROS-PC still successfully completes more NCM and achieves higher accuracy. For pair H, the radiation difference between optical and SAR images is large, because there are many buildings in the image, which leads to the failure of the other two algorithms. To sum up, the ROS-PC has the most uniform distribution, the largest NCM, and the best RMSE among the three algorithms. Therefore, the ROS-PC is more robust to noise and scattering in SAR images and the radiation difference between optical and SAR images.
Gradient-based descriptors such as OS-SIFT are more sensitive to nonlinear radiation differences because the gradient-based descriptors rely on a linear relationship between images, and therefore, they are not appropriate for significant nonlinear intensity differences caused by radiation distortion. The speckle noise and scattering in SAR images pose significant challenges in image registration. The RIFT performs better than the gradient-based descriptors because it uses PC to capture MIM. The RIFT uses the MIM to express the shape and structure information of objects and, therefore, it is robust to nonlinear radiation distortion. However, the repeatability rate of the corner detector in RIFT is not as good as that of UMPC-Harris, and the descriptor in RIFT has limited significance and robustness to noise and scattering in SAR images. Therefore, our ROS-PC yields the smallest RMSE and the largest NCM among all eight pairs because of two reasons, which are listed below.

•
The UMPC-Harris can obtain a higher repeatability rate of keypoints than SAR-Harris and m + M-Harris between SAR and optical images.

•
The HOSMI descriptor uses four-scale and six-orientation LGFs to capture the multi-scale max index and orientation feature information of PC, which is robust to nonlinear radiation variations of optical and SAR images. Further, it can effectively overcome the noise and scattering of SAR images.
After comparisons of the running time in Table 3, it can be found that the ROS-PC is the most time-consuming. The reason is that the algorithm is based on the principle of PC, which is slow by nature. Second, in the process of feature detection, the overlapping block and voting strategies need to additional calculation than the other methods. Third, the descriptor is constructed over the four scales and the dimension is larger. This paper only focuses on a robust registration method for optical and SAR images, and the running time is not the focus. Therefore, reducing the computation time and improving the efficiency of the algorithm is a problem we need to study in the future. Moreover, computational efficiency can be further improved by optimizing the algorithm and implementing the ROS-PC in C/C++.

Influence of Rotation and Scale Variations on the Proposed ROS-PC
The previous experimental results show that the algorithm is robust to the radiation distortion between optical and SAR images; however, the ROS-PC is not designed for scale and rotation deformations. The large-angle rotation between remote sensing images can be corrected using sensor geographic information. Further, by employing remote sensing image ground resolution information, remote sensing images can be assigned to the same scale by resampling. Then, the ROS-PC could be used for fine matching, which can handle slight rotation and scale differences between optical and SAR images. In this subsection, the influence of rotation and scale variation on our algorithm is evaluated based on the NCM for Pair D.

Rotation Experiments of the Proposed ROS-PC
We tested the effect of rotation changes on the ROS-PC. Assuming the optical image remains unchanged, the SAR image is rotated from −12 • to 16 • . The optical and SAR image registration results of the rotation variation are shown in Figure 23. The relationship between the NCM and the rotation angle is listed in Table 4.  Figure 23 and Table 4 indicate that the ROS-PC can tolerate rotations between optical and SAR images below 9 • , which is sufficient for images that have been corrected by sensor geographic information.

Scale Experiments of the Proposed ROS-PC
We test the robustness of the ROS-PC to scale changes. The optical image in Pair-D remains unchanged, and the SAR image is resized from 0.6 to 1.4 with an interval of 0.1. The optical and SAR image registration results of the scale variation are shown in Figure 24. The relationship between the NCM and the scale factor is listed in Table 5.    Figure 24 and Table 5 indicates that the ROS-PC can tolerate the scale difference between optical and SAR images in the range of 0.7-1.2, which is sufficient for images that have been assigned a similar scale by resampling.

Conclusions
In this study, a PC-based optical and SAR image registration algorithm ROS-PC is proposed to address the matching difficulties caused by complex nonlinear radiation differences and speckle noise.
We designed a novel feature detector named UMPC-Harris, comprising an overlapping block strategy, Harris on multi-moment of the PCM, and the vote strategy to obtain uniformly distributed keypoints and increase the repeatability rate of the keypoints. The experimental results on simulated images demonstrated that the proposed UMPC-Harris method achieved a good performance in keypoints detection. The proposed HOSMI descriptor is constructed using the histograms of the PC orientation on the multi-scale MIM. The image registration experiments prove that the ROS-PC is robust to nonlinear radiation variations of optical and SAR images and it can tolerate a small amount of rotation and scale changes.