1. Introduction
With the continuous development of synthetic aperture radar (SAR) imaging technology towards high resolution and wide swath, the application potential of SAR image data in many fields has gradually been reflected. Using high-precision heterogenous images as a reference, registering SAR images is a hot spot in the field of remote sensing image matching. Image matching is the geometric alignment of two or more images from the same area; usually, these images are obtained under different imaging conditions (different sensors, different imaging times, etc.). Matching methods are mainly divided into two types: intensity-based methods and feature-based methods [
1]. Intensity-based methods usually adopt the idea of template matching. This involves selecting a template window on the reference image, and then searching within a certain search area on the image to be matched. The similarity measure (such as cross-correlation, mutual information, sum of squared differences, etc.) is calculated between the template window and the pixel intensity corresponding to the search area window on the image to be matched to obtain the peak value of the similarity measure function, and finally the matching position is determined. Feature-based methods consist of three steps: feature extraction, feature description, and feature matching. In the feature extraction stage, the common features between the two images should be detected as much as possible, and the types of these common features include point features, line features, and area features. Feature description is to generate feature descriptors (that is, feature vectors with a certain length) for the extracted features. The final feature matching is to find the same features by calculating the similarity measure (such as Euclidean distance) between the descriptors based on the feature descriptors, and estimate the image transformation matrix.
The advantage of the intensity-based method is that only the corresponding simi-larity measure needs to be selected, and additional feature detection and description are not required. However its disadvantage is also obvious; it needs to calculate the similarity measure of all pixels in the template window, which is very time-consuming. Due to the strong nonlinear intensity difference between heterologous images, the search surface becomes non-smooth, and it is easy to fall into the local optimal value during the search process. Therefore, it is not suitable for the matching of heterologous images. Compared with the former pixel-by-pixel calculation, the feature-based method only needs to detect some salient features in the image, so it can achieve higher efficiency than the former, and this method can easily achieve scale and rotation invariance. The disadvantage is that different feature extraction methods will have a greater impact on the final matching result, so for a specific application, choosing an appropriate feature extraction method is particularly important. In the feature-based method, the point feature is easier to locate accurately on the image than the line feature and the area feature, and its detection complexity is also lower than the latter. Therefore, the point-based matching method is currently the most widely used [
1,
2]. Point features include corner points and blob points, which correspond to different interest point detectors. In recent years, quite a few scholars have evaluated the performance of interest point detectors in different application fields.
In the field of computer vision, Zukal, M. et al., explored the influence of Harris, Hessian and DoG interest point detectors by illumination and histogram equalization [
3]. Cordes, K. et al., analyzed the localization accuracy of alternative interest point detector (ALP) and DoG interest point detector, and the results show that ALP is better than DoG [
4]. To verify the combined performance of different interest point detectors, Ehsan, S. et al., proposed an evaluation metric based on the spatial distribution of interest points, and extended it to provide a measure of complementarity of pairwise detectors. Experimental results show that the scale-invariant feature detectors dominate whether used alone or in combination with other detectors [
5]. Ruby, P.D. et al., used the survival rate as the evaluation indicator to explore the detection performance of four classical interest point detectors under a series of image sequences such as rotation, scaling and blurring [
6]. In addition to the repeatability, Barandiarán, I. et al., introduced two indicators, the number of interest points and the detection efficiency, and conducted a comprehensive evaluation of nine classical interest point detectors [
7]. For specific applications, in order to explore the performance of interest point detectors in real-time visual tracking, Steffen, G. et al., designed a video sequence dataset containing various geometric changes, lighting changes and motion blur. Taking this dataset as a benchmark, they comprehensively analyze all relevant factors affecting real-time visual tracking by combining interest point detectors and feature descriptors [
8]. In a simultaneous localization and mapping (SLAM) application, Gil, A. et al., evaluated the repeatability of interest point detectors and the invariance and saliency of descriptors on different 3D scene image sequences [
9]. Prˇibyl et al., studied the performance of interest point detectors on high dynamic range (HDR) images, using interest point distribution and repeatability as indicators, and the results showed that current interest point detectors cannot handle HDR images well [
10]. For HDR image processing, Melo, W.A. et al., proposed an improved interest point detector and tested with a large number of HDR images, finally verifying the effectiveness of the proposed algorithm [
11]. Gunashekhar, P.K. et al., applied the contrast stretching technique to the multi-scale Harris and multi-scale Hessian detectors, and proved through experiments that the improved detectors have improved repeatability in terms of illumination, viewing angle, and scale [
12]. Kazmi et al., conducted an experimental study on the performance of interest point detectors applied to image retrieval and classification. By combining detectors with descriptors, it was concluded that interest point detectors can achieve high accuracy in building recognition [
13]. Molina et al., studied the performance of interest point detectors applied to infrared image and visible image face analysis. The experimental results show that compared with SIFT, ORB and BRISK, SURF shows better performance in terms of repeatability and accuracy [
14].
Although a large number of researchers in the field of computer vision have evaluated the performance of many mainstream interest point detectors, the data used in their research are mainly close-range images or ordinary data camera images, and the main influencing factors are scale, viewing angle, lighting, and additive noise. In contrast to close-range images, heterologous remote sensing images are usually taken at high altitude, and the types of ground objects they cover are more complex. Due to the different imaging modes and imaging conditions of heterologous images, there are obvious geometric distortions and grayscale differences between images. This puts forward higher requirements on the performance of interest point detectors, which must be able to reflect the common features between images to the greatest extent. In the field of remote sensing image matching, Wang et al., used SSIM (Structural Similarity Image Measurement) and PSNR (Peak Signal to Noise, PSNR) as image quality metrics. The relationship between image quality and repeatability of interest point detector applied to remote sensing images is studied. The results show that the repeatability decreases as the image quality decreases, and under certain conditions this relationship can be modeled with some simple functions [
15]. The following year, Ye et al., used remote sensing images as experimental data to evaluate the performance of mainstream interest point detectors in the computer vision field [
16].
To the author’s knowledge, the comparative research on the performance of interest point detectors used in heterogeneous remote sensing image matching is relatively limited. In the existing research, the detector scope of its research does not include the latest research progress, and the evaluation indicators used are relatively single.
In this paper, optical images and SAR images are used as experimental data to evaluate the performance of interest point detectors in five aspects: scale difference adaptability, nonlinear intensity difference adaptability, distribution uniformity, image registration alignment performance and detection efficiency. Finally, we conduct comprehensive image registration experiments to further validate our evaluation results. Considering that SAR-Harris, UND-Harris and Har-DoG show good detection performance in the field of remote sensing image matching, as well as Harris-Laplace and DoG are widely used detectors in the field of computer vision [
17]. Therefore, this paper will evaluate the performance of these five detectors.
3. Experiments
3.1. Scale Difference Adaptation
This group of experiments tests the adaptability of the interest point detector to the difference in image scale. Airborne high-resolution SAR images and spaceborne medium-resolution SAR images were selected, respectively. The images of both resolutions are located in an airport area; the high-resolution image has a resolution of 0.2 m and a size of 1300 × 1300. Medium resolution images have a resolution of 10 m and a size of 600 × 600. Each set of data contains five images, where the scale difference is achieved by manual scaling, taking 1.5, 2, 2.5, and 3 times the scale difference, respectively. The experimental data are shown in
Figure 7 below.
Figure 8 shows the repeatability of each interest point detector on the above data. As the scale difference increases, the repeatability of all detectors shows a monotonically decreasing trend. For images of two resolutions, among the five detectors, the repeatability of SAR-Harris is the highest compared with other detectors (when the scale difference is between 1.5 and 2.5). UND-Harris is the lowest. In addition, the repeatability of DoG is higher, followed by Har-DoG and Har-Lap.
3.2. Nonlinear Intensity Difference Adaptation
Due to the different imaging modes, there is a large nonlinear intensity difference between heterogeneous images. This group of experiments studies the adaptability of the interest point detector to this difference. The selected image data information is shown in
Table 1. GSD in
Table 1 stands for ground truth spacing and there are no scale and rotation differences between each pair of images. All images are shown in
Figure 9. The experimental results of the three sets of data are shown in
Figure 10.
For each detector, we count the average of its repeatability across 12 image pairs obtained from four different regions and three different sensors as the final measurement indicator. The statistical results are shown in
Table 2.
It can be seen from
Table 2 that the average repeatability of SAR-Harris is the highest, followed by UND-Harris and Har-Lap. DoG is the lowest, and Har-DoG is between Har-Lap and DoG. From the results of the three sets of experimental data in
Figure 10 alone, the five detectors have higher repeatability on area 2 and lower repeatability on area 3 and 4.
The three types of images in area 2 have prominent geometric structures and clear edge structures, which is conducive to the extraction of interest points. In area 3 and 4, there is a temporal difference between the images, and the texture information is weaker than the other three areas. It is not difficult to see that SAR-Harris has better repeatability than other operators on airborne SAR image and spaceborne SAR image data (SAR-Harris was proposed to solve the problem of heterogeneous SAR image matching), while in the other two sets of data It is comparable to the repeatability exhibited by UND-Harris. Since DoG detects Blob points, and speckle noise on SAR images can easily be mistakenly detected as Blob points, the repeatability of DoG is the lowest among the five detectors. The repeatability of Har-DoG in each pair of images is between Har-Lap and DoG, because Har-DoG is essentially a combination of the two.
3.3. Distribution Uniformity
For image matching application, the uniformity of the location distribution of interest points on the image will affect the final image matching accuracy. Therefore, the interest point detector is required to detect relatively uniformly distributed interest points on both the reference image and the sensed image. This group of experiments uses three sets of image data in the nonlinear intensity difference adaptability experiment. For each sensor and different areas, the distribution uniformity
Nstd corresponding to the reference image and the sensed image are calculated, respectively. The average value of
std from two images is used as the final distribution uniformity indicator on this image pair. The experimental results of three sets of data are shown in
Figure 11. Taking the spaceborne SAR image as an example, the detection results of the five detectors are shown in
Figure 12.
In general, it is not difficult to see that UND-Harris has the best distribution uniformity, followed by DOG and SAR-Harris, and the worst is Har-Lap. Har-DoG is between Har-Lap and DoG.
As mentioned in scale differences adaptability, although UND-Harris performs poorly on scale differences, due to its use of block strategy and feature scale to constrain the spatial position of interest points, the distribution uniformity of interest points is obviously better than the other four detectors. Har-Lap and SAR-Harris achieve corner extraction by thresholding the response function, with the result that when they are applied to SAR images, corners are mostly concentrated in areas with strong scattering targets (corresponding to very bright areas on the image, as shown in
Figure 12a,b). Therefore, the distribution uniformity of both is not good. Among them, SAR-Harris uses ROEWA to suppress speckle noise, so that it can detect some corners with obvious geometric structures but weaker intensity than speckle noise. Therefore, the distribution uniformity of SAR-Harris is better than Har-Lap.
Since speckle noise is distributed evenly on SAR images, it is easily mistaken by DoG as Blob points (as shown in
Figure 12c). The distribution uniformity of DoG is second only to UND-Harris in
Figure 11a,c containing spaceborne SAR images (the signal-to-noise ratio of spaceborne SAR image is lower than that of airborne SAR image). Since Har-DoG integrates DoG and Har-Lap, its distribution uniformity is also between two detectors.
3.4. Image Alignment Performance Evaluation
Based on the experiments in
Section 3.2, we use five different detectors for the image data in
Table 1, and combine feature descriptor for image registration to further display the mosaic map. Since the SIFT feature descriptor has poor performance when applied to heterogenous images, and the phase consistency (PC) is more robust to non-linear intensity differences between images, here we adopt the PCSD feature descriptor in [
23] for feature description. Limited to the length of the article, we select a pair of images from the three sets of spaceborne SAR-to-airborne SAR, optical–to-airborne SAR and optical-to-spaceborne SAR data, respectively. The mosaic map is displayed after the final registration is completed. The results of the experiment are shown in
Figure 13.
We scored the five detectors according to the alignment effect of the area marked by the red circle in the figure, and the obtained results are shown in
Table 3 below.
In
Figure 13, the misalignment at the image junction corresponding to the mosaic image from the far left to the far right is increasing, which indicates that the quality of image alignment is deteriorating. As can be seen from
Table 3, among the three types of image matching, SAR-Harris and UND-Harris are better aligned. Among them, SAR-Harris is comparable to UND-Harris in Optical-to-Airborne SAR data, and both types of images are better than the latter. Unexpectedly, Har-DoG, after being compared with the experimental results in
Section 3.2, is finally better than the other two in image alignment quality, although Har-DoG is lower than Har-Lap and DoG in repeatability.
In the image alignment performance evaluation experiments, except for Har-DoG, the rest of the detectors’ performance rankings are consistent with the experimental results in
Section 3.2. Because blob points are seriously disturbed by speckle noise in SAR images, its repeatability is lower than that of corner point, and the repeatability of Har-DoG combining the two is inevitably lower than that of Har-Lap. However, for the image matching application, the addition of blob point detection introduces more information to some extent (although it will further reduce the computational efficiency). Therefore, the performance comparison of interest point detectors cannot be based on repeatability alone.
3.5. Detection Efficiency
In this section, we study the feature detection efficiency of five detectors, select the airborne SAR and optical image sequences in
Table 1, and manually adjust the detection threshold of each detector to control the detection of 1000 interest points on the reference image and the sensed image. The time consumption of each detector is shown in
Figure 14.
As can be seen from the figure, as the image size increases, the differences in time-consuming of the five detectors are also greater. The detection efficiency of DoG is the highest, followed by SAR-Harris, and Har-DoG is the lowest. When the image size is small, Har-Lap outperforms UND-Harris, but when the image size is large, the two are opposite.
Although DoG performs poorly in adaptability to nonlinear intensity difference, it is significantly better than the other four operators in terms of detection efficiency. All five detectors need to establish a scale space. Since DoG down-samples the image when constructing the scale space and uses DoG instead of LoG when performing scale space localization, its detection time is shorter than other detectors. Although SAR-Harris is similar to Har-Lap, it needs to construct a SAR-Harris scale space, but it has obtained the scale attribute of interest points while completing the extreme value detection, so its detection efficiency is better than that of Har-Lap. Because the number of layers in the scale space constructed by Har-Lap is positively correlated with the image size, the detection efficiency of Har-Lap shows different pros and cons compared to UND-Harris when the image sizes are different. Har-DoG needs to detect corner points and Blob points at the same time, so its feature detection time takes the longest and has the lowest efficiency.
3.6. Comprehensive Evaluation
In order to more intuitively reflect the performance of the five detectors in above experiments, the performance evaluation table shown in
Table 4 is made. As can be seen from the table, SAR-Harris shows the best repeatability under scale changes and nonlinear intensity difference between images. Although it was originally proposed to solve the multiplicative speckle noise in SAR image matching, it can still show good performance when applied to optical image and SAR image matching. This can also be derived from the score of image alignment performance. UND-Harris is proposed for matching optical images and SAR images. Although its repeatability is comparable to SAR-Harris on images with nonlinear intensity difference, the detector is sensitive to scale changes. The distribution uniformity of the interest points detected by UND-Harris is the highest among the five detectors, so when there is a large geometric distortion in the heterologous image, this detector is used to improve the final image matching quality.
Har-DoG combines Har-Lap and DoG to detect complementary corner points and blob points in an image, so it improves the density of interest points. Moreover, Har-DoG is better than Har-Lap and DoG in image alignment performance. However, the nonlinear intensity difference and speckle noise between optical image and SAR image are not fully considered in Har-DoG. Although DoG has poor adaptability to nonlinear intensity difference of heterologous images, it has high detection efficiency, good scale difference adaptability and distribution uniformity. Therefore, DoG is suitable for the coarse matching step in the two-stage matching strategy.
4. Discussion
Since the ground responses from different sensors and at different times are different, we show the performances of the co-registration of images with different scales and different sensors. In this section, we also combine PCSD feature descriptor and interest point detectors to register images and compare the impact of five different detectors on the final registration accuracy. The experiment uses images from different sensors, time and scales, and the image data information used is shown in
Table 5.
The registration results of the three pairs of images are shown in
Figure 15 below. In order to quantitatively compare the final performance of the five detectors, we calculated the root-mean-square error (RMSE) and the number of correct matching points (NCM) of the registration, respectively. The results are shown in
Table 6.
In
Figure 15, it can be seen that all detectors can finally complete the image regis-tration, and the matching points detected by UND-Harris and DoG are more uniform. As can be seen from
Table 6, the RMSE of both UND-Harris and SAR-Harris is better than the other three detectors on the three pairs of images. Furthermore, in test 1, the RMSE of SAR-Harris is better than that of UND-Harris, while in test 2 and test 3, the accuracy of the two is comparable. Among the latter three detectors, Har-DoG has the highest accuracy and DoG has the lowest accuracy. It can be seen that the NCM of Har-DoG is larger than the other two.
From the final registration experimental results, compared with the previous experiments, the experimental results are consistent with the experimental results in 3.5. For Har-DoG, although its repeatability is between Har-Lap and DoG, both RMSE and NCM are better than the other two in registration experiments. This further shows that using both corner points and blob points for detection can indeed improve the registration accuracy of the final image and the density of matching point pairs. Comprehensive registration experiments further demonstrate that repeatability alone is not comprehensive enough as a criterion for evaluating detector performance.
5. Conclusions
Interest points, as key features on images, have been widely used in image matching. In order to select the most suitable interest point detector for specific applications, this paper starts from the application requirements of heterogeneous image matching, and integrates fives factors (scale difference adaptability, nonlinear intensity difference adaptability, interest point distribution uniformity, image registration alignment performance and detection efficiency), to evaluate the performance of five detectors: Har-Lap, DoG, Har-DoG, SAR-Harris, and UND-Harris. Experimental results show that the performance of interest point detectors varies for different evaluation aspects. In terms of scale difference and nonlinear intensity difference adaptability, SAR-Harris outperforms other detectors, among which DoG is second in scale difference adaptability, and UND-Harris is the weakest. UND-Harris is second only to SAR-Harris in adaptability to nonlinear intensity differences. In terms of distribution uniformity, UND-Harris showed the best performance, followed by DoG and SAR-Harris, and Har-Lap was the weakest. In terms of feature detection efficiency, DoG is the highest, followed by SAR-Harris. Har-DoG has the lowest efficiency due to the detection of two types of interest points. In terms of image alignment performance, Har-DoG is better than Har-Lap and DoG. In the other three aspects, the performance of Har-DoG is between Har-Lap and DoG. Regarding the image alignment performance as well as the final comprehensive registration results, only Har-DoG showed different results than adopting the repeatability metric. Therefore, SAR-Harris is optimal considering the five aspects of scale difference adaptability, nonlinear intensity difference adaptability, distribution uniformity, image registration alignment performance and detection efficiency. Although UND-Harris is weaker than SAR-Harris in detection efficiency and scale difference adaptability, its uniformity of interest point distribution makes it suitable for heterogeneous images with large local geometric distortion. In addition, when the sensed image contains a large number of textureless areas (such as water surfaces), the effect of using UND-Harris is not as good as SAR-Harris (UND-Harris will extract a large number of unreliable interest points in these areas of the image). Therefore, SAR-Harris is suitable for images with fewer effective texture areas, while UND-Harris is suitable for images with many effective texture areas. The excellent detection efficiency of DoG makes it more suitable for the coarse matching process in the two-stage matching method. Har-DoG is suitable for complementary detection when there are few corner points or blob points in the image to improve the density of interest points.
Choosing an optimal detector to carry out one’s own research cannot just rely on a single criterion. It is difficult to use a quantitative index to evaluate all detectors. Therefore, in actual research, we need to combine several evaluation indexes to select detectors according to our actual research needs. This paper also provides a reference for the selection of interest point detectors in the process of heterologous image matching.