1. Introduction
With the rapid development of remote sensing technology, the acquisition and comprehensive analysis of multimodal remote sensing images have become a reality, covering multiple types such as visible light, infrared, lidar, and SAR. Among them, SAR, as an active detection sensor, has the unique advantage of all-day and all-weather, while optical sensors are mature in technology and high in resolution, providing high-resolution images that align with human visual perception. The registration of SAR and optical images serves as an essential process. This step is vital for harnessing the complementary benefits of both modalities. It is crucial in remote sensing applications, including image fusion and visual navigation [
1]. However, this process presents significant challenges in remote sensing technology. It demands accurate alignment of the reference image with the sensed image. Additionally, it must address the issues arising from nonlinear radiation distortion (NRD) and complex geometric transformations such as scaling, rotation, and perspective [
2].
The current image registration methods can be mainly divided into two categories: manually designed methods and deep learning-based methods [
3]. Manually designed methods can be further divided into region-based methods and feature-based methods.
Region-based methods can also be called intensity-based methods, which usually use the grayscale information of the image for template matching. This method first selects a template area on the reference image and then performs a sliding search in the search area of the sensed image. By using a specific similarity measure as the standard for judging image similarity, accurate image registration is achieved. Depending on the similarity measure, representative methods include normalized cross-correlation (NCC [
4]), mutual information (MI [
5]), etc.
Feature-based methods achieve image registration by extracting common features between input images and evaluating the similarity of these features. This process is divided into three main steps: feature extraction, feature description, and feature matching [
6]. Feature extraction is mainly divided into two categories: global features and local invariant features [
7]. Global features, such as line features and surface features, have a certain degree of invariance, but not all images contain these features, so their scope of application is limited. In contrast, local invariant feature methods mainly extract point features and area features, of which point features have become the mainstream choice in contemporary state-of-the-art methods owing to their ubiquitous presence and straightforward identification of location. The point feature extraction methods selected for use in current advanced registration methods mainly include Harris [
8], ORB (Oriented FAST and Rotated BRIEF) [
9], and AKAZE (Accelerated KAZE) [
10], while the feature description methods are mainly categorized into constructing descriptors based on the gradient information of the image (e.g., SIFT [
11]) and based on phase congruency (PC) (e.g., RIFT [
12], MSPC [
13]).
Current remote sensing technology can realize system-level geometric correction of images through platform positioning parameters, effectively eliminating rotational and scale differences between SAR images and optical images. Cutting-edge methods tend to combine region-based and feature-based methods to develop feature-based template matching methods. This class of methods usually utilizes similar local structural features between heterologous images. It aims to extract a set of matching point pairs for constructing a geometric transformation model. It is shown that the combination of template region-based and feature-based methods can achieve higher accuracy in SAR-optical heterologous image registration than region-based methods. This type of method usually includes three main steps: feature point extraction, region feature matching, and matching point screening. Depending on the feature descriptors used, common classical methods include: Channel Features of Oriented Gradients (CFOG [
14]) proposed by Ye Y et al. and Angle-Weighted Oriented Gradients (AWOG [
15]), which improve the accuracy of gradient calculation on this basis; LNIFT (Locally Normalized Image Feature Transform) [
16], which uses ORB and HOG-like, SRIF (Scale and Rotation Invariant Feature transform) [
17], which has scale invariance on top of LNIFT; and LPHOG [
18] which combines line and point features and uses HOG-like (Histogram of Oriented Gradient) descriptors. The main methods for SAR and optical image registration that combine the SIFT algorithm within the domain of computer vision include PSO-SIFT [
19] (Position-Scale-Orientation SIFT), OS-SIFT [
20] (Optical-SAR SIFT), RTV-SIFT [
21] (Relative Total Variation). The main registration methods that use phase congruency include HOPC [
22] (Histogram of Phase Congruency), FED-HOPC [
23], etc.
Moreover, deep learning-based multimodal image registration research has made significant advancements in the past few years, such as utilizing deep learning networks for feature extraction [
24], modal transformation [
25] (GAN), and end-to-end registration [
26]. Deep learning methods often show superior performance. However, remote sensing image acquisition conditions—such as resolution, imaging conditions, and polarization factors—pose significant challenges. Existing methods struggle with a lack of diverse multimodal remote sensing images and generally display limited generalization abilities. Traditional methods for SAR-optical image registration continue to see widespread usage.
While carrying out heterologous image registration, the extraction of feature points and the screening of matching points are very important links. Only by effectively eliminating outliers in matching point pairs can a correct geometric transformation model be constructed. The prevalent technique for feature point extraction in contemporary heterologous image registration is the Block-Harris method [
14]. This method efficiently extracts a substantial quantity of feature points. Next, it employs Non-Maximum Suppression (NMS) to filter the matching points [
27]. Finally, RANSAC [
28] is utilized to hypothesize a validation model to further eliminate outliers of matching point pairs [
29]. RANSAC performs robustly when a majority of correct point pairs exist, but when the proportion of outliers increases, its performance will decrease significantly. Especially when the random distribution of outliers causes incorrect correspondences to be fitted, the computational burden of the algorithm increases, and produces an incorrect geometric transformation model. Fast Sample Consensus (FSC) [
30] effectively improves the computational efficiency of RANSAC by optimizing sampling technology. In the past two years, some scholars have also proposed new methods to improve the accuracy of matching point pairs, such as One-Class SVM [
31], and CSC [
32] (Cascaded Sample Consensus).
In scenarios where ground objects in the observation area exhibit richness and significant structural and textural features, methods such as CFOG can proficiently achieve precise registration of SAR and optical heterologous images [
14]. However, it is often necessary to process images with weak feature regions including land-water interface areas characterized by extensive water bodies and deserts with parse structural features in practical applications. At this juncture, the Block-Harris method, widely employed in heterologous image registration, primarily focuses on the importance of structural features and ignores regional variations in weaker feature objects. Consequently, several feature points will also be extracted on the water surface and the desert. These feature points are usually distributed randomly, and even if most of them are eliminated after the processing of regional feature matching and traditional NMS matching point screening, there are still often a few of false matching points remaining in weak feature regions. Their large geometric errors can easily reduce the accuracy of the geometric transformation model, which leads to a significant decrease in the performance of the traditional methods.
Therefore, a regional adaptive processing mechanism and a more comprehensive matching point screening criterion must be introduced into the registration method to determine whether the local region is a weak feature region such as water and desert, so as to control the number of feature points and matching points pairs in such feature regions. Meanwhile, since islands, sporadic villages, and roads are often scattered in a large area of images with weak feature regions, and such features are usually the effective feature regions for extracting highly reliable feature points and obtaining matching points, the use of a sloppy region determination criterion is likely to eliminate a small number of these effective feature regions. This could lead to a deficiency in the number of matching point pairs, thereby diminishing the precision of the geometric transformation model or potentially causing image mismatch.
To significantly improve the robustness of registration in weak feature regions, this paper focuses on two aspects, namely feature point extraction and matching point screening, and proposes a registration method based on region-adaptive keypoint selection that realizes high-precision registration of SAR and optical images with weak feature regions. First, the method adopts a dual threshold criterion of regional information entropy and variance product to effectively identify the weak feature regions. This method aims to avoid extracting the feature points in the weak feature regions while protecting the feature points present in the small number of effective feature regions as much as possible; furthermore, based on the MOGF (Multi-scale Oriented Gradient Fusion) [
33] regional feature descriptor to generate similarity maps, a matching point screening method combining similarity map skewness and NMS is proposed, which not only effectively avoids generating false matching points in the weak feature regions such as waters and deserts, but also maximally preserves correct matching points in the effective feature regions such as islands, sporadic villages, and roads in a large area of the weak feature region.
The remainder of this paper is organized as follows: in
Section 2, we elaborate on the framework of the registration method and the specific implementation of each processing step; in
Section 3, we conduct experiments on different data and verify the effectiveness of the registration method based on region-adaptive keypoint selection by using subjective and objective evaluation criteria; in
Section 4, we analyze the impact of key parameters on the registration performance and summarize the limitations of the method; finally, we present conclusions.
2. Method
Figure 1 illustrates the framework of the registration method proposed in the paper, which consists of three main parts: (i) region-adaptive feature point extraction; (ii) region feature matching (to obtain the similarity map); (iii) matching point screening. For a set of SAR-optical heterologous images, we first extract uniformly distributed feature points using the proposed VE-FAST (Variance and Entropy-Features from Accelerated Segment Test) feature point detection method. Following this, it determines the template area and the search area centered on the extracted feature points to construct feature descriptors that facilitate the generation of a similarity map. Finally, the proposed SNMS (combination of Skewness and NMS) matching point screening method is used to filter the false matching point pairs and build the correct geometric transformation model. The details of each process will be described in detail next.
2.1. Region-Adaptive Feature Point Extraction Based on VE-FAST
Image structural features mainly include edge features, corner point features, and contour features, while texture features mainly refer to statistical features, frequency domain features, and model features. As mentioned earlier, existing methods mostly rely on the structural features (e.g., edges and corner points) of an image for registration. However, when an image contains a large number of weak feature regions, the registration results of traditional methods tend to deviate from the expected ones. Given that the weak feature regions tend to exhibit uniform texture, this paper combines the texture features such as regional information entropy and variance. It introduces a region-adaptive feature point extraction method capable of recognizing the weak feature regions (such as waters, deserts, and other natural regions) and the effective feature regions (which mainly consist of man-made regions such as cities, and farmlands).
In the field of computer vision, there are several mature methods for the recognition of weak feature regions, such as using common image segmentation methods to directly distinguish between weak and effective feature regions. Although these methods can realize region adaptation in SAR-optical registration, their high computational effort is comparable to that of the registration methods, which is not suitable for efficient applications. Since the region-adaptive feature point extraction for SAR-optical registration only needs to extract uniformly distributed feature points in the effective feature region without considering the rotational and scale differences between heterologous remote sensing images, this paper chooses to base the feature point detection on optical images to circumvent the interference of the multiplicative speckle noise of the SAR images on the feature point operator. Commonly used corner detection methods include the Harris operator, the DOG operator in SIFT, and the FAST algorithm in ORB. Among them, the FAST algorithm is fast and has high localization accuracy, which can be applied to feature matching of heterologous images. However, it is prone to clustering phenomenon; therefore, this paper introduces the blocking strategy and combines the texture features including information entropy and variance. Then this paper proposes the VE-FAST method, which has high computational efficiency and registration performance while maintaining uniformity in the spatial distribution of feature points. The method flow chart is shown in
Figure 2.
The method first introduces the blocking strategy of UND-Harris [
34] operator, which divides the image into
image blocks, detects the FAST features individually for each image block, records all the FAST point scores, and arranges them in descending order. According to the information theory approach, let an image
with
gray levels, where the probability of the occurrence of the
th (
) gray level is
, then each pixel band has the amount of information
, and the information entropy of each image block can be calculated according to Formula (1). The size of the information entropy reflects the texture transformation of each image block, and the information entropy of the weak feature region is usually small. For each image block with information entropy larger than the set threshold
, the first
points are selected as candidate points, and to avoid the effective feature region of a small area being neglected,
candidate points are extracted from the image blocks that do not satisfy the set threshold of information entropy. Then the number of candidate points
ranges from
.
Next, for the problem that there are also a small number of candidate points in the weak feature region, the simple texture feature of image variance is introduced, and for a region of size
, the variance is computed by Formula (2):
where
is the pixel average of the image, given by (3).
Here, we choose to directly utilize the template area and search area determined by the candidate points, calculate the variance of the two areas corresponding to each candidate point, and then construct the variance product
for the candidate points from Formula (4). If the variance of both the optical image and SAR image area where the candidate point is located is small, then the variance product does not satisfy the threshold
, which means that the grayscale change of the image in the area where the candidate point is located is not obvious. Therefore, the quality of this candidate point is considered poor, and it is not suitable to be used as a feature point for the subsequent descriptor construction.
In the above formula, is the variance of the template area, is the minimum value of the variance of the template area, is the maximum value of the variance of the template area, is the variance of the search area, is the minimum value of the variance of the search area, is the maximum value of the variance of the search area.
The result of VE-FAST using the variance product to exclude the weak feature region and outputting the final feature points is shown in
Figure 3. It can be seen that the blue candidate points located in the weak feature region have a lower variance product value, which is basically eliminated so as to realize the region-adaptive feature point extraction, but there may still be false feature points due to the threshold setting marked in yellow in
Figure 3, which will be screened out by the SNMS method used subsequently.
In this paper, three sets of images are used for experiments and compared with FAST, Block-FAST, and Block-Harris methods. The final feature point extraction results obtained are shown in
Figure 4. Block-Harris has better detection performance when the whole image texture feature information is rich, and it is used by many heterologous image registration methods. But when feature points are extracted from images containing a large number of weak feature regions, it will appear as shown in
Figure 4, which shows that many feature points extracted by Block-Harris are distributed in the weak feature regions. FAST is one of the popular feature detection operators, which is fast and has high localization accuracy, but it is prone to the clustering phenomenon shown in
Figure 4, and the distribution of feature points is too centralized, which will affect the accuracy of the subsequent geometrical model solving, whereas Block-FAST, which introduces the blocking strategy, suffers from the same problem as Block-Harris. In contrast, the VE-FAST method proposed in this paper can ensure that the feature points are uniformly distributed in the effective feature region, while almost no feature points are extracted in the weak feature region.
This paper quantitatively analyzes the extraction effect of the four feature extraction methods on the above three sets of images from three criteria: the number of effective feature points (NEFP), the total number of feature points (TNFP), and the effective ratio (ER). The results are shown in
Table 1, which shows that the feature points extracted by VE-FAST are basically all effective feature points.
Figure 5 shows the run time of the four feature point extraction methods for different image sizes (from
to
). All the experiments are conducted on Intel i9-13900HX 2200 MHz and 32 G RAM PC. It can be seen that VE-FAST outperforms Block-Harris, is slightly lower than the FAST and Block-FAST, and has high computational efficiency.
2.2. Similarity Map Generation Based on Region Feature Matching
The template matching method needs to construct feature descriptors for the template area of the reference optical image and the search area of the SAR image to be matched, respectively, after the feature points are extracted. Considering that the scale and rotation differences between images have been basically eliminated based on the geometric parameters of the system, the single-scale MOGF descriptor is selected in this paper, which has better computational efficiency and registration performance.
The gradients
and
of the image in the
and
directions are first obtained using filters. Considering the robustness of the method, the Sobel operator, which takes into account a combination of factors, is chosen to extract the gradient information for optical images, and the ROEWA operator proposed in the literature [
35] is chosen for SAR images due to the effect of multiplicative speckle noise.
The gradient magnitude
and gradient orientation
of the image can be calculated by using Formulas (5) and (6).
Then the weighted gradient magnitudes
and
are obtained using Formula (7).
is the image pixel right feature orientation. The definition is given by (8):
where ⌊ ⌋ denotes the downward rounding operator. The gradient orientation
is divided into
equal parts to obtain the angular interval
.
A statistical window is set on the weighted gradient magnitude to form a feature vector for each image pixel, which is then aligned along the z-axis to form a 3D dense structure feature descriptor. Then we choose the Sum of Squared Difference (SSD) between the descriptors as the similarity measure. The formula of SSD is as follows:
where
denotes the MOGF descriptor of the template area corresponding to the reference optical image, and
denotes the MOGF descriptor of the search area corresponding to the SAR image to be matched; and the variable
denotes the offset vector of the search area relative to the center pixel of the search area.
When
is the smallest, it corresponds to the best matching position. Thus, the search problem for matching points is converted to compute the offset vector
, which minimizes
.
Since
is a constant, it can be neglected in the calculation, and the latter two terms are converted to the frequency domain using FFT.
In Formula (11), the first term on the right side of the equation represents the autocorrelation of the SAR images to be matched. For the normalized descriptor, this term is close to a constant and has little effect on the subsequent calculations, so it finally simplifies to:
The Formula (12) generates a similarity map, and matching points can be identified by locating the position of the maximum value in the similarity map.
2.3. SNMS-Assisted Matching Point Screening
Matching point screening is crucial to the construction of the geometric transformation model, and the abnormal outliers may lead to a decrease in registration accuracy or even wrong matching. After obtaining the similarity map, NMS is performed and high-quality matching points can be effectively extracted in general feature-rich regions. However, when a small number of false feature points are located in the weak feature region, as shown in
Figure 6e–h, the similarity map usually exhibits the characteristics of multiple peaks and random chaos. The histogram is symmetric on the left and right sides, and the data are concentrated on the mean sides, which cannot be effectively removed only by using the traditional NMS.
The conventional method uses NMS to solve the multi-peak problem of the similarity maps [
6], however, since the ratio of its primary and secondary peaks is higher than a set threshold and the primary and secondary peaks are far away from each other, the matching points in weak feature regions cannot be effectively eliminated by NMS alone, thus a set of false matching points are incorrectly obtained. On the basis of applying NMS, we further analyze the histogram parameters of the similarity map, calculate its skewness index, and propose the SNMS matching point screening strategy that combines the skewness of the similarity map with NMS. The method achieves the fast screening of the matching points by taking feature consistency as the criterion and finally uses FSC to reject outliers that are inconsistent in geometric relationships. The FSC can stably extract correct matching point pairs, with fewer iterations and higher computational efficiency compared to the RANSC. The overall processing flowchart is shown in
Figure 7.
The traditional NMS needs to find the primary peak and secondary peak through iteration, which is time-consuming, so this paper adopts a simplified NMS. First, the similarity map
is normalized by Formula (13), and converted to a 0–1 grayscale map
. Then the peak
of the similarity map is found to serve as the primary peak
, and the width of the non-searching window
is set centered on the peak. After that, the peak
of the similarity map outside the non-searching window is found to serve as the secondary peak
, as shown in
Figure 8. Finally, the primary and secondary peak windows are set centered on the primary and secondary peaks, respectively, with the width
, and the Intersection over Union (
IoU) of the two windows is calculated.
In the above formula, is the minimum value of the similarity map and is the maximum value of the similarity map.
When the primary and secondary peak ratio is small, it indicates that the primary and secondary peaks are less differentiated, and the similarity map is less reliable at this time. When
IoU is small, it indicates that the secondary peak is farther away from the primary peak, and the similarity map is more reliable. Therefore, the primary and secondary peak ratio
is redefined here by Formula (14):
where
is the machine epsilon introduced to prevent
from being 0, which is a number close to 0 but greater than 0. When
is larger than the set threshold
, the similarity map is considered reliable, thus outputting the matching points.
However, for false matching points distributed in weak feature regions, such as the case of
Figure 8b, they cannot be screened out by the above NMS alone, so in this paper, with reference to the processing idea of medical images, we start from the histogram parameter of the similarity map and further eliminate the false matching points by calculating the skewness. The results of
Figure 6 show that the histogram of the correct matching point’s similarity map has a longer tail on the right side and a shorter tail on the left side, with most of the data concentrated on the left side showing an overall right skewness, while the histogram of the false matching point’s similarity map is basically symmetric on the left and right sides, namely, zero skewness, eventually showing a left skewness with a longer tail on the left side and shorter tail on the right side.
The skewness
of the similarity map can be calculated through Formula (15):
where
is the mean,
is the
th element, and
is the number of samples.
The corresponding skewness of the eight histograms in
Figure 6 is shown in the
Table 2 below:
The difference in the similarity map skewness values between the correct matching points and the false matching points is significant, and when the skewness
of the similarity map is lower than the skewness threshold
, the matching points corresponding to the similarity map are considered to be false which are not outputted to form the final matching point pairs. Thus, the false matching points in
Figure 3 can be screened out. The final result is shown in
Figure 9, which shows that the points in the weak feature region are eliminated.
4. Discussion
In the preceding section of this paper, the registration experiments conducted on multiple sets of authentic SAR images and optical images clearly illustrate that the proposed method exhibits considerable superiority regarding both registration precision and operational efficiency. Based on the experimental images provided, we now concentrate on evaluating how the number of candidate points
affects registration performance.
Figure 17 shows the changes in CMR, RMSE, and run time with the number of candidate points, which provides an important reference for further optimizing the registration performance.
The choice in the previous experiments is to divide the experimental image into
blocks, the information entropy satisfies the threshold of the image block to extract 8
candidate points, and does not satisfy the threshold to extract
candidate points, so that the number of candidate points obtained is
. For the same set of experimental images,
blocks are fixed,
candidate points are extracted for those meeting the information entropy threshold, while
candidate points arise for those that do not satisfy the threshold, where
.
Figure 17 has the maximum number of candidate points as the horizontal axis. When
corresponds to the best CMR and RMSE results, as well as faster registration efficiency. However, this does not mean that any experimental images select this parameter for registration. In fact, when the image size is large, the number of candidate points can be increased appropriately, but it will relatively bring about a decrease in the registration efficiency, so one must strike a balance between registration accuracy and efficiency to determine the appropriate parameter.
For the limitations and applicability of the method, it shows good registration results in our currently available images with weak feature regions. However, since our current method does not have rotation and scale invariance, the method may not be able to obtain the correct matching point pairs when dealing with optical and SAR images with large rotation and scale differences, which may lead to registration failure. Besides the two keypoint selection aspects of feature point extraction and matching point screening, the universal region feature matching modules do not show robust performance for weak feature regions. Up to now, there is no authoritative literature published on matching modules for weak feature regions, and addressing these limitations will be a priority for our upcoming research efforts.