Robust Optical and SAR Image Registration Using Weighted Feature Fusion

Luo, Ao; Yu, Anxi; Zhang, Yongsheng; Tong, Wenhao; Yu, Huatao

doi:10.3390/rs17152544

Open AccessArticle

Robust Optical and SAR Image Registration Using Weighted Feature Fusion

by

Ao Luo

,

Anxi Yu

^*,

Yongsheng Zhang

,

Wenhao Tong

and

Huatao Yu

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2544; https://doi.org/10.3390/rs17152544

Submission received: 9 June 2025 / Revised: 16 July 2025 / Accepted: 20 July 2025 / Published: 22 July 2025

Download

Browse Figures

Versions Notes

Abstract

Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). To overcome this challenge, this article proposes a novel optical–SAR image registration method named Gradient and Standard Deviation Feature Weighted Fusion (GDWF). First, a Block-local standard deviation (Block-LSD) operator is proposed to extract block-based feature points with regional adaptability. Subsequently, a dual-modal feature description is developed, constructing both gradient-based descriptors and local standard deviation (LSD) descriptors for the neighborhoods surrounding the detected feature points. To further enhance matching robustness, a confidence-weighted feature fusion strategy is proposed. By establishing a reliability evaluation model for similarity measurement maps, the contribution weights of gradient features and LSD features are dynamically optimized, ensuring adaptive performance under varying conditions. To verify the effectiveness of the method, different optical and SAR datasets are used to compare it with the currently advanced algorithms MOGF, CFOG, and FED-HOPC. The experimental results demonstrate that the proposed GDWF algorithm achieves the best performance in terms of registration accuracy and robustness among all compared methods, effectively handling optical–SAR image pairs with significant regional heterogeneity.

Keywords:

image registration; optical and Synthetic Aperture Radar (SAR) images; local standard deviation; weighted feature fusion

1. Introduction

Multimodal remote sensing image registration serves as a fundamental prerequisite and critical step for numerous multimodal collaborative processing tasks and integrated remote sensing applications. The registration between optical images and SAR images is one of the typical multimodal image registration tasks and plays a crucial role in various remote sensing applications, including image fusion, change detection, and target recognition [1]. The key to the registration of optical images and SAR images is to identify as many correct corresponding point pairs as possible that are uniformly distributed on both types of remote sensing images to obtain the correct geometric transformation model between the two images [2]. However, owing to the different imaging principles and imaging platforms of optical images and SAR images, there are significant nonlinear radiation differences and geometric distortions between the images [3], as shown in Figure 1. To address these challenges, current research primarily employs traditional manual registration methods and deep learning-based registration methods. Among these, traditional manual registration methods can be further categorized into feature-based registration methods and area-based registration methods [4].

Feature-based registration methods are mainly divided into extracting features (such as corner points and edges), constructing feature descriptors, feature matching, and calculating a transformation model, thus achieving high-precision image registration [5]. The Scale-Invariant Feature Transform (SIFT) algorithm is the most widely used image registration algorithm in computer vision [6]. However, the speckle noise in SAR images causes the feature points detected by the traditional SIFT algorithm to be highly unstable, resulting in numerous incorrect matches. Therefore, it is difficult to apply the SIFT method directly to SAR image registration. Currently, many researchers have proposed corresponding improved algorithms to address this issue. Dellinger F et al. [7] proposed the SAR-SIFT method, which calculates the ratio gradient through the ROEWA operator [8] to enhance the resistance to speckle noise in SAR images. Xiang et al. [9] proposed the OS-SIFT method, which uses multi-scale Sobel and ROEWA operators to design consistent gradients for optical and SAR images, integrating multi-scale information to enhance the robustness of matching. Li et al. [10] proposed the RIFT method, which used the phase congruence (PC) [11] for feature point detection and constructed a maximum index map based on the log-Gabor convolutional sequence for feature description, thereby enhancing feature detection stability and radiometric invariance. However, it requires a strategy of ring feature calculation to overcome the rotation discrepancy, which is less technically efficient. Therefore, they proposed a new rotation strategy to optimize the efficiency [12].

Area-based registration methods focus on the comprehensive properties and resemblances of whole or segmental image regions. Such methods commonly utilize attributes including gray value, texture, and shape to ascertain matching regions through the computation of similarity measurement between designated regions [13]. Commonly used similarity measurement in template matching include sum of squared differences (SSD), normalized cross correlation (NCC) [14], and mutual information (MI) [15]. In recent years, many scholars have conducted in-depth research on this method. Ye et al. [16] proposed a novel structural feature descriptor, the Histogram of Orientated Phase Congruency (HOPC), which replaces traditional Histogram of Oriented Gradient (HOG) [17] with PC features. The method enhances registration precision but suffers from computational inefficiency that needs to be addressed. To address this issue, a fast and robust pixel-wise template matching framework was developed using Channel Features of Oriented Gradients (CFOG) [18], where the SSD similarity measure was optimized in the frequency domain via Fast Fourier Transform (FFT) to accelerate matching performance. Fan et al. [19] improved CFOG using the Angle-Weighted Oriented Gradients (AWOG) strategy, distributing the gradient values to the two most relevant directions to enhance registration accuracy. Tong et al. [20] proposed the Multi-scale Orientated Gradient Fusion (MOGF) operator based on AWOG, which employs multi-scale gradient quantization decomposition for feature description, along with convolutional merging and feature dimensionality reduction to enhance the accuracy and efficiency of image registration. Ye et al. [21] proposed the FED-HOPC registration method, which distinguishes texture regions through the Weighted Edge Density (WED) map to improve the registration accuracy of the HOPC descriptor and used FFT to accelerate the matching calculation, thereby enhancing the robustness of the registration algorithm.

Recently, deep learning has been utilized to cross-source image registration because of its strong feature extraction abilities. These methods improve the precision and robustness of registration by learning deep-level representations of different pattern images [22]. Current research in the domain of image registration primarily focuses on two approaches: feature descriptor learning and style transfer. In feature descriptor learning, deep neural networks have effectively supplanted conventional feature extraction and description processes in image registration. By leveraging hierarchical feature representations learned from data, these deep learning-based methods significantly enhance both registration accuracy and robustness [23,24,25]. In the field of style transfer, advanced models such as Generative Adversarial Networks (GANs) have been successfully employed to first achieve modality unification through sophisticated style transfer algorithms. This preprocessing step enables subsequent registration of the transformed images using either traditional area-based or feature-based registration methods [26,27,28]. However, the development of deep learning-based registration methods currently faces substantial data-related constraints. The inherent complexity, extensive spatial coverage, and heterogeneous noise characteristics of multimodal remote sensing imagery make it challenging to construct sufficiently large and representative training datasets. This data scarcity severely constrains the development potential of deep learning-based registration methods in terms of feature extraction, model generalization, and performance optimization. Therefore, traditional manual registration methods remain widely employed in current applications.

In this paper, we propose a novel optical–SAR image registration method named GDWF, which addresses the limitations of traditional approaches that rely solely on single gradient features (e.g., CFOG, AWOG) in scenarios with significant regional differences. The key innovation of GDWF lies in its dual-feature adaptive fusion strategy, which employs an intelligent weighting scheme to automatically balance feature contributions. This approach achieves both precise registration in structured environments and robust performance in texture-deficient areas. Specifically, in high-texture regions (e.g., urban buildings, farmlands, and other man-made structures), the method preserves the discriminative capability of gradient features, while in low-texture areas (e.g., oceans, forests, and natural landscapes), it compensates for gradient feature limitations by incorporating LSD features that demonstrate superior performance in repetitive textures. This balanced approach enables GDWF to achieve superior performance across diverse and challenging registration scenarios. The main contributions of this paper are summarized as follows:

A Block-local standard deviation feature point extraction operator is proposed, which can accurately identify the effective texture regions in the image, avoiding the problem that feature points in weak texture regions may affect the matching accuracy. It can precisely and efficiently achieve region-adaptive feature point extraction.
A dual-modal feature description operator is proposed that integrates local standard deviation texture features with traditional gradient features. By combining the geometric sensitivity of gradients with the statistical robustness of local standard deviation features, the proposed operator significantly enhances the robustness of feature descriptions. In optical–SAR image registration tasks, it effectively improves both fault tolerance and generalization performance.
A confidence-weighted dual-modal fusion feature matching strategy is proposed. By establishing a confidence evaluation model based on the peak side lobe ratio and the integral side lobe ratio of the similarity measurement map, the contribution weights of gradient features and local standard deviation features can be adaptively optimized, enhancing the reliability of the similarity measurement map and significantly improving the accuracy of image registration.

The remainder of this article is organized as follows. Section 2 introduces the registration framework of the GDWF algorithm proposed in this paper. Section 3 details the datasets used in our experiments, presents the experimental results, and provides analysis to validate the effectiveness of the proposed method. Section 4 analyzes the impact of template size on registration performance and conducts ablation studies to demonstrate the necessity and advantages of each component in the proposed method. Finally, the conclusion is presented in Section 5.

2. Methods

This section presents a robust registration method for optical–SAR image registration. Figure 2 illustrates the flowchart of the proposed algorithm, which consists of four main components: (1) Feature Extraction: The Block-LSD operator is employed to detect reliable and uniformly distributed feature points in the optical image. For each feature point, a template region is established, and a corresponding search region is defined in the SAR image. (2) Feature Description: Gradient-based and LSD-based feature descriptors are constructed for both the template regions (optical image) and search regions (SAR image) to enhance discriminative matching. (3) Adaptive Feature Fusion: The similarity measurement maps of the two feature types are adaptively weighted and fused based on their confidence scores, ensuring robust matching performance. (4) Outlier Elimination: The initial correspondence set obtained from template matching undergoes geometric verification via the Fast Sample Consensus (FSC) algorithm [29] to discard spatial outliers.

2.1. Block-LSD Operator

In the optical–SAR image registration task, extracting high-quality feature points serves as the core step to achieve high-precision registration. To construct a reliable geometric transformation model, it is essential to extract as many feature points as possible from remote sensing images. These feature points should be evenly distributed and robust, and their surrounding areas should be effectively represented through local invariant feature descriptors.

Although traditional feature point detection methods based on image intensity or gradient information, such as the traditional Harris [30] and FAST [31] operators, are widely used in computer vision, they exhibit significant limitations when applied to remote sensing images with substantial regional differences. In weak texture regions, the detection of feature points is susceptible to misjudgment due to weak gradient features, uniform gray-level distribution, and local noise interference, which significantly reduces the matching accuracy. To address these issues, Ye et al. [21] proposed the WED operator, which effectively distinguishes weak texture regions from effective texture regions by fusing PC and edge density features. However, due to the frequency domain calculations required for PC, this method exhibits high computational complexity, leading to an efficiency bottleneck in practical applications. Therefore, this study focuses on developing an efficient feature extraction operator capable of accurately identifying effective texture regions in heterogeneous scenes while ensuring a sufficient number of feature points, even distribution, and strong robustness.

Texture constitutes a fundamental visual characteristic that captures homogeneous spatial patterns within an image, representing intrinsic surface properties of objects. This perceptual feature encodes critical structural information about both the spatial organization of surfaces and their contextual relationships with surrounding elements. Image regions demonstrating slow spatial variation or periodic intensity patterns in their local statistics are typically classified as textured areas [32,33].

Inspired by the research on texture segmentation in medical images [34], the LSD emerges as an efficient and computationally lightweight approach for texture characterization. This statistical measure quantifies intensity variations within local image neighborhoods, demonstrating distinct discriminative capabilities: smooth texture regions exhibit minimal intensity fluctuations (yielding low LSD values), while complex textures produce significant variations (resulting in high LSD values). The LSD’s ability to capture these differential patterns enables robust texture classification, forming a critical foundation for advanced image processing tasks including feature extraction, object detection, and segmentation.

For a given image I, the LSD

σ (i, j)

within the local neighborhood (

n \times n

window) of a pixel point

(i, j)

is defined as follows:

σ (i, j) = \sqrt{\frac{1}{n^{2}} \sum_{x = i - ⌊ n / 2 ⌋}^{i + ⌊ n / 2 ⌋} \sum_{y = j - ⌊ n / 2 ⌋}^{j + ⌊ n / 2 ⌋} {[I (x, y) - μ (i, j)]}^{2}}

(1)

μ (i, j) = \frac{1}{n^{2}} \sum_{x = i - ⌊ n / 2 ⌋}^{i + ⌊ n / 2 ⌋} \sum_{y = j - ⌊ n / 2 ⌋}^{j + ⌊ n / 2 ⌋} I (x, y)

(2)

where

⌊ \cdot ⌋

denotes the floor operation, which extracts the largest integer not greater than the value inside the parentheses. n is an odd number, representing the size of the sliding window,

μ (i, j)

denotes the local mean within the

n \times n

windows, and

I (x, y)

denotes the gray-level intensity of the image at coordinate

(x, y)

.

After constructing the LSD texture intensity distribution map for the entire image, the maximum and minimum values are first normalized. Subsequently, a grid-based block partitioning strategy is employed to uniformly divide the image into blocks. Through the dual control of the initial threshold and the dynamic optimization mechanism, a two-stage screening process is performed in each sub-region. First, low-value regions of local standard deviation (e.g., water surfaces and other weak texture regions) are filtered out based on the preset threshold. Then, candidate points that pass the test are sorted in descending order of standard deviation, with the top k maximum value points retained as feature points. This strategy ensures a high capture rate in effective texture regions while suppressing the interference of pseudo-features in weak texture regions, thereby enhancing the quality and robustness of feature points. By enforcing uniform distribution of feature points through a grid-based block approach, it mitigates matching deviations caused by spatial aggregation, ensuring global spatial consistency and registration accuracy, and substantially improves image registration performance in heterogeneous scenes.

Figure 3 illustrates the optical image and its corresponding LSD intensity map. As can be observed from the figure, the prominent regions of the LSD intensity map are predominantly concentrated in the effective texture regions, whereas the response in weak texture regions is relatively subdued. This demonstrates that the LSD features effectively differentiates between weak texture regions and effective texture regions, thereby providing a robust foundation for subsequent feature point extraction.

To verify the effectiveness of the Block-LSD operator, a comparative experiment was conducted with the Block-Harris operator, Block-FAST operator, and Block-WED operator after introducing the block strategy. The results are presented in Figure 4, while the efficiency comparison is summarized in Table 1.

The experimental results demonstrate that, compared with the Block-Harris and Block-FAST operators, the feature points extracted by the Block-LSD operator are more uniformly distributed in effective texture regions, effectively avoiding false detection in weak texture regions. Moreover, although the Block-WED operator can also accurately locate feature points in effective texture regions, the Block-LSD operator exhibits significantly higher computational efficiency than the Block-WED operator, thereby showcasing superior performance in both effectiveness and efficiency for feature point extraction.

2.2. Dual-Modal Feature Description

The significant nonlinear radiation differences between optical and SAR images lead to inconsistent gray-level representations of identical ground objects, which may induce gradient inversion phenomena [35]. In this context, LSD features demonstrate unique advantages due to their statistical nature. As a texture descriptor based on local statistical distributions, LSD achieves gray-value invariance by quantifying pixel intensity dispersion within local regions. This fundamental property provides exceptional robustness against nonlinear radiometric variations between optical and SAR modalities. The descriptor maintains stable statistical characteristics even under contrast stretching or nonlinear transformations. Crucially, by operating independently of gradient direction information, LSD effectively eliminates matching errors caused by gradient inversion, offering a reliable solution for cross-modal image registration challenges.

Although feature point selection deliberately excludes weak-texture regions during extraction, the inherent limitations of template coverage inevitably incorporate portions of these areas. This partial inclusion of low-texture regions creates persistent challenges for completely mitigating their interference with feature descriptors. Importantly, this coverage constraint simultaneously establishes a natural complementarity between gradient and LSD features in multimodal registration, as each feature type compensates for the other’s weaknesses across different texture environments.

As shown in Figure 5, gradient features, owing to their direction sensitivity and precise geometric edge capture capability, exhibit obvious advantages over LSD features when handling scenes rich in structural features, such as urban building outlines and road networks, and can extract more detailed information. However, when the template includes weak texture regions, factors such as repetitive texture patterns, uniform gray-level distribution, and local noise interference may cause gradient features to produce highly similar responses in these areas, thereby resulting in false matches of corresponding points. In contrast, although LSD features are less effective than gradient features in depicting complex structural features such as urban building contours and road networks, when the template includes weak texture regions, the intensity of LSD features in these regions remains relatively low in both optical and SAR images, thereby effectively avoiding matching interference caused by weak texture regions. This characteristic not only substantially diminishes the risk of false matching but also significantly enhances the reliability of matching by emphasizing the correspondence between effective textures within the region. This complementarity offers theoretical support for constructing a robust multimodal registration framework. By integrating the geometric sensitivity of gradient features with the statistical robustness of LSD features, the fault tolerance and generalization capability of multimodal image registration can be effectively improved.

In the previous section, we introduced the LSD feature using Formula (1). The following section outlines the process of constructing the gradient feature description in the proposed algorithm. For a given optical image and SAR image, the Sobel operator and ROEWA operator are first employed to compute the directional gradients in the x-direction and y-direction for the optical image and SAR image, respectively. Subsequently, the gradient magnitude

I G

and gradient direction

O G

are calculated through Formulas (3) and (4).

I G (x, y) = \sqrt{g_{x} {(x, y)}^{2} + g_{y} {(x, y)}^{2}}

(3)

O G (x, y) = arctan (\frac{g_{y} (x, y)}{g_{x} (x, y)})

(4)

where

g_{x} (x, y)

and

g_{y} (x, y)

respectively denote the directional gradients in the x-direction and the y-direction. To mitigate the impact caused by the nonlinear radiation differences between optical images and SAR images, the values in the gradient direction

O G

that fall within the

[0^{\circ}, 360^{\circ})

range are uniformly reduced by

180^{\circ}

, adjusting the value range of

O G

to

[0^{\circ}, 180^{\circ})

. Next, it is necessary to calculate the angles

θ_{1}

and

θ_{2}

between the gradient direction

O G

and the left and right feature directions, respectively. The calculation formulas are as follows:

Δ θ = \frac{π}{N - 1}

(5)

I R F (x, y) = ⌊\frac{O G (x, y)}{Δ θ}⌋

(6)

θ_{2} (x, y) = O G (x, y) - I R F (x, y) \cdot Δ θ

(7)

θ_{1} (x, y) = Δ θ - θ_{2} (x, y)

(8)

where

Δ θ

denotes the angle interval, N denotes the number of gradient direction channels, and

I R F (x, y)

denotes the index of the right feature direction for the image pixel, which is utilized for subsequent accelerated matching. Subsequently, the weighted gradient magnitudes

W I G_{1} (x, y)

in the left direction and

W I G_{2} (x, y)

in the right direction are calculated separately. The corresponding calculation formulas are as follows:

W I G_{1} (x, y) = \frac{θ_{1} (x, y)}{Δ θ} \cdot I G (x, y)

(9)

W I G_{2} (x, y) = \frac{θ_{2} (x, y)}{Δ θ} \cdot I G (x, y)

(10)

Finally, statistical windows are defined over

W I G_{1}

and

W I G_{2}

to construct feature vectors. These feature vectors are then aligned along the z-axis pixel by pixel and subsequently filtered and normalized via convolution, yielding the gradient feature description

G (x, y, z)

of the image.

The LSD feature description

LSD (x, y, z)

is computed using Formula (1) and subsequently normalized. By concatenating the two features along the z-axis, the final feature descriptor

D (x, y, z)

can be obtained.

D (x, y, z) = G (x, y, z) \oplus LSD (x, y, z)

(11)

where ⊕ is concat operator.

2.3. Weighted Normalized Cross-Correlation

Following feature descriptor construction for both template and search regions, the matching process proceeds by evaluating similarity measures between corresponding feature vectors. The matching points in the SAR image are determined through extremum detection on the resulting similarity map. To enhance matching reliability, we propose a confidence evaluation mechanism for assessing extremum point reliability along with a weighted normalized cross-correlation (WNCC) method that incorporates confidence weights into the similarity analysis. This combined approach effectively improves the accuracy of corresponding point identification between optical and SAR imagery. The detailed implementation of the feature matching method is illustrated in Figure 6.

The calculation formula of the traditional NCC method is presented as follows:

N C C = \frac{\sum (D_{o p t} (x, y) - {\bar{D}}_{o p t}) (D_{s a r} (x, y) - {\bar{D}}_{s a r})}{\sqrt{\sum {(D_{o p t} (x, y) - {\bar{D}}_{o p t})}^{2} \cdot \sum {(D_{s a r} (x, y) - {\bar{D}}_{s a r})}^{2}}}

(12)

where

D_{o p t} (x, y)

and

D_{s a r} (x, y)

respectively denote the single-layer feature descriptors of optical image and SAR image,

{\bar{D}}_{o p t}

and

{\bar{D}}_{s a r}

respectively denote the mean values of

D_{o p t} (x, y)

and

D_{s a r} (x, y)

. Since the Fast Fourier Transform (FFT) technology can significantly improve computational efficiency in the frequency domain [36], the cross-correlation calculation is therefore implemented in the frequency domain:

N C C = \frac{F^{- 1} [F (D_{o p t} (x, y) - {\bar{D}}_{o p t}) \cdot F^{*} (D_{s a r} (x, y) - {\bar{D}}_{s a r})]}{\sqrt{\sum {(D_{o p t} (x, y) - {\bar{D}}_{o p t})}^{2} \cdot \sum {(D_{s a r} (x, y) - {\bar{D}}_{s a r})}^{2}}}

(13)

where F denotes the Fourier forward transform,

F^{- 1}

denotes the Fourier inverse transform, and

F^{*}

denotes the complex conjugate of F. By identifying the position of the maximum value in the NCC similarity map, the corresponding matching points can be determined. Since the denominator is a constant, the offset v between the two images can be calculated by the following formula:

\begin{matrix} v & = arg max \{F^{- 1} [F (D_{o p t} (x, y) - {\bar{D}}_{o p t}) \cdot F^{*} (D_{s a r} (x, y) - {\bar{D}}_{s a r})]\} \end{matrix}

(14)

However, determining the offset solely based on the position of the maximum point has inherent limitations. As illustrated in Figure 7, the focusing effect of the similarity map considerably influences the accuracy of the matching results. When the energy concentration of the similarity map is low, its peak distribution becomes more dispersed, which may cause the maximum point to deviate from the true matching point and consequently reduce the registration accuracy. Conversely, a similarity map with superior focusing performance enables the maximum point to more reliably converge toward the true matching point, thereby improving both the reliability and precision of feature matching.

For this purpose, this paper proposes a confidence evaluation model for similarity maps based on the peak side lobe ratio (PSLR) and the integrated side lobe ratio (ISLR). This model draws inspiration from the PSLR and ISLR metrics in the radar point target imaging quality evaluation system [37], while making adaptive adjustments tailored to the feature matching problem. Specifically, the neighborhood with a fixed radius around the maximum value point in the similarity map is defined as the main lobe region, while the remaining area is considered as the side lobe region. PSLR can effectively characterize whether the similarity map exhibits a significant multi-peak distribution phenomenon by calculating the ratio of the main lobe peak to the side lobe peak. ISLR can reflect the focusing performance of the similarity map through calculating the energy ratio of the main lobe region to the side lobe region. Compared with traditional methods that solely depend on a single peak position, this evaluation system comprehensively takes into account the global characteristics of energy distribution and can more accurately reflect the reliability of the similarity map. The calculation formulas for PSLR, ISLR, and confidence are presented as follows:

PSLR = \frac{P_{m a i n}}{P_{s i d e}}

(15)

ISLR = \frac{\sum_{(i, j) \in M} P {(i, j)}^{2}}{\sum_{(i, j) \in S} P {(i, j)}^{2}}

(16)

Confidence = PSLR \cdot ISLR

(17)

where

P_{m a i n}

denotes the peak value of the main lobe,

P_{s i d e}

denotes the peak value of the side lobe, M denotes the main lobe region, S denotes the side lobe region, and

P (i, j)

denotes the value corresponding to coordinate point

(i, j)

on the similarity map. The product formulation enforces a stringent ‘AND’ condition, requiring simultaneously high PSLR and ISLR values to achieve high confidence. This mathematical relationship naturally embodies the dual requirement that a reliable similarity map must exhibit both localized peak sharpness and global energy concentration, ensuring comprehensive quality assessment. Considering the requirements for numerical stability in energy calculation, the similarity map should be preprocessed with min–max normalization to map its dynamic range to the interval

[0, 1]

. Subsequently, the formula for calculating WNCC can be expressed as follows:

WNCC = NCC \cdot Confidence

(18)

Considering that the gradient feature description exhibits a multi-level structure, when calculating its similarity measure, it is essential to extend WNCC into three-dimensional space and perform integrated normalization of confidence values across all levels. This approach ensures both the accuracy and comprehensiveness of the measurement results. The calculation formula for 3D-WNCC based on gradient feature description is presented as follows:

{WNCC}_{G} = \sum_{z} ({NCC}_{G} (x, y, z) \cdot {Confidence}_{G} (z))

(19)

where

{N C C}_{G} (x, y, z)

denotes the NCC coefficient of the gradient feature description,

{C o n f i d e n c e}_{G} (x, y, z)

denotes the confidence value associated with the gradient feature description. Subsequently, the similarity maps derived from the gradient feature description and those obtained from the LSD feature description are fused with confidence weighting, thereby obtaining a comprehensive weighted fusion feature similarity map. The calculation formula is presented as follows:

\begin{matrix} {WNCC}_{Fusion} & = {WNCC}_{G} \cdot {Confidence}_{G} + {WNCC}_{LSD} \cdot {Confidence}_{LSD} \end{matrix}

(20)

Finally, the maximum value point is located on the similarity map

{WNCC}_{Fusion}

, and the coordinates of the matching points are subsequently determined. This feature fusion strategy, which integrates confidence weighting, not only fully exploits the complementary advantages of gradient features and LSD features but also achieves optimal integration of feature responses through a dynamic weight allocation mechanism, thereby substantially enhancing the accuracy and robustness of image registration.

Figure 8 illustrates the comparison of similarity maps generated by various feature description algorithms using the same feature point extraction operator. Through a systematic comparative analysis of these algorithms, it is evident that among the investigated methods, the GDWF algorithm demonstrates superior matching performance. More specifically, compared with other algorithms, the GDWF algorithm achieves the smallest positional deviation for corresponding homonymous points, thereby substantially enhancing the accuracy and reliability of the registration results.

2.4. Outlier Elimination

Following template matching, the corresponding coordinate pairs of homonymous points between the optical image and the SAR image are output. However, some incorrect matching points may still exist among these homonymous point pairs. Consequently, it is essential to conduct the elimination of these incorrect matching points. In this context, the FSC algorithm is utilized to further eliminate the incorrect matching point pairs. The FSC algorithm is an extension of the RANSAC algorithm [38], and by integrating subset sampling, it not only decreases computational time but also improves robustness.

After eliminating the abnormal points via the FSC algorithm, reliable correct matching point pairs can be obtained. Subsequently, based on these matching point pairs, an accurate geometric transformation model is established using the least squares method, thereby achieving a high-precision image registration result.

3. Results

In this section, we assess the performance of the proposed optical–SAR image registration algorithm. First, we provide detailed information on the datasets utilized in the comparative experiments. Second, we elaborate on the specific performance evaluation metrics adopted and the algorithm parameter configurations selected for this study. Subsequently, we compare and analyze the algorithm’s performance across different ground object scenes and large sample datasets. Finally, we discuss the influence of template size on the algorithm’s performance.

3.1. Datasets

This experiment adopts the SOPatch dataset [39] as the core experimental data. This dataset is constructed by integrating three sub-datasets: OS-dataset [40], SEN1-2 [41], and WHU-SEN-City [42]. It is characterized by multi-scale coverage and scene diversity, providing a standardized benchmark testing environment for optical–SAR image registration research.

The OS-dataset comprises 2673 pairs of high-resolution optical and SAR images, each with a size of 512 × 512 pixels. These images encompass multiple cities and their surrounding regions worldwide, featuring diverse objects such as buildings, vehicles, and various types of ships. This diversity provides a robust experimental environment for SAR and optical image registration.

The SEN1-2 dataset comprises image data acquired from the Sentinel-1 and Sentinel-2 satellites, including a total of 5428 optical–SAR image pairs. Sentinel-1 is a multi-polarization C-band SAR satellite, while Sentinel-2 is a multispectral optical remote sensing satellite. This dataset features an image resolution of 10 m and covers diverse terrains and seasonal characteristics globally, offering comprehensive data support for investigating optical–SAR image registration across various environments and conditions.

The WHU-SEN-City dataset is also based on the image data from Sentinel-1 and Sentinel-2 satellites but focuses on Chinese cities and their surrounding areas. This dataset encompasses a variety of land cover types, including mountains, lakes, forests, urban buildings, and farmlands, and consists of 1380 optical–SAR image pairs.

To construct a unified and standardized experimental dataset, the SOPatch dataset has been subjected to consistent processing of images from the aforementioned three sub-datasets. First, alignment operations were performed to resample and uniformly resize the images from different sources. Subsequently, the overlapping regions of these images were further resampled and uniformly cropped to a size of 512 × 512 pixels, enabling standardized processing in subsequent experiments. To address potential misalignment issues between image pairs, the RIFT and CFOG algorithms were employed for additional registration of the image pairs.

3.2. Evaluation Criteria and Parameter Setting

In this experiment, the performance of the proposed registration method is evaluated using four indicators: correct matching rate (CMR), root mean square error (RMSE), failure rate (FR), and algorithm time consumption (TIME).

CMR is defined as the ratio of the number of correctly matched points to the total number of matched points. In this experiment, matching points with a Euclidean distance of less than 3 pixels are considered correctly matched points [43]. That is,

C M R = \frac{C M N}{N}

(21)

where

CMN

denotes the number of correctly matched points, and N denotes the total number of matching points.

RMSE is an indicator used to measure the accuracy of the algorithm, and its value represents the square root of the average of the transformation errors across all matching point pairs. The calculation formula is presented as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} [{(x_{o p t}^{i} - x_{s a r}^{i^{'}})}^{2} + {(y_{o p t}^{i} - y_{s a r}^{i^{'}})}^{2}]}

(22)

where

(x_{o p t}^{i}, y_{o p t}^{i})

represents the coordinates of the matching points on the optical image, and

(x_{s a r}^{i^{'}}, y_{s a r}^{i^{'}}) = H [(x_{s a r}^{i}, y_{s a r}^{i})]

denotes the position coordinates of the matching points on the SAR image after being transformed onto the optical image by the geometric transformation model H.

FR serves as an indicator for evaluating the robustness of the algorithm. Specifically, it is calculated as the ratio of the number of image pairs that failed to achieve successful matching to the total number of image pairs considered. In this experiment, registration results with an RMSE value greater than or equal to 10 are defined as matching failures, which can be expressed as follows:

FR = \frac{1}{M} \sum_{i = 1}^{M} I (RMSE (i) \geq 10)

(23)

where

I

is the indicator function, which returns 1 when the condition is true and 0 otherwise, and M denotes the total number of image pairs for matching. A lower FR value indicates a higher level of robustness in the algorithm.

The running time of the algorithm serves as a critical metric for evaluating its efficiency. A shorter running time generally implies superior algorithmic performance in terms of computational efficiency.

In the experiments of this paper, the number of feature points extracted is set to 200, the size of the local standard deviation calculation window is set to

5 \times 5

, and the feature point screening threshold is set to 0.1; the angle interval of the gradient feature

Δ θ

is set to

22 . 5^{\circ}

, and the number of gradient directions N is set to 9; the template window size is set to

100 \times 100

, and the search window size is set to

20 \times 20

. The main lobe area is defined as a

10 \times 10

neighborhood range near the maximum value. Additionally, comparison methods such as MOGF, CFOG, and FED-HOPC all adopt their default optimal parameter configurations. All experiments were conducted on a Lenovo Legion laptop (Lenovo Group Ltd., Beijing, China) equipped with 64GB DDR5 RAM and an Intel Core i7-14700K processor (Intel Corporation, Santa Clara, CA, USA) operating at 3.40 GHz.

3.3. Experimental Results of Different Ground Object Scenes

In this section, six groups of optical–SAR image pairs covering diverse scene types were selected from multiple datasets. Through these experiments, the algorithm proposed in this paper was systematically compared with several state-of-the-art algorithms, including MOGF, CFOG, and FED-HOPC. The detailed information of the selected images is summarized in Table 2, while the corresponding optical–SAR images are illustrated in Figure 9. The registration results for these images are presented in Figure 10, and a comprehensive comparison of various performance metrics is provided in Figure 11.

As shown in Figure 10, the GDWF algorithm proposed in this study exhibits superior registration performance when handling typical optical and synthetic aperture radar (SAR) heterogeneous image registration scenarios, such as urban building clusters, farmlands, and port areas. Through visual analysis of the checkerboard mosaic images, it is evident that in all scenarios, the boundaries of ground objects remain highly continuous, and the texture transitions are smooth and natural, with no noticeable misalignment or breakage.

From a quantitative analysis perspective, as shown in Figure 11, the experimental results clearly demonstrate that the GDWF algorithm achieves the highest registration accuracy across six typical scenarios of optical and SAR registration tasks. Specifically, in port and coastal scenes, this algorithm attains sub-pixel registration accuracy (with RMSE values of 0.69 and 0.87, respectively), representing an improvement of up to 62.3% compared to the second-best algorithm. In high-complexity scenes such as urban areas and building clusters, the GDWF algorithm maintains a significant performance advantage, with RMSE improvements ranging from 7.0% to 28.6%. Additionally, the GDWF algorithm achieves the highest CMR metric values across all scenarios, with an average CMR of 86.8%, representing a 9.7% to 25.3% improvement compared to competing algorithms. Notably, it achieves perfect matching (CMR = 100%) in farmland and port scenes. Although its computational time (on average 1.59 s) is slightly longer than that of the MOGF and CFOG algorithms, it is 72.7% faster than the FED-HOPC algorithm. By integrating an innovative feature weighting mechanism, the GDWF algorithm achieves superior registration performance across these scenarios, thereby providing stable and reliable technical support for multi-scenario remote sensing applications.

3.4. Performance Evaluation Based on Large Sample Data

In this section, to comprehensively and thoroughly evaluate the robustness and registration accuracy of the proposed method, a multidimensional comparative experimental scheme is designed. The specific experimental settings are as follows: 2000 pairs of optical–SAR images are randomly selected from the OS-dataset; a similar number of samples are selected from the SEN1-2 dataset at the same scale; and 1000 pairs of images are chosen from the WHU-SEN-CITY dataset to evaluate performance in complex scenarios. On the basis of constructing a complete test set, the proposed method in this paper is systematically compared and analyzed with the current mainstream advanced algorithms such as MOGF, CFOG, and FED-HOPC. The experiment adopts a multidimensional evaluation system, measures the robustness of the algorithm through the FR index, and statistically calculates the three key indicators of RMSE, CMR, and TIME in the case of successful matching. The final comparison results are detailed in Table 3.

As shown in Table 3, this study systematically demonstrated the comprehensive performance advantages of the GDWF algorithm in terms of robustness, matching accuracy, and operational efficiency via multidimensional comparative experiments. Specifically, in terms of robustness, the GDWF algorithm achieved FR values of 1.2% and 1.5% on the OS-dataset and SEN1-2 datasets, respectively, which were 21.1–69.6% and 30–60% lower than those of the comparison algorithms. On the WHU-SEN-City dataset, its FR value was 2.5%, matching the performance of the best comparison algorithm, thereby fully demonstrating its strong adaptability to complex scenarios. In terms of matching accuracy, the GDWF algorithm demonstrated significant dual-index synergy advantages: on the OS-dataset, its RMSE value was 2.557, which was 8.9%, 16.6%, and 34.4% lower than those of MOGF, CFOG, and FED-HOPC, respectively; simultaneously, its CMR value reached 73.3%, significantly exceeding other algorithms by 4.5–20.7 percentage points, thereby achieving a dual breakthrough in error control and the number of effective matches. On the SEN1-2 dataset, the GDWF algorithm achieved a sub-pixel RMSE value of 0.752 (9.0% better than the second-best MOGF) and a near-saturation CMR value of 97.9%, further validating its high-precision characteristics in remote sensing image matching. On the WHU-SEN-City dataset, the GDWF algorithm maintained its superiority, with an RMSE value of 3.025, which was 2.9–8.6% lower than that of the comparison algorithms, and a CMR value of 57.5%, which was 0.8–4.1% higher than that of the comparison algorithms. In terms of algorithm efficiency, although the GDWF algorithm consumed slightly more time than MOGF and CFOG, it still maintained a speed advantage of more than three times over FED-HOPC, thereby demonstrating its excellent balance between accuracy and efficiency. Overall, the GDWF algorithm not only achieves high matching accuracy and efficient operation but also exhibits strong robustness in complex environments, making it the algorithm with the best comprehensive performance.

4. Discussion

4.1. The Impact of Template Size on Registration Performance

In this section, 2000 image sample groups were randomly selected from the OS-dataset for comparative experiments. To systematically examine the impact of template size on matching performance, multiple template windows with sizes ranging from 100 × 100 to 200 × 200 were specifically designed. The experiment employed the control variable method, maintaining all other parameters constant while solely adjusting the template size parameter. Subsequently, a quantitative analysis was performed on the variation trends of FR, RMSE, CMR, and TIME under different template window sizes.

As shown in Figure 12, the GDWF algorithm exhibits comprehensive performance advantages as the template size varies from 100 to 200. Specifically, regarding robustness, its FR consistently remains between 0.012 and 0.013, which is 30.8–81.8% lower than that of MOGF and 66.7–72.1% lower than that of CFOG. At the maximum template size of 200, its failure rate is merely 34.3% of CFOG’s, thereby fully demonstrating the algorithm’s superior robustness. Regarding registration accuracy, the RMSE value of the GDWF algorithm consistently remains the lowest throughout (2.517–2.557), and its accuracy improves by 1.6% as the template size increases, sustaining a stable lead of 8.1–8.9% over the second-best algorithm, MOGF. Meanwhile, its CMR value consistently remains within the high range of 72.6–73.3%, significantly outperforming other algorithms by 4.4–16.4 percentage points, with the smallest fluctuation range (±0.35%). This verifies the algorithm’s high-precision characteristics in multi-scale scenarios. In terms of algorithm efficiency, the time consumption increase of the GDWF algorithm is 133%, which is significantly lower than that of CFOG (187%) and FED-HOPC (145%). Specifically, when the template size reaches 200, its time consumption (4.2 s) is only 30.5% of FED-HOPC’s, thereby demonstrating that the algorithm achieves an excellent balance between accuracy and efficiency via its dynamic feature weight optimization mechanism. In conclusion, the GDWF algorithm offers a reliable and efficient solution for optical–SAR image registration tasks, characterized by its low failure rate, high accuracy, and well-controlled time consumption growth.

4.2. Ablation Study

In this section, we conducted comprehensive ablation studies using 2000 randomly selected image sample groups from the OS-dataset to systematically evaluate the effectiveness of our proposed method. To investigate the contributions of multi-feature fusion and weighted matching strategies, we designed four experimental configurations: (1) Gradient + WNCC, (2) LSD + NCC, (3) Gradient + LSD + NCC, and (4) our complete GDWF framework (Gradient + LSD + WNCC). The comparative results are presented in Table 4, demonstrating the clear advantages of our integrated approach.

The comprehensive experimental results presented in Table 4 demonstrate that our proposed GDWF framework, which combines multi-feature fusion with weighted normalized cross-correlation (WNCC), achieves superior performance in feature point matching tasks compared to alternative approaches. The analysis reveals three key insights: Firstly, single-feature methods exhibit significant limitations—Gradient+WNCC shows a 38.5% higher failure rate (1.8% vs. 1.3%) and 9.9% larger RMSE (2.832 vs. 2.557) than our GDWF framework, while LSD+NCC, despite its fastest computation time (0.46s), suffers from the worst matching performance (11.8% FR and 67.4% CMR), confirming that relying solely on either feature type is suboptimal. Secondly, while combining gradient and LSD features (Gradient + LSD + NCC) improves upon single-feature methods by reducing FR by 5.9% and increasing CMR by 4.5%, it still cannot match the GDWF framework’s performance, particularly in RMSE (2.683 vs. 2.557), indicating that simple feature concatenation is insufficient. Most importantly, the WNCC strategy demonstrates clear advantages when applied to fused features—compared to standard NCC, our complete GDWF framework achieves 23.5% lower FR (1.7% vs. 1.3%), 6.8% higher CMR (70.4% vs. 75.2%), and 4.7% smaller RMSE (2.683 vs. 2.557), verifying WNCC’s effectiveness in feature matching.

Although the GDWF framework requires slightly more computation time (1.80 s vs. 1.62 s), this 11.1% increase is justified by its substantial performance gains, particularly the 23.5% reduction in FR, representing an optimal trade-off between accuracy and efficiency. The GDWF framework achieves the best balanced performance across all metrics, with the lowest FR (1.3%), highest CMR (75.2%), and smallest RMSE (2.557), while being only 11.1% slower than the NCC baseline. Notably, it outperforms the worst-performing LSD + NCC by remarkable margins of 89.8% in FR, 17.2% in RMSE, and 11.6% in CMR, conclusively proving that our proposed multi-feature fusion approach combined with WNCC matching strategy effectively achieves the optimal balance among precision, robustness, and computational efficiency.

5. Conclusions

This paper proposes a robust optical–SAR image registration algorithm named GDWF, which innovatively addresses the limitations of traditional gradient feature methods in heterogeneous image matching via a dual-feature adaptive fusion strategy. The algorithm combines the strengths of gradient features and local standard deviation features, dynamically optimizing their contribution weights through a reliability evaluation model of the similarity measurement map. This significantly enhances both the robustness and registration accuracy of the registration algorithm.

In the experiments, the Block-LSD feature point extraction operator proposed in this paper efficiently extracts feature points in effective texture regions, demonstrating superior performance in both the effectiveness and computational efficiency of feature point extraction. Meanwhile, the proposed registration algorithm exhibits the best robustness and registration accuracy across tests on different ground object scene types (including urban building clusters, farmlands, ports, etc.) and multiple public datasets.

Although the experimental results demonstrate that the algorithm proposed in this paper outperforms current mainstream methods in terms of performance, several limitations remain that require further investigation. First, as an area-based image registration method, this algorithm, similar to others in its category, lacks scale and rotation invariance. This limitation will be addressed in subsequent research by incorporating multi-scale feature pyramids and rotation-invariant feature descriptors. Second, in mountainous terrain with significant elevation changes, the geometric distortion caused by terrain undulations may exceed the compensation capability of the current algorithm, resulting in registration failure. To overcome this challenge, we plan to integrate Digital Elevation Model (DEM) data in future work and develop a three-dimensional geometric correction model to mitigate registration errors induced by elevation differences. These improvement directions will constitute the focus of future research to enhance the adaptability and robustness of the algorithm in complex scenarios.

Author Contributions

All the authors made significant contributions to the work. A.L. and A.Y. carried out the experiment framework. A.L. processed the experiment data; A.L. wrote the manuscript; A.L. and A.Y. analyzed the data; Y.Z., W.T. and H.Y. gave insightful suggestions for the work and the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data employed in this research were obtained from open-access datasets, with download sources provided in the relevant references.

Acknowledgments

The authors would like to thank all the anonymous reviewers for their valuable comments and helpful suggestions which led to substantial improvements in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, B.; Ye, Y. Multimodal remote sensing image registration: A survey. J. Image Graph. 2024, 29, 2137–2161. [Google Scholar] [CrossRef]
Xiang, Y.; Wang, X.; Wang, F.; You, H.; Qiu, X.; Fu, K. A global-to-local algorithm for high-resolution optical and SAR image registration. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5215320. [Google Scholar] [CrossRef]
Ye, Y.; Zhu, B.; Tang, T.; Yang, C.; Xu, Q.; Zhang, G. A robust multimodal remote sensing image registration method and system using steerable filters with first-and second-order gradients. ISPRS J. Photogramm. Remote Sens. 2022, 188, 331–350. [Google Scholar] [CrossRef]
Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion. 2021, 73, 22–71. [Google Scholar] [CrossRef]
Xiang, D.; Pan, X.; Ding, H.; Cheng, J.; Sun, X. Two-stage registration of SAR images with large distortion based on superpixel segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5211115. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-like algorithm for SAR images. IEEE Trans. Geosci. Remote Sens. 2014, 53, 453–466. [Google Scholar] [CrossRef]
Fjortoft, R.; Lopes, A.; Marthon, P.; Cubero-Castan, E. An optimal multiedge detector for SAR image segmentation. IEEE Trans. Geosci. Remote Sens. 1998, 36, 793–802. [Google Scholar] [CrossRef]
Xiang, Y.; Wang, F.; You, H. OS-SIFT: A robust SIFT-like algorithm for high-resolution optical-to-SAR image registration in suburban areas. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3078–3090. [Google Scholar] [CrossRef]
Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 2019, 29, 3296–3310. [Google Scholar] [CrossRef] [PubMed]
Kovesi, P. Image features from phase congruency. Videre J. Comput. Vis. Res. 1999, 1, 1–26. [Google Scholar]
Li, J.; Shi, P.; Hu, Q.; Zhang, Y. RIFT2: Speeding-up RIFT with a new rotation-invariance technique. arXiv 2023, arXiv:2303.00319. [Google Scholar]
Li, L.; Han, L.; Ye, Y.; Xiang, Y.; Zhang, T. Deep learning in remote sensing image matching: A survey. ISPRS J. Photogramm. Remote Sens. 2025, 225, 88–112. [Google Scholar] [CrossRef]
Lewis, J.P. Fast normalized cross-correlation. Vis. Interface 1995, 10, 120–123. [Google Scholar]
Suri, S.; Reinartz, P. Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas. IEEE Trans. Geosci. Remote Sens. 2009, 48, 939–949. [Google Scholar] [CrossRef]
Ye, Y.; Shan, J.; Bruzzone, L.; Shen, L. Robust registration of multimodal remote sensing images based on structural similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and robust matching for multimodal remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef]
Fan, Z.; Zhang, L.; Liu, Y.; Wang, Q.; Zlatanova, S. Exploiting high geopositioning accuracy of SAR data to obtain accurate geometric orientation of optical satellite images. Remote Sens. 2021, 13, 3535. [Google Scholar] [CrossRef]
Tong, W. Research on Unmanned Aerial Vehicle SAR Heterogenous Image Matching and Application Technology. Master’s Thesis, National University of Defense Technology, Changsha, China, 2023. [Google Scholar]
Ye, Y.; Wang, Q.; Zhao, H.; Teng, X.; Bian, Y.; Li, Z. Fast and Robust Optical-to-SAR Remote Sensing Image Registration Using Region Aware Phase Descriptor. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5208512. [Google Scholar] [CrossRef]
Wang, P.; Liu, Y.; Liang, X.; Zhu, D.; Gong, X.; Ye, Y.; Lee, H.F.; Huang, B. CIRSM-Net: A Cyclic Registration Network for SAR and Optical Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5610619. [Google Scholar] [CrossRef]
Bürgmann, T.; Koppe, W.; Schmitt, M. Matching of TerraSAR-X derived ground control points to optical image patches using deep learning. ISPRS J. Photogramm. Remote Sens. 2019, 158, 241–248. [Google Scholar] [CrossRef]
Ye, Y.; Yang, C.; Gong, G.; Yang, P.; Quan, D.; Li, J. Robust optical and SAR image matching using attention-enhanced structural features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–12. [Google Scholar] [CrossRef]
Gao, T.; Lan, C.; Huang, W.; Wang, S. SFA-Net: A SAM-guided focused attention network for multimodal remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2025, 223, 188–206. [Google Scholar] [CrossRef]
Merkle, N.; Auer, S.; Mueller, R.; Reinartz, P. Exploring the potential of conditional adversarial networks for optical and SAR image matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1811–1820. [Google Scholar] [CrossRef]
Quan, D.; Wang, S.; Liang, X.; Wang, R.; Fang, S.; Hou, B.; Jiao, L. Deep generative matching network for optical and SAR image registration. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6215–6218. [Google Scholar]
Zhao, J.; Yang, D.; Li, Y.; Xiao, P.; Yang, J. Intelligent matching method for heterogeneous remote sensing images based on style transfer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6723–6731. [Google Scholar] [CrossRef]
Wu, Y.; Ma, W.; Gong, M.; Su, L.; Jiao, L. A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geosci. Remote Sens. Lett. 2014, 12, 43–47. [Google Scholar] [CrossRef]
Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988; Volume 15, pp. 10–5244. [Google Scholar]
Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar]
Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 6, 610–621. [Google Scholar] [CrossRef]
Sklansky, J. Image segmentation and feature extraction. IEEE Trans. Syst. Man, Cybern. 1978, 8, 237–247. [Google Scholar] [CrossRef]
Rajaei, A. Medical image texture segmentation usingrange filter. CS IT Conf. Proc. 2012, 2. [Google Scholar]
Sedaghat, A.; Ebadi, H. Distinctive order based self-similarity descriptor for multi-sensor remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2015, 108, 62–71. [Google Scholar] [CrossRef]
Reddy, B.S.; Chatterji, B.N. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef] [PubMed]
Cumming, I.G.; Wong, F.H. Digital processing of synthetic aperture radar data. Artech House 2005, 1, 108–110. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications To Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Xu, W.; Yuan, X.; Hu, Q.; Li, J. SAR-optical feature matching: A large-scale patch dataset and a deep local descriptor. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103433. [Google Scholar] [CrossRef]
Xiang, Y.; Tao, R.; Wang, F.; You, H.; Han, B. Automatic registration of optical and SAR images via improved phase congruency model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5847–5861. [Google Scholar] [CrossRef]
Schmitt, M.; Hughes, L.H.; Zhu, X.X. The SEN1-2 dataset for deep learning in SAR-optical data fusion. arXiv 2018, arXiv:1807.01569. [Google Scholar] [CrossRef]
Wang, L.; Xu, X.; Yu, Y.; Yang, R.; Gui, R.; Xu, Z.; Pu, F. SAR-to-optical image translation using supervised cycle-consistent adversarial networks. IEEE Access 2019, 7, 129136–129149. [Google Scholar] [CrossRef]
Fang, Y.; Hu, J.; Du, C.; Liu, Z.; Zhang, L. SAR-optical image matching by integrating Siamese U-Net with FFT correlation. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4016505. [Google Scholar] [CrossRef]

Figure 1. Example of the nonlinear radiation difference between optical image and SAR image. (a) Optical image. (b) SAR image.

Figure 2. Flowchart of the proposed GDWF algorithm.

Figure 3. Optical image and its corresponding LSD intensity map. (a–c) Optical image. (d–f) LSD map.

Figure 4. Comparison of feature point extraction performance among different operators.

Figure 5. Two groups of optical–SAR images and their corresponding MOGF gradient features and LSD features. (a,g), (d,j) are respectively the optical images and SAR images corresponding to the same area. (b,e,h,k) are the MOGF features of the corresponding images; (c,f,i,l) are the LSD features of the corresponding images.

Figure 6. Flowchart of the feature matching pipeline.

Figure 7. Similarity maps and detail maps in different situations. The green markers indicate the true matching positions, and the red marks indicate the positions of the maximum values in the similarity map. (a) Cases of poor-focused similarity map and its detailed diagram. (b) Cases of well-focused similarity map and its detailed diagram.

Figure 8. Comparison of similarity maps obtained by different feature description methods. The green markers indicate the true matching positions, and the red marks indicate the positions of the maximum values in the similarity map.

Figure 9. Six pairs of optical and SAR images of different ground object scenes.

Figure 10. Comparison of image registration results using checkerboard.

Figure 11. Comparative analysis of performance metrics under different ground object scene types.

Figure 12. The impact of template size on matching performance.

Table 1. Comparison of feature point extraction time among different operators.

Image	Block-FAST	Block-Harris	Block-WED	Block-LSD
1	0.04 s	0.21 s	1.71 s	0.18 s
2	0.05 s	0.22 s	1.86 s	0.19 s
3	0.06 s	0.36 s	2.95 s	0.29 s

Table 2. Detailed information of experimental data.

Image	Dataset	Size	Resolution	Scene Type
1	OS-dataset	$512 \times 512$	1 m	Farmland
2	OS-dataset	$512 \times 512$	1 m	Building
3	SEN1-2	$512 \times 512$	10 m	Port
4	SEN1-2	$512 \times 512$	10 m	Coast
5	WHU-SEN-City	$512 \times 512$	10 m	Riverbank
6	WHU-SEN-City	$512 \times 512$	10 m	City

Table 3. Comparative analysis of algorithm performance based on different datasets.

Metric	Dataset	Method
Metric	Dataset	GDWF	MOGF	CFOG	FED-HOPC
FR	OS-dataset (2000)	1.2%	2.2%	4.3%	1.9%
	SEN1-2 (2000)	1.5%	5.0%	4.4%	2.5%
	WHU-SEN-City (1000)	2.5%	2.6%	2.7%	2.5%
RMSE	OS-dataset (2000)	2.557	2.806	3.067	3.898
	SEN1-2 (2000)	0.752	0.826	0.869	1.339
	WHU-SEN-City (1000)	3.025	3.116	3.131	3.310
CMR	OS-dataset (2000)	73.3%	68.3%	64.6%	52.6%
	SEN1-2 (2000)	97.9%	97.4%	97.3%	92.8%
	WHU-SEN-City (1000)	57.5%	56.7%	56.4%	53.4%
TIME	OS-dataset (2000)	1.80 s	1.13 s	1.35 s	5.63 s
	SEN1-2 (2000)	2.12 s	1.36 s	1.88s	6.02 s
	WHU-SEN-City (1000)	1.85 s	1.25 s	1.63 s	5.95 s

Table 4. Comparison of performance metrics among different methods.

Method	FR	RMSE	CMR	TIME
Gradient + WNCC	1.8%	2.832	68.4%	1.51s
LSD + NCC	11.8%	3.082	67.4%	0.46 s
Gradient + LSD + NCC	1.7%	2.683	70.4%	1.62 s
Gradient + LSD + WNCC	1.3%	2.557	75.2%	1.80 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, A.; Yu, A.; Zhang, Y.; Tong, W.; Yu, H. Robust Optical and SAR Image Registration Using Weighted Feature Fusion. Remote Sens. 2025, 17, 2544. https://doi.org/10.3390/rs17152544

AMA Style

Luo A, Yu A, Zhang Y, Tong W, Yu H. Robust Optical and SAR Image Registration Using Weighted Feature Fusion. Remote Sensing. 2025; 17(15):2544. https://doi.org/10.3390/rs17152544

Chicago/Turabian Style

Luo, Ao, Anxi Yu, Yongsheng Zhang, Wenhao Tong, and Huatao Yu. 2025. "Robust Optical and SAR Image Registration Using Weighted Feature Fusion" Remote Sensing 17, no. 15: 2544. https://doi.org/10.3390/rs17152544

APA Style

Luo, A., Yu, A., Zhang, Y., Tong, W., & Yu, H. (2025). Robust Optical and SAR Image Registration Using Weighted Feature Fusion. Remote Sensing, 17(15), 2544. https://doi.org/10.3390/rs17152544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Optical and SAR Image Registration Using Weighted Feature Fusion

Abstract

1. Introduction

2. Methods

2.1. Block-LSD Operator

2.2. Dual-Modal Feature Description

2.3. Weighted Normalized Cross-Correlation

2.4. Outlier Elimination

3. Results

3.1. Datasets

3.2. Evaluation Criteria and Parameter Setting

3.3. Experimental Results of Different Ground Object Scenes

3.4. Performance Evaluation Based on Large Sample Data

4. Discussion

4.1. The Impact of Template Size on Registration Performance

4.2. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI