Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques

Im, Yungyo; Lee, Yangwon

doi:10.3390/app16021113

Open AccessArticle

Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques

by

Yungyo Im

¹

and

Yangwon Lee

^2,*

¹

Geomatics Research Institute, Pukyong National University, Busan 48513, Republic of Korea

²

Major of Geomatics Engineering, Division of Earth Environmental System Sciences, Pukyong National University, Busan 48513, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1113; https://doi.org/10.3390/app16021113

Submission received: 1 January 2026 / Revised: 18 January 2026 / Accepted: 19 January 2026 / Published: 21 January 2026

(This article belongs to the Special Issue Applications of Deep and Machine Learning in Remote Sensing)

Download

Browse Figures

Versions Notes

Featured Application

This study offers an autonomous multi-satellite image registration pipeline that overcomes the limitations of traditional handcrafted and sparse matching methods when dealing with heterogeneous sensor data. It can be effectively applied to the integrated management of multi-source satellite constellations, enabling consistent and accurate geographic alignment for large-scale urban monitoring even under challenging seasonal conditions. Furthermore, the framework’s ability to maintain a stable registration error of less than 0.5 pixels makes it a robust preprocessing tool for high-precision environmental analysis, such as time-series change detection and disaster surveillance.

Abstract

This study evaluates the effectiveness of integrating a Residual Shifting (ResShift)-based deep learning super-resolution (SR) technique with the Robust Dense Feature Matching (RoMa) algorithm for high-precision inter-satellite image registration. The key findings of this research are as follows: (1) Enhancement of Structural Details: Quadrupling image resolution via the ResShift SR model significantly improved the distinctness of edges and corners, leading to superior feature matching performance compared to original resolution data. (2) Superiority of Dense Matching: The RoMa model consistently delivered overwhelming results, maintaining a minimum of 2300 correct matches (NCM) across all datasets, which substantially outperformed existing sparse matching models such as SuperPoint + LightGlue (SPLG) (minimum 177 NCM) and SuperPoint + SuperGlue (SPSG). (3) Seasonal Robustness: The proposed framework demonstrated exceptional stability, maintaining registration errors below 0.5 pixels even in challenging summer–winter image pairs affected by cloud cover and spectral variations. (4) Geospatial Reliability: Integration of SR-derived homography with RoMa achieved a significant reduction in geographic distance errors, confirming the robustness of the dense matching paradigm for multi-sensor and multi-temporal satellite data fusion. These findings validate that the synergy between diffusion-based SR and dense feature matching provides a robust technological foundation for autonomous, high-precision satellite image registration.

Keywords:

image registration; satellite image; feature extraction; deep learning; super-resolution

1. Introduction

Remote sensing technology, which relies on satellite imagery, is essential for a diverse range of applications, including multi-temporal change detection, environmental and disaster monitoring, and the generation of high-precision geospatial information. Consequently, a vast volume of satellite imagery covering the same regions is being accumulated from various platforms at different times, driving active research into multi-sensor image fusion for integrated utilization.

However, these images inherently possess variations in spatial resolution, sensor characteristics, acquisition times, and incidence angles, which inevitably lead to geometric discrepancies. Furthermore, this issue extends even to images captured by the same satellite; varying environmental conditions at different acquisition times—such as cloud cover, solar elevation, and incidence angles—can introduce significant geometric distortions. This results in a critical problem where individual images do not precisely align with their actual geographic coordinates. Such discrepancies hinder the precise analysis and fusion of multi-source imagery, as the quality of high-resolution satellite products depends heavily on their geometric accuracy [1]. Therefore, precise geometric alignment, known as image registration, is a mandatory preprocessing step and a prerequisite for the accurate analysis of a target area [2,3].

A key factor in registering multi-sensor satellite images is the precise acquisition of Ground Control Points (GCPs). High-precision registration requires that corresponding points between images be accurately matched, necessitating a sufficient number of uniformly distributed GCPs [4,5]. However, GCPs are typically collected through manual field surveys—a process that is time-consuming, costly, and difficult to scale over large areas [6,7]. To overcome these limitations, automatic GCP extraction and automated registration techniques have been proposed [8,9]. Consequently, there is an active development of technologies aimed at simultaneously improving registration accuracy and processing speed while minimizing human intervention [10,11,12,13,14,15,16,17].

Image registration techniques are generally categorized into two approaches: area-based (intensity-based) and feature-based [18,19]. Area-based methods register images by comparing the similarity of color or texture information. However, for large or complex images, these methods can be computationally expensive and may underperform in high-precision tasks [20]. In contrast, feature-based methods extract local features—such as corners, edges, and contours—for matching. They are particularly effective for high-resolution satellite imagery due to their computational efficiency and robustness to transformations like rotation and scale. Many studies have sought to improve registration performance by constructing robust feature descriptors [21,22], removing mismatches [23], and integrating multiple matching strategies [24]. As the demand for combining images from diverse sensors with varying spectral and spatial resolutions grows, multi-source image registration has emerged as a critical research direction [25].

Traditional research has centered on the design and improvement in handcrafted (HC) descriptors, such as Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), Features from Accelerated Segment Test (FAST), Binary Robust Independent Elementary Features (BRIEF), and Oriented FAST and Rotated BRIEF (ORB) [26]. While these algorithms have been widely adopted for remote sensing [27,28,29,30,31], they suffer a sharp decline in keypoint matching accuracy when applied to images with high noise, complex terrain, or significant resolution differences. They also exhibit lower generalization performance and greater vulnerability to complex transformations compared to deep learning-based models. To address these limitations, studies have proposed deep learning techniques utilizing Convolutional Neural Networks (CNNs) for feature extraction and matching [32,33,34,35]. Among these, several seminal models have been explored, including SuperPoint [36], which leverages a CNN to detect keypoints and descriptors; SuperGlue [37], a Graph Neural Network (GNN)-based matcher that marked a paradigm shift by incorporating spatial information; and LightGlue [38], a lightweight successor to the computationally intensive SuperGlue. Their applications to remote sensing have been extensively explored [39,40,41,42,43]. More recently, a Transformer-based Robust Dense Feature Matching (RoMa) algorithm has emerged, focusing on dense matching to achieve more robust results, demonstrating high performance in remote sensing applications [44].

Despite these advancements, existing techniques still face limitations. A persistent challenge arises in urban environments with highly repetitive structures, where the reliability of keypoint matching is often compromised by recurring building patterns, varying illumination, and shadows [45,46,47]. Furthermore, scale mismatch in multi-sensor images with different resolutions poses a significant challenge, altering the location and representation of keypoints. Securing reliable correspondence is particularly difficult in high-to-low resolution image pairs [48,49,50]. This degradation in matching accuracy ultimately stems from limitations in image sharpness and structural representation. To compensate, deep learning-based super-resolution (SR) has emerged as a promising technology. High-resolution images provide more structural detail, enhancing the representation of keypoints like corners and contours, which in turn improves registration accuracy. In particular, diffusion-based SR models have demonstrated more realistic restoration than conventional CNN or Generative Adversarial Network (GAN)-based models [51,52,53], showing significant potential for remote sensing. However, despite the focus on SR-based resolution enhancement, studies that quantitatively and qualitatively analyze the impact of these techniques on satellite image registration accuracy remain scarce.

To bridge this research gap, this study systematically evaluates various image registration methods and assesses the impact of diffusion-based SR on the performance of multi-sensor satellite image registration. Specifically, we employ Residual Shifting (ResShift), a state-of-the-art (SOTA) diffusion model, to enhance the structural details of low-resolution imagery from Korean satellites, namely Compact Advanced Satellite 500-1 (CAS500-1), Korea Multi-Purpose Satellite 3 (KOMPSAT-3), and KOMPSAT-3A. The core of our investigation lies in a comprehensive comparative analysis between traditional algorithms (SIFT and ORB) and cutting-edge deep learning models (SuperPoint, SuperGlue, LightGlue, and RoMa). The distinct contributions of this work are three-fold:

Evaluation of SOTA Image Registration Models: We provide one of the first rigorous assessments of the RoMa algorithm in the context of multi-sensor satellite registration, exploring its potential to outperform established feature-based matchers.
Cross-Resolution and Seasonal Robustness: Unlike previous studies, we conduct experiments across all possible satellite pairs and account for extreme seasonal variations (Summer vs. Winter) to evaluate registration robustness under significant spectral and temporal shifts.
Quantitative and Qualitative Synergy: Beyond simple pixel-wise errors, we analyze geospatial reliability through EPSG-based coordinate errors and spatial distribution metrics, providing a practical framework for high-precision geospatial fusion.

Ultimately, this research establishes a robust technological foundation for integrating SR with deep learning-based matching to achieve autonomous, high-precision satellite image registration.

2. Materials and Methods

The overall research methodology of this study is illustrated in Figure 1. The process is divided into two primary stages: satellite image-pair preprocessing and image registration with feature matching algorithms.

First, multi-sensor imagery was collected from three distinct platforms: CAS500-1 (C1), KOMPSAT-3 (K3), and KOMPSAT-3A (K3A). These images underwent atmospheric correction, coordinate reference system (CRS) unification, and target area extraction. To evaluate the effectiveness of super-resolution (SR), we employed the ResShift model to generate a dataset with a four-fold resolution enhancement, following histogram equalization and normalization.

The registration experiments were conducted across three pairing scenarios: C1–K3A, C1–K3, and K3A–K3. In each pair, the higher-resolution image served as the reference image, while the lower-resolution (or SR-enhanced) image was designated as the source image. We then performed feature extraction and matching using both traditional methods (SIFT, ORB) and SOTA deep learning-based frameworks, specifically SuperPoint + SuperGlue (SPSG), SuperPoint + LightGlue (SPLG), and RoMa.

Finally, the matched keypoint coordinates were used to estimate a homography matrix, which geometrically transformed the source image to align with the reference image. The performance of each method was rigorously assessed through a comparative quantitative and qualitative analysis of the final registration results.

2.1. Dataset

2.1.1. Description of Satellite Data and Study Area

This study utilized satellite Red–Green–Blue (RGB) images acquired from three South Korean optical satellites: CAS500-1 (C1), KOMPSAT-3A (K3A), and KOMPSAT-3 (K3). All images were captured over the same geographical region. The K3 satellite, launched in 2012, provides a multispectral spatial resolution of approximately 2.8 m. K3A, launched in 2015, offers a resolution of about 2.2 m. Both platforms are extensively used in both domestic and international remote sensing research and high-precision cartography [54,55]. The C1 satellite, which began operations in October 2021, provides the highest spatial resolution among the three at approximately 2.0 m. Designed for land observation, topographical change analysis, and urban monitoring, C1 is an increasingly vital asset in various applied research fields. Since all three satellites provide RGB optical imagery, they are ideal for comparing registration performance across varying resolutions and acquisition characteristics. C1 was designated as the reference image due to its superior spatial resolution.

To rigorously evaluate the robustness of the algorithms under challenging real-world conditions, we intentionally selected a densely built-up urban area in Cheonghak-dong, Osan, Gyeonggi Province, South Korea, for which imagery from all three satellites was available (Figure 2). This site features a complex integration of road networks, commercial structures, residential zones, and green spaces, providing a demanding environment for image registration. While urban areas may appear conducive to matching due to an abundance of features, they often contain factors that degrade matching accuracy, such as significant shadows, structural repetitiveness, elevation-induced displacements, and irregular building layouts. By deliberately avoiding “easier” terrain—such as flatlands or uniform agricultural fields—this study focuses on evaluating algorithm performance in high-difficulty environments. This experimental design is intended to reflect real-world application scenarios, such as change detection and object recognition, and to identify which techniques—traditional or deep learning-based—remain robust under these rigorous conditions.

To evaluate the impact of seasonal variations, we acquired imagery for both winter and summer. For K3A and K3, Level 1G (L1G) winter images were obtained prior to atmospheric correction. Although the K3 image was captured on 3 March, it was categorized as a winter scene because it represents the pre-vegetation period, with a color palette closely matching that of the K3A winter imagery. For C1, Level 2G (L2G) images for both summer and winter were acquired as initial data prior to the application of atmospheric correction. The detailed specifications of the satellite data used in this study are summarized in Table 1.

2.1.2. Data Preprocessing

The dataset construction process was executed in the following sequence. First, we identified and collected overlapping scenes from the C1, K3A, and K3 satellites. Since each satellite provides individual R, G, and B bands, we stacked the corresponding layers into a single 3-band composite image. This was a foundational step to ensure uniformity in model inputs for subsequent processing.

Subsequently, we checked the product levels of the imagery to determine the necessity of atmospheric correction. As the acquired images had not undergone this process, we performed corrections using the ENVI software (v6.0; NV5 Geospatial Solutions, Inc., Broomfield, CO, USA). For the KOMPSAT series (K3A and K3), we utilized the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) algorithm. For C1, we employed the Quick Atmospheric Correction (QUAC) algorithm. Atmospheric correction is essential for ensuring radiometric consistency and improving registration accuracy by mitigating influences such as atmospheric scattering and color distortion.

Once corrected, the images were aligned to achieve spatial co-location. The experimental extent was defined using the C1 image as the reference, given its superior spatial resolution. The K3A and K3 images were reprojected to the CRS of C1 and cropped to the same spatial extent, resulting in three registration pairs: C1–K3A, C1–K3, and K3A–K3. Through this procedure, we established two preliminary datasets:

Original Dataset (OD): Cropped images prior to atmospheric correction.
Original Dataset with Atmospheric Correction (ODA): Images after atmospheric correction.

Visual confirmation of the differences between the OD and ODA datasets is a key component of this study. Since feature matching and SR models are typically optimized for 8-bit formats (e.g., PNG or JPEG) rather than the Geographic Tagged Image File Format (GeoTIFF), the cropped GeoTIFFs were converted to PNG. During this conversion, color and brightness distributions were stabilized using Min–Max normalization and histogram equalization. Min–Max normalization was implemented to map the radiometric range onto a consistent 8-bit scale, thereby mitigating sensor-dependent value discrepancies following preprocessing. Subsequently, histogram equalization was applied to stabilize global contrast and accentuate structural edges for improved feature matching. This standardized procedure was maintained across all datasets to ensure a rigorous and fair comparison.

Finally, we SR to test the hypothesis that enhanced structural details would improve registration accuracy. To this end, we applied ResShift, a diffusion-based deep learning model, to the ODA dataset. The resulting images, restored to a four-fold higher resolution, were organized into a new dataset named ODA-SR. The final OD, ODA, and ODA-SR datasets were configured with the same spatial extent, serving as the basis for comparing registration performance across experimental conditions.

2.1.3. Experiment Dataset

The detailed specifications of the C1, K3A, and K3 satellite images used in the final experiments, along with the naming conventions and key characteristics for each registration pair, are summarized in Table 2. To evaluate the proposed framework, we conducted three primary experiments:

Algorithm Comparison: A comparative analysis of registration accuracy between classical and deep learning-based algorithms using atmospherically corrected imagery (ODA).
Impact of Super-Resolution: An assessment of SR’s effectiveness by comparing registration performance on datasets before (ODA) and after (ODA-SR) applying the SR model.
Impact of Seasonal Variations: An analysis of how seasonal changes affect registration performance by comparing image pairs acquired in different seasons (Summer vs. Winter).

The dataset configurations for each experiment are: (1) ODA, (2) ODA vs. ODA-SR, and (3) ODA vs. ODA-SR (seasonal variants). For Experiment 3, identifiers were appended to the dataset names to distinguish seasonal conditions: ‘W’ for winter-acquired C1 images and ‘S’ for summer-acquired C1 images. In Experiments 1 and 2, where seasonal comparison was not the focus, only winter imagery was used, and the identifiers were omitted for brevity.

The visual transitions and improvements in visibility across the datasets—OD, ODA, and ODA-SR—for each satellite are illustrated in Figure 3, Figure 4, Figure 5 and Figure 6.

Table 2. Summary of experiment dataset combinations, including sensor pairs, image dimensions, atmospheric correction methods, and seasonal acquisition details.

Dataset Name	Satellite		Image Size		Atmospheric Correction Method		Season
OD	C1	K3A	308 × 308	280 × 280	NA	NA	Winter	Winter
	C1	K3A	308 × 308	280 × 280	NA	NA	Summer	Winter
	C1	K3	308 × 308	220 × 220	NA	NA	Winter	Winter
	K3A	K3	280 × 280	220 × 220	NA	NA	Winter	Winter
ODA	C1	K3A	308 × 308	280 × 280	QUAC	FLAASH	Winter	Winter
	C1	K3A	308 × 308	280 × 280	QUAC	FLAASH	Summer	Winter
	C1	K3	308 × 308	220 × 220	QUAC	FLAASH	Winter	Winter
	K3A	K3	280 × 280	220 × 220	FLAASH	FLAASH	Winter	Winter
ODA-SR	C1	K3A	1232 × 1232	1120 × 1120	QUAC	FLAASH	Winter	Winter
	C1	K3A	1232 × 1232	1120 × 1120	QUAC	FLAASH	Summer	Winter
	C1	K3	1232 × 1232	880 × 880	QUAC	FLAASH	Winter	Winter
	K3A	K3	1120 × 1120	880 × 880	FLAASH	FLAASH	Winter	Winter

2.2. Method

The core methodology of this study is structured into a systematic pipeline. First, reference and source image pairs were established for the three defined scenarios involving C1, K3, and K3A imagery. Keypoint detection and matching were then performed on these pairs using a suite of models: HC algorithms (SIFT and ORB) and deep learning-based frameworks, specifically SuperPoint + SuperGlue (SPSG), SuperPoint + LightGlue (SPLG), and RoMa. All experiments were performed in a unified framework based on Python 3.8.2, OpenCV 4.11.0, and PyTorch 2.1.2 to minimize environmental discrepancies.

From the resulting correspondences, a homography matrix was computed to define the geometric transformation required to align the source image with the reference image. The source image was then transformed using this matrix, and the final output was compared against the reference image for rigorous quantitative and qualitative analysis.

This process was systematically applied across three distinct experimental setups to address the study’s objectives:

Algorithm Comparison: Evaluation of registration accuracy between HC and deep learning models using the ODA dataset.
Impact of Super-Resolution (SR): Comparative analysis of registration results before and after the application of SR, using both ODA and ODA-SR datasets.
Impact of Seasonal Change: Assessment of registration accuracy under significant spectral and temporal shifts by utilizing ODA and ODA-SR datasets that include C1-W (Winter) and C1-S (Summer) images.

2.2.1. Super Resolution

A critical factor in image registration is the spatial resolution of the imagery. Registration accuracy fluctuates based on the density of pixel information; higher-resolution images, characterized by sharper textures and well-defined edges, typically allow for the detection of more robust feature points [56]. However, due to hardware constraints and orbital characteristics, satellite imagery often presents greater complexity and lower precision compared to standard optical imagery [57]. Prior research, such as the work by Xiangchun et al. [58] on non-uniform solar image matching, has demonstrated that deep learning-based SR can enhance image quality and positively influence data alignment. These findings suggest that SR technology can effectively transcend sensor limitations to improve registration outcomes [59].

Early SR approaches relied on traditional interpolation methods, such as bilinear, bicubic, and Lanczos. While these methods have been used for decades, they are consistently outperformed by data-driven deep learning models that excel at reconstructing the fine details necessary for high-precision tasks [60]. The evolution of SR has progressed from CNN-based models (e.g., SR-CNN) to GANs, which enabled more realistic visual results. More recently, diffusion-based approaches have emerged, demonstrating superior performance in generating high-quality textures and restoring natural details compared to GAN-based models [61,62]. Within the remote sensing domain, super-resolution (SR) research has expanded beyond RGB imagery to include hyperspectral image super-resolution, encompassing unsupervised approaches based on enhanced Deep Image Prior formulations [63]. While diffusion-based SR models have shown great potential in a wide range of image enhancement tasks, their application to satellite image registration remains relatively underexplored. Therefore, this study employs ResShift, a SOTA diffusion-based SR model, to generate high-resolution images for a comprehensive comparative analysis.

ResShift [51], a diffusion-based SR model finalized in 2024, addresses the slow inference speeds typical of conventional diffusion models through a residual shifting technique. This model defines the discrepancy between low- and high-resolution images as a “residual,” which is iteratively corrected to restore detailed structures and textures. By hierarchically applying a multi-scale diffusion process, ResShift achieves both computational efficiency and high performance across various scale factors. Notably, it has demonstrated superior quantitative performance even with limited computational resources. In this study, the official ResShift implementation was utilized with the following configuration: task = realsr, scale = 4, and version = v3. A scale factor of ×4 was selected as it represents the maximum upscaling capability provided by the official pretrained weights, thereby maximizing the recovery of fine structural details essential for robust feature matching. To establish a rigorous classical baseline for visual assessment, the results of ×4 Lanczos upsampling—widely recognized for its superior sharpness among interpolation methods—are presented alongside the ResShift reconstruction in Figure 7.

2.2.2. Handcrafted Feature Matching Algorithms

HC models, also referred to as classical techniques, were employed to provide a comparative baseline against deep learning-based feature-matching frameworks. As the term implies, these features are engineered based on human domain knowledge, utilizing mathematically defined algorithms to identify distinct image characteristics such as corners, edges, and blobs. Because they rely on manually designed rules, the feature extraction process is transparent and deterministic, consistently producing the same output for a given input.

Among representative HC algorithms—such as SIFT, SURF, and ORB—SIFT is particularly notable for its robust performance in detecting keypoints within complex imagery due to its scale and rotation invariance [64]. While SURF offers higher processing speeds, it is generally less accurate than SIFT [65]. ORB is an even more lightweight algorithm, providing speeds suitable for real-time applications [66]. Despite the rise in deep learning, HC algorithms remain widely used in satellite imagery research. For instance, Khalili et al. [67] reported that SIFT achieved approximately 1.7 times higher accuracy than SURF for both optical and Synthetic Aperture Radar (SAR) registration. Similarly, Fatiha et al. [64] found that SIFT yielded the highest accuracy while ORB maintained the fastest processing speed. Given that prior research has primarily focused on single-satellite contexts, we determined that a direct comparative evaluation across multi-sensor datasets was necessary. Consequently, we selected SIFT for its benchmark accuracy and ORB for its superior computational efficiency.

SIFT

Proposed in 2004, SIFT [68] is a landmark algorithm designed to extract features robust to image scale and rotation. It identifies key local regions across resolutions by performing scale-space extrema detection using the Difference of Gaussians (DoG) function. The process involves four stages: (1) scale-space extrema detection, (2) keypoint localization and contrast-based filtering, (3) orientation assignment based on local gradients, and (4) the calculation of a 128-dimensional descriptor. For our experiments, SIFT was implemented in Python with parameters aligned with Lowe’s original recommendations: nOctaveLayers was set to 3, and contrastThreshold was set to 0.09 to eliminate unstable features in low-contrast regions.

ORB

ORB [66], introduced in 2011, combines the FAST [69] keypoint detector with the BRIEF [70] descriptor. It is renowned for its computational efficiency, outperforming SIFT in speed while remaining open source. To overcome the limitations of the original FAST—which lacks orientation and scale data—ORB utilizes Oriented-FAST, incorporating an image pyramid and the intensity centroid method. Furthermore, it enhances the BRIEF descriptor through Rotated-BRIEF, creating a rotation-robust binary descriptor. These descriptors are matched using Hamming distance, providing a significant speed advantage over SIFT’s floating-point-based matching. In this study, ORB was implemented in Python using the default parameters specified in the OpenCV documentation.

Brute-Force Matcher

Based on the feature descriptors extracted via SIFT and ORB, matching was performed using a Brute-Force (BF) Matcher. The BF Matcher is a straightforward and intuitive method typically employed for matching HC descriptors. The algorithm exhaustively computes the distances between all descriptors in Image A and those in Image B to identify the closest correspondences.

Since SIFT generates real-valued vector descriptors, the Euclidean distance (d) between two descriptors, p and q, is calculated as:

d (p, q) = \sqrt{\sum_{i = 1}^{n} {(p_{i} - q_{i})}^{2}}

(1)

For ORB, which utilizes binary descriptors, the Hamming distance (d) is used instead:

d (p, q) = \sum_{i = 1}^{n} (p_{i} \oplus q_{i})

(2)

In our experiments, K-Nearest Neighbor (KNN) matching was implemented to return the K most likely candidates for each feature point. To refine these matches, we applied the Ratio Test and the Scale-Orientation Restriction method proposed by Lowe [68]. For the Ratio Test, K was set to 2. A match was accepted only if the distance ratio between the closest and second-closest neighbors was below the predefined threshold of 0.8, as suggested by Lowe [68].

Subsequently, a scale-orientation restriction filter was applied to eliminate inaccurate correspondences arising from significant disparities in keypoint scales or orientations. The scale similarity was evaluated by calculating the ratio between the scales of the two keypoints; a value closer to 1 indicated higher similarity. Similarly, the angular difference was compared to ensure directional consistency. Following these verification stages, the refined keypoint pairs were utilized for the final image transformation.

2.2.3. Deep Learning-Based Feature Matching Algorithms

Recently, deep learning-based methods have demonstrated superior performance over the HC methods. This shift is primarily driven by the ability of deep learning models to adapt flexibly to diverse imagery through self-learning mechanisms. Unlike classical methods, these algorithms automatically extract features from large-scale datasets, enabling effective keypoint detection even in complex scenes characterized by significant noise or geometric distortions. In this study, we utilized SuperPoint for feature detection and descriptor extraction, while SuperGlue and LightGlue were employed for keypoint matching. Additionally, the RoMa model, which integrates detection and matching, was included to evaluate technological advancements and compare accuracy against traditional baselines.

Previous studies have explored the efficacy of these models in remote sensing. For instance, Luo et al. [71] compared SIFT, SPSG, and SPLG across various environmental conditions, reporting that both deep learning frameworks outperformed SIFT, with SPLG demonstrating more stable keypoint distributions in complex imagery. Similarly, Song et al. [72] found that SuperGlue and LightGlue achieved a 13% higher success rate than SIFT in satellite image matching. Furthermore, Berton et al. [44] evaluated SIFT, ORB, SuperGlue, LightGlue, and RoMa using space station imagery, concluding that RoMa achieved significantly higher matching accuracy.

Despite these findings, these studies primarily focused on resolution differences without performing mutual registration across heterogeneous satellite sensors, conducting detailed geographical coordinate error analyses, or assessing the synergy between deep learning-based SR and matching algorithms. Given that performance can vary significantly across specific datasets, our approach—incorporating SR-enhanced data alongside a comparative evaluation of SuperPoint, SuperGlue, LightGlue, and RoMa—provides a rigorous assessment of their effectiveness across diverse and challenging scenarios.

SuperPoint

SuperPoint [36] is a CNN-based feature detector and descriptor model proposed in 2018. It replaces traditional HC methods with a self-supervised learning approach. The model is initially pre-trained on simple geometric patterns using the MagicPoint architecture. It then utilizes a technique known as homography adaptation to facilitate robust feature learning on real-world imagery. This method applies various geometric transformations to unlabeled images to automatically generate correspondences, enabling the model to be trained on complex real-world data without the need for manual labeling.

SuperPoint comprises two primary CNN modules: a detector head and a descriptor head, which identify keypoint locations and generate their respective descriptors. These components are jointly trained in an end-to-end manner, allowing the model to learn feature representations that remain robust against variations in illumination, scale, and rotation. In this study, we employed SuperPoint as a local feature extractor to generate keypoint coordinates and descriptors. These outputs were subsequently integrated with global matching modules, specifically SuperGlue and LightGlue, for our multi-sensor satellite image registration experiments.

SuperGlue

SuperGlue [37] is a GNN-based feature matching algorithm proposed in 2020. It introduces a more precise, learning-based matching approach compared to conventional Brute-Force (BF) methods. The model takes keypoints and descriptors extracted by a front-end (e.g., SuperPoint) as input and performs matching in a middle-end stage by simultaneously considering both self-attention (global context within an image) and cross-attention (local relationships between images). Furthermore, it formulates the assignment as an optimal transport problem and ensures high reliability through a cross-check strategy that validates bidirectional matches.

SuperGlue has been extensively utilized in satellite and remote sensing research. Prior studies have demonstrated its superior performance in diverse applications, including unmanned aerial vehicle (UAV)-to-satellite orthoimage registration and the precise matching of low-texture landscapes [73,74]. In our experiments, we paired SuperPoint with SuperGlue to perform keypoint registration between multi-sensor satellite images. Considering the outdoor nature of the dataset, pre-trained outdoor weights were utilized, and the match_threshold was set to 0.5. This conservative confidence cutoff was implemented to suppress low-confidence correspondences, which are particularly prone to becoming outliers during cross-sensor registration. Additionally, to accommodate hardware memory constraints, the max_keypoints was capped at 6400, and internal image resizing was disabled to maintain original resolution.

LightGlue

LightGlue [38] is a lightweight successor to SuperGlue, proposed in 2023, designed to enhance computational efficiency and memory utilization while maintaining high matching accuracy. A primary limitation of the original SuperGlue was its high computational overhead due to a fixed architectural depth, which often led to inefficiencies when matching high-resolution imagery. In contrast, LightGlue introduces a dynamically adjustable depth, allowing the model to adapt its computational effort based on the matching complexity of each image pair.

After each computational block, LightGlue predicts the current set of correspondences and determines whether additional processing is required. This early stopping mechanism enables the model to resolve “easy” matches quickly while allocating deeper processing resources only to more complex pairs. Consequently, this approach significantly accelerates processing speeds and reduces redundant computations, all while achieving an accuracy level comparable to that of SuperGlue.

In our experiments, we performed satellite image registration using the SuperPoint and LightGlue pipeline. Internal image resizing was disabled to preserve the original spatial information. To ensure a rigorous and fair comparison with SuperGlue, all matching parameters were configured to be identical, thereby maintaining consistent experimental conditions across the pipeline.

RoMa

RoMa [75] is a high-performance deep learning-based matching algorithm proposed in 2024, designed to achieve superior precision and robustness through a dense matching and coarse-to-fine strategy. RoMa leverages a large-scale self-supervised model, Distillation of Knowledge with No Labels Version 2 (DINOv2), to extract global, coarse-level features. It then performs stable, position-agnostic coarse matching by predicting anchor probabilities via a Transformer-based decoder. Subsequently, a fine-feature encoder extracts local details, which are integrated with the coarse features to derive precise, pixel-level correspondences.

Notably, RoMa models multi-modal distributions in global matching using a regression-by-classification loss function and enhances local precision through a robust regression loss. This dual-stage architecture enables RoMa to maintain high performance even in the presence of significant geometric distortions or viewpoint changes. In our experiments, we utilized RoMa’s publicly available base architecture. To maintain a fair and consistent comparative framework with SuperGlue and LightGlue, identical values were assigned to the parameters shared across these models.

2.2.4. Homography-Based Image Transformation

Homography-based image transformation aligns or warps a source image to match the perspective of a reference image through a projective transformation. To achieve this, it is essential to compute the homography matrix (H), which defines the geometric mapping between corresponding points in two image planes. The pixel coordinates of matched keypoint pairs—extracted via SIFT, ORB, SPSG, SPLG, and RoMa—are first filtered for outliers using the Random Sample Consensus (RANSAC) algorithm.

The RANSAC algorithm distinguishes between inliers and outliers through an iterative process to ensure the robustness of the homography matrix. Initially, a random subset of minimal matching pairs is selected to compute a tentative homography matrix. The positions of the remaining keypoints are then predicted using this matrix, and the reprojection error—the distance between the predicted and actual matched positions—is calculated. Keypoints exceeding a specific error threshold are classified as outliers, while those within the threshold are considered inliers. This process is repeated for a set number of iterations to extract an optimal set of inliers, ultimately yielding a robust homography matrix H.

For a reference image A and a source image B to be warped, the point coordinates (x, y) in image B are transformed into the coordinates (x′, y′) in image A. This relationship is defined in homogeneous coordinates as follows:

[\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}] = H \times [\begin{matrix} x \\ y \\ 1 \end{matrix}] w h e r e H = [\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix}]

(3)

The final homography matrix (H) is applied to transform the pixel coordinates of source image B to align with those of reference image A through a matrix multiplication operation. The resulting transformed image is then compared with the reference image to evaluate the precision of the feature matching and to determine the overall registration accuracy, with the ultimate goal of minimizing geometric discrepancies. This comparative analysis serves as the basis for assessing the performance of each algorithm under varying resolution and seasonal conditions.

2.2.5. Model Performance Evaluation

For the final performance evaluation, the homography matrix derived from the experiments was applied to the original GeoTIFF imagery (ODA) to calculate registration errors. This procedure ensured that the spatial metadata of the original imagery, specifically its coordinate and location data, was preserved throughout the assessment. In the evaluation stage, keypoint coordinates from the higher-resolution reference image served as the ground truth. Performance was measured based on the distance between these ground-truth points and their corresponding points in the transformed source image. This setup utilizes the precise details of high-resolution imagery as a benchmark to enhance the reliability of the measurements.

The initial matching pairs were filtered using the RANSAC algorithm, where a keypoint was classified as an inlier if the Euclidean distance between the projected and actual keypoints was less than 3 pixels. For a more rigorous assessment, we introduced a stricter metric termed the Number of Correct Matches (NCM). An NCM is defined as a final matching pair that satisfies the highly conservative condition of falling within the same pixel (error < 1 pixel). Achieving an NCM indicates that the transformation model was calculated with sufficient precision to overcome even resampling errors. Consequently, this highly stringent method selects only the most reliable, highest-quality matches, serving as a metric to measure an algorithm’s maximum precision.

Based on these NCM results, we measured two types of distance errors to evaluate registration precision:

Pixel Distance Error: For each NCM pair, we calculated the Euclidean distance between the keypoint coordinates in the reference image (x′, y′) and the corresponding transformed coordinates in the source image (x, y).
Geographic Distance Error: Pixel coordinates were converted into geographic coordinates to calculate the spatial error in meters. All satellite images used in this study share the same coordinate reference system: Korea 2000/Unified CS (a plane rectangular coordinate system based on the GRS80 ellipsoid and the UTM-K projection).

The quantitative evaluation of image registration was conducted using four primary indicators:

NCM: The total count of correctly matched pairs satisfying the 1-pixel tolerance.
Pixel Distance: The average pixel error calculated before and after registration to assess the magnitude of geometric correction.
Geographic Distance: The geographic coordinate error, measured in meters, to validate geospatial accuracy.
Processing Time: The total duration required to register a single reference–source image pair, encompassing both feature detection and matching stages, measured in seconds.

For the qualitative evaluation, a mosaic visualization technique was employed. This method involves arranging alternating patches from the reference and source images in a repeating 3 × 3 grid to visually inspect the continuity of registered features. To facilitate visual analysis, specific areas showing significant structural alignment are highlighted with green circles (Figure 8).

3. Results and Discussion

3.1. Performance Comparison Between Models

The results of the image registration experiments using the ODA are summarized in Table 3 and illustrated in Figure 9 and Figure 10. Regarding the NCM, deep learning-based models significantly outperformed classical HC methods.

For the high-resolution satellite pair (C1–K3A), SIFT and ORB yielded 12 and 4 NCM, respectively. In contrast, the deep learning models—SPSG, SPLG, and RoMa—produced substantially higher counts of 22, 208, and 6525 NCM, respectively. For the cross-resolution pair between the highest-resolution (C1) and lowest-resolution (K3) satellites, HC models yielded only 4 and 3 NCM, while deep learning models achieved 5, 177, and 6359 NCM. Similarly, for the K3A–K3 pair, HC models exhibited a sharp decline to 6 and 1 NCM, whereas the deep learning models maintained robust performance with 6, 185, and 4295 NCM, respectively.

While SPSG performed similarly to SIFT in cross-resolution cases, its NCM count nearly doubled that of SIFT in the high-resolution C1–K3A matching, suggesting that deep learning frameworks are more effective for large-scale feature matching. Notably, SPLG consistently produced high NCM counts, demonstrating an approximately 17-fold increase compared to specific instances of SIFT and RoMa. These results demonstrate the superior ability of deep learning models to establish correct correspondences in complex urban environments.

Regarding registration accuracy (distance error), the high-resolution C1–K3A case began with an initial distance error of 4.6 pixels (10.2 m). HC models reduced this error to 0.2 pixels (SIFT) and 0.36 pixels (ORB). The deep learning models recorded errors of 0.43 (SPSG), 0.46 (SPLG), and 0.25 pixels (RoMa), achieving a reduction of at least 4.1 pixels. In the C1–K3 pair, which had an initial displacement of 14.9 pixels (41.7 m), SIFT and RoMa yielded errors of 0.25 and 0.26 pixels, respectively, while ORB achieved 0.31 pixels.

Although reducing distance error is critical, stable image registration (warping) also requires an even spatial distribution of matching points to ensure a globally consistent transformation. From this perspective, RoMa is a more suitable model than SIFT or ORB. While HC models occasionally showed slightly smaller distance errors, RoMa provided a significantly higher density of correct matches across the entire scene while maintaining a meaningful reduction in distance error. Furthermore, SPSG and SPLG demonstrated reliable performance, maintaining pixel distance errors below 0.5 pixels in all cases.

Analysis of computational efficiency showed that all models completed matching in under 3 s. Among HC models, ORB was approximately 46% faster than SIFT across all scenarios. Among deep learning models, SPSG was the fastest, consistently matching within the 0.3-s range, followed by RoMa and SPLG.

In summary, while HC models demonstrated competitive speed and distance accuracy in specific scenarios, deep learning models provided a far superior balance of NCM density, distance reduction, and robustness. These findings indicate that deep learning-based frameworks are more effective for multi-sensor satellite image registration than traditional HC methods.

3.2. Impact of Super-Resolution on Registration Performance

An experiment was conducted to verify the hypothesis that SR improves image registration accuracy compared to the original imagery. Based on the findings from the previous section that deep learning models outperform HC methods, this comparative evaluation was restricted to DL frameworks, specifically SPSG, SPLG, and RoMa. The performance was compared using the ODA and ODA-SR datasets, as summarized in Table 4 and illustrated in Figure 11 and Figure 12.

Regarding the NCM, both SPSG and SPLG exhibited a substantial increase following the application of SR. For the C1–K3A pair, SPSG’s NCM count rose from 22 to 670, and SPLG increased from 208 to 629. In the C1–K3 case, where resolution disparities were more pronounced, SPSG increased from 5 to 283, and SPLG rose from 177 to 462. Similarly, for the K3A–K3 pair, SPSG and SPLG showed increases to 445 and 489 NCM, respectively. These results demonstrate that SR techniques meaningfully enhance feature density by restoring structural details.

Conversely, RoMa showed a slight decrease in NCM across all cases; for example, dropping from 6525 to 4196 in the C1–K3A pair. This reduction is attributed to the internal image resizing mechanisms within the RoMa architecture, which may not fully leverage the expanded pixel space of the SR-enhanced images. RoMa conducts inference through a coarse-to-fine dense matching pipeline at fixed internal resolutions, necessitating the resampling of inputs and the scaling of coordinates across stages. Under our stringent NCM definition (error < 1 pixel), such internal resampling and coordinate scaling may slightly diminish the total count of correspondences classified as NCM. Nevertheless, this process does not preclude improvements in post-registration distance error metrics, as the underlying geometric alignment remains highly accurate.

The experimental results indicate a unique phenomenon in the RoMa model: while the NCM slightly decreased following SR, the overall registration precision, in terms of pixel and geographic distance, significantly improved. This outcome can be justified through the following logical perspectives:

Shift from Quantity to Quality of Features: The application of ResShift-based SR enhances the distinctness of structural details, such as edges and corners, by quadrupling the image resolution. While this process may reduce the total number of candidate matches by filtering out ambiguous or low-quality features, the remaining keypoints are localized with much higher sub-pixel precision. Consequently, the homography matrix derived from these high-quality correspondences is more geometrically accurate, leading to a reduction in final registration errors.
Internal Resolution and Coordinate Scaling: RoMa operates on a coarse-to-fine dense matching pipeline at fixed internal working resolutions. When processing SR-enhanced images with larger pixel spaces, the model requires internal resampling and coordinate scaling. Under the strict NCM definition used in this study (error < 1 pixel), even minute sub-pixel shifts during this scaling process can cause some correspondences to technically fall outside the NCM threshold.
Robustness of Dense Matching Density: Despite the numerical decrease, RoMa consistently maintained over 2300 NCMs across all datasets, which is at least 1800 matches more than the next-best model, SPLG. This overwhelming density of correct matches ensures that the transformation model remains statistically robust. The enhanced structural representation provided by SR allows the dense matching paradigm to achieve a more globally consistent alignment, effectively prioritizing spatial reliability over the raw count of matches.

In terms of distance error, SR demonstrated a positive impact on both SPLG and RoMa. In the C1–K3A pair, SPLG’s error decreased from 0.46 to 0.40 pixels and from 1.01 m to 0.88 m (~12.9%), while RoMa improved from 0.25 to 0.20 pixels and from 0.55 m to 0.44 m (20%). Consistent error reductions were observed in other combinations, such as the K3A–K3 pair, where RoMa achieved an 18.4% improvement in geographic distance. These findings suggest that the integration of diffusion-based SR significantly enhances the accuracy and reliability of multi-sensor registration when coupled with advanced deep learning models. To assess registration stability, we report the variability of post-registration residuals as the mean ± standard deviation calculated over NCM correspondences1. Notably, RoMa identifies thousands of NCMs while maintaining low dispersion across all tested pairs, indicating a more uniform spatial alignment.

To clarify the total efficiency of the SR-assisted pipeline, an end-to-end runtime estimate was derived by combining ResShift inference time with the matching duration. Utilizing an NVIDIA GeForce RTX 4090 for the cropped area of interest (AOI), the end-to-end runtime for the ResShift + RoMa pipeline under the ODA-SR configuration ranged from 4.3 to 6.2 s per pair (6.2 s for C1–K3A, 5.1 s for C1–K3, and 4.3 s for K3A–K3). Although SR introduces additional computational overhead compared to matching alone, the total cost remains within several seconds for our experimental framework while yielding consistent accuracy gains. Furthermore, when SR outputs are precomputed offline, the online runtime is further reduced to the core registration-stage processing time.

3.3. Performance Comparison Across Seasons

It is well-established that image registration typically achieves higher accuracy when using image pairs acquired on the same date or within the same season. However, many remote sensing applications require the registration of multi-temporal satellite images from different seasons, making seasonal robustness a critical factor. Therefore, this experiment was conducted to evaluate whether deep learning-based registration models and super-resolution (SR) techniques can maintain satisfactory performance when registering cross-seasonal satellite image pairs. For this analysis, the C1 dataset was categorized by acquisition month: August images were labeled as C1-S (Summer) and December images as C1-W (Winter). Both sets were subsequently registered with the corresponding K3A (Winter) images.

Initial registration using the SPSG model on ODA yielded the lowest NCMs (12 for summer and 22 for winter). Similarly, SPLG showed relatively low NCMs of 159 and 208 for summer and winter, respectively. In contrast, RoMa achieved significantly higher NCMs for both seasons (6026 for summer and 6525 for winter). The performance gap between seasons for RoMa was approximately 8%, suggesting that the model is less sensitive to seasonal variations. This demonstrates that deep learning models, particularly RoMa, can produce robust results even when matching ODA images from different seasons. Furthermore, RoMa’s distance error remained extremely stable, with a negligible difference of only 0.01 pixels between the two seasons.

When applying the SR technique (ODA-SR), NCMs improved substantially. For summer matches, SPSG, SPLG, and RoMa achieved NCMs of 441, 511, and 4099, respectively (Table 5). For winter matches, the NCMs were 670 (SPSG), 692 (SPLG), and 4196 (RoMa). These figures are significantly higher than those obtained from ODA, highlighting the effectiveness of SR in enhancing registration across different seasons. In a seasonal comparison of ODA-SR performance, the NCMs for summer reached 65%, 73%, and 97% of the winter values for SPSG, SPLG, and RoMa, respectively. While summer images are generally more susceptible to cloud-related noise—typically leading to fewer matches than winter images—RoMa maintained nearly identical performance across both seasons. This highlights the model’s robustness. In terms of registration accuracy, RoMa delivered highly stable results, with distance errors ranging from 0.20 to 0.23 pixels. Overall, the integration of RoMa with SR techniques provided consistently satisfactory and stable performance regardless of seasonal changes. While the winter-to-winter pairs (C1-W and K3A) exhibited superior performance compared to the summer–winter combination (C1-S and K3A), the spatial alignment and structural continuity were markedly improved through image registration using RoMa and SR, even when addressing the challenges of cross-seasonal imagery (Figure 13).

Regarding ODA-SR, the NCMs for summer matches were lower than those for winter. This is primarily because high-quality summer images are difficult to obtain due to frequent cloud cover, which hinders the extraction of clear feature points. Consequently, while winter–winter registration pairs generally yield better numerical values, summer–winter pairs face challenges due to significant seasonal variations in vegetation and lighting. Despite these difficulties, RoMa showed almost the same performance for both seasons, with summer NCMs reaching 97% of the winter values. This indicates the exceptional robustness of the model against seasonal noise and spectral differences.

The superior performance of the winter–winter combination is likely attributable to the spectral coherence in color distribution between the two images. To verify this, we analyzed the RGB histograms as shown in Figure 14.

Since image matching algorithms rely on comparing pixel intensity variations and local patterns around feature points, a higher similarity in histogram distribution—reflecting analogous data characteristics—generally leads to a higher matching success rate. In the case of the winter–winter pair (C1-W vs. K3A), the intensity histograms in column (b) exhibit highly similar profiles; both images show a prominent peak concentrated in the darker regions (values 0–50), followed by a gradual decline toward the higher values. Furthermore, the RGB histograms in column (a) for both images display closely overlapping red, green, and blue curves, indicating a consistent spectral pattern without significant color bias.

Conversely, for the summer–winter pair (C1-S vs. K3A), the summer image (C1-S) presents a distinct second peak near the intensity value of 100, resulting in a bimodal distribution. This phenomenon is attributed to the high reflectance of vegetation and intense solar radiation during the summer season. In contrast, the K3A (winter) image shows relatively low intensity in this range. Such discrepancies in the overall brightness distribution increase the likelihood that the matching algorithm will misidentify identical locations as different features due to their divergent radiometric characteristics.

These differences are further elucidated by variations in vegetation and surface reflectance. In the RGB histogram of C1-S (Summer), the green channel maintains higher values over a broader range, reflecting the abundance of foliage and grass during the summer. In winter images, however, the loss of foliage exposes the ground surface, making building shadows and road outlines more pronounced. Since both C1-W and K3A share these seasonal geometric characteristics, they yield more consistent results during feature point extraction.

Solar-induced brightness and contrast also exhibit clear seasonal trends. C1-W and K3A share the low solar elevation characteristic of winter, resulting in elongated shadows and an overall left-skewed intensity distribution (lower brightness). In contrast, C1-S experiences higher solar angles and intense direct sunlight, leading to shorter shadows and a higher proportion of bright areas, which creates a significant shift in illumination geometry. For these reasons, while the summer–winter combination slightly underperformed compared to the winter–winter pair, it still achieved robust image registration performance through the application of high-performance algorithms such as RoMa and SR.

3.4. Limitations and Generalizability

This study intentionally focused on a densely built-up urban AOI to stress-test multi-sensor registration under challenging conditions, such as repetitive structural patterns, significant shadows, and relief-induced displacements. All imagery was acquired under near-nadir viewing geometries (incidence angles ≤ 11.88°, as shown in Table 1), which minimizes extreme building-lean and parallax while still reflecting realistic urban relief effects. However, strongly oblique acquisitions over high-rise districts may exhibit larger parallax that cannot be fully captured by a single global homography. Such cases may benefit from extended geometric models, such as Digital Surface Model (DSM)-assisted correction or locally adaptive warping, which will be explored in future work.

Furthermore, the generalizability of the observed performance gains to low-texture or highly homogeneous land covers—such as croplands, forests, and water bodies—cannot be directly confirmed. These regions often contain sparse distinctive keypoints and, particularly in vegetated landscapes, undergo substantial radiometric and textural changes driven by seasonal phenology. In such scenarios, while SR may enhance local contrast, it also risks amplifying repetitive or vegetation-driven patterns; thus, its net impact on geometric registration accuracy may differ from that in urban environments.

Consequently, we expect the proposed SR + matching pipeline to be most reliable in areas with abundant man-made structures or mixed textures. Conversely, texture-poor regions may require complementary strategies, including multi-scale matching, radiometrically invariant preprocessing, or land-cover-aware masking. Future research will extend this evaluation to geographically diverse sites and multiple land-cover classes to quantify generalization and identify the specific conditions under which SR provides consistent registration benefits.

4. Conclusions

This study successfully demonstrated that high-accuracy registration of multi-sensor satellite images can be achieved by integrating SR techniques with deep learning-based feature detection and matching algorithms. Using a dataset from C1, K3A, and K3 satellites covering Cheonghak-dong, Osan, we addressed the challenge of inherent low sharpness in high-resolution satellite imagery (2.0–2.8 m). By employing the ResShift SR model to quadruple the resolution, we significantly enhanced the distinctness of edges and corners, facilitating more accurate feature matching.

Comparative analysis showed that deep learning-based models (SPSG, SPLG, and RoMa) consistently outperformed traditional methods on ODA. Furthermore, the application of SR (ODA-SR) yielded the highest NCM and the lowest distance errors, confirming that SR-enhanced data significantly bolsters registration precision. Among the evaluated models, RoMa demonstrated the best overall performance in terms of NCM, distance error, and processing efficiency. Notably, RoMa maintained a minimum of 2300 NCM across the entire dataset, overwhelmingly surpassing the next-best model, SPLG (minimum 177 NCM).

This superior performance is directly attributed to RoMa’s dense matching architecture. Unlike the sparse feature-based strategies of SPSG and SPLG, RoMa’s dense approach proved highly robust in identifying structural correspondences across multi-sensor and multi-temporal images, even under varying seasonal spectral conditions. These findings validate dense matching as a superior paradigm for complex satellite image registration tasks. Although RoMa showed a slight decrease in NCM when processing larger SR-enhanced images—likely due to its internal architecture and fixed parameters under current hardware constraints—its overall matching capability remains exceptionally effective. This study contributes to the advancement of satellite image registration technology, and future research should focus on optimizing computational efficiency and model parameters to develop even more robust multi-satellite registration frameworks.

Author Contributions

Conceptualization, Y.I. and Y.L.; methodology, Y.I. and Y.L.; formal analysis, Y.I.; data curation, Y.I.; writing—original draft preparation, Y.I.; writing—review and editing, Y.L.; supervision, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Environment Industry & Technology Institute (KEITI) through the Project for developing an observation-based GHG emissions geo-spatial information map, funded by the Korea Ministry of Climate, Energy and Environment (MCEE) (RS-2025-00232066).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jeong, J. Comparison of single-sensor stereo model and dual-sensor stereo model with high-resolution satellite imagery. Korean J. Remote Sens. 2015, 31, 421–432. [Google Scholar] [CrossRef][Green Version]
Misra, I.; Rohil, M.K.; Moorthi, S.M.; Dhar, D. Feature based remote sensing image registration techniques: A comprehensive and comparative review. Int. J. Remote Sens. 2022, 43, 4477–4516. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, Y.; Qiao, P.; Lv, X.; Li, J.; Du, T.; Cai, Y. Image registration algorithm for remote sensing images based on pixel location information. Remote Sens. 2023, 15, 436. [Google Scholar] [CrossRef]
Zhang, J.; Dang, S. Improved feature extraction and matching algorithm for point-line fusion. In Proceedings of the Sixteenth International Conference on Graphics and Image Processing (ICGIP 2024), Nanjing, China, 8–10 November 2024; pp. 408–417. [Google Scholar] [CrossRef]
Wang, L.; Lan, C.; Wu, B.; Gao, T.; Wei, Z.; Yao, F. A method for detecting feature-sparse regions and matching enhancement. Remote Sens. 2022, 14, 6214. [Google Scholar] [CrossRef]
Son, J.-H.; Yoon, W.; Kim, T.; Rhee, S. Iterative precision geometric correction for high-resolution satellite images. Korean J. Remote Sens. 2021, 37, 431–447. [Google Scholar] [CrossRef]
Lee, C.N.; Oh, J.H. LiDAR chip for automated geo-referencing of high-resolution satellite imagery. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2014, 32, 319–326. [Google Scholar] [CrossRef]
Shin, J.-I.; Yoon, W.-S.; Park, H.-J.; Oh, K.-Y.; Kim, T.-J. A method to improve matching success rate between KOMPSAT-3A imagery and aerial ortho-images. Korean J. Remote Sens. 2018, 34, 893–903. [Google Scholar] [CrossRef]
Lee, Y.J.; Kim, T. Determination of spatial resolution to improve GCP chip matching performance for CAS-4. Korean J. Remote Sens. 2021, 37, 1517–1526. [Google Scholar] [CrossRef]
Bentoutou, Y.; Taleb, N.; Kpalma, K.; Ronsin, J. An automatic image registration for applications in remote sensing. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2127–2137. [Google Scholar] [CrossRef]
Dong, Y.; Jiao, W.; Long, T.; Liu, L.; He, G. Eliminating the effect of image border with image periodic decomposition for phase correlation based remote sensing image registration. Sensors 2019, 19, 2329. [Google Scholar] [CrossRef] [PubMed]
Tong, X.; Ye, Z.; Xu, Y.; Gao, S.; Xie, H.; Du, Q.; Liu, S.; Xu, X.; Liu, S.; Luan, K. Image registration with Fourier-based image correlation: A comprehensive review of developments and applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4062–4081. [Google Scholar] [CrossRef]
Ye, Z.; Kang, J.; Yao, J.; Song, W.; Liu, S.; Luo, X.; Xu, Y.; Tong, X. Robust fine registration of multisensor remote sensing images based on enhanced subpixel phase correlation. Sensors 2020, 20, 4338. [Google Scholar] [CrossRef]
Fan, R.; Hou, B.; Liu, J.; Yang, J.; Hong, Z. Registration of multiresolution remote sensing images based on L2-siamese model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 237–248. [Google Scholar] [CrossRef]
Tondewad, M.P.S.; Dale, M.M.P. Remote sensing image registration methodology: Review and discussion. Procedia Comput. Sci. 2020, 171, 2390–2399. [Google Scholar] [CrossRef]
Lee, W.; Sim, D.; Oh, S.-J. A CNN-based high-accuracy registration for remote sensing images. Remote Sens. 2021, 13, 1482. [Google Scholar] [CrossRef]
Feng, R.; Du, Q.; Shen, H.; Li, X. Region-by-region registration combining feature-based and optical flow methods for remote sensing images. Remote Sens. 2021, 13, 1475. [Google Scholar] [CrossRef]
Zitova, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef]
Brown, L. A survey of image registration Techniques. ACM Comput. Surv. 1992, 24, 325–376. [Google Scholar] [CrossRef]
Myronenko, A.; Song, X. Image registration by minimization of residual complexity. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 49–56. [Google Scholar] [CrossRef]
Ye, Y. Fast and robust registration of multimodal remote sensing images via dense orientated gradient feature. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 1009–1015. [Google Scholar] [CrossRef]
Ma, W.; Wen, Z.; Wu, Y.; Jiao, L.; Gong, M.; Zheng, Y.; Liu, L. Remote sensing image registration with modified SIFT and enhanced feature matching. IEEE Geosci. Remote Sens. Lett. 2016, 14, 3–7. [Google Scholar] [CrossRef]
Chang, H.-H.; Wu, G.-L.; Chiang, M.-H. Remote sensing image registration based on modified SIFT and feature slope grouping. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1363–1367. [Google Scholar] [CrossRef]
Feng, R.; Du, Q.; Li, X.; Shen, H. Robust registration for remote sensing images by combining and localizing feature-and area-based methods. ISPRS J. Photogramm. Remote Sens. 2019, 151, 15–26. [Google Scholar] [CrossRef]
Zhang, X.; Leng, C.; Hong, Y.; Pei, Z.; Cheng, I.; Basu, A. Multimodal remote sensing image registration methods and advancements: A survey. Remote Sens. 2021, 13, 5128. [Google Scholar] [CrossRef]
Karami, E.; Prasad, S.; Shehata, M. Image matching using SIFT, SURF, BRIEF and ORB: Performance comparison for distorted images. arXiv 2017, arXiv:1710.02726. [Google Scholar] [CrossRef]
Teke, M.; Temizel, A. Multi-spectral satellite image registration using scale-restricted SURF. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2310–2313. [Google Scholar] [CrossRef]
Alnagdawi, M.A.; Hashim, S.Z.M. ORB-PC feature-based image registration. In Proceedings of the 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 17–19 September 2019; pp. 111–115. [Google Scholar] [CrossRef]
Cheng, M.-L.; Matsuoka, M. An efficient and precise remote sensing optical image matching technique using binary-based feature points. Sensors 2021, 21, 6035. [Google Scholar] [CrossRef]
Meng, L.; Zhou, J.; Liu, S.; Ding, L.; Zhang, J.; Wang, S.; Lei, T. Investigation and evaluation of algorithms for unmanned aerial vehicle multispectral image registration. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102403. [Google Scholar] [CrossRef]
Patel, L.; Patel, M. Feature Based Image Registration Using ORB and CNN for Remote Sensing Images. Indian J. Sci. Technol. 2023, 16, 3803–3813. [Google Scholar] [CrossRef]
Nassar, A.; Amer, K.; ElHakim, R.; ElHelw, M. A deep CNN-based framework for enhanced aerial imagery registration with applications to UAV geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1513–1523. [Google Scholar] [CrossRef]
Yang, Z.; Dan, T.; Yang, Y. Multi-temporal remote sensing image registration using deep convolutional features. IEEE Access 2018, 6, 38544–38555. [Google Scholar] [CrossRef]
Nguyen, T.T.; Hoang, T.D.; Pham, M.T.; Vu, T.T.; Nguyen, T.H.; Huynh, Q.-T.; Jo, J. Monitoring agriculture areas with satellite images and deep learning. Appl. Soft Comput. 2020, 95, 106565. [Google Scholar] [CrossRef]
Ghaffarian, S.; Valente, J.; Van Der Voort, M.; Tekinerdogan, B. Effect of attention mechanism in deep learning-based remote sensing image processing: A systematic literature review. Remote Sens. 2021, 13, 2965. [Google Scholar] [CrossRef]
DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar] [CrossRef]
Sarlin, P.-E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar] [CrossRef]
Lindenberger, P.; Sarlin, P.-E.; Pollefeys, M. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 17627–17638. [Google Scholar] [CrossRef]
Ren, Y.; Liu, Y.; Huang, Z.; Liu, W.; Wang, W. 2ChADCNN: A template matching network for season-changing UAV aerial images and satellite imagery. Drones 2023, 7, 558. [Google Scholar] [CrossRef]
Ioli, F.; Dematteis, N.; Giordan, D.; Nex, F.; Pinto, L. Deep learning low-cost photogrammetry for 4D short-term glacier dynamics monitoring. PFG-J. Photogramm. Remote Sens. Geoinf. Sci. 2024, 92, 657–678. [Google Scholar] [CrossRef]
Tang, B.; Yang, X.; Lu, R.; Fan, J.; Li, Q.; Zhang, Z.; Wang, S.; Su, S. A rotation matching method for heterogeneous images based on invariant feature translation. In Proceedings of the 2024 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024; pp. 7739–7744. [Google Scholar] [CrossRef]
Li, Y.; Wei, C.; Wu, D.; Cui, Y.; He, P.; Zhang, Y.; Wang, R. A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5651921. [Google Scholar] [CrossRef]
Stoken, A.; Ilhardt, P.; Lambert, M.; Fisher, K. (Street) Lights Will Guide You: Georeferencing Nighttime Astronaut Photography of Earth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 492–501. [Google Scholar] [CrossRef]
Berton, G.; Goletto, G.; Trivigno, G.; Stoken, A.; Caputo, B.; Masone, C. Earthmatch: Iterative coregistration for fine-grained localization of astronaut photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 4264–4274. [Google Scholar] [CrossRef]
Liu, T.; Zhao, S.; Jiang, W.; Guo, B. Sat-DN: Implicit Surface Reconstruction from Multi-View Satellite Images with Depth and Normal Supervision. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 25374–25390. [Google Scholar] [CrossRef]
Su, N.; Yan, Y.; Qiu, M.; Zhao, C.; Wang, L. Object-based dense matching method for maintaining structure characteristics of linear buildings. Sensors 2018, 18, 1035. [Google Scholar] [CrossRef]
Gong, X.; Yao, F.; Ma, J.; Jiang, J.; Lu, T.; Zhang, Y.; Zhou, H. Feature matching for remote-sensing image registration via neighborhood topological and affine consistency. Remote Sens. 2022, 14, 2606. [Google Scholar] [CrossRef]
Ye, Z.; Xu, Y.; Chen, H.; Zhu, J.; Tong, X.; Stilla, U. Area-based dense image matching with subpixel accuracy for remote sensing applications: Practical analysis and comparative study. Remote Sens. 2020, 12, 696. [Google Scholar] [CrossRef]
Ye, Y.; Shan, J. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 2014, 90, 83–95. [Google Scholar] [CrossRef]
Gao, C.; Li, W.; Tao, R.; Du, Q. MS-HLMO: Multiscale histogram of local main orientation for remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5626714. [Google Scholar] [CrossRef]
Yue, Z.; Wang, J.; Loy, C.C. Efficient diffusion model for image restoration by residual shifting. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 116–130. [Google Scholar] [CrossRef]
Wang, Y.; Yuan, W.; Xie, F.; Lin, B. ESatSR: Enhancing Super-Resolution for Satellite Remote Sensing Images with State Space Model and Spatial Context. Remote Sens. 2024, 16, 1956. [Google Scholar] [CrossRef]
Han, L.; Zhao, Y.; Lv, H.; Zhang, Y.; Liu, H.; Bi, G.; Han, Q. Enhancing remote sensing image super-resolution with efficient hybrid conditional diffusion model. Remote Sens. 2023, 15, 3452. [Google Scholar] [CrossRef]
Kang, W.; Jung, M.; Kim, Y. A study on training dataset configuration for deep learning based image matching of multi-sensor VHR satellite images. Korean J. Remote Sens. 2022, 38, 1505–1514. [Google Scholar] [CrossRef]
Yun, Y.; Kim, T.; Oh, J.; Han, Y. Analysis of co-registration performance according to geometric processing level of KOMPSAT-3/3A reference image. Korean J. Remote Sens. 2021, 37, 221–232. [Google Scholar] [CrossRef]
Dufournaud, Y.; Schmid, C.; Horaud, R. Matching images with different resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Hilton Head Island, SC, USA, 15 June 2000; pp. 612–618. [Google Scholar] [CrossRef]
Xu, W.; Guangluan, X.; Wang, Y.; Sun, X.; Lin, D.; Yirong, W. High quality remote sensing image super-resolution using deep memory connected network. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8889–8892. [Google Scholar] [CrossRef]
Xiangchun, L.; Zhan, C.; Wei, S.; Fenglei, L.; Yanxing, Y. Data Matching of Solar Images Super-Resolution Based on Deep Learning. Comput. Mater. Contin. 2021, 68, 4017–4029. [Google Scholar] [CrossRef]
Kang, J.; Lee, Y.-W.; Kim, D. A Comparative Study of Deep Learning-Based Super-Resolution Techniques on Sentinel-2 and CAS500-1 Satellites. Geogr. J. Korea 2023, 57, 541–555. [Google Scholar] [CrossRef]
Anwar, S.; Khan, S.; Barnes, N. A deep journey into super-resolution: A survey. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
Li, X.; Ren, Y.; Jin, X.; Lan, C.; Wang, X.; Zeng, W.; Wang, X.; Chen, Z. Diffusion models for image restoration and enhancement: A comprehensive survey. Int. J. Comput. Vis. 2025, 133, 8078–8108. [Google Scholar] [CrossRef]
Xia, B.; Zhang, Y.; Wang, S.; Wang, Y.; Wu, X.; Tian, Y.; Yang, W.; Van Gool, L. Diffir: Efficient diffusion model for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 13095–13105. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Gao, L.; Han, Z.; Li, Z.; Chanussot, J. Enhanced Deep Image Prior for Unsupervised Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5504218. [Google Scholar] [CrossRef]
Fatiha, M.; Oussama, M. Automatic Feature Points Based Registration Technique of Satellite Images. In Proceedings of the 2024 8th International Conference on Image and Signal Processing and Their Applications (ISPA), Biskra, Algeria, 21–22 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
Khalili, F.; Razzazi, F.; Hosseini, S.A. Remote Sensing Image Registration Using Fast Visual Saliency and Improved SIFT. In Proceedings of the 2024 32nd International Conference on Electrical Engineering (ICEE), Kitakyushu, Japan, 30 June–4 July 2024; pp. 1–5. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar] [CrossRef]
Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar] [CrossRef]
Luo, Q.; Zhang, J.; Xie, Y.; Huang, X.; Han, T. Comparative Analysis of Advanced Feature Matching Algorithms in Challenging High Spatial Resolution Optical Satellite Stereo Scenarios. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 2645–2649. [Google Scholar] [CrossRef]
Song, S.; Morelli, L.; Wu, X.; Qin, R.; Albanwan, H.; Remondino, F. Deep Learning Meets Satellite Images—An Evaluation on Handcrafted and Learning-Based Features for Multi-date Satellite Stereo Images. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 50–67. [Google Scholar] [CrossRef]
Hou, H.; Lan, C.; Xu, Q.; Lv, L.; Xiong, X.; Yao, F.; Wang, L. Attention-based matching approach for heterogeneous remote sensing images. Remote Sens. 2022, 15, 163. [Google Scholar] [CrossRef]
He, X.; Feng, S.; Shao, W.; He, H.; Wu, L. An enhanced feature matching multi-temporal port remote sensing image registration network E-SuperGlue. In Proceedings of the International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), Guangzhou, China, 1–3 March 2024; pp. 96–109. [Google Scholar] [CrossRef]
Edstedt, J.; Sun, Q.; Bökman, G.; Wadenbäck, M.; Felsberg, M. Roma: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 19790–19800. [Google Scholar] [CrossRef]

Figure 1. Overall workflow of the proposed multi-sensor satellite image registration framework incorporating diffusion-based super-resolution (ResShift) and deep learning-based feature matching.

Figure 2. Overview of the study area and representative satellite imagery: (a) Geographic location of the study site in South Korea (red circle); (b) spatial footprints of the acquired satellite scenes (C1, K3A, and K3) relative to the Area of Interest (AOI); (c–f) sample patches of the dataset showing (c) C1 winter, (d) C1 summer, (e) K3A winter, and (f) K3 winter.

Figure 3. CAS500-1 original images in winter: (a) without atmospheric (OD), (b) with atmospheric correction (ODA), and (c) with atmospheric correction and SR (ODA-SR).

Figure 4. CAS500-1 original images in summer: (a) without atmospheric (OD), (b) with atmospheric correction (ODA), and (c) with atmospheric correction and SR (ODA-SR).

Figure 5. KOMPSAT-3A original images in winter: (a) without atmospheric (OD), (b) with atmospheric correction (ODA), and (c) with atmospheric correction and SR (ODA-SR).

Figure 6. KOMPSAT-3 original images in winter: (a) without atmospheric (OD), (b) with atmospheric correction (ODA), and (c) with atmospheric correction and SR (ODA-SR).

Figure 7. Visual comparison of ×4 upsampling results on a representative urban patch from the C1-S (Summer) scene: (a) Original ODA image, (b) ×4 Lanczos interpolation (classical baseline), and (c) ×4 ResShift super-resolution (ODA-SR). While Lanczos interpolation introduces slight blurring around complex structures, the ResShift model significantly enhances edge sharpness and recovers fine structural details, such as roof boundaries and road markings, which are critical for high-precision feature matching.

Figure 8. Checkerboard mosaic visualization for qualitative registration assessment: Alternating patches from the reference and source images are arranged in a 3 × 3 grid to facilitate the visual inspection of feature continuity and alignment accuracy. This method allows for a direct comparison of structural consistency (e.g., roads and building boundaries) between the registered imagery pairs.

Figure 9. Visualization of NCM from HC models: (a) C1–K3A, (b) C1–K3, (c) and K3A–K3. The green lines indicate the correct matches between the image pairs.

Figure 10. Visualization of NCM from deep learning models: (a) C1–K3A, (b) C1–K3, and (c) K3A–K3. For a better visual assessment of RoMa (NCM = 6525), only 200 randomly sampled keypoints were displayed, similar to the SPLG (NCM = 208) visualization. The green lines indicate the correct matches between the image pairs.

Figure 11. Comparison of NCM before and after the application of SR: (a) ODA and (b) ODA-SR. For a better visual assessment, only 300 randomly sampled keypoints were displayed. The green lines indicate the correct matches between the image pairs.

Figure 12. Mosaic images of C1–K3A registration results. The comparison highlights the registration precision using the SPSG model across different stages: (a) before ODA registration, (b) after ODA registration without SR, and (c) after ODA registration with SR-derived homography. The green circles indicate regions where the super-resolution (SR) approach achieved the most significant improvement in Number of Correct Matches (NCM) and spatial alignment.

Figure 13. Visual comparison of the mosaic of C1-S (Summer) and K3A (Winter) images before and after registration. Mosaic results illustrate: (a) the initial alignment before registration, and (b) the registration result achieved using the RoMa model with the ODA-SR dataset. The green circles highlight areas where precise spatial alignment and structural continuity were significantly improved after registration, despite the challenges of summer vs. winter images.

Figure 14. Spectral characteristics analysis of the experimental datasets: (a) RGB channel histograms and (b) pixel intensity distributions for C1-S (Summer), C1-W (Winter), and K3A (Winter) images. The high similarity in histogram profiles between C1-W and K3A (winter–winter pair) facilitates higher matching success, whereas the bimodal distribution in C1-S due to high vegetation reflectance creates radiometric discrepancies for cross-seasonal registration.

Table 1. Detailed specifications and acquisition parameters of the CAS500-1, KOMPSAT-3A, and KOMPSAT-3 satellite imagery utilized in this study.

Satellite	CAS500-1		KOMPSAT-3A	KOMPSAT-3
Product Level	Level2G	Level2G	Level1G	Level1G
Season of image	Winter	Summer	Winter	Winter
Acquisition date	8 December 2023	8 August 2023	29 January 2019	3 March 2014
GSD * (Column × Row)	2.0 m (2.03 × 2.03)	2.0 m (2.05 × 2.03)	2.2 m (2.17 × 2.16)	2.8 m (2.80 × 2.80)
Size of scene (Column × Row)	7353 × 7229	7438 × 7253	6980 × 7030	7345 × 6824
Incidence angle	2.92°	7.42°	3.24°	11.88°

* GSD: Ground Sample Distance.

Table 3. Quantitative performance comparison between handcrafted (HC) and deep learning-based models for multi-sensor satellite image registration using the ODA dataset. The evaluation metrics include the Number of Correct Matches (NCM), pixel-level distance error, geographic distance error (m), and processing time (s). Best results are in bold.

Satellites	Model	Number of Correct Matches	Pixel Distance (Pixel)		Geographic Distance (m)		Processing Time (s)
Satellites	Model	Number of Correct Matches	Before Registration	After Registration	Before Registration	After Registration	Processing Time (s)
C1–K3A	SIFT	12	4.60	0.20	10.20	0.44	0.28
	ORB	2		0.36		0.79	0.15
	SPSG	22		0.43		0.94	0.34
	SPLG	208		0.46		1.01	2.12
	RoMa	6525		0.25		0.55	1.21
C1–K3	SIFT	4	14.90	0.25	41.70	0.70	0.28
	ORB	3		0.14		0.39	0.15
	SPSG	5		0.31		0.86	0.34
	SPLG	177		0.40		1.12	2.22
	RoMa	6359		0.26		0.72	1.19
K3A–K3	SIFT	6	12.0	0.33	33.60	0.92	0.24
	ORB	1		0.18		0.50	0.14
	SPSG	6		0.51		1.42	0.33
	SPLG	185		0.42		1.17	2.24
	RoMa	4295		0.37		1.03	1.17

Table 4. Quantitative performance analysis of deep learning models for multi-sensor satellite image registration: A comparative study of ODA and ODA-SR datasets. This table highlights the impact of super-resolution (SR) on the Number of Correct Matches (NCM), pixel-level distance error, geographic coordinate error (m), and computational efficiency (s) across various sensor pairs. After-registration distance errors are presented as mean ± standard deviation calculated over NCM correspondences to reflect alignment stability. Best results are in bold.

Dataset	Satellites	Model	Number of Correct Matches	Pixel Distance (Pixel)		Geographic Distance (m)		Processing Time (s)
Dataset	Satellites	Model	Number of Correct Matches	Before Registration	After Registration	Before Registration	After Registration	Processing Time (s)
ODA	C1–K3A	SPSG	22	4.60	0.43 ± 0.22	10.20	0.94 ± 0.47	0.34
		SPLG	208		0.46 ± 0.22		1.01 ± 0.48	2.12
		RoMa	6525		0.25 ± 0.15		0.55 ± 0.33	1.21
	C1–K3	SPSG	5	14.90	0.31 ± 0.05	41.70	0.86 ± 0.19	0.34
		SPLG	177		0.40 ± 0.21		1.12 ± 0.58	2.22
		RoMa	6359		0.26 ± 0.14		0.72 ± 0.42	1.19
	K3A–K3	SPSG	6	12.00	0.51 ± 0.15	33.60	1.42 ± 0.43	0.33
		SPLG	185		0.42 ± 0.22		1.17 ± 0.61	2.24
		RoMa	4295		0.37 ± 0.18		1.03 ± 0.51	1.17
ODA-SR	C1–K3A	SPSG	670	4.30	0.45 ± 0.22	10.20	0.99 ± 0.49	11.78
		SPLG	692		0.40 ± 0.22		0.88 ± 0.49	2.63
		RoMa	4196		0.20 ± 0.15		0.44 ± 0.34	1.89
	C1–K3	SPSG	283	14.70	0.47 ± 0.23	41.70	1.31 ± 0.64	2.32
		SPLG	462		0.40 ± 0.21		1.12 ± 0.60	2.67
		RoMa	4000		0.22 ± 0.14		0.61 ± 0.42	1.31
	K3A–K3	SPSG	445	12.00	0.48 ± 0.23	33.60	1.34 ± 0.65	2.91
		SPLG	489		0.40 ± 0.21		1.12 ± 0.59	2.65
		RoMa	2325		0.30 ± 0.19		0.84 ± 0.53	1.30

Table 5. Performance comparison of deep learning models for seasonal satellite imagery using the ODA and ODA-SR datasets for C1-S (Summer), C1-W (Winter), and K3A (Winter).

Dataset	Satellites	Model	Number of Correct Matches	Pixel Distance (Pixel)		Geographic Distance (m)		Processing Time (s)
Dataset	Satellites	Model	Number of Correct Matches	Before Registration	After Registration	Before Registration	After Registration	Processing Time (s)
ODA	C1-S vs. K3A	SPSG	12	4.30	0.47	9.40	1.03	0.36
		SPLG	159		0.46		1.01	2.16
		RoMa	6026		0.26		0.57	1.19
	C1-W vs. K3A	SPSG	22	4.60	0.43	10.10	0.94	0.34
		SPLG	208		0.46		1.01	2.12
		RoMa	6525		0.25		0.55	1.21
ODA-SR	C1-S vs. K3A	SPSG	441	4.30	0.47	9.40	1.03	7.55
		SPLG	511		0.40		0.88	2.63
		RoMa	4099		0.23		0.50	1.33
	C1-W vs. K3A	SPSG	670	4.60	0.45	10.10	0.99	11.78
		SPLG	692		0.40		0.88	2.63
		RoMa	4196		0.20		0.44	1.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Im, Y.; Lee, Y. Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques. Appl. Sci. 2026, 16, 1113. https://doi.org/10.3390/app16021113

AMA Style

Im Y, Lee Y. Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques. Applied Sciences. 2026; 16(2):1113. https://doi.org/10.3390/app16021113

Chicago/Turabian Style

Im, Yungyo, and Yangwon Lee. 2026. "Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques" Applied Sciences 16, no. 2: 1113. https://doi.org/10.3390/app16021113

APA Style

Im, Y., & Lee, Y. (2026). Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques. Applied Sciences, 16(2), 1113. https://doi.org/10.3390/app16021113

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Multi-Resolution Satellite Image Registration Using Deep Feature Matching and Super Resolution Techniques

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.1.1. Description of Satellite Data and Study Area

2.1.2. Data Preprocessing

2.1.3. Experiment Dataset

2.2. Method

2.2.1. Super Resolution

2.2.2. Handcrafted Feature Matching Algorithms

2.2.3. Deep Learning-Based Feature Matching Algorithms

2.2.4. Homography-Based Image Transformation

2.2.5. Model Performance Evaluation

3. Results and Discussion

3.1. Performance Comparison Between Models

3.2. Impact of Super-Resolution on Registration Performance

3.3. Performance Comparison Across Seasons

3.4. Limitations and Generalizability

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI