Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (246)

Search Parameters:
Keywords = local image registration

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 24301 KiB  
Article
Robust Optical and SAR Image Registration Using Weighted Feature Fusion
by Ao Luo, Anxi Yu, Yongsheng Zhang, Wenhao Tong and Huatao Yu
Remote Sens. 2025, 17(15), 2544; https://doi.org/10.3390/rs17152544 - 22 Jul 2025
Viewed by 302
Abstract
Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). [...] Read more.
Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). To overcome this challenge, this article proposes a novel optical–SAR image registration method named Gradient and Standard Deviation Feature Weighted Fusion (GDWF). First, a Block-local standard deviation (Block-LSD) operator is proposed to extract block-based feature points with regional adaptability. Subsequently, a dual-modal feature description is developed, constructing both gradient-based descriptors and local standard deviation (LSD) descriptors for the neighborhoods surrounding the detected feature points. To further enhance matching robustness, a confidence-weighted feature fusion strategy is proposed. By establishing a reliability evaluation model for similarity measurement maps, the contribution weights of gradient features and LSD features are dynamically optimized, ensuring adaptive performance under varying conditions. To verify the effectiveness of the method, different optical and SAR datasets are used to compare it with the currently advanced algorithms MOGF, CFOG, and FED-HOPC. The experimental results demonstrate that the proposed GDWF algorithm achieves the best performance in terms of registration accuracy and robustness among all compared methods, effectively handling optical–SAR image pairs with significant regional heterogeneity. Full article
Show Figures

Figure 1

26 pages, 6798 KiB  
Article
Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering
by Qi Kang, Jixian Zhang, Guoman Huang and Fei Liu
Remote Sens. 2025, 17(14), 2501; https://doi.org/10.3390/rs17142501 - 18 Jul 2025
Viewed by 391
Abstract
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and [...] Read more.
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and efficient optical–SAR image registration. The proposed method integrates a structure-enhanced feature extractor, RS2FNet, which combines dual-stage Res2Net modules with a bi-level routing attention mechanism to capture multi-scale local textures and global structural semantics. A context-aware matching module refines correspondences through self- and cross-attention, coupled with a confidence-driven early-exit pruning strategy to reduce computational cost while maintaining accuracy. Additionally, a match-aware multi-task loss function jointly enforces spatial consistency, affine invariance, and structural coherence for end-to-end optimization. Experiments on public datasets (SEN1-2 and WHU-OPT-SAR) and a self-collected Gaofen (GF) dataset demonstrated that ACAMatch significantly outperformed existing state-of-the-art methods in terms of the number of correct matches, matching accuracy, and inference speed, especially under challenging conditions such as resolution differences and severe structural distortions. These results indicate the effectiveness and generalizability of the proposed approach for multimodal image registration, making ACAMatch a promising solution for remote sensing applications such as change detection and multi-sensor data fusion. Full article
(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)
Show Figures

Figure 1

27 pages, 86462 KiB  
Article
SAR Image Registration Based on SAR-SIFT and Template Matching
by Shichong Liu, Xiaobo Deng, Chun Liu and Yongchao Cheng
Remote Sens. 2025, 17(13), 2216; https://doi.org/10.3390/rs17132216 - 27 Jun 2025
Viewed by 367
Abstract
Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar [...] Read more.
Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar imaging. To address these issues, this paper proposes a novel two-stage registration method, consisting of pre-registration and fine registration. In the pre-registration stage, the scale-invariant feature transform for the synthetic aperture radar (SAR-SIFT) algorithm is integrated into an iterative optimization framework to eliminate large-scale geometric discrepancies, ensuring a coarse but reliable initial alignment. In the fine registration stage, a novel similarity measure is introduced by combining frequency-domain phase congruency and spatial-domain gradient features, which enhances the robustness and accuracy of template matching, especially in edge-rich regions. For the topographic relief in the SAR images, an adaptive local stretching transformation strategy is proposed to correct the undulating areas. Experiments on five pairs of SAR images containing flat and undulating regions show that the proposed method achieves initial alignment errors below 10 pixels and final registration errors below 1 pixel. Compared with other methods, our approach obtains more correct matching pairs (up to 100+ per image pair), higher registration precision, and improved robustness under complex terrains. These results validate the accuracy and effectiveness of the proposed registration framework. Full article
Show Figures

Figure 1

28 pages, 11793 KiB  
Article
Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network
by Xiaoye Bi, Rongkai Qie, Chengyang Tao, Zhaoxiang Zhang and Yuelei Xu
Remote Sens. 2025, 17(13), 2160; https://doi.org/10.3390/rs17132160 - 24 Jun 2025
Cited by 1 | Viewed by 398
Abstract
Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% [...] Read more.
Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% reduction in mean squared error (from 0.0106 to 0.0068), 9.27% enhancement in normalized cross-correlation, 26% improvement in local normalized cross-correlation, and 8% increase in mutual information compared to state-of-the-art methods. The architecture integrates a cross-modal style transfer network (CSTNet) that transforms visible images into pseudo-infrared representations to unify modality characteristics, and a multi-scale cascaded registration network (MCRNet) that performs progressive spatial alignment across multiple resolution scales using diffeomorphic deformation modeling to ensure smooth and invertible transformations. A self-supervised learning paradigm based on image reconstruction eliminates reliance on manually annotated data while maintaining registration accuracy through synthetic deformation generation. Extensive experiments on the LLVIP dataset demonstrate the method’s robustness under challenging conditions involving large-scale transformations, with ablation studies confirming that style transfer contributes 28% MSE improvement and diffeomorphic registration prevents 10.6% performance degradation. The proposed approach provides a robust solution for cross-modal image registration in dynamic UAV environments, offering significant implications for downstream applications such as target detection, tracking, and surveillance. Full article
(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)
Show Figures

Graphical abstract

28 pages, 3438 KiB  
Article
Optimizing Remote Sensing Image Retrieval Through a Hybrid Methodology
by Sujata Alegavi and Raghvendra Sedamkar
J. Imaging 2025, 11(6), 179; https://doi.org/10.3390/jimaging11060179 - 28 May 2025
Viewed by 568
Abstract
The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective [...] Read more.
The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective data management, retrieval, and exploitation. The classification of large-sized images at the pixel level generates substantial data, escalating the workload and search space for similarity measurement. Semantic-based image retrieval remains an open problem due to limitations in current artificial intelligence techniques. Furthermore, on-board storage constraints compel the application of numerous compression algorithms to reduce storage space, intensifying the difficulty of retrieving substantial, sensitive, and target-specific data. This research proposes an innovative hybrid approach to enhance the retrieval of remotely sensed images. The approach leverages multilevel classification and multiscale feature extraction strategies to enhance performance. The retrieval system comprises two primary phases: database building and retrieval. Initially, the proposed Multiscale Multiangle Mean-shift with Breaking Ties (MSMA-MSBT) algorithm selects informative unlabeled samples for hyperspectral and synthetic aperture radar images through an active learning strategy. Addressing the scaling and rotation variations in image capture, a flexible and dynamic algorithm, modified Deep Image Registration using Dynamic Inlier (IRDI), is introduced for image registration. Given the complexity of remote sensing images, feature extraction occurs at two levels. Low-level features are extracted using the modified Multiscale Multiangle Completed Local Binary Pattern (MSMA-CLBP) algorithm to capture local contexture features, while high-level features are obtained through a hybrid CNN structure combining pretrained networks (Alexnet, Caffenet, VGG-S, VGG-M, VGG-F, VGG-VDD-16, VGG-VDD-19) and a fully connected dense network. Fusion of low- and high-level features facilitates final class distinction, with soft thresholding mitigating misclassification issues. A region-based similarity measurement enhances matching percentages. Results, evaluated on high-resolution remote sensing datasets, demonstrate the effectiveness of the proposed method, outperforming traditional algorithms with an average accuracy of 86.66%. The hybrid retrieval system exhibits substantial improvements in classification accuracy, similarity measurement, and computational efficiency compared to state-of-the-art scene classification and retrieval methods. Full article
(This article belongs to the Topic Computational Intelligence in Remote Sensing: 2nd Edition)
Show Figures

Figure 1

27 pages, 9977 KiB  
Article
Mergeable Probabilistic Voxel Mapping for LiDAR–Inertial–Visual Odometry
by Balong Wang, Nassim Bessaad, Huiying Xu, Xinzhong Zhu and Hongbo Li
Electronics 2025, 14(11), 2142; https://doi.org/10.3390/electronics14112142 - 24 May 2025
Cited by 1 | Viewed by 813
Abstract
To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in [...] Read more.
To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in environmental geometric plane features and optimizes computational efficiency through a voxel merging strategy. Additionally, it integrates color information from cameras to further enhance localization accuracy. Specifically, in the LiDAR–inertial odometry (LIO) subsystem, a probabilistic voxel plane model is constructed for LiDAR point clouds to explicitly represent measurement noise uncertainty, thereby improving the accuracy and robustness of point cloud registration. A voxel merging strategy based on the union-find algorithm is introduced to merge coplanar voxel planes, reducing computational load. In the visual–inertial odometry (VIO) subsystem, image tracking points are generated through a global map projection, and outlier points are eliminated using a random sample consensus algorithm based on a dynamic Bayesian network. Finally, state estimation accuracy is enhanced by jointly optimizing frame-to-frame reprojection errors and frame-to-map RGB color errors. Experimental results demonstrate that the proposed method achieves root mean square errors (RMSEs) of absolute trajectory error at 0.478 m and 0.185 m on the M2DGR and NTU-VIRAL datasets, respectively, while attaining real-time performance with an average processing time of 39.19 ms per-frame on the NTU-VIRAL datasets. Compared to state-of-the-art approaches, our method exhibits significant improvements in both accuracy and computational efficiency. Full article
(This article belongs to the Special Issue Advancements in Robotics: Perception, Manipulation, and Interaction)
Show Figures

Figure 1

26 pages, 9328 KiB  
Article
Global Optical and SAR Image Registration Method Based on Local Distortion Division
by Bangjie Li, Dongdong Guan, Yuzhen Xie, Xiaolong Zheng, Zhengsheng Chen, Lefei Pan, Weiheng Zhao and Deliang Xiang
Remote Sens. 2025, 17(9), 1642; https://doi.org/10.3390/rs17091642 - 6 May 2025
Viewed by 596
Abstract
Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such [...] Read more.
Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such as perspective shrinkage and occlusion. As a result, it becomes difficult to represent the spatial correspondence between optical and SAR images using a single geometric model. To address this challenge, we propose a global optical-SAR image registration method that leverages local distortion characteristics. Specifically, we introduce a Superpixel-based Local Distortion Division (SLDD) method, which defines superpixel region features and segments the image into local distortion and normal regions by computing the Mahalanobis distance between superpixel features. We further design a Multi-Feature Fusion Capsule Network (MFFCN) that integrates shallow salient features with deep structural details, reconstructing the dimensions of digital capsules to generate feature descriptors encompassing texture, phase, structure, and amplitude information. This design effectively mitigates the information loss and feature degradation problems caused by pooling operations in conventional convolutional neural networks (CNNs). Additionally, a hard negative mining loss is incorporated to further enhance feature discriminability. Feature descriptors are extracted separately from regions with different distortion levels, and corresponding transformation models are built for local registration. Finally, the local registration results are fused to generate a globally aligned image. Experimental results on public datasets demonstrate that the proposed method achieves superior performance over state-of-the-art (SOTA) approaches in terms of Root Mean Squared Error (RMSE), Correct Match Number (CMN), Distribution of Matched Points (Scat), Edge Fidelity (EF), and overall visual quality. Full article
(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)
Show Figures

Figure 1

19 pages, 12128 KiB  
Article
Marker-Less Navigation System for Anterior Cruciate Ligament Reconstruction with 3D Femoral Analysis and Arthroscopic Guidance
by Shuo Wang, Weili Shi, Shuai Yang, Jiahao Cui and Qinwei Guo
Bioengineering 2025, 12(5), 464; https://doi.org/10.3390/bioengineering12050464 - 27 Apr 2025
Viewed by 541
Abstract
Accurate femoral tunnel positioning is crucial for successful anterior cruciate ligament reconstruction (ACLR), yet traditional arthroscopic techniques face significant challenges in spatial orientation and precise anatomical localization. This study presents a novel marker-less computer-assisted navigation system that integrates three-dimensional femoral modeling with real-time [...] Read more.
Accurate femoral tunnel positioning is crucial for successful anterior cruciate ligament reconstruction (ACLR), yet traditional arthroscopic techniques face significant challenges in spatial orientation and precise anatomical localization. This study presents a novel marker-less computer-assisted navigation system that integrates three-dimensional femoral modeling with real-time arthroscopic guidance. The system employs advanced image processing techniques for accurate condyle segmentation and implements the Bernard and Hertel (BH) grid system for standardized positioning. A curvature-based feature extraction approach precisely identifies the capsular line reference (CLR) on the lateral condyle surface, forming the foundation for establishing the BH reference grid. The system’s two-stage registration framework, combining SIFT-ICP algorithms, achieves accurate alignment between preoperative models and arthroscopic views. Validation results from expert surgeons demonstrated high precision, with 71.5% of test groups achieving acceptable or excellent performance standards (mean deviation distances: 1.12–1.86 mm). Unlike existing navigation solutions, our system maintains standard surgical workflow without requiring additional surgical instruments or markers, offering an efficient and minimally invasive approach to enhance ACLR precision. This innovation bridges the gap between preoperative planning and intraoperative execution, potentially improving surgical outcomes through standardized tunnel positioning. Full article
(This article belongs to the Special Issue Advances in Medical 3D Vision: Voxels and Beyond)
Show Figures

Figure 1

22 pages, 2872 KiB  
Article
Wavelet-Guided Multi-Scale ConvNeXt for Unsupervised Medical Image Registration
by Xuejun Zhang, Aobo Xu, Ganxin Ouyang, Zhengrong Xu, Shaofei Shen, Wenkang Chen, Mingxian Liang, Guiqi Zhang, Jiashun Wei, Xiangrong Zhou and Dongbo Wu
Bioengineering 2025, 12(4), 406; https://doi.org/10.3390/bioengineering12040406 - 11 Apr 2025
Cited by 2 | Viewed by 974
Abstract
Medical image registration is essential in clinical practices such as surgical navigation and image-guided diagnosis. The Transformer architecture of TransMorph demonstrates better accuracy in non-rigid registration tasks. However, its weaker spatial locality priors necessitate large-scale training datasets and a heavy number of parameters, [...] Read more.
Medical image registration is essential in clinical practices such as surgical navigation and image-guided diagnosis. The Transformer architecture of TransMorph demonstrates better accuracy in non-rigid registration tasks. However, its weaker spatial locality priors necessitate large-scale training datasets and a heavy number of parameters, which conflict with the limited annotated data and real-time demands of clinical workflows. Moreover, traditional downsampling and upsampling always degrade high-frequency anatomical features such as tissue boundaries or small lesions. We proposed WaveMorph, a wavelet-guided multi-scale ConvNeXt method for unsupervised medical image registration. A novel multi-scale wavelet feature fusion downsampling module is proposed by integrating the ConvNeXt architecture with Haar wavelet lossless decomposition to extract and fuse features from eight frequency sub-images using multi-scale convolution kernels. Additionally, a lightweight dynamic upsampling module is introduced in the decoder to reconstruct fine-grained anatomical structures. WaveMorph integrates the inductive bias of CNNs with the advantages of Transformers, effectively mitigating topological distortions caused by spatial information loss while supporting real-time inference. In both atlas-to-patient (IXI) and inter-patient (OASIS) registration tasks, WaveMorph demonstrates state-of-the-art performance, achieving Dice scores of 0.779 ± 0.015 and 0.824 ± 0.021, respectively, and real-time inference (0.072 s/image), validating the effectiveness of our model in medical image registration. Full article
Show Figures

Figure 1

26 pages, 8883 KiB  
Article
Enhancing Machine Learning Techniques in VSLAM for Robust Autonomous Unmanned Aerial Vehicle Navigation
by Hussam Rostum and József Vásárhelyi
Electronics 2025, 14(7), 1440; https://doi.org/10.3390/electronics14071440 - 2 Apr 2025
Viewed by 663
Abstract
This study introduces a visual SLAM real-time system designed for small indoor environments. The system demonstrates resilience against significant motion clutter and supports wide-baseline loop closing, re-localization, and automatic initialization. Leveraging state-of-the-art algorithms, the approach presented in this article utilizes adapted Oriented FAST [...] Read more.
This study introduces a visual SLAM real-time system designed for small indoor environments. The system demonstrates resilience against significant motion clutter and supports wide-baseline loop closing, re-localization, and automatic initialization. Leveraging state-of-the-art algorithms, the approach presented in this article utilizes adapted Oriented FAST and Rotated BRIEF features for tracking, mapping, re-localization, and loop closing. In addition, the research uses an adaptive threshold to find putative feature matches that provide efficient map initialization and accurate tracking. The assignment is to process visual information from the camera of a DJI Tello drone for the construction of an indoor map and the estimation of the trajectory of the camera. In a ’survival of the fittest’ style, the algorithms selectively pick adaptive points and keyframes for reconstruction. This leads to robustness and a concise traceable map that develops as scene content emerges, making lifelong operation possible. The results give an improvement in the RMSE for the adaptive ORB algorithm and the adaptive threshold (3.280). However, the standard ORB algorithm failed to achieve the mapping process. Full article
(This article belongs to the Special Issue Development and Advances in Autonomous Driving Technology)
Show Figures

Figure 1

18 pages, 10219 KiB  
Article
Automatic Registration of Remote Sensing High-Resolution Hyperspectral Images Based on Global and Local Features
by Xiaorong Zhang, Siyuan Li, Zhongyang Xing, Binliang Hu and Xi Zheng
Remote Sens. 2025, 17(6), 1011; https://doi.org/10.3390/rs17061011 - 13 Mar 2025
Cited by 1 | Viewed by 705
Abstract
Automatic registration of remote sensing images is an important task, which requires the establishment of appropriate correspondence between the sensed image and the reference image. Nowadays, the trend of satellite remote sensing technology is shifting towards high-resolution hyperspectral imaging technology. Ever higher revisit [...] Read more.
Automatic registration of remote sensing images is an important task, which requires the establishment of appropriate correspondence between the sensed image and the reference image. Nowadays, the trend of satellite remote sensing technology is shifting towards high-resolution hyperspectral imaging technology. Ever higher revisit cycles and image resolutions require higher accuracy and real-time performance for automatic registration. The push-broom payload is affected by the push-broom stability of the satellite platform and the elevation change of ground objects, and the obtained hyperspectral image may have distortions such as stretching or shrinking at different parts of the image. In order to solve this problem, a new automatic registration strategy for remote sensing hyperspectral images based on the combination of whole and local features of the image was established, and two granularity registrations were carried out, namely coarse-grained matching and fine-grained matching. The high-resolution spatial features are first employed for detecting scale-invariant features, while the spectral information is used for matching, and then the idea of image stitching is employed to fuse the image after fine registration to obtain high-precision registration results. In order to verify the proposed algorithm, a simulated on-orbit push-broom imaging experiment was carried out to obtain hyperspectral images with local complex distortions under different lighting conditions. The simulation results show that the proposed remote sensing hyperspectral image registration algorithm is superior to the existing automatic registration algorithms. The advantages of the proposed algorithm in terms of registration accuracy and real-time performance make it have a broad prospect for application in satellite ground application systems. Full article
Show Figures

Graphical abstract

21 pages, 16064 KiB  
Article
A Novel 3D Magnetic Resonance Imaging Registration Framework Based on the Swin-Transformer UNet+ Model with 3D Dynamic Snake Convolution Scheme
by Yaolong Han, Lei Wang, Zizhen Huang, Yukun Zhang and Xiao Zheng
J. Imaging 2025, 11(2), 54; https://doi.org/10.3390/jimaging11020054 - 11 Feb 2025
Viewed by 1500
Abstract
Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. [...] Read more.
Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. To address these issues, we extended the Swin-UNet architecture and incorporated dynamic snake convolution (DSConv) into the model, expanding it into three dimensions. This improvement enables the model to better capture spatial information at different scales, enhancing its adaptability to complex anatomical structures and their intricate components. Additionally, multi-scale dense skip connections were introduced to mitigate the spatial information loss caused by downsampling, enhancing the model’s ability to capture both global and local features. We also introduced a novel optimization-based weakly supervised strategy, which iteratively refines the deformation field generated during registration, enabling the model to produce more accurate registered images. Building on these innovations, we proposed OSS DSC-STUNet+ (Swin-UNet+ with 3D dynamic snake convolution). Experimental results on the IXI, OASIS, and LPBA40 brain MRI datasets demonstrated up to a 16.3% improvement in Dice coefficient compared to five classical methods. The model exhibits outstanding performance in terms of registration accuracy, efficiency, and feature preservation. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

25 pages, 13698 KiB  
Article
Self-Supervised Foundation Model for Template Matching
by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova
Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025
Viewed by 1583
Abstract
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
Show Figures

Figure 1

16 pages, 4878 KiB  
Technical Note
A Robust Digital Elevation Model-Based Registration Method for Mini-RF/Mini-SAR Images
by Zihan Xu, Fei Zhao, Pingping Lu, Yao Gao, Tingyu Meng, Yanan Dang, Mofei Li and Robert Wang
Remote Sens. 2025, 17(4), 613; https://doi.org/10.3390/rs17040613 - 11 Feb 2025
Viewed by 778
Abstract
SAR data from the lunar spaceborne Reconnaissance Orbiter’s (LRO) Mini-RF and Chandrayaan-1’s Mini-SAR provide valuable insights into the properties of the lunar surface. However, public lunar SAR data products are not properly registered and are limited by localization issues. Existing registration methods for [...] Read more.
SAR data from the lunar spaceborne Reconnaissance Orbiter’s (LRO) Mini-RF and Chandrayaan-1’s Mini-SAR provide valuable insights into the properties of the lunar surface. However, public lunar SAR data products are not properly registered and are limited by localization issues. Existing registration methods for Earth SAR have proven to be inadequate in their robustness for lunar data registration. And current research on methods for lunar SAR has not yet focused on producing globally registered datasets. To solve these problems, this article introduces a robust automatic registration method tailored for S-band Level-1 Mini-RF and Mini-SAR data with the assistance of lunar DEM. A simulated SAR image based on real lunar DEM data is first generated to assist the registration work, and then an offset calculation approach based on normalized cross-correlation (NCC) and specific processing, including background removal, is proposed to achieve the registration between the simulated image, and the real image. When applying Mini-RF images and Mini-SAR images, high robustness and good accuracy are exhibited, which produces fully registered datasets. After processing using the proposed method, the average error between Mini-RF images and DEM references was reduced from approximately 3000 m to about 100 m. To further explore the additional improvement of the proposed method, the registered lunar SAR datasets are used for further analysis, including a review of the circular polarization ratio (CPR) characteristics of anomalous craters. Full article
(This article belongs to the Section Engineering Remote Sensing)
Show Figures

Figure 1

22 pages, 4780 KiB  
Article
A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework
by Wentao Ye, Jianghong Wu, Wei Zhang, Liyang Sun, Xue Dong and Shuogui Xu
Bioengineering 2025, 12(2), 114; https://doi.org/10.3390/bioengineering12020114 - 26 Jan 2025
Viewed by 1203
Abstract
In image-guided surgery (IGS) practice, combining intraoperative 2D X-ray images with preoperative 3D X-ray images from computed tomography (CT) enables the rapid and accurate localization of lesions, which allows for a more minimally invasive and efficient surgery, and also reduces the risk of [...] Read more.
In image-guided surgery (IGS) practice, combining intraoperative 2D X-ray images with preoperative 3D X-ray images from computed tomography (CT) enables the rapid and accurate localization of lesions, which allows for a more minimally invasive and efficient surgery, and also reduces the risk of secondary injuries to nerves and vessels. Conventional optimization-based methods for 2D X-ray and 3D CT matching are limited in speed and precision due to non-convex optimization spaces and a constrained searching range. Recently, deep learning (DL) approaches have demonstrated remarkable proficiency in solving complex nonlinear 2D–3D registration. In this paper, a fast and robust DL-based registration method is proposed that takes an intraoperative 2D X-ray image as input, compares it with the preoperative 3D CT, and outputs their relative pose in x, y, z and pitch, yaw, roll. The method employs a dual-channel Swin transformer feature extractor equipped with attention mechanisms and feature pyramid to facilitate the correlation between features of the 2D X-ray and anatomical pose of CT. Tests on three different regions of interest acquired from open-source datasets show that our method can achieve high pose estimation accuracy (mean rotation and translation error of 0.142° and 0.362 mm, respectively) in a short time (0.02 s). Robustness tests indicate that our proposed method can maintain zero registration failures across varying levels of noise. This generalizable learning-based 2D (X-ray) and 3D (CT) registration algorithm owns promising applications in surgical navigation, targeted radiotherapy, and other clinical operations, with substantial potential for enhancing the accuracy and efficiency of image-guided surgery. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

Back to TopTop