MDPI - Publisher of Open Access Journals

23 pages, 24301 KiB

Open AccessArticle

Robust Optical and SAR Image Registration Using Weighted Feature Fusion

by Ao Luo, Anxi Yu, Yongsheng Zhang, Wenhao Tong and Huatao Yu

Remote Sens. 2025, 17(15), 2544; https://doi.org/10.3390/rs17152544 - 22 Jul 2025

Viewed by 302

Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). [...] Read more.

Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). To overcome this challenge, this article proposes a novel optical–SAR image registration method named Gradient and Standard Deviation Feature Weighted Fusion (GDWF). First, a Block-local standard deviation (Block-LSD) operator is proposed to extract block-based feature points with regional adaptability. Subsequently, a dual-modal feature description is developed, constructing both gradient-based descriptors and local standard deviation (LSD) descriptors for the neighborhoods surrounding the detected feature points. To further enhance matching robustness, a confidence-weighted feature fusion strategy is proposed. By establishing a reliability evaluation model for similarity measurement maps, the contribution weights of gradient features and LSD features are dynamically optimized, ensuring adaptive performance under varying conditions. To verify the effectiveness of the method, different optical and SAR datasets are used to compare it with the currently advanced algorithms MOGF, CFOG, and FED-HOPC. The experimental results demonstrate that the proposed GDWF algorithm achieves the best performance in terms of registration accuracy and robustness among all compared methods, effectively handling optical–SAR image pairs with significant regional heterogeneity. Full article

► Show Figures

Figure 1

26 pages, 6798 KiB

Open AccessArticle

Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering

by Qi Kang, Jixian Zhang, Guoman Huang and Fei Liu

Remote Sens. 2025, 17(14), 2501; https://doi.org/10.3390/rs17142501 - 18 Jul 2025

Viewed by 391

Abstract

Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and [...] Read more.

Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and efficient optical–SAR image registration. The proposed method integrates a structure-enhanced feature extractor, RS2FNet, which combines dual-stage Res2Net modules with a bi-level routing attention mechanism to capture multi-scale local textures and global structural semantics. A context-aware matching module refines correspondences through self- and cross-attention, coupled with a confidence-driven early-exit pruning strategy to reduce computational cost while maintaining accuracy. Additionally, a match-aware multi-task loss function jointly enforces spatial consistency, affine invariance, and structural coherence for end-to-end optimization. Experiments on public datasets (SEN1-2 and WHU-OPT-SAR) and a self-collected Gaofen (GF) dataset demonstrated that ACAMatch significantly outperformed existing state-of-the-art methods in terms of the number of correct matches, matching accuracy, and inference speed, especially under challenging conditions such as resolution differences and severe structural distortions. These results indicate the effectiveness and generalizability of the proposed approach for multimodal image registration, making ACAMatch a promising solution for remote sensing applications such as change detection and multi-sensor data fusion. Full article

(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)

► Show Figures

Figure 1

27 pages, 86462 KiB

Open AccessArticle

SAR Image Registration Based on SAR-SIFT and Template Matching

by Shichong Liu, Xiaobo Deng, Chun Liu and Yongchao Cheng

Remote Sens. 2025, 17(13), 2216; https://doi.org/10.3390/rs17132216 - 27 Jun 2025

Viewed by 367

Abstract

Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar [...] Read more.

Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar imaging. To address these issues, this paper proposes a novel two-stage registration method, consisting of pre-registration and fine registration. In the pre-registration stage, the scale-invariant feature transform for the synthetic aperture radar (SAR-SIFT) algorithm is integrated into an iterative optimization framework to eliminate large-scale geometric discrepancies, ensuring a coarse but reliable initial alignment. In the fine registration stage, a novel similarity measure is introduced by combining frequency-domain phase congruency and spatial-domain gradient features, which enhances the robustness and accuracy of template matching, especially in edge-rich regions. For the topographic relief in the SAR images, an adaptive local stretching transformation strategy is proposed to correct the undulating areas. Experiments on five pairs of SAR images containing flat and undulating regions show that the proposed method achieves initial alignment errors below 10 pixels and final registration errors below 1 pixel. Compared with other methods, our approach obtains more correct matching pairs (up to 100+ per image pair), higher registration precision, and improved robustness under complex terrains. These results validate the accuracy and effectiveness of the proposed registration framework. Full article

► Show Figures

Figure 1

28 pages, 11793 KiB

Open AccessArticle

Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network

by Xiaoye Bi, Rongkai Qie, Chengyang Tao, Zhaoxiang Zhang and Yuelei Xu

Remote Sens. 2025, 17(13), 2160; https://doi.org/10.3390/rs17132160 - 24 Jun 2025

Cited by 1 | Viewed by 398

Abstract

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% [...] Read more.

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% reduction in mean squared error (from 0.0106 to 0.0068), 9.27% enhancement in normalized cross-correlation, 26% improvement in local normalized cross-correlation, and 8% increase in mutual information compared to state-of-the-art methods. The architecture integrates a cross-modal style transfer network (CSTNet) that transforms visible images into pseudo-infrared representations to unify modality characteristics, and a multi-scale cascaded registration network (MCRNet) that performs progressive spatial alignment across multiple resolution scales using diffeomorphic deformation modeling to ensure smooth and invertible transformations. A self-supervised learning paradigm based on image reconstruction eliminates reliance on manually annotated data while maintaining registration accuracy through synthetic deformation generation. Extensive experiments on the LLVIP dataset demonstrate the method’s robustness under challenging conditions involving large-scale transformations, with ablation studies confirming that style transfer contributes 28% MSE improvement and diffeomorphic registration prevents 10.6% performance degradation. The proposed approach provides a robust solution for cross-modal image registration in dynamic UAV environments, offering significant implications for downstream applications such as target detection, tracking, and surveillance. Full article

(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)

► Show Figures

Graphical abstract

28 pages, 3438 KiB

Open AccessArticle

Optimizing Remote Sensing Image Retrieval Through a Hybrid Methodology

by Sujata Alegavi and Raghvendra Sedamkar

J. Imaging 2025, 11(6), 179; https://doi.org/10.3390/jimaging11060179 - 28 May 2025

Viewed by 568

Abstract

The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective [...] Read more.

The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective data management, retrieval, and exploitation. The classification of large-sized images at the pixel level generates substantial data, escalating the workload and search space for similarity measurement. Semantic-based image retrieval remains an open problem due to limitations in current artificial intelligence techniques. Furthermore, on-board storage constraints compel the application of numerous compression algorithms to reduce storage space, intensifying the difficulty of retrieving substantial, sensitive, and target-specific data. This research proposes an innovative hybrid approach to enhance the retrieval of remotely sensed images. The approach leverages multilevel classification and multiscale feature extraction strategies to enhance performance. The retrieval system comprises two primary phases: database building and retrieval. Initially, the proposed Multiscale Multiangle Mean-shift with Breaking Ties (MSMA-MSBT) algorithm selects informative unlabeled samples for hyperspectral and synthetic aperture radar images through an active learning strategy. Addressing the scaling and rotation variations in image capture, a flexible and dynamic algorithm, modified Deep Image Registration using Dynamic Inlier (IRDI), is introduced for image registration. Given the complexity of remote sensing images, feature extraction occurs at two levels. Low-level features are extracted using the modified Multiscale Multiangle Completed Local Binary Pattern (MSMA-CLBP) algorithm to capture local contexture features, while high-level features are obtained through a hybrid CNN structure combining pretrained networks (Alexnet, Caffenet, VGG-S, VGG-M, VGG-F, VGG-VDD-16, VGG-VDD-19) and a fully connected dense network. Fusion of low- and high-level features facilitates final class distinction, with soft thresholding mitigating misclassification issues. A region-based similarity measurement enhances matching percentages. Results, evaluated on high-resolution remote sensing datasets, demonstrate the effectiveness of the proposed method, outperforming traditional algorithms with an average accuracy of 86.66%. The hybrid retrieval system exhibits substantial improvements in classification accuracy, similarity measurement, and computational efficiency compared to state-of-the-art scene classification and retrieval methods. Full article

(This article belongs to the Topic Computational Intelligence in Remote Sensing: 2nd Edition)

► Show Figures

Figure 1

27 pages, 9977 KiB

Open AccessArticle

Mergeable Probabilistic Voxel Mapping for LiDAR–Inertial–Visual Odometry

by Balong Wang, Nassim Bessaad, Huiying Xu, Xinzhong Zhu and Hongbo Li

Electronics 2025, 14(11), 2142; https://doi.org/10.3390/electronics14112142 - 24 May 2025

Cited by 1 | Viewed by 813

Abstract

To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in [...] Read more.

To address the limitations of existing LiDAR–visual fusion methods in adequately accounting for map uncertainties induced by LiDAR measurement noise, this paper introduces a LiDAR–inertial–visual odometry framework leveraging mergeable probabilistic voxel mapping. The method innovatively employs probabilistic voxel models to characterize uncertainties in environmental geometric plane features and optimizes computational efficiency through a voxel merging strategy. Additionally, it integrates color information from cameras to further enhance localization accuracy. Specifically, in the LiDAR–inertial odometry (LIO) subsystem, a probabilistic voxel plane model is constructed for LiDAR point clouds to explicitly represent measurement noise uncertainty, thereby improving the accuracy and robustness of point cloud registration. A voxel merging strategy based on the union-find algorithm is introduced to merge coplanar voxel planes, reducing computational load. In the visual–inertial odometry (VIO) subsystem, image tracking points are generated through a global map projection, and outlier points are eliminated using a random sample consensus algorithm based on a dynamic Bayesian network. Finally, state estimation accuracy is enhanced by jointly optimizing frame-to-frame reprojection errors and frame-to-map RGB color errors. Experimental results demonstrate that the proposed method achieves root mean square errors (RMSEs) of absolute trajectory error at 0.478 m and 0.185 m on the M2DGR and NTU-VIRAL datasets, respectively, while attaining real-time performance with an average processing time of 39.19 ms per-frame on the NTU-VIRAL datasets. Compared to state-of-the-art approaches, our method exhibits significant improvements in both accuracy and computational efficiency. Full article

(This article belongs to the Special Issue Advancements in Robotics: Perception, Manipulation, and Interaction)

► Show Figures

Figure 1

26 pages, 9328 KiB

Open AccessArticle

Global Optical and SAR Image Registration Method Based on Local Distortion Division

by Bangjie Li, Dongdong Guan, Yuzhen Xie, Xiaolong Zheng, Zhengsheng Chen, Lefei Pan, Weiheng Zhao and Deliang Xiang

Remote Sens. 2025, 17(9), 1642; https://doi.org/10.3390/rs17091642 - 6 May 2025

Viewed by 596

Abstract

Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such [...] Read more.

Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such as perspective shrinkage and occlusion. As a result, it becomes difficult to represent the spatial correspondence between optical and SAR images using a single geometric model. To address this challenge, we propose a global optical-SAR image registration method that leverages local distortion characteristics. Specifically, we introduce a Superpixel-based Local Distortion Division (SLDD) method, which defines superpixel region features and segments the image into local distortion and normal regions by computing the Mahalanobis distance between superpixel features. We further design a Multi-Feature Fusion Capsule Network (MFFCN) that integrates shallow salient features with deep structural details, reconstructing the dimensions of digital capsules to generate feature descriptors encompassing texture, phase, structure, and amplitude information. This design effectively mitigates the information loss and feature degradation problems caused by pooling operations in conventional convolutional neural networks (CNNs). Additionally, a hard negative mining loss is incorporated to further enhance feature discriminability. Feature descriptors are extracted separately from regions with different distortion levels, and corresponding transformation models are built for local registration. Finally, the local registration results are fused to generate a globally aligned image. Experimental results on public datasets demonstrate that the proposed method achieves superior performance over state-of-the-art (SOTA) approaches in terms of Root Mean Squared Error (RMSE), Correct Match Number (CMN), Distribution of Matched Points (Scat), Edge Fidelity (EF), and overall visual quality. Full article

(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)

► Show Figures

Figure 1

19 pages, 12128 KiB

Open AccessArticle

Marker-Less Navigation System for Anterior Cruciate Ligament Reconstruction with 3D Femoral Analysis and Arthroscopic Guidance

by Shuo Wang, Weili Shi, Shuai Yang, Jiahao Cui and Qinwei Guo

Bioengineering 2025, 12(5), 464; https://doi.org/10.3390/bioengineering12050464 - 27 Apr 2025

Viewed by 541

Abstract

Accurate femoral tunnel positioning is crucial for successful anterior cruciate ligament reconstruction (ACLR), yet traditional arthroscopic techniques face significant challenges in spatial orientation and precise anatomical localization. This study presents a novel marker-less computer-assisted navigation system that integrates three-dimensional femoral modeling with real-time [...] Read more.

Accurate femoral tunnel positioning is crucial for successful anterior cruciate ligament reconstruction (ACLR), yet traditional arthroscopic techniques face significant challenges in spatial orientation and precise anatomical localization. This study presents a novel marker-less computer-assisted navigation system that integrates three-dimensional femoral modeling with real-time arthroscopic guidance. The system employs advanced image processing techniques for accurate condyle segmentation and implements the Bernard and Hertel (BH) grid system for standardized positioning. A curvature-based feature extraction approach precisely identifies the capsular line reference (CLR) on the lateral condyle surface, forming the foundation for establishing the BH reference grid. The system’s two-stage registration framework, combining SIFT-ICP algorithms, achieves accurate alignment between preoperative models and arthroscopic views. Validation results from expert surgeons demonstrated high precision, with 71.5% of test groups achieving acceptable or excellent performance standards (mean deviation distances: 1.12–1.86 mm). Unlike existing navigation solutions, our system maintains standard surgical workflow without requiring additional surgical instruments or markers, offering an efficient and minimally invasive approach to enhance ACLR precision. This innovation bridges the gap between preoperative planning and intraoperative execution, potentially improving surgical outcomes through standardized tunnel positioning. Full article

(This article belongs to the Special Issue Advances in Medical 3D Vision: Voxels and Beyond)

► Show Figures

Figure 1

22 pages, 2872 KiB

Open AccessArticle

Wavelet-Guided Multi-Scale ConvNeXt for Unsupervised Medical Image Registration

by Xuejun Zhang, Aobo Xu, Ganxin Ouyang, Zhengrong Xu, Shaofei Shen, Wenkang Chen, Mingxian Liang, Guiqi Zhang, Jiashun Wei, Xiangrong Zhou and Dongbo Wu

Bioengineering 2025, 12(4), 406; https://doi.org/10.3390/bioengineering12040406 - 11 Apr 2025

Cited by 2 | Viewed by 974

Abstract

Medical image registration is essential in clinical practices such as surgical navigation and image-guided diagnosis. The Transformer architecture of TransMorph demonstrates better accuracy in non-rigid registration tasks. However, its weaker spatial locality priors necessitate large-scale training datasets and a heavy number of parameters, [...] Read more.

Medical image registration is essential in clinical practices such as surgical navigation and image-guided diagnosis. The Transformer architecture of TransMorph demonstrates better accuracy in non-rigid registration tasks. However, its weaker spatial locality priors necessitate large-scale training datasets and a heavy number of parameters, which conflict with the limited annotated data and real-time demands of clinical workflows. Moreover, traditional downsampling and upsampling always degrade high-frequency anatomical features such as tissue boundaries or small lesions. We proposed WaveMorph, a wavelet-guided multi-scale ConvNeXt method for unsupervised medical image registration. A novel multi-scale wavelet feature fusion downsampling module is proposed by integrating the ConvNeXt architecture with Haar wavelet lossless decomposition to extract and fuse features from eight frequency sub-images using multi-scale convolution kernels. Additionally, a lightweight dynamic upsampling module is introduced in the decoder to reconstruct fine-grained anatomical structures. WaveMorph integrates the inductive bias of CNNs with the advantages of Transformers, effectively mitigating topological distortions caused by spatial information loss while supporting real-time inference. In both atlas-to-patient (IXI) and inter-patient (OASIS) registration tasks, WaveMorph demonstrates state-of-the-art performance, achieving Dice scores of 0.779 ± 0.015 and 0.824 ± 0.021, respectively, and real-time inference (0.072 s/image), validating the effectiveness of our model in medical image registration. Full article

(This article belongs to the Special Issue Biomedical Imaging and Data Analytics for Disease Diagnosis and Treatment, 2nd Edition)

► Show Figures

Figure 1

26 pages, 8883 KiB

Open AccessEditor’s ChoiceArticle

Enhancing Machine Learning Techniques in VSLAM for Robust Autonomous Unmanned Aerial Vehicle Navigation

by Hussam Rostum and József Vásárhelyi

Electronics 2025, 14(7), 1440; https://doi.org/10.3390/electronics14071440 - 2 Apr 2025

Viewed by 663

Abstract

This study introduces a visual SLAM real-time system designed for small indoor environments. The system demonstrates resilience against significant motion clutter and supports wide-baseline loop closing, re-localization, and automatic initialization. Leveraging state-of-the-art algorithms, the approach presented in this article utilizes adapted Oriented FAST [...] Read more.

This study introduces a visual SLAM real-time system designed for small indoor environments. The system demonstrates resilience against significant motion clutter and supports wide-baseline loop closing, re-localization, and automatic initialization. Leveraging state-of-the-art algorithms, the approach presented in this article utilizes adapted Oriented FAST and Rotated BRIEF features for tracking, mapping, re-localization, and loop closing. In addition, the research uses an adaptive threshold to find putative feature matches that provide efficient map initialization and accurate tracking. The assignment is to process visual information from the camera of a DJI Tello drone for the construction of an indoor map and the estimation of the trajectory of the camera. In a ’survival of the fittest’ style, the algorithms selectively pick adaptive points and keyframes for reconstruction. This leads to robustness and a concise traceable map that develops as scene content emerges, making lifelong operation possible. The results give an improvement in the RMSE for the adaptive ORB algorithm and the adaptive threshold (3.280). However, the standard ORB algorithm failed to achieve the mapping process. Full article

(This article belongs to the Special Issue Development and Advances in Autonomous Driving Technology)

► Show Figures

Figure 1

18 pages, 10219 KiB

Open AccessArticle

Automatic Registration of Remote Sensing High-Resolution Hyperspectral Images Based on Global and Local Features

by Xiaorong Zhang, Siyuan Li, Zhongyang Xing, Binliang Hu and Xi Zheng

Remote Sens. 2025, 17(6), 1011; https://doi.org/10.3390/rs17061011 - 13 Mar 2025

Cited by 1 | Viewed by 705

Abstract

Automatic registration of remote sensing images is an important task, which requires the establishment of appropriate correspondence between the sensed image and the reference image. Nowadays, the trend of satellite remote sensing technology is shifting towards high-resolution hyperspectral imaging technology. Ever higher revisit [...] Read more.

Automatic registration of remote sensing images is an important task, which requires the establishment of appropriate correspondence between the sensed image and the reference image. Nowadays, the trend of satellite remote sensing technology is shifting towards high-resolution hyperspectral imaging technology. Ever higher revisit cycles and image resolutions require higher accuracy and real-time performance for automatic registration. The push-broom payload is affected by the push-broom stability of the satellite platform and the elevation change of ground objects, and the obtained hyperspectral image may have distortions such as stretching or shrinking at different parts of the image. In order to solve this problem, a new automatic registration strategy for remote sensing hyperspectral images based on the combination of whole and local features of the image was established, and two granularity registrations were carried out, namely coarse-grained matching and fine-grained matching. The high-resolution spatial features are first employed for detecting scale-invariant features, while the spectral information is used for matching, and then the idea of image stitching is employed to fuse the image after fine registration to obtain high-precision registration results. In order to verify the proposed algorithm, a simulated on-orbit push-broom imaging experiment was carried out to obtain hyperspectral images with local complex distortions under different lighting conditions. The simulation results show that the proposed remote sensing hyperspectral image registration algorithm is superior to the existing automatic registration algorithms. The advantages of the proposed algorithm in terms of registration accuracy and real-time performance make it have a broad prospect for application in satellite ground application systems. Full article

(This article belongs to the Special Issue Trends and Prospects in Hyperspectral Remote Sensing Images Processing and Analysis)

► Show Figures

Graphical abstract

21 pages, 16064 KiB

Open AccessArticle

A Novel 3D Magnetic Resonance Imaging Registration Framework Based on the Swin-Transformer UNet+ Model with 3D Dynamic Snake Convolution Scheme

by Yaolong Han, Lei Wang, Zizhen Huang, Yukun Zhang and Xiao Zheng

J. Imaging 2025, 11(2), 54; https://doi.org/10.3390/jimaging11020054 - 11 Feb 2025

Viewed by 1500

Abstract

Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. [...] Read more.

Transformer-based image registration methods have achieved notable success, but they still face challenges, such as difficulties in representing both global and local features, the inability of standard convolution operations to focus on key regions, and inefficiencies in restoring global context using the decoder. To address these issues, we extended the Swin-UNet architecture and incorporated dynamic snake convolution (DSConv) into the model, expanding it into three dimensions. This improvement enables the model to better capture spatial information at different scales, enhancing its adaptability to complex anatomical structures and their intricate components. Additionally, multi-scale dense skip connections were introduced to mitigate the spatial information loss caused by downsampling, enhancing the model’s ability to capture both global and local features. We also introduced a novel optimization-based weakly supervised strategy, which iteratively refines the deformation field generated during registration, enabling the model to produce more accurate registered images. Building on these innovations, we proposed OSS DSC-STUNet+ (Swin-UNet+ with 3D dynamic snake convolution). Experimental results on the IXI, OASIS, and LPBA40 brain MRI datasets demonstrated up to a 16.3% improvement in Dice coefficient compared to five classical methods. The model exhibits outstanding performance in terms of registration accuracy, efficiency, and feature preservation. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

25 pages, 13698 KiB

Open AccessEditor’s ChoiceArticle

Self-Supervised Foundation Model for Template Matching

by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova

Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025

Viewed by 1583

Abstract

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

16 pages, 4878 KiB

Open AccessTechnical Note

A Robust Digital Elevation Model-Based Registration Method for Mini-RF/Mini-SAR Images

by Zihan Xu, Fei Zhao, Pingping Lu, Yao Gao, Tingyu Meng, Yanan Dang, Mofei Li and Robert Wang

Remote Sens. 2025, 17(4), 613; https://doi.org/10.3390/rs17040613 - 11 Feb 2025

Viewed by 778

Abstract

SAR data from the lunar spaceborne Reconnaissance Orbiter’s (LRO) Mini-RF and Chandrayaan-1’s Mini-SAR provide valuable insights into the properties of the lunar surface. However, public lunar SAR data products are not properly registered and are limited by localization issues. Existing registration methods for [...] Read more.

SAR data from the lunar spaceborne Reconnaissance Orbiter’s (LRO) Mini-RF and Chandrayaan-1’s Mini-SAR provide valuable insights into the properties of the lunar surface. However, public lunar SAR data products are not properly registered and are limited by localization issues. Existing registration methods for Earth SAR have proven to be inadequate in their robustness for lunar data registration. And current research on methods for lunar SAR has not yet focused on producing globally registered datasets. To solve these problems, this article introduces a robust automatic registration method tailored for S-band Level-1 Mini-RF and Mini-SAR data with the assistance of lunar DEM. A simulated SAR image based on real lunar DEM data is first generated to assist the registration work, and then an offset calculation approach based on normalized cross-correlation (NCC) and specific processing, including background removal, is proposed to achieve the registration between the simulated image, and the real image. When applying Mini-RF images and Mini-SAR images, high robustness and good accuracy are exhibited, which produces fully registered datasets. After processing using the proposed method, the average error between Mini-RF images and DEM references was reduced from approximately 3000 m to about 100 m. To further explore the additional improvement of the proposed method, the registered lunar SAR datasets are used for further analysis, including a review of the circular polarization ratio (CPR) characteristics of anomalous craters. Full article

(This article belongs to the Section Engineering Remote Sensing)

► Show Figures

Figure 1

22 pages, 4780 KiB

Open AccessArticle

A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework

by Wentao Ye, Jianghong Wu, Wei Zhang, Liyang Sun, Xue Dong and Shuogui Xu

Bioengineering 2025, 12(2), 114; https://doi.org/10.3390/bioengineering12020114 - 26 Jan 2025

Viewed by 1203

Abstract

In image-guided surgery (IGS) practice, combining intraoperative 2D X-ray images with preoperative 3D X-ray images from computed tomography (CT) enables the rapid and accurate localization of lesions, which allows for a more minimally invasive and efficient surgery, and also reduces the risk of [...] Read more.

In image-guided surgery (IGS) practice, combining intraoperative 2D X-ray images with preoperative 3D X-ray images from computed tomography (CT) enables the rapid and accurate localization of lesions, which allows for a more minimally invasive and efficient surgery, and also reduces the risk of secondary injuries to nerves and vessels. Conventional optimization-based methods for 2D X-ray and 3D CT matching are limited in speed and precision due to non-convex optimization spaces and a constrained searching range. Recently, deep learning (DL) approaches have demonstrated remarkable proficiency in solving complex nonlinear 2D–3D registration. In this paper, a fast and robust DL-based registration method is proposed that takes an intraoperative 2D X-ray image as input, compares it with the preoperative 3D CT, and outputs their relative pose in x, y, z and pitch, yaw, roll. The method employs a dual-channel Swin transformer feature extractor equipped with attention mechanisms and feature pyramid to facilitate the correlation between features of the 2D X-ray and anatomical pose of CT. Tests on three different regions of interest acquired from open-source datasets show that our method can achieve high pose estimation accuracy (mean rotation and translation error of 0.142° and 0.362 mm, respectively) in a short time (0.02 s). Robustness tests indicate that our proposed method can maintain zero registration failures across varying levels of noise. This generalizable learning-based 2D (X-ray) and 3D (CT) registration algorithm owns promising applications in surgical navigation, targeted radiotherapy, and other clinical operations, with substantial potential for enhancing the accuracy and efficiency of image-guided surgery. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

Search Results (246)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (246)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI