Saved Queries

For power-grid applications such as transmission corridor inspection, substation asset inspection, and post-disaster emergency repair, reliable UAV self-localization under GNSS-degraded or GNSS-denied conditions is critical to ensuring operational safety and accurate defect geotagging. Due to substantial discrepancies in viewpoint, scale, and geometric structure between oblique UAV images and nadir satellite images, conventional RGB-based cross-view retrieval methods often suffer from unstable alignment and insufficient geometric modeling, particularly in scenarios with repetitive textures and partial overlap. To address these challenges, we propose a cross-view visual geo-localization model that integrates RGBD multimodal inputs with multi-scale attention enhancement. Specifically, MiDaS is used to estimate relative depth from UAV imagery, which is concatenated with RGB to form a four-channel input, while satellite images are padded with an additional zero channel to maintain dimensional consistency. A shared-weight ViTAdapter is adopted to learn joint semantic–geometric representations, and a lightweight Efficient Multi-scale Attention (EMA) module is adopted on spatial feature maps to strengthen multi-scale spatial consistency. In addition, an IoU-weighted InfoNCE loss is employed to accommodate partial matching during training, thereby improving the robustness of feature alignment. Experiments on the GTA-UAV dataset under the cross-area protocol show stable performance across both retrieval and localization metrics. Specifically, Recall@1, Recall@5, and Recall@10 reach 18.12%, 38.83%, and 49.47%, respectively; AP is 28.01 and SDM@3 is 0.53; meanwhile, the top-1 geodesic distance error Dis@1 is 1052.73 m. These results indicate that explicit geometric priors combined with multi-scale spatial enhancement can effectively improve cross-view feature alignment, leading to enhanced robustness and accuracy for localization in challenging power inspection scenarios. Full article

(This article belongs to the Special Issue Planning, Operation, and Energy Efficiency of Sustainable Electric Power Systems)

►▼ Show Figures

Figure 1

28 pages, 33073 KB

Open AccessArticle

Pedestrian Localization Using Smartphone LiDAR in Indoor Environments

by Jaehun Kim and Kwangjae Sung

Electronics 2026, 15(9), 1810; https://doi.org/10.3390/electronics15091810 - 24 Apr 2026

Abstract

Many place recognition approaches, which identify previously visited places or locations by matching current sensory data, such as 2D RGB images and 3D point clouds, have been proposed to achieve accurate and robust localization and loop closure detection in global positioning system (GPS)-denied environments. Since visual place recognition (VPR) methods that rely on images captured by camera sensors are highly sensitive to variations in appearance, including changes in lighting, surface color, and shadows, they can lead to poor place recognition accuracy. In contrast, light detection and ranging (LiDAR)-based place recognition (LPR) approaches based on 3D point cloud data that captures the shape and geometric structure of the environment are robust to changes in place appearance and can therefore provide more reliable place recognition results than VPR methods. This work presents an indoor LPR method called PointNetVLAD-based indoor pedestrian localization (PIPL). PIPL is a deep network model that uses PointNetVLAD to learn to extract global descriptors from 3D LiDAR point cloud data. PIPL can recognize places previously visited by a pedestrian using point clouds captured by a low-cost LiDAR sensor on a smartphone in small-scale indoor environments, while PointNetVLAD performs place recognition for vehicles using high-cost LiDAR, GPS, and inertial measurement unit (IMU) sensors in large-scale outdoor areas. For place recognition on 3D point cloud reference maps generated from LiDAR scans, PointNetVLAD exploits the universal transverse mercator (UTM) coordinate system based on GPS and IMU measurements, whereas PIPL uses a virtual coordinate system designed in this study due to the unavailability of GPS indoors. In experiments conducted in campus buildings, PIPL shows significant advantages over NetVLAD (known as a convolutional neural network (CNN)-based VPR method). Particularly in indoor environments with repetitive scenes where geometric structures are preserved and image-based appearance features are sparse or unclear, PIPL achieved

39 %

higher top-1 accuracy and

10 %

higher top-3 accuracy compared to NetVLAD. Furthermore, PIPL achieved place recognition accuracy comparable to NetVLAD even with a small number of points in a 3D point cloud and outperformed NetVLAD even with a smaller model training dataset. The experimental results also indicate that PIPL requires over

76 %

less place retrieval time than NetVLAD while maintaining robust place classification performance. Full article

(This article belongs to the Special Issue Advanced Indoor Localization Technologies: From Theory to Application)

14 pages, 1169 KB

Open AccessArticle

Assessing the Relationship Between Volumetric Changes and Functional Connectivity in Patients with Mild Cognitive Impairment

by Weronika Machaj, Przemyslaw Podgorski, Julian Maciaszek, Dorota Szczesniak, Joanna Rymaszewska, Patryk Piotrowski and Anna Zimny

J. Clin. Med. 2026, 15(9), 3229; https://doi.org/10.3390/jcm15093229 - 23 Apr 2026

Abstract

Background: Amnestic mild cognitive impairment (aMCI) is considered a transitional state between normal aging and dementia, often without visible abnormalities on standard brain magnetic resonance (MR) images. The aim of the study was to analyze both microstructural and functional brain abnormalities using advanced MR techniques. Methods: The study included 27 patients with aMCI and an age-matched control group (CG) of 25 healthy subjects. All MR studies were performed on a 3T MR scanner (Philips, Ingenia) with a 32-channel head and neck coil using volumetric 3D T1 sequences, followed by a resting-state functional MRI (rs-fMRI) sequence. Volumetric analysis was performed using the Destrieux atlas to assess potential structural differences between groups. Seed-to-voxel functional connectivity analyses were conducted using the bilateral hippocampi and both anterior and posterior divisions of the parahippocampal gyri as seed regions. Results: Compared to healthy controls, reduced cortical thickness was observed in aMCI subjects in the temporal regions, frontal and orbitofrontal areas, limbic areas, parietal and sensorimotor cortices, as well as occipito-temporal regions. Additionally, significantly increased functional connectivity was observed between bilateral medial temporal lobe (MTL) regions and the right thalamus. Conclusions: Cortical thinning in various brain regions along with the increased functional connectivity between the MTL regions and the right thalamus may reflect potential compensatory mechanisms in response to initial subtle degenerative changes, emphasizing the importance of using both functional and structural imaging to detect early changes in aMCI patients. Full article

(This article belongs to the Special Issue Alzheimer’s Disease and Related Disorders: Recent Advances in Prevention, Diagnosis and Therapy)

►▼ Show Figures

Figure 1

26 pages, 1490 KB

Open AccessSystematic Review

Object Detection in Optical Remote Sensing Images: A Systematic Review of Methods, Benchmarks, and Operational Applications

by Neus Fontanet Garcia and Piero Boccardo

Remote Sens. 2026, 18(9), 1289; https://doi.org/10.3390/rs18091289 - 23 Apr 2026

Abstract

Object detection in optical remote sensing imagery has emerged as a crucial task in computer vision, with applications ranging between environmental monitoring to disaster management, precision agriculture, and urban planning. This review systematically examines current methodologies, categorising them into four principal approaches: (1) template matching-based methods, which leverage predefined patterns for object identification; (2) knowledge-based methods, which incorporate geometric and contextual information to enhance detection accuracy; (3) object-based image analysis (OBIA), which segments images into meaningful objects using spectral and spatial properties; (4) machine learning-based methods, particularly deep convolutional neural networks (CNNs), which have revolutionised the field through automatic feature learning. Each methodology’s performance characteristics, computational requirements, and suitability for different remote sensing applications are analysed. Our systematic review, following PRISMA guidelines, analysed 189 studies published from 2010 to 2025, of which 73 provided quantitative results on standard benchmarks. The three most critical challenges identified are as follows: (1) annotation bottleneck, as dense bounding box labelling of remote sensing imagery remains highly labour-intensive for deep learning approaches, (2) extreme scale variation spanning 2–3 orders of magnitude within single scenes, and (3) domain adaptation failures when models encounter new geographic regions or sensor characteristics. This review identifies critical research gaps and proposes prioritised future directions, emphasising foundation models for zero-shot detection, efficient architectures for resource-constrained deployment, and standardised benchmarks with size-specific metrics. The analysis provides practitioners with evidence-based decision frameworks for method selection and researchers with a roadmap for advancing object detection in remote sensing applications. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence and Deep Learning for Remote Sensing (3rd Edition))

25 pages, 1701 KB

Open AccessArticle

Concrete Crack Detection in Extremely Dark Environments Based on Infrared-Visible Multi-Level Registration Fusion and Frequency Decoupling

by Zixiang Li, Weishuai Xie and Bingquan Xiang

Sensors 2026, 26(9), 2612; https://doi.org/10.3390/s26092612 - 23 Apr 2026

Abstract

To address the issues of difficult heterogeneous image registration and low segmentation accuracy caused by the severe lack of illumination and significant modal differences in concrete cracks in extremely dark environments, this paper proposes a two-stage processing framework of registration–fusion first, and decoupling–segmentation later. In the registration and fusion stage, a registration algorithm based on morphological priors and multi-level quadtree spatial constraints is designed. This approach transforms the problem from pixel grayscale matching to spatial topological matching, achieving a feature fusion of high infrared saliency and high visible light sharpness. In the segmentation stage, a Latent Frequency-Decoupled Topological Network (LFDT-Net) is proposed. It utilizes Discrete Wavelet Transform (DWT) to achieve high-fidelity frequency decoupling of the low-frequency infrared backbone and the high-frequency visible light edges. Furthermore, a Cross-Frequency Guidance Module is utilized to eliminate double-edged artifacts, and a skeleton-aware topological loss function is introduced to constrain the topological integrity of the cracks. Experimental results on a self-built heterogeneous multi-modal crack dataset demonstrate that the proposed method significantly outperforms existing mainstream methods in registration accuracy, fusion quality, and segmentation accuracy. Achieving a mean Intersection over Union (mIoU) of 81.7%, the method effectively suppresses background noise in dark environments and precisely restores the microscopic edges and continuous topological structures of faint cracks. Full article

(This article belongs to the Special Issue AI-Based Visual Sensing for Object Detection)

22 pages, 6548 KB

Open AccessArticle

A Hybrid Lung and Colon Histopathological Image Classification Framework Using MobileNetV3-Small Deep Features and Differential Evolution Optimization

by Muhammad Usama Naveed, Sohail Jabbar, Muhammad Munwar Iqbal, Awais Ahmad, Ibrahim S. Alkhazi and Mansoor Alghamdi

Diagnostics 2026, 16(9), 1256; https://doi.org/10.3390/diagnostics16091256 - 22 Apr 2026

Viewed by 146

Abstract

Background/Objectives: Cancer remains one of the leading causes of mortality worldwide, with lung and colon cancers among the most prevalent. Conventional histopathological diagnosis is time-consuming, requires expert pathologists, and is susceptible to human error. Methods: To address these limitations, this study proposes an automated classification framework for lung and colon cancer using histopathological images. The proposed method employs a lightweight pretrained deep learning model, MobileNetV3-Small, through transfer learning. Training is performed on an enhanced version of the LC25000 dataset, in which redundant image patches are removed to improve robustness and clinical generalizability. The images were initially available in multiple resolutions, which are resized to 224 × 224 × 3 to match the canonical input size of MobileNetV3-Small. Deep features are extracted from the dropout layer as it provides regularized representation of high-level features by reducing the overfitting (dimension N × 1024), which are optimized using a differential evolution algorithm, reducing the feature space to N × 60. These optimized features are evaluated using multiple classifiers. Results: Experimental results demonstrate a maximum classification accuracy of 98.14% using a Quadratic Support Vector Machine (SVM) and a 21.3× speed-up achieved with bagged trees, outperforming several state-of-the-art approaches representing a 3.34% improvement over the baseline study on the enhanced dataset. Conclusions: The results confirm that the proposed framework effectively balances high accuracy with computational efficiency. The use of a lightweight deep model combined with feature optimization makes the approach well-suited for practical clinical environments. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

►▼ Show Figures

Figure 1

60 pages, 7000 KB

Open AccessArticle

Biometric Embedded Non-Blind Color Image Watermarking with Geometric Tamper Resistance via SIFT-ORB Keypoint Matching

by Swapnaneel Dhar, Riyanka Manna, Khaldi Amine and Aditya Kumar Sahu

Computers 2026, 15(5), 264; https://doi.org/10.3390/computers15050264 - 22 Apr 2026

Viewed by 95

Abstract

This work introduces a non-blind watermarking framework for color images to address tamper detection, particularly under geometric transformations. The proposed scheme fuses two watermarks, a personal signature and a biometric fingerprint, into a unified composite watermark embedded into the chrominance component of the cover image using a multi-level transform domain approach, discrete wavelet transforms (DWTs), discrete cosine transforms (DCTs), and singular value decomposition (SVD). By leveraging the rotation-invariant properties of scale-invariant feature transform (SIFT) and oriented FAST and rotated BRIEF (ORB) descriptors, the framework ensures robust tamper detection without requiring alignment, thus mitigating the limitations of conventional detection techniques vulnerable to transformation-induced tamper obfuscation (TITO). Extensive experimentation demonstrates that the method maintains high perceptual fidelity, achieving PSNR values ranging from 50 to 55 dB for embedding strength factor μ (0.01–0.04) and SSIM indices near 1 across multiple benchmark images. Furthermore, the scheme exhibits notable resilience to a range of image processing attacks and geometric distortion. Comparative evaluation reveals its superiority over existing grayscale, color, SIFT-based and DWT-DCT-SVD-based watermarking techniques, affirming its applicability in scenarios demanding secure, imperceptible, and transformation-invariant image watermarking. Full article

(This article belongs to the Special Issue Advanced Cryptographic Techniques for Digital Watermarking, Encryption, and Steganography)

25 pages, 19124 KB

Open AccessArticle

Multi-Scale Fractional-Order Image Fusion Algorithm Based on Polarization Spectral Images

by Zhenduo Zhang, Xueying Cao and Zhen Wang

Appl. Sci. 2026, 16(9), 4087; https://doi.org/10.3390/app16094087 - 22 Apr 2026

Viewed by 75

Abstract

With the continuous advancement of polarization spectral sensing technology, multi-band polarization image fusion has emerged as a novel approach to image fusion. By integrating spectral and polarization information, this method overcomes the limitations of relying on a single information source and significantly improves overall image quality. To address this, this paper proposes a new polarization spectral fusion algorithm. First, feature matching is employed to achieve pixel-level spatial alignment of multi-band polarization images. Then, a fusion strategy based on multi-scale decomposition and singular value decomposition is adopted to preserve structural information and fine details. Subsequently, fractional-order processing and guided filtering are applied to enhance details and suppress noise. Finally, a progressive reconstruction from low to high scales is performed to ensure hierarchical consistency and information integrity throughout the fusion process. In addition, spectral information is utilized for color restoration, enabling the final image to achieve high spatial resolution while maintaining natural and rich color representation.Experimental results demonstrate that the proposed method effectively integrates features from different spectral bands and polarization information while preserving maximum similarity, leading to significant improvements in both image quality and detail representation. Full article

17 pages, 5236 KB

Open AccessArticle

Two Non-Learning Filters for the Enhancement of Images Obtained from a Fluorescence Imaging System, a Near-Infrared Camera, and Low-Light Condition

by Jun Hong, Xi He, Haoru Ning, Zhonghuan Su, Ling Zhang, Yingcheng Lin and Ye Wu

Electronics 2026, 15(9), 1777; https://doi.org/10.3390/electronics15091777 - 22 Apr 2026

Viewed by 97

Abstract

Images obtained from imaging instruments can endure issues such as high degradation, color distortion, and weak brightness. Effective systems for enhancing these images are critically required. To improve the image quality, herein, we propose two filters based on simple functions, including cosine, sine, hyperbolic secant, and the inverse of hyperbolic cosecant. These filters are used for enhancing the images obtained from a fluorescence imaging system, a near-infrared camera, and low-light condition. The contrast is increased while the image quality is improved. They perform better than a matched filter. Moreover, the combination of our filters with the filter based on the watershed algorithm or the matched filter can be used to extract the marginal features from images generated under water environment. Furthermore, their application in image fusion is explored. Our designed filters may be potentially used for future applications on target identification and tracking. Full article

21 pages, 8256 KB

Open AccessArticle

SemGeoFrame: A Visual Matching Framework for Aircraft Based on Surface Semantic Information

by Zhaoyun Luo, Yanfei Liu, Chen Liu, Min Kong, Dongfang Yang, Maoan Zhou and Cong An

Remote Sens. 2026, 18(9), 1267; https://doi.org/10.3390/rs18091267 - 22 Apr 2026

Viewed by 120

Abstract

In GNSS-denied environments, UAV visual positioning faces the critical bottleneck of low matching accuracy between heterogeneous images. To address this, we propose SemGeoFrame, a visual matching framework that leverages surface semantic information to enhance robustness. The key innovations are threefold: First, we construct a semantic prior from the probability distributions of image semantic segmentation and design a consistency screening mechanism based on Jensen–Shannon divergence to eliminate false matches by leveraging pixel-level semantic consistency for cross-view image matching. Second, a confidence-guided partition sampling strategy ensures balanced distribution of matches in both spatial and semantic categories, overcoming the limitations of conventional spatial-only sampling. Third, geometric, semantic, and confidence constraints are jointly optimized to achieve robust homography estimation. SemGeoFrame adopts a plug-and-play design and consistently improves the performance of mainstream matching algorithms (e.g., ORB, SuperPoint, LoFTR) on multiple heterogeneous datasets. The experimental results demonstrate that our framework significantly enhances matching accuracy and robustness across diverse scenarios. Full article

12 pages, 2265 KB

Open AccessArticle

Optimizing Reconstruction Parameters for Detecting Peripheral In-Stent Restenosis with Photon-Counting Detector CT: A Phantom Study

by Yiheng Tan, Joost F. Hop, Magdalena Dobrolinska, Xinlin Zheng, Evie J. I. Hoeijmakers, Jean-Paul P. M. de Vries, Marcel J. W. Greuter and Reinoud P. H. Bokkers

Diagnostics 2026, 16(9), 1253; https://doi.org/10.3390/diagnostics16091253 - 22 Apr 2026

Viewed by 139

Abstract

Background/Objectives: To determine the optimal reconstruction parameters for accurate visualization of peripheral in-stent restenosis using photon-counting detector CT (PCD-CT), and to evaluate its potential advantages over energy-integrated detector CT (EID-CT). Methods: Endovascular peripheral stents with varying degrees of in-stent restenosis were scanned in a custom-made phantom using EID-CT (Somatom Force) and PCD-CT (Naeotom Alpha) under clinical acquisition protocols. EID-CT images were reconstructed with Bv40 and Bv59 kernels at 512 matrices. PCD-CT data were acquired in standard-resolution (SR) and ultra-high-resolution (UHR) modes. In both modes, images were reconstructed with multiple kernels (Bv40, Bv56 and Bv72) and matrix sizes (512 and 1024 matrix). In SR mode, additional virtual monoenergetic images (40–100 keV) were generated, while UHR mode included only polychromatic reconstructions. Quantitative image quality (noise, contrast, contrast-to-noise ratio [CNR]) was measured, and two blinded readers performed qualitative assessments of restenosis visualization. Results: PCD-CT with SR mode at VMI 40 keV achieved the highest image contrast and CNR, significantly outperforming EID-CT and PCD-CT_UHR under matched conditions (all p < 0.05). The sharper reconstruction kernel further enhanced the image contrast and improved subjective visualization despite increased image noise. Both readers ranked PCD-CT_{SR-Bv72-40keV} at 1024 matrix highest for detecting all degrees of restenosis, with excellent inter-reader agreement (ρ > 0.80). Conclusions: PCD-CT in SR mode at VMI 40 keV, specifically using the Bv72 kernel with a 1024 matrix, optimizes the visualization of peripheral in-stent restenosis. Compared to EID-CT, PCD-CT provides superior image quality and detectability of restenosis. Full article

(This article belongs to the Special Issue Photon-Counting CT in Vascular Imaging: Redefining Precision in Vascular Diagnosis)

►▼ Show Figures

Figure 1

12 pages, 4476 KB

Open AccessArticle

Broadband Polarization-Insensitive Tunable Terahertz Metamaterial Absorber Based on an Asymmetric Graphene Structure

by Ahmed Ali, Sulaiman Al-Sowayan, Waleed Shihzad, Asrafali Barkathulla, Zaid Ahmed Shamsan, Majeed A. S. Alkanhal and Yosef T. Aladadi

Nanomaterials 2026, 16(9), 502; https://doi.org/10.3390/nano16090502 - 22 Apr 2026

Viewed by 189

Abstract

A graphene-based tunable broad-band terahertz (THz) metamaterial absorber is presented, exhibiting strong and stable absorption across a wide frequency range. The device employs an ultra-thin three-layer structure consisting of a metallic reflector, a dielectric spacer, and a patterned graphene metasurface with an asymmetric geometry. Through optimized structural parameters, the absorber achieves broad-band absorption exceeding 90% between 2.45 THz and 6.11 THz with a bandwidth of 3.66 THz, featuring three distinct resonant frequencies at 2.764 THz, 3.534 THz, and 5.41 THz, corresponding to peak absorption efficiencies of 97.26%, 96.96%, and 99.90%, respectively. Impedance matching and electric field analyses confirm that the enhanced absorption arises from the strong coupling of electric and magnetic resonances within the multilayer structure. Moreover, the absorber exhibits polarization-insensitive behavior under varying polarization angles and maintains high absorption stability for both TE and TM modes up to an incident angle of 60°, as verified by simulation results, and allows dynamic tunability through Fermi-level modulation. These characteristics highlight the absorber’s potential for advanced THz imaging, sensing, and stealth applications. Full article

(This article belongs to the Special Issue Advanced Nanomaterials for Microwave Absorption and Electromagnetic Interference Shielding)

►▼ Show Figures

Figure 1

19 pages, 378 KB

Open AccessArticle

Mislabel Detection in Multi-Label Chest X-Rays via Prototype-Weighted Neighborhood Consistency in CoAtNet Embedding Space

by Ariel Gamboa, Mauricio Araya and Camilo Sotomayor

Appl. Sci. 2026, 16(9), 4067; https://doi.org/10.3390/app16094067 - 22 Apr 2026

Viewed by 76

Abstract

Large-scale chest X-ray (CXR) datasets often rely on report-derived or weak labels, introducing missing and incorrect annotations that can degrade downstream models and limit trust. We study training-free mislabel detection in multi-label CXRs by scoring neighborhood label consistency in a fixed embedding space. Using the NIH Chest X-ray Kaggle sample (5606 CXRs), we extract intermediate CoAtNet features and obtain 64-dimensional embeddings with a frozen CoAtNet backbone and a lightweight refinement head. On top of these embeddings, we compare kNN consistency baselines with distance weighting and label-set similarity against LPV-DW-CS, clustered prototype voting weighted by distance and cluster support. We evaluate three synthetic label-noise regimes with review budgets matched to the corruption rate: random single-label (5% and 20%), boundary-noise (20% corruption within the lowest-density 20% subset), and disjoint-label replacement (20% within that subset). LPV-DW-CS yields the highest downstream macro-AUROC after filtering top-ranked samples (up to 0.8860), while kNN variants achieve higher Recall@budget at the same review rates (up to 99.44%). An image-only expert Likert review of top-ranked real samples finds substantial label-set inconsistencies (54.1% for LPV-DW-CS-280-A; 60.5% for KNN-DW-LSS), supporting neighborhood-consistency ranking as a practical, training-free tool for targeted dataset auditing. Full article

(This article belongs to the Special Issue Computer-Vision-Based Biomedical Image Processing)

►▼ Show Figures

Figure 1

23 pages, 5106 KB

Open AccessArticle

A Multidimensional Framework for Analyzing Image–Text Consistency in Social Media

by Hongqi Xia, Zhijie Zhao, Binbin Zhao, Hong Lan, Han Wu, Xujing Jing and Yanrong Zhang

Appl. Sci. 2026, 16(8), 4044; https://doi.org/10.3390/app16084044 - 21 Apr 2026

Viewed by 175

Abstract

As image–text posts have become a dominant form of social media communication, understanding how the two modalities jointly convey meaning remains a key challenge in multimodal analysis. This study aims to examine whether image–text consistency is inherently multidimensional rather than reducible to a single similarity metric. Existing studies often reduce consistency to a single relevance score, which cannot capture semantic, emotional, and functional interactions. We construct a dataset of 28,650 multimodal posts and model image–text relationships along three dimensions: semantic consistency (CSC), emotional consistency (CEC), and informational matching consistency (IMC). Semantic and emotional alignment are measured using cross-modal representation and similarity computation, while IMC is defined through rule-based classification of informational roles. Results show that emotional consistency (CEC = 0.621) is higher than semantic consistency (CSC = 0.549,

p < 0.001

), while 61.0% of posts maintain consistent informational orientation. These findings demonstrate that image–text consistency exhibits distinct cross-dimensional patterns that cannot be captured by single-metric approaches. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

22 pages, 45694 KB

Open AccessArticle

Visual Localization for Deep-Sea Mining Vehicles During Operation

by Yangrui Cheng, Bingkun Wang, Xiaojun Zhuo, Kai Liu and Yingjie Guan

J. Mar. Sci. Eng. 2026, 14(8), 759; https://doi.org/10.3390/jmse14080759 - 21 Apr 2026

Viewed by 104

Abstract

Deep-sea mining operations demand continuous, drift-free positioning over multi-day missions—a requirement that traditional acoustic dead-reckoning systems struggle to meet due to cumulative error accumulation and frequent DVL bottom-lock loss in sediment plume environments. Inspired by Google Cartographer’s 2D grid mapping paradigm, we present a prior map-based visual localization framework that decouples offline mapping from real-time localization, fundamentally eliminating drift through absolute image registration against pre-built seabed mosaics. By integrating adaptive keyframe selection, Multi-Scale Retinex (MSR) enhancement, and the AD-LG deep feature matching architecture, our system constructs globally consistent seabed maps for absolute positioning. The framework leverages deformable convolutions and LightGlue to effectively mitigate challenges such as low texture and non-rigid distortion. Quantitative validation on tank simulation datasets demonstrates significant superiority over IMU-only and standard fusion schemes; qualitative deployment on real Pacific CCZ imagery confirms near-real-time operational feasibility on an embedded Jetson Orin NX platform. This system establishes visual navigation as a viable backup to acoustic systems, addressing a critical gap in deep-sea mining vehicle autonomy. Full article

(This article belongs to the Special Issue Advances in Underwater Positioning and Navigation Technology)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 101.

Go to page 1 2 3 4 5

Search Results (5,026)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI