MDPI - Publisher of Open Access Journals

32 pages, 1435 KiB

Open AccessReview

Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies

by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs

Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025

Abstract

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

21 pages, 4909 KiB

Open AccessArticle

Rapid 3D Camera Calibration for Large-Scale Structural Monitoring

by Fabio Bottalico, Nicholas A. Valente, Christopher Niezrecki, Kshitij Jerath, Yan Luo and Alessandro Sabato

Remote Sens. 2025, 17(15), 2720; https://doi.org/10.3390/rs17152720 - 6 Aug 2025

Abstract

Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry [...] Read more.

Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry measurements require the stereo cameras to be calibrated to determine their intrinsic and extrinsic parameters by capturing multiple images of a calibration object. This image-based approach becomes cumbersome and time-consuming as the size of the tested object increases. To streamline the calibration and make it scale-insensitive, a multi-sensor system embedding inertial measurement units and a laser sensor is developed to compute the extrinsic parameters of the stereo cameras. In this research, the accuracy of the proposed sensor-based calibration method in performing stereophotogrammetry is validated experimentally and compared with traditional approaches. Tests conducted at various scales reveal that the proposed sensor-based calibration enables reconstructing both static and dynamic point clouds, measuring displacements with an accuracy higher than 95% compared to image-based traditional calibration, while being up to an order of magnitude faster and easier to deploy. The novel approach has broad applications for making static, dynamic, and deformation measurements to transform how large-scale structural health monitoring can be performed. Full article

(This article belongs to the Special Issue New Perspectives on 3D Point Cloud (Third Edition))

► Show Figures

Figure 1

16 pages, 4587 KiB

Open AccessArticle

FAMNet: A Lightweight Stereo Matching Network for Real-Time Depth Estimation in Autonomous Driving

by Jingyuan Zhang, Qiang Tong, Na Yan and Xiulei Liu

Symmetry 2025, 17(8), 1214; https://doi.org/10.3390/sym17081214 - 1 Aug 2025

Viewed by 236

Abstract

Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods [...] Read more.

Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods often sacrifice accuracy or generalization capability. To address these challenges, we propose FAMNet (Fusion Attention Multi-Scale Network), a lightweight and generalizable stereo matching framework tailored for real-time depth estimation in autonomous driving applications. FAMNet consists of two novel modules: Fusion Attention-based Cost Volume (FACV) and Multi-scale Attention Aggregation (MAA). FACV constructs a compact yet expressive cost volume by integrating multi-scale correlation, attention-guided feature fusion, and channel reweighting, thereby reducing reliance on heavy 3D convolutions. MAA further enhances disparity estimation by fusing multi-scale contextual cues through pyramid-based aggregation and dual-path attention mechanisms. Extensive experiments on the KITTI 2012 and KITTI 2015 benchmarks demonstrate that FAMNet achieves a favorable trade-off between accuracy, efficiency, and generalization. On KITTI 2015, with the incorporation of FACV and MAA, the prediction accuracy of the baseline model is improved by 37% and 38%, respectively, and a total improvement of 42% is achieved by our final model. These results highlight FAMNet’s potential for practical deployment in resource-constrained autonomous driving systems requiring real-time and reliable depth perception. Full article

(This article belongs to the Special Issue Computer Vision, Pattern Recognition, Machine Learning, and Symmetry, 2nd Edition)

► Show Figures

Figure 1

18 pages, 12946 KiB

Open AccessArticle

High-Resolution 3D Reconstruction of Individual Rice Tillers for Genetic Studies

by Jiexiong Xu, Jiyoung Lee, Gang Jiang and Xiangchao Gan

Agronomy 2025, 15(8), 1803; https://doi.org/10.3390/agronomy15081803 - 25 Jul 2025

Viewed by 208

Abstract

The architecture of rice tillers plays a pivotal role in yield potential, yet conventional phenotyping methods have struggled to capture these intricate three-dimensional (3D) structures with high fidelity. In this study, a 3D model reconstruction method was developed specifically for rice tillers to [...] Read more.

The architecture of rice tillers plays a pivotal role in yield potential, yet conventional phenotyping methods have struggled to capture these intricate three-dimensional (3D) structures with high fidelity. In this study, a 3D model reconstruction method was developed specifically for rice tillers to overcome the challenges posed by their slender, feature-poor morphology in multi-view stereo-based 3D reconstruction. By applying strategically designed colorful reference markers, high-resolution 3D tiller models of 231 rice landraces were reconstructed. Accurate phenotyping was achieved by introducing ScaleCalculator, a software tool that integrated depth images from a depth camera to calibrate the physical sizes of the 3D models. The high efficiency of the 3D model-based phenotyping pipeline was demonstrated by extracting the following seven key agronomic traits: flag leaf length, panicle length, first internode length below the panicle, stem length, flag leaf angle, second leaf angle from the panicle, and third leaf angle. Genome-wide association studies (GWAS) performed with these 3D traits identified numerous candidate genes, nine of which had been previously confirmed in the literature. This work provides a 3D phenomics solution tailored for slender organs and offers novel insights into the genetic regulation of complex morphological traits in rice. Full article

(This article belongs to the Special Issue Harnessing Sensing, Artificial Intelligence, and Robotics for Digital Agriculture)

► Show Figures

Figure 1

18 pages, 2592 KiB

Open AccessArticle

A Minimal Solution for Binocular Camera Relative Pose Estimation Based on the Gravity Prior

by Dezhong Chen, Kang Yan, Hongping Zhang and Zhenbao Yu

Remote Sens. 2025, 17(15), 2560; https://doi.org/10.3390/rs17152560 - 23 Jul 2025

Viewed by 184

Abstract

High-precision positioning is the foundation for the functionality of various intelligent agents. In complex environments, such as urban canyons, relative pose estimation using cameras is a crucial step in high-precision positioning. To take advantage of the ability of an inertial measurement unit (IMU) [...] Read more.

High-precision positioning is the foundation for the functionality of various intelligent agents. In complex environments, such as urban canyons, relative pose estimation using cameras is a crucial step in high-precision positioning. To take advantage of the ability of an inertial measurement unit (IMU) to provide relatively accurate gravity prior information over a short period, we propose a minimal solution method for the relative pose estimation of a stereo camera system assisted by the IMU. We rigidly connect the IMU to the camera system and use it to obtain the rotation matrices in the roll and pitch directions for the entire system, thereby reducing the minimum number of corresponding points required for relative pose estimation. In contrast to classic pose-estimation algorithms, our method can also calculate the camera focal length, which greatly expands its applicability. We constructed a simulated dataset and used it to compare and analyze the numerical stability of the proposed method and the impact of different levels of noise on algorithm performance. We also collected real-scene data using a drone and validated the proposed algorithm. The results on real data reveal that our method exhibits smaller errors in calculating the relative pose of the camera system compared with two classic reference algorithms. It achieves higher precision and stability and can provide a comparatively accurate camera focal length. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

37 pages, 55522 KiB

Open AccessArticle

EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera

by Orla Sealy Phelan, Dara Molloy, Roshan George, Edward Jones, Martin Glavin and Brian Deegan

Sensors 2025, 25(15), 4540; https://doi.org/10.3390/s25154540 - 22 Jul 2025

Viewed by 236

Abstract

Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional [...] Read more.

Efficient object detection is crucial to real-time monitoring applications such as autonomous driving or security systems. Modern RGB cameras can produce high-resolution images for accurate object detection. However, increased resolution results in increased network latency and power consumption. To minimise this latency, Convolutional Neural Networks (CNNs) often have a resolution limitation, requiring images to be down-sampled before inference, causing significant information loss. Event-based cameras are neuromorphic vision sensors with high temporal resolution, low power consumption, and high dynamic range, making them preferable to regular RGB cameras in many situations. This project proposes the fusion of an event-based camera with an RGB camera to mitigate the trade-off between temporal resolution and accuracy, while minimising power consumption. The cameras are calibrated to create a multi-modal stereo vision system where pixel coordinates can be projected between the event and RGB camera image planes. This calibration is used to project bounding boxes detected by clustering of events into the RGB image plane, thereby cropping each RGB frame instead of down-sampling to meet the requirements of the CNN. Using the Common Objects in Context (COCO) dataset evaluator, the average precision (AP) for the bicycle class in RGB scenes improved from 21.08 to 57.38. Additionally, AP increased across all classes from 37.93 to 46.89. To reduce system latency, a novel object detection approach is proposed where the event camera acts as a region proposal network, and a classification algorithm is run on the proposed regions. This achieved a 78% improvement over baseline. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 2941 KiB

Open AccessArticle

Dynamic Proxemic Model for Human–Robot Interactions Using the Golden Ratio

by Tomáš Spurný, Ján Babjak, Zdenko Bobovský and Aleš Vysocký

Appl. Sci. 2025, 15(15), 8130; https://doi.org/10.3390/app15158130 - 22 Jul 2025

Viewed by 260

Abstract

This paper presents a novel approach to determine dynamic safety and comfort zones in human–robot interactions (HRIs), with a focus on service robots operating in dynamic environments with people. The proposed proxemic model leverages the golden ratio-based comfort zone distribution and ISO safety [...] Read more.

This paper presents a novel approach to determine dynamic safety and comfort zones in human–robot interactions (HRIs), with a focus on service robots operating in dynamic environments with people. The proposed proxemic model leverages the golden ratio-based comfort zone distribution and ISO safety standards to define adaptive proxemic boundaries for robots around humans. Unlike traditional fixed-threshold approaches, this novel method proposes a gradual and context-sensitive modulation of robot behaviour based on human position, orientation, and relative velocity. The system was implemented on an NVIDIA Jetson Xavier NX platform using a ZED 2i stereo depth camera Stereolabs, New York, USA and tested on two mobile robotic platforms: Go1 Unitree, Hangzhou, China (quadruped) and Scout Mini Agilex, Dongguan, China (wheeled). The initial verification of proposed proxemic model through experimental comfort validation was conducted using two simple interaction scenarios, and subjective feedback was collected from participants using a modified Godspeed Questionnaire Series. The results show that the participants felt comfortable during the experiments with robots. This acceptance of the proposed methodology plays an initial role in supporting further research of the methodology. The proposed solution also facilitates integration into existing navigation frameworks and opens pathways towards socially aware robotic systems. Full article

(This article belongs to the Special Issue Intelligent Robotics: Design and Applications)

► Show Figures

Figure 1

15 pages, 1991 KiB

Open AccessArticle

Hybrid Deep–Geometric Approach for Efficient Consistency Assessment of Stereo Images

by Michał Kowalczyk, Piotr Napieralski and Dominik Szajerman

Sensors 2025, 25(14), 4507; https://doi.org/10.3390/s25144507 - 20 Jul 2025

Viewed by 453

Abstract

We present HGC-Net, a hybrid pipeline for assessing geometric consistency between stereo image pairs. Our method integrates classical epipolar geometry with deep learning components to compute an interpretable scalar score A, reflecting the degree of alignment. Unlike traditional techniques, which may overlook subtle [...] Read more.

We present HGC-Net, a hybrid pipeline for assessing geometric consistency between stereo image pairs. Our method integrates classical epipolar geometry with deep learning components to compute an interpretable scalar score A, reflecting the degree of alignment. Unlike traditional techniques, which may overlook subtle miscalibrations, HGC-Net reliably detects both severe and mild geometric distortions, such as sub-degree tilts and pixel-level shifts. We evaluate the method on the Middlebury 2014 stereo dataset, using synthetically distorted variants to simulate misalignments. Experimental results show that our score degrades smoothly with increasing geometric error and achieves high detection rates even at minimal distortion levels, outperforming baseline approaches based on disparity or calibration checks. The method operates in real time (12.5 fps on 1080p input) and does not require access to internal camera parameters, making it suitable for embedded stereo systems and quality monitoring in robotic and AR/VR applications. The approach also supports explainability via confidence maps and anomaly heatmaps, aiding human operators in identifying problematic regions. Full article

(This article belongs to the Special Issue Feature Papers in Physical Sensors 2025)

► Show Figures

Figure 1

32 pages, 2740 KiB

Open AccessArticle

Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review

by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico

Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025

Viewed by 1351

Abstract

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)

► Show Figures

Figure 1

21 pages, 33500 KiB

Open AccessArticle

Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision

by Tianzhong Fang, Wei Chen and Lu Han

Horticulturae 2025, 11(7), 801; https://doi.org/10.3390/horticulturae11070801 - 6 Jul 2025

Viewed by 446

Abstract

With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and [...] Read more.

With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and positioning accuracy in complex orchard environments (e.g., uneven illumination, foliage occlusion, and fruit overlap), which hinders practical applications. This study proposes a visual system for apple-harvesting robots based on improved Mask R-CNN and binocular vision to achieve more precise fruit positioning. The binocular camera (ZED2i) carried by the robot acquires dual-channel apple images. An improved Mask R-CNN is employed to implement instance segmentation of apple targets in binocular images, followed by a template-matching algorithm with parallel epipolar constraints for stereo matching. Four pairs of feature points from corresponding apples in binocular images are selected to calculate disparity and depth. Experimental results demonstrate average coefficients of variation and positioning accuracy of 5.09% and 99.61%, respectively, in binocular positioning. During harvesting operations with a self-designed apple-picking robot, the single-image processing time was 0.36 s, the average single harvesting cycle duration reached 7.7 s, and the comprehensive harvesting success rate achieved 94.3%. This work presents a novel high-precision visual positioning method for apple-harvesting robots. Full article

(This article belongs to the Section Fruit Production Systems)

► Show Figures

Figure 1

18 pages, 4774 KiB

Open AccessArticle

InfraredStereo3D: Breaking Night Vision Limits with Perspective Projection Positional Encoding and Groundbreaking Infrared Dataset

by Yuandong Niu, Limin Liu, Fuyu Huang, Juntao Ma, Chaowen Zheng, Yunfeng Jiang, Ting An, Zhongchen Zhao and Shuangyou Chen

Remote Sens. 2025, 17(12), 2035; https://doi.org/10.3390/rs17122035 - 13 Jun 2025

Viewed by 459

Abstract

In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in [...] Read more.

In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in a significant decline in image quality and making it difficult to meet the task requirements. The method based on lidar has poor imaging effects in rainy and foggy weather, close-range scenes, and scenarios requiring thermal imaging data. In contrast, infrared cameras can effectively overcome this challenge because their imaging mechanisms are different from those of RGB cameras and lidar. However, the research on three-dimensional scene reconstruction of infrared images is relatively immature, especially in the field of infrared binocular stereo matching. There are two main challenges given this situation: first, there is a lack of a dataset specifically for infrared binocular stereo matching; second, the lack of texture information in infrared images causes a limit in the extension of the RGB method to the infrared reconstruction problem. To solve these problems, this study begins with the construction of an infrared binocular stereo matching dataset and then proposes an innovative perspective projection positional encoding-based transformer method to complete the infrared binocular stereo matching task. In this paper, a stereo matching network combined with transformer and cost volume is constructed. The existing work in the positional encoding of the transformer usually uses a parallel projection model to simplify the calculation. Our method is based on the actual perspective projection model so that each pixel is associated with a different projection ray. It effectively solves the problem of feature extraction and matching caused by insufficient texture information in infrared images and significantly improves matching accuracy. We conducted experiments based on the infrared binocular stereo matching dataset proposed in this paper. Experiments demonstrated the effectiveness of the proposed method. Full article

(This article belongs to the Collection Visible Infrared Imaging Radiometers and Applications)

► Show Figures

Figure 1

31 pages, 99149 KiB

Open AccessArticle

Optimizing Camera Settings and Unmanned Aerial Vehicle Flight Methods for Imagery-Based 3D Reconstruction: Applications in Outcrop and Underground Rock Faces

by Junsu Leem, Seyedahmad Mehrishal, Il-Seok Kang, Dong-Ho Yoon, Yulong Shao, Jae-Joon Song and Jinha Jung

Remote Sens. 2025, 17(11), 1877; https://doi.org/10.3390/rs17111877 - 28 May 2025

Viewed by 689

Abstract

The structure from motion (SfM) and multiview stereo (MVS) techniques have proven effective in generating high-quality 3D point clouds, particularly when integrated with unmanned aerial vehicles (UAVs). However, the impact of image quality—a critical factor for SfM–MVS techniques—has received limited attention. This study [...] Read more.

The structure from motion (SfM) and multiview stereo (MVS) techniques have proven effective in generating high-quality 3D point clouds, particularly when integrated with unmanned aerial vehicles (UAVs). However, the impact of image quality—a critical factor for SfM–MVS techniques—has received limited attention. This study proposes a method for optimizing camera settings and UAV flight methods to minimize point cloud errors under illumination and time constraints. The effectiveness of the optimized settings was validated by comparing point clouds generated under these conditions with those obtained using arbitrary settings. The evaluation involved measuring point-to-point error levels for an indoor target and analyzing the standard deviation of cloud-to-mesh (C2M) and multiscale model-to-model cloud comparison (M3C2) distances across six joint planes of a rock mass outcrop in Seoul, Republic of Korea. The results showed that optimal settings improved accuracy without requiring additional lighting or extended survey time. Furthermore, we assessed the performance of SfM–MVS under optimized settings in an underground tunnel in Yeoju-si, Republic of Korea, comparing the resulting 3D models with those generated using Light Detection and Ranging (LiDAR). Despite challenging lighting conditions and time constraints, the results suggest that SfM–MVS with optimized settings has the potential to produce 3D models with higher accuracy and resolution at a lower cost than LiDAR in such environments. Full article

(This article belongs to the Special Issue Advancing UAV-Based Remote Sensing: Innovations, Techniques and Applications)

► Show Figures

Graphical abstract

17 pages, 1922 KiB

Open AccessArticle

Enhancing Visual–Inertial Odometry Robustness and Accuracy in Challenging Environments

by Alessandro Minervini, Adrian Carrio and Giorgio Guglieri

Robotics 2025, 14(6), 71; https://doi.org/10.3390/robotics14060071 - 27 May 2025

Viewed by 1660

Abstract

Visual–Inertial Odometry (VIO) algorithms are widely adopted for autonomous drone navigation in GNSS-denied environments. However, conventional monocular and stereo VIO setups often lack robustness under challenging environmental conditions or during aggressive maneuvers, due to the sensitivity of visual information to lighting, texture, and [...] Read more.

Visual–Inertial Odometry (VIO) algorithms are widely adopted for autonomous drone navigation in GNSS-denied environments. However, conventional monocular and stereo VIO setups often lack robustness under challenging environmental conditions or during aggressive maneuvers, due to the sensitivity of visual information to lighting, texture, and motion blur. In this work, we enhance an existing open-source VIO algorithm to improve both the robustness and accuracy of the pose estimation. First, we integrate an IMU-based motion prediction module to improve feature tracking across frames, particularly during high-speed movements. Second, we extend the algorithm to support a multi-camera setup, which significantly improves tracking performance in low-texture environments. Finally, to reduce the computational complexity, we introduce an adaptive feature selection strategy that dynamically adjusts the detection thresholds according to the number of detected features. Experimental results validate the proposed approaches, demonstrating notable improvements in both accuracy and robustness across a range of challenging scenarios. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Figure 1

14 pages, 3918 KiB

Open AccessArticle

Transforming Monochromatic Images into 3D Holographic Stereograms Through Depth-Map Extraction

by Oybek Mirzaevich Narzulloev, Jinwon Choi, Jumamurod Farhod Ugli Aralov, Leehwan Hwang, Philippe Gentet and Seunghyun Lee

Appl. Sci. 2025, 15(10), 5699; https://doi.org/10.3390/app15105699 - 20 May 2025

Viewed by 522

Abstract

Traditional holographic printing techniques prove inadequate when only input data are available. Therefore, this paper proposes a new artificial-intelligence-based process for generating digital holographic stereograms from a single black-and-white photograph. This method eliminates the need for stereo cameras, photogrammetry, or 3D models. In [...] Read more.

Traditional holographic printing techniques prove inadequate when only input data are available. Therefore, this paper proposes a new artificial-intelligence-based process for generating digital holographic stereograms from a single black-and-white photograph. This method eliminates the need for stereo cameras, photogrammetry, or 3D models. In this approach, a convolutional neural network and deep convolutional neural field model are used for image colorization and a depth-map estimation, respectively. Subsequently, the colored image and depth map are used to generate the multiview images required for creating holographic stereograms. This method efficiently preserves the visual characteristics of the original black-and-white images in the final digital holographic portraits. This provides a new and accessible method for holographic reconstruction using limited data, enabling the generation of 3D holographic content from existing images. Experiments were conducted using black-and-photographs of two historical figures, and highly realistic holograms were obtained successfully. This study has significant implications for cultural preservation, personal archiving, and the generation of life-like holographic images with minimal input data. By bridging the gap between historical photographic sources and modern holographic techniques, our approach opens up new possibilities for memory preservation and visual storytelling. Full article

► Show Figures

Figure 1

17 pages, 5356 KiB

Open AccessEditor’s ChoiceArticle

A Study on the Features for Multi-Target Dual-Camera Tracking and Re-Identification in a Comparatively Small Environment

by Jong-Chen Chen, Po-Sheng Chang and Yu-Ming Huang

Electronics 2025, 14(10), 1984; https://doi.org/10.3390/electronics14101984 - 13 May 2025

Viewed by 544

Abstract

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates [...] Read more.

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates our exploration of some of the above challenging issues. This study uses stereo camera calibration and key point detection to reconstruct the three-dimensional key points of the person being tracked, thereby performing person-tracking tasks. The results show that the dual cameras’ 3D spatial tracking method can have a relatively better continuous monitoring effect than a single camera alone. This study adopts four ways to evaluate person similarity, which can effectively reduce the unnecessary identity generation of persons. However, using all four methods simultaneously may not produce better results than a specific assessment method alone due to differences in people’s activity situations. Full article

(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

► Show Figures

Figure 1

Search Results (611)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (611)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI