MDPI - Publisher of Open Access Journals

40 pages, 1010 KiB

Open AccessReview

A Survey of Deep Learning-Based 3D Object Detection Methods for Autonomous Driving Across Different Sensor Modalities

by Miguel Valverde, Alexandra Moutinho and João-Vitor Zacchi

Sensors 2025, 25(17), 5264; https://doi.org/10.3390/s25175264 (registering DOI) - 24 Aug 2025

This paper presents a comprehensive survey of deep learning-based methods for 3D object detection in autonomous driving, focusing on their use of diverse sensor modalities, including monocular cameras, stereo vision, LiDAR, radar, and multi-modal fusion. To systematically organize the literature, a structured taxonomy [...] Read more.

This paper presents a comprehensive survey of deep learning-based methods for 3D object detection in autonomous driving, focusing on their use of diverse sensor modalities, including monocular cameras, stereo vision, LiDAR, radar, and multi-modal fusion. To systematically organize the literature, a structured taxonomy is proposed that categorizes methods by input modality. The review also outlines the chronological evolution of these approaches, highlighting major architectural developments and paradigm shifts. Furthermore, the surveyed methods are quantitatively compared using standard evaluation metrics across benchmark datasets in autonomous driving scenarios. Overall, this work provides a detailed and modality-agnostic overview of the current landscape of deep learning approaches for 3D object detection in autonomous driving. Results of this work are available in a github open repository. Full article

(This article belongs to the Special Issue Sensors and Sensor Fusion Technology in Autonomous Vehicles)

16 pages, 1632 KiB

Open AccessArticle

Toward an Augmented Reality Representation of Collision Risks in Harbors

by Mario Miličević, Igor Vujović, Miro Petković and Ana Kuzmanić Skelin

Appl. Sci. 2025, 15(17), 9260; https://doi.org/10.3390/app15179260 - 22 Aug 2025

Abstract

In ports with a significant density of non-AIS vessels, there is an increased risk of collisions. This is because physical limitations restrict the maneuverability of AIS vessels, while small vessels that do not have AIS are unpredictable. To help with collision prevention, we [...] Read more.

In ports with a significant density of non-AIS vessels, there is an increased risk of collisions. This is because physical limitations restrict the maneuverability of AIS vessels, while small vessels that do not have AIS are unpredictable. To help with collision prevention, we propose an augmented reality system that detects vessels from video stream and estimates speed with a single sideway-mounted camera. The goal is to visualize a cone for risk assessment. The estimation of speed is executed by geometric relations between the camera and the ship, which were used to estimate distances between points in a known time interval. The most important part of the proposal is vessel speed estimation by a monocular camera validated by a laser speed measurement. This will help port authorities to manage risks. This system differs from similar trials as it uses a single stationary camera linked to the authorities and not to the bridge crew. Full article

(This article belongs to the Section Marine Science and Engineering)

23 pages, 13423 KiB

Open AccessArticle

A Lightweight LiDAR–Visual Odometry Based on Centroid Distance in a Similar Indoor Environment

by Zongkun Zhou, Weiping Jiang, Chi Guo, Yibo Liu and Xingyu Zhou

Remote Sens. 2025, 17(16), 2850; https://doi.org/10.3390/rs17162850 - 16 Aug 2025

Viewed by 448

Abstract

Simultaneous Localization and Mapping (SLAM) is a critical technology for robot intelligence. Compared to cameras, Light Detection and Ranging (LiDAR) sensors achieve higher accuracy and stability in indoor environments. However, LiDAR can only capture the geometric structure of the environment, and LiDAR-based SLAM [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a critical technology for robot intelligence. Compared to cameras, Light Detection and Ranging (LiDAR) sensors achieve higher accuracy and stability in indoor environments. However, LiDAR can only capture the geometric structure of the environment, and LiDAR-based SLAM often fails in scenarios with insufficient geometric features or highly similar structures. Furthermore, low-cost mechanical LiDARs, constrained by sparse point cloud density, are particularly prone to odometry drift along the Z-axis, especially in environments such as tunnels or long corridors. To address the localization issues in such scenarios, we propose a forward-enhanced SLAM algorithm. Utilizing a 16-line LiDAR and a monocular camera, we construct a dense colored point cloud input and apply an efficient multi-modal feature extraction algorithm based on centroid distance to extract a set of feature points with significant geometric and color features. These points are then optimized in the back end based on constraints from points, lines, and planes. We compare our method with several classic SLAM algorithms in terms of feature extraction, localization, and elevation constraint. Experimental results demonstrate that our method achieves high-precision real-time operation and exhibits excellent adaptability to indoor environments with similar structures. Full article

► Show Figures

Figure 1

20 pages, 27328 KiB

Open AccessArticle

GDVI-Fusion: Enhancing Accuracy with Optimal Geometry Matching and Deep Nearest Neighbor Optimization

by Jincheng Peng, Xiaoli Zhang, Kefei Yuan, Xiafu Peng and Gongliu Yang

Appl. Sci. 2025, 15(16), 8875; https://doi.org/10.3390/app15168875 - 12 Aug 2025

Viewed by 239

Abstract

The visual–inertial odometry (VIO) system is not robust enough in long time operation. Especially, the visual–inertial and Global Navigation Satellite System (GNSS) coupled system is prone to dispersion of system position information in case of failure of visual information or GNSS information. To [...] Read more.

The visual–inertial odometry (VIO) system is not robust enough in long time operation. Especially, the visual–inertial and Global Navigation Satellite System (GNSS) coupled system is prone to dispersion of system position information in case of failure of visual information or GNSS information. To address the above problems, this paper proposes a tightly coupled nonlinear optimized localization system of RGBD visual, inertial measurement unit (IMU), and global position (GDVI-Fusion) to solve the problems of insufficient robustness of carrier position estimation and inaccurate localization information in environments where visual information or GNSS information fails. The preprocessing of depth information in the initialization process is proposed to solve the influence of an RGBD camera by lighting and physical structure and to improve the accuracy of the depth information of image feature points so as to improve the robustness of the localization system. Based on the K-Nearest-Neighbors (KNN) algorithm, to process the feature points, the matching points construct the best geometric constraints and eliminate the feature matching points with an abnormal length and slope of the matching line, which improves the rapidity and accuracy of the feature point matching, resulting in the improvement of the system’s localization accuracy. The lightweight monocular GDVI-Fusion system proposed in this paper achieves a 54.2% improvement in operational efficiency and a 37.1% improvement in positioning accuracy compared with the GVINS system. We have verified the system’s operational efficiency and positioning accuracy using a public dataset and on a prototype. Full article

► Show Figures

Figure 1

19 pages, 4206 KiB

Open AccessArticle

A Hybrid UNet with Attention and a Perceptual Loss Function for Monocular Depth Estimation

by Hamidullah Turkmen and Devrim Akgun

Mathematics 2025, 13(16), 2567; https://doi.org/10.3390/math13162567 - 11 Aug 2025

Viewed by 399

Abstract

Monocular depth estimation is a crucial technique in computer vision that determines the depth or distance of objects in a scene using a single 2D image captured by a camera. UNet-based models are a fundamental architecture for monocular depth estimation, due to their [...] Read more.

Monocular depth estimation is a crucial technique in computer vision that determines the depth or distance of objects in a scene using a single 2D image captured by a camera. UNet-based models are a fundamental architecture for monocular depth estimation, due to their effective encoder–decoder structure. This study presents an effective depth estimation model based on a hybrid UNet architecture that incorporates ensemble features. The new model integrates Transformer-based attention blocks to capture global context and an encoder built on ResNet18 to extract spatial features. Additionally, a novel Boundary-Aware Depth Consistency Loss (BADCL) function has been introduced to enhance accuracy. This function features dynamic scaling, smoothness regularization, and boundary-aware weighting, which provides sharper edges, smoother depth transitions, and scale-consistent predictions. The proposed model has been evaluated on the NYU Depth V2 dataset, achieving a Structural Similarity Index Measure (SSIM) of 99.8%. The performance of the proposed model indicates increased depth accuracy compared to state-of-the-art methods. Full article

(This article belongs to the Special Issue Artificial Intelligence and Algorithms with Their Applications)

► Show Figures

Figure 1

19 pages, 3382 KiB

Open AccessArticle

LiDAR as a Geometric Prior: Enhancing Camera Pose Tracking Through High-Fidelity View Synthesis

by Rafael Muñoz-Salinas, Jianheng Liu, Francisco J. Romero-Ramirez, Manuel J. Marín-Jiménez and Fu Zhang

Appl. Sci. 2025, 15(15), 8743; https://doi.org/10.3390/app15158743 - 7 Aug 2025

Viewed by 330

Abstract

This paper presents a robust framework for monocular camera pose estimation by leveraging high-fidelity, pre-built 3D LiDAR maps. The core of our approach is a render-and-match pipeline that synthesizes photorealistic views from a dense LiDAR point cloud. By detecting and matching keypoints between [...] Read more.

This paper presents a robust framework for monocular camera pose estimation by leveraging high-fidelity, pre-built 3D LiDAR maps. The core of our approach is a render-and-match pipeline that synthesizes photorealistic views from a dense LiDAR point cloud. By detecting and matching keypoints between these synthetic images and the live camera feed, we establish reliable 3D–2D correspondences for accurate pose estimation. We evaluate two distinct strategies: an Online Rendering and Tracking method that renders views on the fly, and an Offline Keypoint-Map Tracking method that precomputes a keypoint map for known trajectories, optimizing for computational efficiency. Comprehensive experiments demonstrate that our framework significantly outperforms several state-of-the-art visual SLAM systems in both accuracy and tracking consistency. By anchoring localization to the stable geometric information from the LiDAR map, our method overcomes the reliance on photometric consistency that often causes failures in purely image-based systems, proving particularly effective in challenging real-world environments. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision Applications)

► Show Figures

Figure 1

32 pages, 1435 KiB

Open AccessReview

Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies

by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs

Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025

Viewed by 738

Abstract

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

17 pages, 6351 KiB

Open AccessArticle

Vision-Ray-Calibration-Based Monocular Deflectometry by Poses Estimation from Reflections

by Cheng Liu, Jianhua Liu, Yanming Xing, Xiaohui Ao, Wang Zhang and Chunguang Yang

Sensors 2025, 25(15), 4778; https://doi.org/10.3390/s25154778 - 3 Aug 2025

Viewed by 239

Abstract

A monocular deflectometric system comprises a camera and a screen that collaboratively facilitate the reconstruction of a specular surface under test (SUT). This paper presents a methodology for solving the slope distribution of the SUT utilizing pose estimation derived from reflections, based on [...] Read more.

A monocular deflectometric system comprises a camera and a screen that collaboratively facilitate the reconstruction of a specular surface under test (SUT). This paper presents a methodology for solving the slope distribution of the SUT utilizing pose estimation derived from reflections, based on vision ray calibration (VRC). Initially recorded by the camera, an assisted flat mirror in different postures reflects the patterns displayed by a screen maintained in a constant posture. The system undergoes a calibration based on the VRC to ascertain the vision ray distribution of the camera and the spatial relationship between the camera and the screen. Subsequently, the camera records the reflected patterns by the SUT, which remains in a constant posture while the screen is adjusted to multiple postures. Utilizing the VRC, the vision ray distribution among several postures of the screen and the SUT is calibrated. Following this, an iterative integrated calibration is performed, employing the calibration results from the preceding separate calibrations as initial parameters. The integrated calibration amalgamates the cost functions from the separate calibrations with the intersection of lines in Plücker space. Ultimately, the results from the integrated calibration yield the slope distribution of the SUT, enabling an integral reconstruction. In both the numeric simulations and actual measurements, the integrated calibration significantly enhances the accuracy of the reconstructions when compared to the reconstructions with the separate calibrations. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

27 pages, 6578 KiB

Open AccessArticle

Evaluating Neural Radiance Fields for ADA-Compliant Sidewalk Assessments: A Comparative Study with LiDAR and Manual Methods

by Hang Du, Shuaizhou Wang, Linlin Zhang, Mark Amo-Boateng and Yaw Adu-Gyamfi

Infrastructures 2025, 10(8), 191; https://doi.org/10.3390/infrastructures10080191 - 22 Jul 2025

Viewed by 478

Abstract

An accurate assessment of sidewalk conditions is critical for ensuring compliance with the Americans with Disabilities Act (ADA), particularly to safeguard mobility for wheelchair users. This paper presents a novel 3D reconstruction framework based on neural radiance field (NeRF), which utilize a monocular [...] Read more.

An accurate assessment of sidewalk conditions is critical for ensuring compliance with the Americans with Disabilities Act (ADA), particularly to safeguard mobility for wheelchair users. This paper presents a novel 3D reconstruction framework based on neural radiance field (NeRF), which utilize a monocular video input from consumer-grade cameras to generate high-fidelity 3D models of sidewalk environments. The framework enables automatic extraction of ADA-relevant geometric features, including the running slope, the cross slope, and vertical displacements, facilitating an efficient and scalable compliance assessment process. A comparative study is conducted across three surveying methods—manual measurements, LiDAR scanning, and the proposed NeRF-based approach—evaluated on four sidewalks and one curb ramp. Each method was assessed based on accuracy, cost, time, level of automation, and scalability. The NeRF-based approach achieved high agreement with LiDAR-derived ground truth, delivering an F1 score of 96.52%, a precision of 96.74%, and a recall of 96.34% for ADA compliance classification. These results underscore the potential of NeRF to serve as a cost-effective, automated alternative to traditional and LiDAR-based methods, with sufficient precision for widespread deployment in municipal sidewalk audits. Full article

► Show Figures

Figure 1

18 pages, 3225 KiB

Open AccessArticle

Autonomous Tracking of Steel Lazy Wave Risers Using a Hybrid Vision–Acoustic AUV Framework

by Ali Ghasemi and Hodjat Shiri

J. Mar. Sci. Eng. 2025, 13(7), 1347; https://doi.org/10.3390/jmse13071347 - 15 Jul 2025

Viewed by 384

Abstract

Steel lazy wave risers (SLWRs) are critical in offshore hydrocarbon transport for linking subsea wells to floating production facilities in deep-water environments. The incorporation of buoyancy modules reduces curvature-induced stress concentrations in the touchdown zone (TDZ); however, extended operational exposure under cyclic environmental [...] Read more.

Steel lazy wave risers (SLWRs) are critical in offshore hydrocarbon transport for linking subsea wells to floating production facilities in deep-water environments. The incorporation of buoyancy modules reduces curvature-induced stress concentrations in the touchdown zone (TDZ); however, extended operational exposure under cyclic environmental and operational loads results in repeated seabed contact. This repeated interaction modifies the seabed soil over time, gradually forming a trench and altering the riser configuration, which significantly impacts stress patterns and contributes to fatigue degradation. Accurately reconstructing the riser’s evolving profile in the TDZ is essential for reliable fatigue life estimation and structural integrity evaluation. This study proposes a simulation-based framework for the autonomous tracking of SLWRs using a fin-actuated autonomous underwater vehicle (AUV) equipped with a monocular camera and multibeam echosounder. By fusing visual and acoustic data, the system continuously estimates the AUV’s relative position concerning the riser. A dedicated image processing pipeline, comprising bilateral filtering, edge detection, Hough transform, and K-means clustering, facilitates the extraction of the riser’s centerline and measures its displacement from nearby objects and seabed variations. The framework was developed and validated in the underwater unmanned vehicle (UUV) Simulator, a high-fidelity underwater robotics and pipeline inspection environment. Simulated scenarios included the riser’s dynamic lateral and vertical oscillations, in which the system demonstrated robust performance in capturing complex three-dimensional trajectories. The resulting riser profiles can be integrated into numerical models incorporating riser–soil interaction and non-linear hysteretic behavior, ultimately enhancing fatigue prediction accuracy and informing long-term infrastructure maintenance strategies. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

32 pages, 2740 KiB

Open AccessArticle

Vision-Based Navigation and Perception for Autonomous Robots: Sensors, SLAM, Control Strategies, and Cross-Domain Applications—A Review

by Eder A. Rodríguez-Martínez, Wendy Flores-Fuentes, Farouk Achakir, Oleg Sergiyenko and Fabian N. Murrieta-Rico

Eng 2025, 6(7), 153; https://doi.org/10.3390/eng6070153 - 7 Jul 2025

Cited by 1 | Viewed by 2454

Abstract

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from [...] Read more.

Camera-centric perception has matured into a cornerstone of modern autonomy, from self-driving cars and factory cobots to underwater and planetary exploration. This review synthesizes more than a decade of progress in vision-based robotic navigation through an engineering lens, charting the full pipeline from sensing to deployment. We first examine the expanding sensor palette—monocular and multi-camera rigs, stereo and RGB-D devices, LiDAR–camera hybrids, event cameras, and infrared systems—highlighting the complementary operating envelopes and the rise of learning-based depth inference. The advances in visual localization and mapping are then analyzed, contrasting sparse and dense SLAM approaches, as well as monocular, stereo, and visual–inertial formulations. Additional topics include loop closure, semantic mapping, and LiDAR–visual–inertial fusion, which enables drift-free operation in dynamic environments. Building on these foundations, we review the navigation and control strategies, spanning classical planning, reinforcement and imitation learning, hybrid topological–metric memories, and emerging visual language guidance. Application case studies—autonomous driving, industrial manipulation, autonomous underwater vehicles, planetary rovers, aerial drones, and humanoids—demonstrate how tailored sensor suites and algorithms meet domain-specific constraints. Finally, the future research trajectories are distilled: generative AI for synthetic training data and scene completion; high-density 3D perception with solid-state LiDAR and neural implicit representations; event-based vision for ultra-fast control; and human-centric autonomy in next-generation robots. By providing a unified taxonomy, a comparative analysis, and engineering guidelines, this review aims to inform researchers and practitioners designing robust, scalable, vision-driven robotic systems. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)

► Show Figures

Figure 1

16 pages, 3055 KiB

Open AccessArticle

LET-SE2-VINS: A Hybrid Optical Flow Framework for Robust Visual–Inertial SLAM

by Wei Zhao, Hongyang Sun, Songsong Ma and Haitao Wang

Sensors 2025, 25(13), 3837; https://doi.org/10.3390/s25133837 - 20 Jun 2025

Viewed by 639

Abstract

This paper presents SE2-LET-VINS, an enhanced Visual–Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual–Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality [...] Read more.

This paper presents SE2-LET-VINS, an enhanced Visual–Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual–Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching. Experimental results on the European Robotics Challenges (EuRoc) dataset demonstrate that the proposed hybrid method improves localization accuracy by up to 43.89% compared to the classic VINS-Mono model in sequences with loop closure detection. In no-loop scenarios, the method also achieves error reductions of 29.7%, 21.8%, and 24.1% on the MH_04, MH_05, and V2_03 sequences, respectively. Trajectory visualization and Gaussian fitting analysis further confirm the system’s good robustness and accuracy. SE2-LET-VINS offers a robust solution for visual–inertial navigation, particularly in demanding environments, and paves the way for future real-time applications and extended capabilities. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

28 pages, 12681 KiB

Open AccessArticle

MM-VSM: Multi-Modal Vehicle Semantic Mesh and Trajectory Reconstruction for Image-Based Cooperative Perception

by Márton Cserni, András Rövid and Zsolt Szalay

Appl. Sci. 2025, 15(12), 6930; https://doi.org/10.3390/app15126930 - 19 Jun 2025

Viewed by 539

Abstract

Recent advancements in cooperative 3D object detection have demonstrated significant potential for enhancing autonomous driving by integrating roadside infrastructure data. However, deploying comprehensive LiDAR-based cooperative perception systems remains prohibitively expensive and requires precisely annotated 3D data to function robustly. This paper proposes an [...] Read more.

Recent advancements in cooperative 3D object detection have demonstrated significant potential for enhancing autonomous driving by integrating roadside infrastructure data. However, deploying comprehensive LiDAR-based cooperative perception systems remains prohibitively expensive and requires precisely annotated 3D data to function robustly. This paper proposes an improved multi-modal method integrating LiDAR-based shape references into a previously mono-camera-based semantic vertex reconstruction framework to enable robust and cost-effective monocular and cooperative pose estimation after the reconstruction. A novel camera–LiDAR loss function that combines re-projection loss from a multi-view camera system alongside LiDAR shape constraints is proposed. Experimental evaluations conducted on the Argoverse dataset and real-world experiments demonstrate significantly improved shape reconstruction robustness and accuracy, thereby improving pose estimation performance. The effectiveness of the algorithm is proven through a real-world smart valet parking application, which is evaluated in our university parking area with real vehicles. Our approach allows accurate 6DOF pose estimation using an inexpensive IP camera without requiring context-specific training, thereby advancing the state of the art in monocular and cooperative image-based vehicle localization. Full article

(This article belongs to the Special Issue Advances in Autonomous Driving and Smart Transportation)

► Show Figures

Figure 1

22 pages, 22557 KiB

Open AccessArticle

Depth from 2D Images: Development and Metrological Evaluation of System Uncertainty Applied to Agricultural Scenarios

by Bernardo Lanza, Cristina Nuzzi and Simone Pasinetti

Sensors 2025, 25(12), 3790; https://doi.org/10.3390/s25123790 - 17 Jun 2025

Viewed by 402

Abstract

This article describes the development, experimental validation, and uncertainty analysis of a simple-to-use model for monocular depth estimation based on optical flow. The idea is deeply rooted in the agricultural scenario, for which vehicles that move around the field are equipped with low-cost [...] Read more.

This article describes the development, experimental validation, and uncertainty analysis of a simple-to-use model for monocular depth estimation based on optical flow. The idea is deeply rooted in the agricultural scenario, for which vehicles that move around the field are equipped with low-cost cameras. In the experiment, the camera was mounted on a robot moving linearly at five different constant speeds looking at the target measurands (ArUco markers) positioned at different depths. The acquired data was processed and filtered with a moving average window-based filter to reduce noise in the estimated apparent depths of the ArUco markers and in the estimated optical flow image speeds. Two methods are proposed for model validation: a generalized approach and a complete approach that separates the input data according to their image speed to account for the exponential nature of the proposed model. The practical result obtained by the two analyses is that, to reduce the impact of uncertainty on depth estimates, it is best to have image speeds higher than 500–800 px/s. This is obtained by either moving the camera faster or by increasing the camera’s frame rate. The best-case scenario is achieved when the camera moves at 0.50–0.75 m/s and the frame rate is set to 60 fps (effectively reduced to 20 fps after filtering). As a further contribution, two practical examples are provided to offer guidance for untrained personnel in selecting the camera’s speed and camera characteristics. The developed code is made publicly available on GitHub. Full article

(This article belongs to the Special Issue Advances in Sensor Technologies and Measurement Techniques for Smart Agri-Food)

► Show Figures

Figure 1

18 pages, 12661 KiB

Open AccessArticle

Regression-Based Docking System for Autonomous Mobile Robots Using a Monocular Camera and ArUco Markers

by Jun Seok Oh and Min Young Kim

Sensors 2025, 25(12), 3742; https://doi.org/10.3390/s25123742 - 15 Jun 2025

Viewed by 578

Abstract

This paper introduces a cost-effective autonomous charging docking system that utilizes a monocular camera and ArUco markers. Traditional monocular vision-based approaches, such as SolvePnP, are sensitive to viewing angles, lighting conditions, and camera calibration errors, limiting the accuracy of spatial estimation. To address [...] Read more.

This paper introduces a cost-effective autonomous charging docking system that utilizes a monocular camera and ArUco markers. Traditional monocular vision-based approaches, such as SolvePnP, are sensitive to viewing angles, lighting conditions, and camera calibration errors, limiting the accuracy of spatial estimation. To address these challenges, we propose a regression-based method that learns geometric features from variations in marker size and shape to estimate distance and orientation accurately. The proposed model is trained using ground-truth data collected from a LiDAR sensor, while real-time operation is performed using only monocular input. Experimental results show that the proposed system achieves a mean distance error of 1.18 cm and a mean orientation error of 3.11°, significantly outperforming SolvePnP, which exhibits errors of 58.54 cm and 6.64°, respectively. In real-world docking tests, the system achieves a final average docking position error of 2 cm and an orientation error of 3.07°, demonstrating that reliable and accurate performance can be attained using low-cost, vision-only hardware. This system offers a practical and scalable solution for industrial applications. Full article

(This article belongs to the Special Issue Advances in Mobile Robot Perceptions, Planning, Control and Learning: 2nd Edition)

► Show Figures

Figure 1

Search Results (468)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (468)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI