MDPI - Publisher of Open Access Journals

28 pages, 12681 KiB

Open AccessArticle

MM-VSM: Multi-Modal Vehicle Semantic Mesh and Trajectory Reconstruction for Image-Based Cooperative Perception

by Márton Cserni, András Rövid and Zsolt Szalay

Appl. Sci. 2025, 15(12), 6930; https://doi.org/10.3390/app15126930 - 19 Jun 2025

Viewed by 392

Recent advancements in cooperative 3D object detection have demonstrated significant potential for enhancing autonomous driving by integrating roadside infrastructure data. However, deploying comprehensive LiDAR-based cooperative perception systems remains prohibitively expensive and requires precisely annotated 3D data to function robustly. This paper proposes an [...] Read more.

Recent advancements in cooperative 3D object detection have demonstrated significant potential for enhancing autonomous driving by integrating roadside infrastructure data. However, deploying comprehensive LiDAR-based cooperative perception systems remains prohibitively expensive and requires precisely annotated 3D data to function robustly. This paper proposes an improved multi-modal method integrating LiDAR-based shape references into a previously mono-camera-based semantic vertex reconstruction framework to enable robust and cost-effective monocular and cooperative pose estimation after the reconstruction. A novel camera–LiDAR loss function that combines re-projection loss from a multi-view camera system alongside LiDAR shape constraints is proposed. Experimental evaluations conducted on the Argoverse dataset and real-world experiments demonstrate significantly improved shape reconstruction robustness and accuracy, thereby improving pose estimation performance. The effectiveness of the algorithm is proven through a real-world smart valet parking application, which is evaluated in our university parking area with real vehicles. Our approach allows accurate 6DOF pose estimation using an inexpensive IP camera without requiring context-specific training, thereby advancing the state of the art in monocular and cooperative image-based vehicle localization. Full article

(This article belongs to the Special Issue Advances in Autonomous Driving and Smart Transportation)

► Show Figures

Figure 1

23 pages, 12272 KiB

Open AccessArticle

Optimized Design and Deep Vision-Based Operation Control of a Multi-Functional Robotic Gripper for an Automatic Loading System

by Yaohui Wang, Sheng Guo, Jinliang Zhang, Hongbo Ding, Bo Zhang, Ao Cao, Xiaohu Sun, Guangxin Zhang, Shihe Tian, Yongxu Chen, Jixuan Ma and Guangrong Chen

Actuators 2025, 14(6), 259; https://doi.org/10.3390/act14060259 - 23 May 2025

Viewed by 446

Abstract

This study presents an optimized design and vision-guided control strategy for a multi-functional robotic gripper integrated into an automatic loading system for warehouse environments. The system adopts a modular architecture, including standardized platforms, transport containers, four collaborative 6-DOF robotic arms, and a multi-sensor [...] Read more.

This study presents an optimized design and vision-guided control strategy for a multi-functional robotic gripper integrated into an automatic loading system for warehouse environments. The system adopts a modular architecture, including standardized platforms, transport containers, four collaborative 6-DOF robotic arms, and a multi-sensor vision module. Methodologically, we first developed three gripper prototypes, selecting the optimal design (30° angle between the gripper and container side) through workspace and interference analysis. A deep vision-based recognition system, enhanced by an improved YOLOv5 algorithm and multi-feature fusion, was employed for real-time object detection and pose estimation. Kinematic modeling and seventh-order polynomial trajectory planning ensured smooth and precise robotic arm movements. Key results from simulations and experiments demonstrated a 95.72% success rate in twist lock operations, with a positioning accuracy of 1.2 mm. The system achieved a control cycle of 35 ms, ensuring efficiency compared with non-vision-based methods. Practical implications include enabling fully autonomous container handling in logistics, reducing labor costs, and enhancing operational safety. Limitations include dependency on fixed camera setups and sensitivity to extreme lighting conditions. Full article

(This article belongs to the Special Issue Advancement in the Design and Control of Robotic Grippers—Second Edition)

► Show Figures

Figure 1

34 pages, 9482 KiB

Open AccessArticle

A Novel Feedforward Youla Parameterization Method for Avoiding Local Minima in Stereo Image-Based Visual Servoing Control

by Rongfei Li and Francis Assadian

Appl. Sci. 2025, 15(9), 4991; https://doi.org/10.3390/app15094991 - 30 Apr 2025

Viewed by 344

Abstract

In robot navigation and manipulation, accurately determining the camera’s pose relative to the environment is crucial for effective task execution. In this paper, we systematically prove that this problem corresponds to the Perspective-3-Point (P3P) formulation, where exactly three known 3D points and their [...] Read more.

In robot navigation and manipulation, accurately determining the camera’s pose relative to the environment is crucial for effective task execution. In this paper, we systematically prove that this problem corresponds to the Perspective-3-Point (P3P) formulation, where exactly three known 3D points and their corresponding 2D image projections are used to estimate the pose of a stereo camera. In image-based visual servoing (IBVS) control, the system becomes overdetermined, as the six degrees of freedom (DoF) of the stereo camera must align with nine observed 2D features in the scene. When more constraints are imposed than available DoFs, global stability cannot be guaranteed, as the camera may become trapped in a local minimum far from the desired configuration during servoing. To address this issue, we propose a novel control strategy for accurately positioning a calibrated stereo camera. Our approach integrates a feedforward controller with a Youla parameterization-based feedback controller, ensuring robust servoing performance. Through simulations, we demonstrate that our method effectively avoids local minima and enables the camera to reach the desired pose accurately and efficiently. Full article

(This article belongs to the Special Issue Applied and Innovative Computational Intelligence Systems: 3rd Edition)

► Show Figures

Figure 1

27 pages, 11200 KiB

Open AccessArticle

An Automatic Registration System Based on Augmented Reality to Enhance Civil Infrastructure Inspections

by Leonardo Binni, Massimo Vaccarini, Francesco Spegni, Leonardo Messi and Berardo Naticchia

Buildings 2025, 15(7), 1146; https://doi.org/10.3390/buildings15071146 - 31 Mar 2025

Cited by 1 | Viewed by 623

Abstract

Manual geometric and semantic alignment of inspection data with existing digital models (field-to-model data registration) and on-site access to relevant information (model-to-field data registration) represent cumbersome procedures that cause significant loss of information and fragmentation, hindering the efficiency of civil infrastructure inspections. To [...] Read more.

Manual geometric and semantic alignment of inspection data with existing digital models (field-to-model data registration) and on-site access to relevant information (model-to-field data registration) represent cumbersome procedures that cause significant loss of information and fragmentation, hindering the efficiency of civil infrastructure inspections. To address the bidirectional registration challenge, this study introduces a high-accuracy automatic registration method and system based on Augmented Reality (AR) that streamlines data exchange between the field and a knowledge graph-based Digital Twin (DT) platform for infrastructure management, and vice versa. A centimeter-level 6-DoF pose estimation of the AR device in large-scale, open unprepared environments is achieved by implementing a hybrid approach based on Real-Time Kinematic and Visual Inertial Odometry to cope with urban-canyon scenarios. For this purpose, a low-cost and non-invasive RTK receiver was prototyped and firmly attached to an AR device (i.e., Microsoft HoloLens 2). Multiple filters and latency compensation techniques were implemented to enhance registration accuracy. The system was tested in a real-world scenario involving the inspection of a highway viaduct. Throughout the use case inspection, the system seamlessly and automatically provided field operators with on-field access to existing DT information (i.e., open BIM models) such as georeferenced holograms and facilitated the enrichment of the asset’s DT through the automatic registration of inspection data (i.e., images) with the open BIM models included in the DT. This study contributes to DT-based civil infrastructure management by establishing a bidirectional and seamless integration between virtual and physical entities. Full article

(This article belongs to the Special Issue Selected Papers from the “24th International Conference on Construction Applications of Virtual Reality—CONVR2024”)

► Show Figures

Figure 1

19 pages, 4427 KiB

Open AccessArticle

Robust MPS-INS UKF Integration and SIR-Based Hyperparameter Estimation in a 3D Flight Environment

by Juyoung Seo, Dongha Kwon, Byungjin Lee and Sangkyung Sung

Aerospace 2025, 12(3), 228; https://doi.org/10.3390/aerospace12030228 - 11 Mar 2025

Viewed by 606

Abstract

This study introduces a pose estimation algorithm integrating an Inertial Navigation System (INS) with an Alternating Current (AC) magnetic field-based navigation system, referred to as the Magnetic Positioning System (MPS), evaluated using a 6 Degrees of Freedom (DoF) drone. The study addresses significant [...] Read more.

This study introduces a pose estimation algorithm integrating an Inertial Navigation System (INS) with an Alternating Current (AC) magnetic field-based navigation system, referred to as the Magnetic Positioning System (MPS), evaluated using a 6 Degrees of Freedom (DoF) drone. The study addresses significant challenges such as the magnetic vector distortions and model uncertainties caused by motor noise, which degrade attitude estimation and limit the effectiveness of traditional Extended Kalman Filter (EKF)-based fusion methods. To mitigate these issues, a Tightly Coupled Unscented Kalman Filter (TC UKF) was developed to enhance robustness and navigation accuracy in dynamic environments. The proposed Unscented Kalman Filter (UKF) demonstrated a superior attitude estimation performance within a 6 m coil spacing area, outperforming both the MPS 3D LS (Least Squares) and EKF-based approaches. Furthermore, hyperparameters such as alpha, beta, and kappa were optimized using the Sequential Importance Resampling (SIR) process of the Particle Filter. This adaptive hyperparameter adjustment achieved improved navigation results compared to the default UKF settings, particularly in environments with high model uncertainty. Full article

(This article belongs to the Special Issue Advanced GNC Solutions for VTOL Systems)

► Show Figures

Figure 1

21 pages, 35742 KiB

Open AccessArticle

LandNet: Combine CNN and Transformer to Learn Absolute Camera Pose for the Fixed-Wing Aircraft Approach and Landing

by Siyuan Shen, Guanfeng Yu, Lei Zhang, Youyu Yan and Zhengjun Zhai

Remote Sens. 2025, 17(4), 653; https://doi.org/10.3390/rs17040653 - 14 Feb 2025

Viewed by 805

Abstract

Camera localization approaches often degrade in challenging environments characterized by illumination variations and significant viewpoint changes, presenting critical limitations for fixed-wing aircraft landing applications. To address these challenges, we propose LandNet—a novel absolute camera pose estimation network specifically designed for airborne scenarios. Our [...] Read more.

Camera localization approaches often degrade in challenging environments characterized by illumination variations and significant viewpoint changes, presenting critical limitations for fixed-wing aircraft landing applications. To address these challenges, we propose LandNet—a novel absolute camera pose estimation network specifically designed for airborne scenarios. Our framework processes images from forward-looking aircraft cameras to directly predict 6-DoF camera poses, subsequently enabling aircraft pose determination through rigid transformation. As a first step, we design two encoders from Transformer and CNNs to capture complementary spatial–temporal features. Furthermore, a novel Feature Interactive Block (FIB) is employed to fully utilize spatial clues from the CNN encoder and temporal clues from the Transformer encoder. We also introduce a novel Attentional Convtrans Fusion Block (ACFB) to fuse the feature maps from encoder and transformer encoder, which can enhance the image representations to promote the accuracy of the camera pose. Finally, two Multi-Layer Perceptron (MLP) heads are applied to estimate 6-DOF of camera position and orientation, respectively. Thus the estimated position and orientation of our LandNet can be further used to acquire the pose and orientation of the aircraft through the rigid connection between the airborne camera and the aircraft. The experimental results from simulation and real flight data demonstrate the effectiveness of our proposed method. Full article

► Show Figures

Figure 1

21 pages, 5326 KiB

Open AccessArticle

6-DoF Pose Estimation from Single RGB Image and CAD Model Retrieval Using Feature Similarity Measurement

by Sieun Park, Won-Je Jeong, Mayura Manawadu and Soon-Yong Park

Appl. Sci. 2025, 15(3), 1501; https://doi.org/10.3390/app15031501 - 1 Feb 2025

Cited by 1 | Viewed by 1295

Abstract

This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object [...] Read more.

This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object is one of the important techniques in 3D computer vision. However, in addition to 6-DoF pose estimation, retrieval and alignment of the matching CAD model with the RGB object should be performed for various industrial applications such as eXtended Reality (XR), Augmented Reality (AR), robot’s pick and place, and so on. This paper addresses 6-DoF pose estimation and CAD model retrieval problems simultaneously and quantitatively analyzes how much the 6-DoF pose estimation affects the CAD model retrieval performance. This study consists of two main steps. The first step is 6-DoF pose estimation based on the PoseContrast network. We enhance the structure of PoseConstrast by adding variance uncertainty weight and feature attention modules. The second step is the retrieval of the matching CAD model by an image similarity measurement between the CAD rendering and the RGB object. In our experiments, we used 2000 RGB images collected from Google and Bing search engines and 100 CAD models from ShapeNetCore. The Pascal3D+ dataset is used to train the pose estimation network and DELF features are used for the similarity measurement. Comprehensive ablation studies about the proposed network show the quantitative performance analysis with respect to the baseline model. Experimental results show that the pose estimation performance has a positive correlation with the CAD retrieval performance. Full article

(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)

► Show Figures

Figure 1

18 pages, 4340 KiB

Open AccessArticle

GFA-Net: Geometry-Focused Attention Network for Six Degrees of Freedom Object Pose Estimation

by Shuai Lin, Junhui Yu, Peng Su, Weitao Xue, Yang Qin, Lina Fu, Jing Wen and Hong Huang

Sensors 2025, 25(1), 168; https://doi.org/10.3390/s25010168 - 31 Dec 2024

Viewed by 891

Abstract

Six degrees of freedom (6-DoF) object pose estimation is essential for robotic grasping and autonomous driving. While estimating pose from a single RGB image is highly desirable for real-world applications, it presents significant challenges. Many approaches incorporate supplementary information, such as depth data, [...] Read more.

Six degrees of freedom (6-DoF) object pose estimation is essential for robotic grasping and autonomous driving. While estimating pose from a single RGB image is highly desirable for real-world applications, it presents significant challenges. Many approaches incorporate supplementary information, such as depth data, to derive valuable geometric characteristics. However, the challenge of deep neural networks inadequately extracting features from object regions in RGB images remains. To overcome these limitations, we introduce the Geometry-Focused Attention Network (GFA-Net), a novel framework designed for more comprehensive feature extraction by analyzing critical geometric and textural object characteristics. GFA-Net leverages Point-wise Feature Attention (PFA) to capture subtle pose differences, guiding the network to localize object regions and identify point-wise discrepancies as pose shifts. In addition, a Geometry Feature Aggregation Module (GFAM) integrates multi-scale geometric feature maps to distill crucial geometric features. Then, the resulting dense 2D–3D correspondences are passed to a Perspective-n-Point (PnP) module for 6-DoF pose computation. Experimental results on the LINEMOD and Occlusion LINEMOD datasets indicate that our proposed method is highly competitive with state-of-the-art approaches, achieving 96.54% and 49.35% accuracy, respectively, utilizing the ADD-S metric with a 0.10d threshold. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

31 pages, 8127 KiB

Open AccessArticle

Data-Driven Kinematic Model for the End-Effector Pose Control of a Manipulator Robot

by Josué Goméz-Casas, Carlos A. Toro-Arcila, Nelly Abigaíl Rodríguez-Rosales, Jonathan Obregón-Flores, Daniela E. Ortíz-Ramos, Jesús Fernando Martínez-Villafañe and Oziel Gómez-Casas

Processes 2024, 12(12), 2831; https://doi.org/10.3390/pr12122831 - 10 Dec 2024

Cited by 1 | Viewed by 1452

Abstract

This paper presents a data-driven kinematic model for the end-effector pose control applied to a variety of manipulator robots, focusing on the entire end-effector’s pose (position and orientation). The measured signals of the full pose and their computed derivatives, along with a linear [...] Read more.

This paper presents a data-driven kinematic model for the end-effector pose control applied to a variety of manipulator robots, focusing on the entire end-effector’s pose (position and orientation). The measured signals of the full pose and their computed derivatives, along with a linear combination of an estimated Jacobian matrix and a vector of joint velocities, generate a model estimation error. The Jacobian matrix is estimated using the Pseudo Jacobian Matrix (PJM) algorithm, which requires tuning only the step and weight parameters that scale the convergence of the model estimation error. The proposed control law is derived in two stages: the first one is part of an objective function minimization, and the second one is a constraint in a quasi-Lagrangian function. The control design parameters guarantee the control error convergence in a closed-loop configuration with adaptive behavior in terms of the dynamics of the estimated Jacobian matrix. The novelty of the approach lies in its ability to achieve superior tracking performance across different manipulator robots, validated through simulations. Quantitative results show that, compared to a classical inverse-kinematics approach, the proposed method achieves rapid convergence of performance indices (e.g., Root Mean Square Error (RMSE) reduced to near-zero in two cycles vs. a steady-state RMSE of 20 in the classical approach). Additionally, the proposed method minimizes joint drift, maintaining an RMSE of approximately 0.3 compared to 1.5 under the classical scheme. The control was validated by means of simulations featuring an UR5e manipulator with six Degrees of Freedom (DOF), a KUKA Youbot with eight DOF, and a KUKA Youbot Dual with thirteen DOF. The stability analysis of the closed-loop controller is demonstrated by means of the Lyapunov stability conditions. Full article

(This article belongs to the Special Issue Model Based, Data Driven Identification and Control for Developing Intelligent and Smart Processes and Systems)

► Show Figures

Figure 1

20 pages, 3018 KiB

Open AccessArticle

Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology

by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei

Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024

Viewed by 1170

Abstract

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve

A T E

(Absolute Translation Error) and

A R E

(Absolute Rotation Error) of 0.077 m and

2 . 70^{\circ}

in the

f r 2_d e s k

sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article

► Show Figures

Figure 1

23 pages, 8425 KiB

Open AccessArticle

Enhancing Inter-AUV Perception: Adaptive 6-DOF Pose Estimation with Synthetic Images for AUV Swarm Sensing

by Qingbo Wei, Yi Yang, Xingqun Zhou, Zhiqiang Hu, Yan Li, Chuanzhi Fan, Quan Zheng and Zhichao Wang

Drones 2024, 8(9), 486; https://doi.org/10.3390/drones8090486 - 14 Sep 2024

Cited by 3 | Viewed by 1310

Abstract

The capabilities of AUV mutual perception and localization are crucial for the development of AUV swarm systems. We propose the AUV6D model, a synthetic image-based approach to enhance inter-AUV perception through 6D pose estimation. Due to the challenge of acquiring accurate 6D pose [...] Read more.

The capabilities of AUV mutual perception and localization are crucial for the development of AUV swarm systems. We propose the AUV6D model, a synthetic image-based approach to enhance inter-AUV perception through 6D pose estimation. Due to the challenge of acquiring accurate 6D pose data, a dataset of simulated underwater images with precise pose labels was generated using Unity3D. Mask-CycleGAN technology was introduced to transform these simulated images into realistic synthetic images, addressing the scarcity of available underwater data. Furthermore, the Color Intermediate Domain Mapping strategy is proposed to ensure alignment across different image styles at pixel and feature levels, enhancing the adaptability of the pose estimation model. Additionally, the Salient Keypoint Vector Voting Mechanism was developed to improve the accuracy and robustness of underwater pose estimation, enabling precise localization even in the presence of occlusions. The experimental results demonstrated that our AUV6D model achieved millimeter-level localization precision and pose estimation errors within five degrees, showing exceptional performance in complex underwater environments. Navigation experiments with two AUVs further verified the model’s reliability for mutual 6D pose estimation. This research provides substantial technical support for more complex and precise collaborative operations for AUV swarms in the future. Full article

► Show Figures

Figure 1

24 pages, 5021 KiB

Open AccessArticle

A Robust Tri-Electromagnet-Based 6-DoF Pose Tracking System Using an Error-State Kalman Filter

by Shuda Dong and Heng Wang

Sensors 2024, 24(18), 5956; https://doi.org/10.3390/s24185956 - 13 Sep 2024

Cited by 1 | Viewed by 1441

Abstract

Magnetic pose tracking is a non-contact, accurate, and occlusion-free method that has been increasingly employed to track intra-corporeal medical devices such as endoscopes in computer-assisted medical interventions. In magnetic pose-tracking systems, a nonlinear estimation algorithm is needed to recover the pose information from [...] Read more.

Magnetic pose tracking is a non-contact, accurate, and occlusion-free method that has been increasingly employed to track intra-corporeal medical devices such as endoscopes in computer-assisted medical interventions. In magnetic pose-tracking systems, a nonlinear estimation algorithm is needed to recover the pose information from magnetic measurements. In existing pose estimation algorithms such as the extended Kalman filter (EKF), the 3-DoF orientation in the

S^{3}

manifold is normally parametrized as unit quaternions and simply treated as a vector in the Euclidean space, which causes a violation of the unity constraint of quaternions and reduces pose tracking accuracy. In this paper, a pose estimation algorithm based on the error-state Kalman filter (ESKF) is proposed to improve the accuracy and robustness of electromagnetic tracking systems. The proposed system consists of three electromagnetic coils for magnetic field generation and a tri-axial magnetic sensor attached to the target object for field measurement. A strategy of sequential coil excitation is developed to separate the magnetic fields from different coils and reject magnetic disturbances. Simulation and experiments are conducted to evaluate the pose tracking performance of the proposed ESKF algorithm, which is also compared with standard EKF and constrained EKF. It is shown that the ESKF can effectively maintain the quaternion unity and thus achieve a better tracking accuracy, i.e., a Euclidean position error of 2.23 mm and an average orientation angle error of 0.45°. The disturbance rejection performance of the electromagnetic tracking system is also experimentally validated. Full article

(This article belongs to the Topic Target Tracking, Guidance, and Navigation for Autonomous Systems, 2nd Edition)

► Show Figures

Graphical abstract

25 pages, 19272 KiB

Open AccessArticle

6DoF Object Pose and Focal Length Estimation from Single RGB Images in Uncontrolled Environments

by Mayura Manawadu and Soon-Yong Park

Sensors 2024, 24(17), 5474; https://doi.org/10.3390/s24175474 - 23 Aug 2024

Cited by 1 | Viewed by 2349

Abstract

Accurate 6DoF (degrees of freedom) pose and focal length estimation are important in extended reality (XR) applications, enabling precise object alignment and projection scaling, thereby enhancing user experiences. This study focuses on improving 6DoF pose estimation using single RGB images of unknown camera [...] Read more.

Accurate 6DoF (degrees of freedom) pose and focal length estimation are important in extended reality (XR) applications, enabling precise object alignment and projection scaling, thereby enhancing user experiences. This study focuses on improving 6DoF pose estimation using single RGB images of unknown camera metadata. Estimating the 6DoF pose and focal length from an uncontrolled RGB image, obtained from the internet, is challenging because it often lacks crucial metadata. Existing methods such as FocalPose and Focalpose++ have made progress in this domain but still face challenges due to the projection scale ambiguity between the translation of an object along the z-axis (

t_{z}

) and the camera’s focal length. To overcome this, we propose a two-stage strategy that decouples the projection scaling ambiguity in the estimation of z-axis translation and focal length. In the first stage,

t_{z}

is set arbitrarily, and we predict all the other pose parameters and focal length relative to the fixed

t_{z}

. In the second stage, we predict the true value of

t_{z}

while scaling the focal length based on the

t_{z}

update. The proposed two-stage method reduces projection scale ambiguity in RGB images and improves pose estimation accuracy. The iterative update rules constrained to the first stage and tailored loss functions including Huber loss in the second stage enhance the accuracy in both 6DoF pose and focal length estimation. Experimental results using benchmark datasets show significant improvements in terms of median rotation and translation errors, as well as better projection accuracy compared to the existing state-of-the-art methods. In an evaluation across the Pix3D datasets (chair, sofa, table, and bed), the proposed two-stage method improves projection accuracy by approximately 7.19%. Additionally, the incorporation of Huber loss resulted in a significant reduction in translation and focal length errors by 20.27% and 6.65%, respectively, in comparison to the Focalpose++ method. Full article

(This article belongs to the Special Issue Computer Vision and Virtual Reality: Technologies and Applications)

► Show Figures

Figure 1

18 pages, 4498 KiB

Open AccessArticle

Selective Grasping for Complex-Shaped Parts Using Topological Skeleton Extraction

by Andrea Pennisi, Monica Sileo, Domenico Daniele Bloisi and Francesco Pierri

Electronics 2024, 13(15), 3021; https://doi.org/10.3390/electronics13153021 - 31 Jul 2024

Cited by 1 | Viewed by 961

Abstract

To enhance the autonomy and flexibility of robotic systems, a crucial role is played by the capacity to perceive and grasp objects. More in detail, robot manipulators must detect the presence of the objects within their workspace, identify the grasping point, and compute [...] Read more.

To enhance the autonomy and flexibility of robotic systems, a crucial role is played by the capacity to perceive and grasp objects. More in detail, robot manipulators must detect the presence of the objects within their workspace, identify the grasping point, and compute a trajectory for approaching the objects with a pose of the end-effector suitable for performing the task. These can be challenging tasks in the presence of complex geometries, where multiple grasping-point candidates can be detected. In this paper, we present a novel approach for dealing with complex-shaped automotive parts consisting of a deep-learning-based method for topological skeleton extraction and an active grasping pose selection mechanism. In particular, we use a modified version of the well-known Lightweight OpenPose algorithm to estimate the topological skeleton of real-world automotive parts. The estimated skeleton is used to select the best grasping pose for the object at hand. Our approach is designed to be more computationally efficient with respect to other existing grasping pose detection methods. Quantitative experiments conducted with a 7 DoF manipulator on different real-world automotive components demonstrate the effectiveness of the proposed approach with a success rate of

87.04 %

. Full article

(This article belongs to the Special Issue Applications of Machine Vision in Robotics)

► Show Figures

Figure 1

27 pages, 3382 KiB

Open AccessArticle

DOT-SLAM: A Stereo Visual Simultaneous Localization and Mapping (SLAM) System with Dynamic Object Tracking Based on Graph Optimization

by Yuan Zhu, Hao An, Huaide Wang, Ruidong Xu, Zhipeng Sun and Ke Lu

Sensors 2024, 24(14), 4676; https://doi.org/10.3390/s24144676 - 18 Jul 2024

Cited by 5 | Viewed by 2451

Abstract

Most visual simultaneous localization and mapping (SLAM) systems are based on the assumption of a static environment in autonomous vehicles. However, when dynamic objects, particularly vehicles, occupy a large portion of the image, the localization accuracy of the system decreases significantly. To mitigate [...] Read more.

Most visual simultaneous localization and mapping (SLAM) systems are based on the assumption of a static environment in autonomous vehicles. However, when dynamic objects, particularly vehicles, occupy a large portion of the image, the localization accuracy of the system decreases significantly. To mitigate this challenge, this paper unveils DOT-SLAM, a novel stereo visual SLAM system that integrates dynamic object tracking through graph optimization. By integrating dynamic object pose estimation into the SLAM system, the system can effectively utilize both foreground and background points for ego vehicle localization and obtain a static feature points map. To rectify the inaccuracies in depth estimation from stereo disparity directly on the foreground points of dynamic objects due to their self-similarity characteristics, a coarse-to-fine depth estimation method based on camera–road plane geometry is presented. This method uses rough depth to guide fine stereo matching, thereby obtaining the 3 dimensions (3D)spatial positions of feature points on dynamic objects. Subsequently, by establishing constraints on the dynamic object’s pose using the road plane and non-holonomic constraints (NHCs) of the vehicle, reducing the initial pose uncertainty of dynamic objects leads to more accurate dynamic object initialization. Finally, by considering foreground points, background points, the local road plane, the ego vehicle pose, and dynamic object poses as optimization nodes, through the establishment and joint optimization of a nonlinear model based on graph optimization, accurate six degrees of freedom (DoFs) pose estimations are obtained for both the ego vehicle and dynamic objects. Experimental validation on the KITTI-360 dataset demonstrates that DOT-SLAM effectively utilizes features from the background and dynamic objects in the environment, resulting in more accurate vehicle trajectory estimation and a static environment map. Results obtained from a real-world dataset test reinforce the effectiveness. Full article

(This article belongs to the Topic Information Sensing Technology for Intelligent/Driverless Vehicle, 2nd Edition)

► Show Figures

Figure 1

Search Results (96)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (96)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI