sensors-logo

Journal Browser

Journal Browser

Sensors for Object Detection, Pose Estimation, and 3D Reconstruction

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 30 June 2026 | Viewed by 17022

Editors


E-Mail Website
Guest Editor
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Interests: machine vision; 3D optical inspection; industrial augmented reality

E-Mail Website
Guest Editor
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Interests: computer vision; 3D reconstruction

E-Mail Website
Guest Editor
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Interests: machine vision; photogrammetry; non-contact strain measurement

Special Issue Information

Dear Colleagues,

This topic encompasses a range of technologies dedicated to gathering data from the environment to identify objects, estimate their spatial orientations, and construct detailed 3D representations. These sensors are pivotal across numerous domains, including robotics, augmented reality, autonomous vehicles, and computer vision. 

Object detection sensors utilize various technologies such as cameras, LiDAR, RADAR, or depth sensors to perceive objects within their surroundings. These technologies are crucial for applications like autonomous vehicles and surveillance systems. 

In the realm of 3D reconstruction, sensors collect data points from multiple vantage points and use algorithms to generate intricate 3D models of either the entire scene or specific objects within it. This process often involves methodologies like point cloud registration, surface reconstruction, and texture mapping. 

Overall, the integration of various sensors for object detection, pose estimation, and 3D reconstruction enables advanced perception capabilities in robotics, computer vision applications, aviation, aerospace, automotive, and beyond.

Prof. Dr. Liyan Zhang
Dr. Shenglan Liu
Dr. Nan Ye
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine vision
  • 3D optical inspection
  • industrial augmented reality
  • computer vision
  • 3D reconstruction
  • photogrammetry
  • non-contact strain measurement
  • image processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 2732 KB  
Article
Automated Single-Sensor 3D Scanning and Modular Benchmark Objects for Human-Scale 3D Reconstruction
by Kartik Choudhary, Mats Isaksson, Gavin W. Lambert and Tony Dicker
Sensors 2026, 26(4), 1331; https://doi.org/10.3390/s26041331 - 19 Feb 2026
Viewed by 755
Abstract
High-fidelity 3D reconstruction of human-sized objects typically requires multi-sensor scanning systems that are expensive, complex, and rely on proprietary hardware configurations. Existing low-cost approaches often rely on handheld scanning, which is inherently unstructured and operator-dependent, leading to inconsistent coverage and variable reconstruction quality. [...] Read more.
High-fidelity 3D reconstruction of human-sized objects typically requires multi-sensor scanning systems that are expensive, complex, and rely on proprietary hardware configurations. Existing low-cost approaches often rely on handheld scanning, which is inherently unstructured and operator-dependent, leading to inconsistent coverage and variable reconstruction quality. This limitation necessitates the need for a controlled, repeatable, and affordable scanning method that can generate high-quality data without requiring multi-sensor hardware or external tracking markers. This study presents a marker-less scanning platform designed for human-scale reconstruction. The system consists of a single structured-light sensor mounted on a vertical linear actuator, synchronised with a motorised turntable that rotates the subject. This constrained kinematic setup ensures a repeatable cylindrical acquisition trajectory. To address the geometric ambiguity often found in vertical translational symmetry (i.e., where distinct elevation steps appear identical), the system employs a sensor-assisted initialisation strategy, where feedback from the rotary encoder and linear drive serves as constraints for the registration pipeline. The captured frames are reconstructed into a complete model through a two-step Iterative Closest Point (ICP) procedure that eliminates the vertical drift and model collapse (often referred to as “telescoping”) common in unconstrained scanning. To evaluate system performance, a modular anthropometric benchmark object representing a human-sized target (1.6 m) was scanned. The reconstructed model was assessed in terms of surface coverage and volumetric fidelity relative to a CAD reference. The results demonstrate high sampling stability, achieving a mean surface density of 0.760points/mm2 on front-facing surfaces. Geometric deviation analysis revealed a mean signed error of −1.54 mm (σ= 2.27 mm), corresponding to a relative volumetric error of approximately 0.096% over the full vertical span. These findings confirm that a single-sensor system, when guided by precise kinematics, can mitigate the non-linear bending and drift artefacts of handheld acquisition, providing an accessible yet rigorously accurate alternative to industrial multi-sensor systems. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

22 pages, 3790 KB  
Article
Smartphone-Based Automated Photogrammetry for Reconstruction of Residual Limb Models in Prosthetic Design
by Lander De Waele, Jolien Gooijers and Dante Mantini
Sensors 2026, 26(4), 1251; https://doi.org/10.3390/s26041251 - 14 Feb 2026
Viewed by 837
Abstract
Accurate modeling of residual limb geometry is essential for prosthetic socket design, yet current scanning techniques can be costly, operator-dependent, or impractical for repeated clinical use. This study presents a fully automated, low-cost photogrammetry workflow capable of generating metrically accurate 3D models of [...] Read more.
Accurate modeling of residual limb geometry is essential for prosthetic socket design, yet current scanning techniques can be costly, operator-dependent, or impractical for repeated clinical use. This study presents a fully automated, low-cost photogrammetry workflow capable of generating metrically accurate 3D models of lower-limb residual limbs using video and still images acquired with a standard smartphone or a full-frame digital camera. The pipeline integrates adaptive frame selection, deep learning-based background removal, robust metric scaling via ArUco markers, and open-source Structure-from-Motion and Multi-View Stereo reconstruction, requiring no manual post-processing or proprietary software. Accuracy and repeatability were evaluated using four 3D-printed limb phantoms and high-resolution CT-derived meshes as ground truth. Smartphone video and full-frame camera acquisitions achieved sub-millimeter surface accuracy, volume and perimeter errors within ±1%, and high inter-session repeatability, all within clinically accepted thresholds for prosthetic socket fabrication. In contrast, smartphone still-photo reconstructions showed larger deviations and reduced stability. Acquisition time was under five minutes, and complete reconstruction required approximately 1 h and 30 min. These results demonstrate that smartphone video-based photogrammetry provides a practical, scalable, and clinically viable alternative for residual limb modeling, particularly in resource-constrained or remote care settings. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

30 pages, 15680 KB  
Article
Quantifying the Measurement Precision of a Commercial Ultrasonic Real-Time Location System for Camera Pose Estimation in Indoor Photogrammetry
by Faith Nayko and Derek D. Lichti
Sensors 2026, 26(1), 319; https://doi.org/10.3390/s26010319 - 3 Jan 2026
Cited by 2 | Viewed by 834
Abstract
Photogrammetric reconstruction from indoor imagery requires either labor-intensive ground control points (GCPs) or positioning sensor integration. While global navigation satellite system technology revolutionized aerial photogrammetry by enabling direct georeferencing through integrated sensor orientation (ISO), indoor environments lack an equivalent positioning solution. Before indoor [...] Read more.
Photogrammetric reconstruction from indoor imagery requires either labor-intensive ground control points (GCPs) or positioning sensor integration. While global navigation satellite system technology revolutionized aerial photogrammetry by enabling direct georeferencing through integrated sensor orientation (ISO), indoor environments lack an equivalent positioning solution. Before indoor positioning systems can be adopted for photogrammetric applications, their fundamental measurement precision must be established. This study characterizes the repeatability and temporal stability of the ZeroKey Quantum real-time location system (RTLS) as a prerequisite to testing reconstruction accuracy when RTLS measurements provide camera pose constraints in photogrammetric bundle adjustment. Through systematic tripod-mounted observations across 30 test locations in a controlled laboratory environment, optimal data collection protocols were determined, temporal stability was investigated, and measurement precision was quantified. An automated position-based stationary detection algorithm using a 20 mm threshold successfully identified all 30 stationary periods for durations of 30 s or less. Optimal duration analysis revealed that 1 s observation windows achieve 3 mm position precision and 1° orientation precision after brief settling, enabling practical workflows with worst-case total collection time of 2.5 s per station. Per-axis uncertainties were quantified as 1.6 mm, 1.7 mm, and 1.1 mm root mean square (RMS) for position and 0.08°, 0.09°, and 0.07° RMS for orientation. These findings demonstrate that ultrasonic RTLS achieves millimeter-level position repeatability and sub-degree orientation repeatability, establishing the measurement precision necessary to justify subsequent accuracy testing through photogrammetric bundle adjustment. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

21 pages, 3681 KB  
Article
E-Sem3DGS: Monocular Human and Scene Reconstruction via Event-Aided Semantic 3DGS
by Xiaoting Yin, Hao Shi, Kailun Yang, Jiajun Zhai, Shangwei Guo and Kaiwei Wang
Sensors 2026, 26(1), 188; https://doi.org/10.3390/s26010188 - 27 Dec 2025
Viewed by 1397
Abstract
Reconstructing animatable humans, together with their surrounding static environments, from monocular, motion-blurred videos is still challenging for current neural rendering methods. Existing monocular human reconstruction approaches achieve impressive quality and efficiency, but they are designed for clean intensity inputs and mainly focus on [...] Read more.
Reconstructing animatable humans, together with their surrounding static environments, from monocular, motion-blurred videos is still challenging for current neural rendering methods. Existing monocular human reconstruction approaches achieve impressive quality and efficiency, but they are designed for clean intensity inputs and mainly focus on the foreground human, leading to degraded performance under motion blur and incomplete scene modeling. Event cameras provide high temporal resolution and robustness to motion blur, making them a natural complement to standard video sensors. We present E-Sem3DGS, a semantically augmented 3D Gaussian Splatting framework that leverages hybrid event-intensity streams to jointly reconstruct explicit 3D volumetric representations of human avatars and static scenes. E-Sem3DGS maintains a single set of 3D Gaussians in Euclidean space, each endowed with a learnable semantic attribute that softly separates dynamic human and static scene content. We initialize human Gaussians from Skinned Multi-Person Linear (SMPL) model priors with semantic values set to 1 and scene Gaussians by sampling a surrounding cube with semantic values set to 0, then jointly optimize geometry, appearance, and semantics. To mitigate motion blur, we derive optical flow from events and use it to supervise image-based optical flow between rendered frames, enforcing temporal coherence in high-motion regions and sharpening both humans and backgrounds. On the motion-blurred ZJU-MoCap-Blur dataset, E-Sem3DGS improves the average full-frame PSNR from 21.75 to 32.56 (+49.7%) over previous methods. On MMHPSD-Blur, our method improves PSNR from 25.23 to 28.63 (+13.48%). Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

24 pages, 7286 KB  
Article
Efficient Synthetic Defect on 3D Object Reconstruction and Generation Pipeline for Digital Twins Smart Factory
by Viet-Hoan Nguyen, Thi-Ngot Pham, Jun-Ho Huh, Pil-Joo Choi, Young-Bong Kim, Oh-Heum Kwon and Ki-Ryong Kwon
Sensors 2025, 25(22), 6908; https://doi.org/10.3390/s25226908 - 12 Nov 2025
Cited by 1 | Viewed by 1935
Abstract
High-quality 3D objects play a crucial role in digital twins, while synthetic data generated from these objects have become essential in deep learning-based computer vision applications. The task of collecting and labeling real defects on industrial object surfaces has many challenges and efforts, [...] Read more.
High-quality 3D objects play a crucial role in digital twins, while synthetic data generated from these objects have become essential in deep learning-based computer vision applications. The task of collecting and labeling real defects on industrial object surfaces has many challenges and efforts, while synthetic data generation feasibly replicates huge amounts of labeled data. However, synthetic datasets lack realism in their rendered images. To overcome this issue, this paper introduces a single framework for 3D industrial object reconstruction and synthetic defect generation for digital twin smart factory applications. In detail, NeRF is applied to reconstruct our custom industrial 3D objects through videos collected by a smartphone camera. Several NeRF-based models (i.e., Instant-NGP, Nerfacto, Volinga, and Tensorf) are compared to choose the best outcome for the next step of defect generation on the 3D object surface. To be fairly evaluated, we train four models using the Nerfstudio framework with our three custom datasets of two objects. From the experiment’s results, Instant-NGP and Nerfacto achieve the best outcomes, outperforming all other methods significantly. The exported meshes of 3D objects are refined using Blender before loading into NVIDIA Omniverse Code to generate defects on the surface with the Replicator. To evaluate the object detection performance and to verify the benefits of synthetic defect data, we conducted experiments with YOLO-based models on our synthetic and real-plus-synthetic defects. From the experiment’s results, the synthetic defect data contribute to improving YOLO models’ generalization capability with the highest and lowest accuracy mAP@0.5 enhancement of 18.8 and 1.5 percent on YOLOv6n and YOLOv8s, respectively. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

23 pages, 8966 KB  
Article
Object-Specific Multiview Classification Through View-Compatible Feature Fusion
by Javier Perez Soler, Jose-Luis Guardiola, Nicolás García Sastre, Pau Garrigues Carbó, Miguel Sanchis Hernández and Juan-Carlos Perez-Cortes
Sensors 2025, 25(13), 4127; https://doi.org/10.3390/s25134127 - 2 Jul 2025
Cited by 1 | Viewed by 1299
Abstract
Multi-view classification (MVC) typically focuses on categorizing objects into distinct classes by employing multiple perspectives of the same objects. However, in numerous real-world applications, such as industrial inspection and quality control, there is an increasing need to distinguish particular objects from a pool [...] Read more.
Multi-view classification (MVC) typically focuses on categorizing objects into distinct classes by employing multiple perspectives of the same objects. However, in numerous real-world applications, such as industrial inspection and quality control, there is an increasing need to distinguish particular objects from a pool of similar ones while simultaneously disregarding unknown objects. In these scenarios, relying on a single image may not provide sufficient information to effectively identify the scrutinized object, as different perspectives may reveal distinct characteristics that are essential for accurate classification. Most existing approaches operate within closed-set environments and are focused on generalization, which makes them less effective in distinguishing individual objects from others. This limitations are particularly problematic in industrial quality assessment, where distinguishing between specific objects and discarding unknowns is crucial. To address this challenge, we introduce a View-Compatible Feature Fusion (VCFF) method that utilizes images from predetermined positions as an accurate solution for multi-view classification of specific objects. Unlike other approaches, VCFF explicitly integrates pose information during the fusion process. It does not merely use pose as auxiliary data but employs it to align and selectively fuse features from different views. This mathematically explicit fusion of rotations, based on relative poses, allows VCFF to effectively combine multi-view information, enhancing classification accuracy. Through experimental evaluations, we demonstrate that the proposed VCFF method outperforms state-of-the-art MVC algorithms, especially in open-set scenarios, where the set of possible objects is not fully known in advance. Remarkably, VCFF achieves an average precision of 1.0 using only 8 cameras, whereas existing methods require 20 cameras to reach a maximum of 0.95. In terms of AUC-ROC under the constraint of fewer than 3σ false positives—a critical metric in industrial inspection—current state-of-the-art methods achieve up to 0.72, while VCFF attains a perfect score of 1.0 with just eight cameras. Furthermore, our approach delivers highly accurate rotation estimation, maintaining an error margin slightly above 2° when sampling at 4° intervals. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

16 pages, 3014 KB  
Article
Cross-Modal Interaction Between Perception and Vision of Grasping a Slanted Handrail to Reproduce the Sensation of Walking on a Slope in Virtual Reality
by Yuto Ohashi, Monica Perusquía-Hernández, Kiyoshi Kiyokawa and Nobuchika Sakata
Sensors 2025, 25(3), 938; https://doi.org/10.3390/s25030938 - 4 Feb 2025
Cited by 3 | Viewed by 2073
Abstract
Numerous studies have previously explored the perception of horizontal movements. This includes research on Redirected Walking (RDW). However, the challenge of replicating the sensation of vertical movement has remained a recurring theme. Many conventional methods rely on physically mimicking steps or slopes, which [...] Read more.
Numerous studies have previously explored the perception of horizontal movements. This includes research on Redirected Walking (RDW). However, the challenge of replicating the sensation of vertical movement has remained a recurring theme. Many conventional methods rely on physically mimicking steps or slopes, which can be hazardous and induce fear. This is especially true when head-mounted displays (HMDs) obstruct the user’s field of vision. Our primary objective was to reproduce the sensation of ascending a slope while traversing a flat surface. This effect is achieved by giving the users the haptic sensation of gripping a tilted handrail similar to those commonly found on ramps or escalators. To achieve this, we developed a walker-type handrail device capable of tilting across a wide range of angles. We induced a cross-modal effect to enhance the perception of walking up a slope. This was achieved by combining haptic feedback from the hardware with an HMD-driven visual simulation of an upward-sloping scene. The results indicated that the condition with tactile presentation significantly alleviated fear and enhanced the sensation of walking uphill compared to the condition without tactile presentation. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

26 pages, 6416 KB  
Article
Advanced Monocular Outdoor Pose Estimation in Autonomous Systems: Leveraging Optical Flow, Depth Estimation, and Semantic Segmentation with Dynamic Object Removal
by Alireza Ghasemieh and Rasha Kashef
Sensors 2024, 24(24), 8040; https://doi.org/10.3390/s24248040 - 17 Dec 2024
Cited by 4 | Viewed by 3394
Abstract
Autonomous technologies have revolutionized transportation, military operations, and space exploration, necessitating precise localization in environments where traditional GPS-based systems are unreliable or unavailable. While widespread for outdoor localization, GPS systems face limitations in obstructed environments such as dense urban areas, forests, and indoor [...] Read more.
Autonomous technologies have revolutionized transportation, military operations, and space exploration, necessitating precise localization in environments where traditional GPS-based systems are unreliable or unavailable. While widespread for outdoor localization, GPS systems face limitations in obstructed environments such as dense urban areas, forests, and indoor spaces. Moreover, GPS reliance introduces vulnerabilities to signal disruptions, which can lead to significant operational failures. Hence, developing alternative localization techniques that do not depend on external signals is essential, showing a critical need for robust, GPS-independent localization solutions adaptable to different applications, ranging from Earth-based autonomous vehicles to robotic missions on Mars. This paper addresses these challenges using Visual odometry (VO) to estimate a camera’s pose by analyzing captured image sequences in GPS-denied areas tailored for autonomous vehicles (AVs), where safety and real-time decision-making are paramount. Extensive research has been dedicated to pose estimation using LiDAR or stereo cameras, which, despite their accuracy, are constrained by weight, cost, and complexity. In contrast, monocular vision is practical and cost-effective, making it a popular choice for drones, cars, and autonomous vehicles. However, robust and reliable monocular pose estimation models remain underexplored. This research aims to fill this gap by developing a novel adaptive framework for outdoor pose estimation and safe navigation using enhanced visual odometry systems with monocular cameras, especially for applications where deploying additional sensors is not feasible due to cost or physical constraints. This framework is designed to be adaptable across different vehicles and platforms, ensuring accurate and reliable pose estimation. We integrate advanced control theory to provide safety guarantees for motion control, ensuring that the AV can react safely to the imminent hazards and unknown trajectories of nearby traffic agents. The focus is on creating an AI-driven model(s) that meets the performance standards of multi-sensor systems while leveraging the inherent advantages of monocular vision. This research uses state-of-the-art machine learning techniques to advance visual odometry’s technical capabilities and ensure its adaptability across different platforms, cameras, and environments. By merging cutting-edge visual odometry techniques with robust control theory, our approach enhances both the safety and performance of AVs in complex traffic situations, directly addressing the challenge of safe and adaptive navigation. Experimental results on the KITTI odometry dataset demonstrate a significant improvement in pose estimation accuracy, offering a cost-effective and robust solution for real-world applications. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

16 pages, 8801 KB  
Article
Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views
by Taemin Hwang and Minjoon Kim
Sensors 2024, 24(17), 5645; https://doi.org/10.3390/s24175645 - 30 Aug 2024
Cited by 3 | Viewed by 3036
Abstract
In this paper, we present a noise-robust approach for the 3D pose estimation of multiple people using appearance similarity. The common methods identify the cross-view correspondences between the detected keypoints and determine their association with a specific person by measuring the distances between [...] Read more.
In this paper, we present a noise-robust approach for the 3D pose estimation of multiple people using appearance similarity. The common methods identify the cross-view correspondences between the detected keypoints and determine their association with a specific person by measuring the distances between the epipolar lines and the joint locations of the 2D keypoints across all the views. Although existing methods achieve remarkable accuracy, they are still sensitive to camera calibration, making them unsuitable for noisy environments where any of the cameras slightly change angle or position. To address these limitations and fix camera calibration error in real-time, we propose a framework for 3D pose estimation which uses appearance similarity. In the proposed framework, we detect the 2D keypoints and extract the appearance feature and transfer it to the central server. The central server uses geometrical affinity and appearance similarity to match the detected 2D human poses to each person. Then, it compares these two groups to identify calibration errors. If a camera with the wrong calibration is identified, the central server fixes the calibration error, ensuring accuracy in the 3D reconstruction of skeletons. In the experimental environment, we verified that the proposed algorithm is robust against false geometrical errors. It achieves around 11.5% and 8% improvement in the accuracy of 3D pose estimation on the Campus and Shelf datasets, respectively. Full article
(This article belongs to the Special Issue Sensors for Object Detection, Pose Estimation, and 3D Reconstruction)
Show Figures

Figure 1

Back to TopTop