sensors-logo

Journal Browser

Journal Browser

Computer Vision for 3D Perception and Applications

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensor Networks".

Deadline for manuscript submissions: closed (1 February 2021) | Viewed by 43088

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Engineering (DISI), University of Bologna, 40136 Bologna, Italy
Interests: computer vision; deep learning; depth perception; embedded systems

E-Mail Website
Guest Editor
Visual Analysis of People (VAP) Lab, Aalborg University, Rendsburggade 14, 9000 Aalborg, Denmark
Interests: computer vision; image processing; machine vision; pattern recognition; visual analysis of peoples' whereabouts; surveillance; traffic monitoring
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Effective 3D perception of the environment greatly enriches the knowledge of the surrounding environment and is crucial to effectively develop high-level applications for various purposes. Pivotal to 3D perception is the acquisition/estimation of reliable depth information—a task for which several technologies exist, ranging from active sensors (e.g., Time-of-Flight devices) to passive cameras, coupled with a variety of different techniques allowing for depth estimation from images (stereo matching, structure from motion, and more). From an accurate reconstruction of the surrounding 3D scene, several complex problems can be addressed, such as autonomous navigation and localization, tracking, surveillance, robotics, interaction with other agents, and manipulation of the sensed environment. In this field as well, the recent advances in deep learning have rapidly found place and spread.

The aim of this Special Issue is to present both techniques to reliably acquire 3D data and to tackle computer vision tasks by exploiting this information, exploring novel solutions for perception, as well as for applications.

This Special Issue invites contributions in the following topics (but is not limited to these):

  • Depth from images;
  • Binocular and multi-view stereo;
  • Active depth sensing;
  • Single image depth estimation;
  • 3D reconstruction scene understanding;
  • RGB-D computer vision;
  • 3D pose estimation, tracking and recognition;
  • 3D motion estimation;
  • 3D semantic segmentation;
  • Applications of 3D vision (e.g., robotics, augmented reality).

Dr. Matteo Poggi
Prof. Dr. Thomas Moeslund
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • 3D perception
  • binocular-multiview stereo
  • active 3D sensors
  • RGB-D computer vision
  • 3D detection
  • 3D tracking
  • 3D vision applications

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

3 pages, 164 KiB  
Editorial
Computer Vision for 3D Perception and Applications
by Matteo Poggi and Thomas B. Moeslund
Sensors 2021, 21(12), 3944; https://doi.org/10.3390/s21123944 - 08 Jun 2021
Cited by 1 | Viewed by 2167
Abstract
Effective 3D perception of an observed scene greatly enriches the knowledge about the surrounding environment and is crucial to effectively develop high-level applications for various purposes [...] Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

Research

Jump to: Editorial, Review

21 pages, 60445 KiB  
Article
PPTFH: Robust Local Descriptor Based on Point-Pair Transformation Features for 3D Surface Matching
by Lang Wu, Kai Zhong, Zhongwei Li, Ming Zhou, Hongbin Hu, Congjun Wang and Yusheng Shi
Sensors 2021, 21(9), 3229; https://doi.org/10.3390/s21093229 - 07 May 2021
Cited by 7 | Viewed by 2812
Abstract
Three-dimensional feature description for a local surface is a core technology in 3D computer vision. Existing descriptors perform poorly in terms of distinctiveness and robustness owing to noise, mesh decimation, clutter, and occlusion in real scenes. In this paper, we propose a 3D [...] Read more.
Three-dimensional feature description for a local surface is a core technology in 3D computer vision. Existing descriptors perform poorly in terms of distinctiveness and robustness owing to noise, mesh decimation, clutter, and occlusion in real scenes. In this paper, we propose a 3D local surface descriptor using point-pair transformation feature histograms (PPTFHs) to address these challenges. The generation process of the PPTFH descriptor consists of three steps. First, a simple but efficient strategy is introduced to partition the point-pair sets on the local surface into four subsets. Then, three feature histograms corresponding to each point-pair subset are generated by the point-pair transformation features, which are computed using the proposed Darboux frame. Finally, all the feature histograms of the four subsets are concatenated into a vector to generate the overall PPTFH descriptor. The performance of the PPTFH descriptor is evaluated on several popular benchmark datasets, and the results demonstrate that the PPTFH descriptor achieves superior performance in terms of descriptiveness and robustness compared with state-of-the-art algorithms. The benefits of the PPTFH descriptor for 3D surface matching are demonstrated by the results obtained from five benchmark datasets. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

25 pages, 3440 KiB  
Article
Simulation Study of a Frame-Based Motion Correction Algorithm for Positron Emission Imaging
by Héctor Espinós-Morató, David Cascales-Picó, Marina Vergara, Ángel Hernández-Martínez, José María Benlloch Baviera and María José Rodríguez-Álvarez
Sensors 2021, 21(8), 2608; https://doi.org/10.3390/s21082608 - 08 Apr 2021
Cited by 4 | Viewed by 2092
Abstract
Positron emission tomography (PET) is a functional non-invasive imaging modality that uses radioactive substances (radiotracers) to measure changes in metabolic processes. Advances in scanner technology and data acquisition in the last decade have led to the development of more sophisticated PET devices with [...] Read more.
Positron emission tomography (PET) is a functional non-invasive imaging modality that uses radioactive substances (radiotracers) to measure changes in metabolic processes. Advances in scanner technology and data acquisition in the last decade have led to the development of more sophisticated PET devices with good spatial resolution (1–3 mm of full width at half maximum (FWHM)). However, there are involuntary motions produced by the patient inside the scanner that lead to image degradation and potentially to a misdiagnosis. The adverse effect of the motion in the reconstructed image increases as the spatial resolution of the current scanners continues improving. In order to correct this effect, motion correction techniques are becoming increasingly popular and further studied. This work presents a simulation study of an image motion correction using a frame-based algorithm. The method is able to cut the acquired data from the scanner in frames, taking into account the size of the object of study. This approach allows working with low statistical information without losing image quality. The frames are later registered using spatio-temporal registration developed in a multi-level way. To validate these results, several performance tests are applied to a set of simulated moving phantoms. The results obtained show that the method minimizes the intra-frame motion, improves the signal intensity over the background in comparison with other literature methods, produces excellent values of similarity with the ground-truth (static) image and is able to find a limit in the patient-injected dose when some prior knowledge of the lesion is present. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

20 pages, 8201 KiB  
Article
Deep Learning for Transient Image Reconstruction from ToF Data
by Enrico Buratto, Adriano Simonetto, Gianluca Agresti, Henrik Schäfer and Pietro Zanuttigh
Sensors 2021, 21(6), 1962; https://doi.org/10.3390/s21061962 - 11 Mar 2021
Cited by 12 | Viewed by 3640
Abstract
In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; [...] Read more.
In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

17 pages, 2216 KiB  
Article
Combining Augmented Reality and 3D Printing to Improve Surgical Workflows in Orthopedic Oncology: Smartphone Application and Clinical Evaluation
by Rafael Moreta-Martinez, Alicia Pose-Díez-de-la-Lastra, José Antonio Calvo-Haro, Lydia Mediavilla-Santos, Rubén Pérez-Mañanes and Javier Pascau
Sensors 2021, 21(4), 1370; https://doi.org/10.3390/s21041370 - 15 Feb 2021
Cited by 27 | Viewed by 5245
Abstract
During the last decade, orthopedic oncology has experienced the benefits of computerized medical imaging to reduce human dependency, improving accuracy and clinical outcomes. However, traditional surgical navigation systems do not always adapt properly to this kind of interventions. Augmented reality (AR) and three-dimensional [...] Read more.
During the last decade, orthopedic oncology has experienced the benefits of computerized medical imaging to reduce human dependency, improving accuracy and clinical outcomes. However, traditional surgical navigation systems do not always adapt properly to this kind of interventions. Augmented reality (AR) and three-dimensional (3D) printing are technologies lately introduced in the surgical environment with promising results. Here we present an innovative solution combining 3D printing and AR in orthopedic oncological surgery. A new surgical workflow is proposed, including 3D printed models and a novel AR-based smartphone application (app). This app can display the patient’s anatomy and the tumor’s location. A 3D-printed reference marker, designed to fit in a unique position of the affected bone tissue, enables automatic registration. The system has been evaluated in terms of visualization accuracy and usability during the whole surgical workflow. Experiments on six realistic phantoms provided a visualization error below 3 mm. The AR system was tested in two clinical cases during surgical planning, patient communication, and surgical intervention. These results and the positive feedback obtained from surgeons and patients suggest that the combination of AR and 3D printing can improve efficacy, accuracy, and patients’ experience. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

17 pages, 3441 KiB  
Article
HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints
by Linda Christin Büker, Finnja Zuber, Andreas Hein and Sebastian Fudickar
Sensors 2021, 21(4), 1356; https://doi.org/10.3390/s21041356 - 14 Feb 2021
Cited by 6 | Viewed by 3059
Abstract
With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or [...] Read more.
With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis). Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

16 pages, 5932 KiB  
Article
Surface Reconstruction from Structured Light Images Using Differentiable Rendering
by Janus Nørtoft Jensen, Morten Hannemose, J. Andreas Bærentzen, Jakob Wilm, Jeppe Revall Frisvad and Anders Bjorholm Dahl
Sensors 2021, 21(4), 1068; https://doi.org/10.3390/s21041068 - 04 Feb 2021
Cited by 5 | Viewed by 2649
Abstract
When 3D scanning objects, the objective is usually to obtain a continuous surface. However, most surface scanning methods, such as structured light scanning, yield a point cloud. Obtaining a continuous surface from a point cloud requires a subsequent surface reconstruction step, which is [...] Read more.
When 3D scanning objects, the objective is usually to obtain a continuous surface. However, most surface scanning methods, such as structured light scanning, yield a point cloud. Obtaining a continuous surface from a point cloud requires a subsequent surface reconstruction step, which is directly affected by any error from the computation of the point cloud. In this work, we propose a one-step approach in which we compute the surface directly from structured light images. Our method minimizes the least-squares error between photographs and renderings of a triangle mesh, where the vertex positions of the mesh are the parameters of the minimization problem. To ensure fast iterations during optimization, we use differentiable rendering, which computes images and gradients in a single pass. We present simulation experiments demonstrating that our method for computing a triangle mesh has several advantages over approaches that rely on an intermediate point cloud. Our method can produce accurate reconstructions when initializing the optimization from a sphere. We also show that our method is good at reconstructing sharp edges and that it is robust with respect to image noise. In addition, our method can improve the output from other reconstruction algorithms if we use these for initialization. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

18 pages, 1480 KiB  
Article
A Systematic Comparison of Depth Map Representations for Face Recognition
by Stefano Pini, Guido Borghi, Roberto Vezzani, Davide Maltoni and Rita Cucchiara
Sensors 2021, 21(3), 944; https://doi.org/10.3390/s21030944 - 31 Jan 2021
Cited by 15 | Viewed by 3142
Abstract
Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which [...] Read more.
Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task. We compare different depth data representations (depth and normal images, voxels, point clouds), deep models (two-dimensional and three-dimensional Convolutional Neural Networks, PointNet-based networks), and pre-processing and normalization techniques in order to determine the configuration that maximizes the recognition accuracy and is capable of generalizing better on unseen data and novel acquisition settings. Extensive intra- and cross-dataset experiments, which were performed on four public databases, suggest that representations and methods that are based on normal images and point clouds perform and generalize better than other 2D and 3D alternatives. Moreover, we propose a novel challenging dataset, namely MultiSFace, in order to specifically analyze the influence of the depth map quality and the acquisition distance on the face recognition accuracy. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

18 pages, 4215 KiB  
Article
Self-Supervised Point Set Local Descriptors for Point Cloud Registration
by Yijun Yuan, Dorit Borrmann, Jiawei Hou, Yuexin Ma, Andreas Nüchter and Sören Schwertfeger
Sensors 2021, 21(2), 486; https://doi.org/10.3390/s21020486 - 12 Jan 2021
Cited by 19 | Viewed by 3092
Abstract
Descriptors play an important role in point cloud registration. The current state-of-the-art resorts to the high regression capability of deep learning. However, recent deep learning-based descriptors require different levels of annotation and selection of patches, which make the model hard to migrate to [...] Read more.
Descriptors play an important role in point cloud registration. The current state-of-the-art resorts to the high regression capability of deep learning. However, recent deep learning-based descriptors require different levels of annotation and selection of patches, which make the model hard to migrate to new scenarios. In this work, we learn local registration descriptors for point clouds in a self-supervised manner. In each iteration of the training, the input of the network is merely one unlabeled point cloud. Thus, the whole training requires no manual annotation and manual selection of patches. In addition, we propose to involve keypoint sampling into the pipeline, which further improves the performance of our model. Our experiments demonstrate the capability of our self-supervised local descriptor to achieve even better performance than the supervised model, while being easier to train and requiring no data labeling. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

17 pages, 9366 KiB  
Article
Real-Time Single Image Depth Perception in the Wild with Handheld Devices
by Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi, Fabio Tosi and Stefano Mattoccia
Sensors 2021, 21(1), 15; https://doi.org/10.3390/s21010015 - 22 Dec 2020
Cited by 22 | Viewed by 4827
Abstract
Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two [...] Read more.
Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

21 pages, 21766 KiB  
Article
Semantic Extraction of Permanent Structures for the Reconstruction of Building Interiors from Point Clouds
by Inge Coudron, Steven Puttemans, Toon Goedemé and Patrick Vandewalle
Sensors 2020, 20(23), 6916; https://doi.org/10.3390/s20236916 - 03 Dec 2020
Cited by 11 | Viewed by 2434
Abstract
The extraction of permanent structures (such as walls, floors, and ceilings) is an important step in the reconstruction of building interiors from point clouds. These permanent structures are, in general, assumed to be planar. However, point clouds from building interiors often also contain [...] Read more.
The extraction of permanent structures (such as walls, floors, and ceilings) is an important step in the reconstruction of building interiors from point clouds. These permanent structures are, in general, assumed to be planar. However, point clouds from building interiors often also contain clutter with planar surfaces such as furniture, cabinets, etc. Hence, not all planar surfaces that are extracted belong to permanent structures. This is undesirable as it can result in geometric errors in the reconstruction. Therefore, it is important that reconstruction methods can correctly detect and extract all permanent structures even in the presence of such clutter. We propose to perform semantic scene completion using deep learning, prior to the extraction of permanent structures to improve the reconstruction results. For this, we started from the ScanComplete network proposed by Dai et al. We adapted the network to use a different input representation to eliminate the need for scanning trajectory information as this is not always available. Furthermore, we optimized the architecture to make inference and training significantly faster. To further improve the results of the network, we created a more realistic dataset based on real-life scans from building interiors. The experimental results show that our approach significantly improves the extraction of the permanent structures from both synthetically generated as well as real-life point clouds, thereby improving the overall reconstruction results. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

27 pages, 18630 KiB  
Review
3D Sensors for Sewer Inspection: A Quantitative Review and Analysis
by Chris H. Bahnsen, Anders S. Johansen, Mark P. Philipsen, Jesper W. Henriksen, Kamal Nasrollahi and Thomas B. Moeslund
Sensors 2021, 21(7), 2553; https://doi.org/10.3390/s21072553 - 06 Apr 2021
Cited by 21 | Viewed by 5227
Abstract
Automating inspection of critical infrastructure such as sewer systems will help utilities optimize maintenance and replacement schedules. The current inspection process consists of manual reviews of video as an operator controls a sewer inspection vehicle remotely. The process is slow, labor-intensive, and expensive [...] Read more.
Automating inspection of critical infrastructure such as sewer systems will help utilities optimize maintenance and replacement schedules. The current inspection process consists of manual reviews of video as an operator controls a sewer inspection vehicle remotely. The process is slow, labor-intensive, and expensive and presents a huge potential for automation. With this work, we address a central component of the next generation of robotic inspection of sewers, namely the choice of 3D sensing technology. We investigate three prominent techniques for 3D vision: passive stereo, active stereo, and time-of-flight (ToF). The Realsense D435 camera is chosen as the representative of the first two techniques wheres the PMD CamBoard pico flexx represents ToF. The 3D reconstruction performance of the sensors is assessed in both a laboratory setup and in an outdoor above-ground setup. The acquired point clouds from the sensors are compared with reference 3D models using the cloud-to-mesh metric. The reconstruction performance of the sensors is tested with respect to different illuminance levels and different levels of water in the pipes. The results of the tests show that the ToF-based point cloud from the pico flexx is superior to the output of the active and passive stereo cameras. Full article
(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)
Show Figures

Figure 1

Back to TopTop