Submit to Sensors Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Computer Vision for 3D Perception and Applications

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensor Networks".

Deadline for manuscript submissions: closed (1 February 2021) | Viewed by 60711

Share This Special Issue

Special Issue Editors

Dr. Matteo Poggi

E-Mail Website
Guest Editor

Department of Computer Science and Engineering (DISI), University of Bologna, 40136 Bologna, Italy
Interests: computer vision; deep learning; depth perception; embedded systems

Prof. Dr. Thomas Moeslund

E-Mail Website
Guest Editor

Visual Analysis of People (VAP) Lab, Aalborg University, Rendsburggade 14, 9000 Aalborg, Denmark
Interests: computer vision; image processing; machine vision; pattern recognition; visual analysis of peoples' whereabouts; surveillance; traffic monitoring
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Effective 3D perception of the environment greatly enriches the knowledge of the surrounding environment and is crucial to effectively develop high-level applications for various purposes. Pivotal to 3D perception is the acquisition/estimation of reliable depth information—a task for which several technologies exist, ranging from active sensors (e.g., Time-of-Flight devices) to passive cameras, coupled with a variety of different techniques allowing for depth estimation from images (stereo matching, structure from motion, and more). From an accurate reconstruction of the surrounding 3D scene, several complex problems can be addressed, such as autonomous navigation and localization, tracking, surveillance, robotics, interaction with other agents, and manipulation of the sensed environment. In this field as well, the recent advances in deep learning have rapidly found place and spread.

The aim of this Special Issue is to present both techniques to reliably acquire 3D data and to tackle computer vision tasks by exploiting this information, exploring novel solutions for perception, as well as for applications.

This Special Issue invites contributions in the following topics (but is not limited to these):

Depth from images;
Binocular and multi-view stereo;
Active depth sensing;
Single image depth estimation;
3D reconstruction scene understanding;
RGB-D computer vision;
3D pose estimation, tracking and recognition;
3D motion estimation;
3D semantic segmentation;
Applications of 3D vision (e.g., robotics, augmented reality).

Dr. Matteo Poggi
Prof. Dr. Thomas Moeslund
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

3D perception
binocular-multiview stereo
active 3D sensors
RGB-D computer vision
3D detection
3D tracking
3D vision applications

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (12 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Editorial

Jump to: Research, Review

3 pages, 164 KiB

Open AccessEditorial

Computer Vision for 3D Perception and Applications

by Matteo Poggi and Thomas B. Moeslund

Sensors 2021, 21(12), 3944; https://doi.org/10.3390/s21123944 - 8 Jun 2021

Cited by 1 | Viewed by 3192

Abstract

Effective 3D perception of an observed scene greatly enriches the knowledge about the surrounding environment and is crucial to effectively develop high-level applications for various purposes [...] Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

Research

Jump to: Editorial, Review

21 pages, 60445 KiB

Open AccessArticle

PPTFH: Robust Local Descriptor Based on Point-Pair Transformation Features for 3D Surface Matching

by Lang Wu, Kai Zhong, Zhongwei Li, Ming Zhou, Hongbin Hu, Congjun Wang and Yusheng Shi

Sensors 2021, 21(9), 3229; https://doi.org/10.3390/s21093229 - 7 May 2021

Cited by 13 | Viewed by 3873

Abstract

Three-dimensional feature description for a local surface is a core technology in 3D computer vision. Existing descriptors perform poorly in terms of distinctiveness and robustness owing to noise, mesh decimation, clutter, and occlusion in real scenes. In this paper, we propose a 3D local surface descriptor using point-pair transformation feature histograms (PPTFHs) to address these challenges. The generation process of the PPTFH descriptor consists of three steps. First, a simple but efficient strategy is introduced to partition the point-pair sets on the local surface into four subsets. Then, three feature histograms corresponding to each point-pair subset are generated by the point-pair transformation features, which are computed using the proposed Darboux frame. Finally, all the feature histograms of the four subsets are concatenated into a vector to generate the overall PPTFH descriptor. The performance of the PPTFH descriptor is evaluated on several popular benchmark datasets, and the results demonstrate that the PPTFH descriptor achieves superior performance in terms of descriptiveness and robustness compared with state-of-the-art algorithms. The benefits of the PPTFH descriptor for 3D surface matching are demonstrated by the results obtained from five benchmark datasets. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

25 pages, 3440 KiB

Open AccessArticle

Simulation Study of a Frame-Based Motion Correction Algorithm for Positron Emission Imaging

by Héctor Espinós-Morató, David Cascales-Picó, Marina Vergara, Ángel Hernández-Martínez, José María Benlloch Baviera and María José Rodríguez-Álvarez

Sensors 2021, 21(8), 2608; https://doi.org/10.3390/s21082608 - 8 Apr 2021

Cited by 4 | Viewed by 3034

Abstract

Positron emission tomography (PET) is a functional non-invasive imaging modality that uses radioactive substances (radiotracers) to measure changes in metabolic processes. Advances in scanner technology and data acquisition in the last decade have led to the development of more sophisticated PET devices with good spatial resolution (1–3 mm of full width at half maximum (FWHM)). However, there are involuntary motions produced by the patient inside the scanner that lead to image degradation and potentially to a misdiagnosis. The adverse effect of the motion in the reconstructed image increases as the spatial resolution of the current scanners continues improving. In order to correct this effect, motion correction techniques are becoming increasingly popular and further studied. This work presents a simulation study of an image motion correction using a frame-based algorithm. The method is able to cut the acquired data from the scanner in frames, taking into account the size of the object of study. This approach allows working with low statistical information without losing image quality. The frames are later registered using spatio-temporal registration developed in a multi-level way. To validate these results, several performance tests are applied to a set of simulated moving phantoms. The results obtained show that the method minimizes the intra-frame motion, improves the signal intensity over the background in comparison with other literature methods, produces excellent values of similarity with the ground-truth (static) image and is able to find a limit in the patient-injected dose when some prior knowledge of the lesion is present. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

20 pages, 8201 KiB

Open AccessArticle

Deep Learning for Transient Image Reconstruction from ToF Data

by Enrico Buratto, Adriano Simonetto, Gianluca Agresti, Henrik Schäfer and Pietro Zanuttigh

Sensors 2021, 21(6), 1962; https://doi.org/10.3390/s21061962 - 11 Mar 2021

Cited by 19 | Viewed by 5287

Abstract

In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

17 pages, 2216 KiB

Open AccessEditor’s ChoiceArticle

Combining Augmented Reality and 3D Printing to Improve Surgical Workflows in Orthopedic Oncology: Smartphone Application and Clinical Evaluation

by Rafael Moreta-Martinez, Alicia Pose-Díez-de-la-Lastra, José Antonio Calvo-Haro, Lydia Mediavilla-Santos, Rubén Pérez-Mañanes and Javier Pascau

Sensors 2021, 21(4), 1370; https://doi.org/10.3390/s21041370 - 15 Feb 2021

Cited by 39 | Viewed by 7148

Abstract

During the last decade, orthopedic oncology has experienced the benefits of computerized medical imaging to reduce human dependency, improving accuracy and clinical outcomes. However, traditional surgical navigation systems do not always adapt properly to this kind of interventions. Augmented reality (AR) and three-dimensional (3D) printing are technologies lately introduced in the surgical environment with promising results. Here we present an innovative solution combining 3D printing and AR in orthopedic oncological surgery. A new surgical workflow is proposed, including 3D printed models and a novel AR-based smartphone application (app). This app can display the patient’s anatomy and the tumor’s location. A 3D-printed reference marker, designed to fit in a unique position of the affected bone tissue, enables automatic registration. The system has been evaluated in terms of visualization accuracy and usability during the whole surgical workflow. Experiments on six realistic phantoms provided a visualization error below 3 mm. The AR system was tested in two clinical cases during surgical planning, patient communication, and surgical intervention. These results and the positive feedback obtained from surgeons and patients suggest that the combination of AR and 3D printing can improve efficacy, accuracy, and patients’ experience. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

17 pages, 3441 KiB

Open AccessArticle

HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints

by Linda Christin Büker, Finnja Zuber, Andreas Hein and Sebastian Fudickar

Sensors 2021, 21(4), 1356; https://doi.org/10.3390/s21041356 - 14 Feb 2021

Cited by 9 | Viewed by 4294

Abstract

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis). Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

16 pages, 5932 KiB

Open AccessArticle

Surface Reconstruction from Structured Light Images Using Differentiable Rendering

by Janus Nørtoft Jensen, Morten Hannemose, J. Andreas Bærentzen, Jakob Wilm, Jeppe Revall Frisvad and Anders Bjorholm Dahl

Sensors 2021, 21(4), 1068; https://doi.org/10.3390/s21041068 - 4 Feb 2021

Cited by 8 | Viewed by 4015

Abstract

When 3D scanning objects, the objective is usually to obtain a continuous surface. However, most surface scanning methods, such as structured light scanning, yield a point cloud. Obtaining a continuous surface from a point cloud requires a subsequent surface reconstruction step, which is directly affected by any error from the computation of the point cloud. In this work, we propose a one-step approach in which we compute the surface directly from structured light images. Our method minimizes the least-squares error between photographs and renderings of a triangle mesh, where the vertex positions of the mesh are the parameters of the minimization problem. To ensure fast iterations during optimization, we use differentiable rendering, which computes images and gradients in a single pass. We present simulation experiments demonstrating that our method for computing a triangle mesh has several advantages over approaches that rely on an intermediate point cloud. Our method can produce accurate reconstructions when initializing the optimization from a sphere. We also show that our method is good at reconstructing sharp edges and that it is robust with respect to image noise. In addition, our method can improve the output from other reconstruction algorithms if we use these for initialization. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

18 pages, 1480 KiB

Open AccessArticle

A Systematic Comparison of Depth Map Representations for Face Recognition

by Stefano Pini, Guido Borghi, Roberto Vezzani, Davide Maltoni and Rita Cucchiara

Sensors 2021, 21(3), 944; https://doi.org/10.3390/s21030944 - 31 Jan 2021

Cited by 22 | Viewed by 5002

Abstract

Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task. We compare different depth data representations (depth and normal images, voxels, point clouds), deep models (two-dimensional and three-dimensional Convolutional Neural Networks, PointNet-based networks), and pre-processing and normalization techniques in order to determine the configuration that maximizes the recognition accuracy and is capable of generalizing better on unseen data and novel acquisition settings. Extensive intra- and cross-dataset experiments, which were performed on four public databases, suggest that representations and methods that are based on normal images and point clouds perform and generalize better than other 2D and 3D alternatives. Moreover, we propose a novel challenging dataset, namely MultiSFace, in order to specifically analyze the influence of the depth map quality and the acquisition distance on the face recognition accuracy. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

18 pages, 4215 KiB

Open AccessArticle

Self-Supervised Point Set Local Descriptors for Point Cloud Registration

by Yijun Yuan, Dorit Borrmann, Jiawei Hou, Yuexin Ma, Andreas Nüchter and Sören Schwertfeger

Sensors 2021, 21(2), 486; https://doi.org/10.3390/s21020486 - 12 Jan 2021

Cited by 22 | Viewed by 4541

Abstract

Descriptors play an important role in point cloud registration. The current state-of-the-art resorts to the high regression capability of deep learning. However, recent deep learning-based descriptors require different levels of annotation and selection of patches, which make the model hard to migrate to new scenarios. In this work, we learn local registration descriptors for point clouds in a self-supervised manner. In each iteration of the training, the input of the network is merely one unlabeled point cloud. Thus, the whole training requires no manual annotation and manual selection of patches. In addition, we propose to involve keypoint sampling into the pipeline, which further improves the performance of our model. Our experiments demonstrate the capability of our self-supervised local descriptor to achieve even better performance than the supervised model, while being easier to train and requiring no data labeling. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

17 pages, 9366 KiB

Open AccessArticle

Real-Time Single Image Depth Perception in the Wild with Handheld Devices

by Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi, Fabio Tosi and Stefano Mattoccia

Sensors 2021, 21(1), 15; https://doi.org/10.3390/s21010015 - 22 Dec 2020

Cited by 31 | Viewed by 6349

Abstract

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

21 pages, 21766 KiB

Open AccessArticle

Semantic Extraction of Permanent Structures for the Reconstruction of Building Interiors from Point Clouds

by Inge Coudron, Steven Puttemans, Toon Goedemé and Patrick Vandewalle

Sensors 2020, 20(23), 6916; https://doi.org/10.3390/s20236916 - 3 Dec 2020

Cited by 15 | Viewed by 3401

Abstract

The extraction of permanent structures (such as walls, floors, and ceilings) is an important step in the reconstruction of building interiors from point clouds. These permanent structures are, in general, assumed to be planar. However, point clouds from building interiors often also contain clutter with planar surfaces such as furniture, cabinets, etc. Hence, not all planar surfaces that are extracted belong to permanent structures. This is undesirable as it can result in geometric errors in the reconstruction. Therefore, it is important that reconstruction methods can correctly detect and extract all permanent structures even in the presence of such clutter. We propose to perform semantic scene completion using deep learning, prior to the extraction of permanent structures to improve the reconstruction results. For this, we started from the ScanComplete network proposed by Dai et al. We adapted the network to use a different input representation to eliminate the need for scanning trajectory information as this is not always available. Furthermore, we optimized the architecture to make inference and training significantly faster. To further improve the results of the network, we created a more realistic dataset based on real-life scans from building interiors. The experimental results show that our approach significantly improves the extraction of the permanent structures from both synthetically generated as well as real-life point clouds, thereby improving the overall reconstruction results. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research

27 pages, 18630 KiB

Open AccessReview

3D Sensors for Sewer Inspection: A Quantitative Review and Analysis

by Chris H. Bahnsen, Anders S. Johansen, Mark P. Philipsen, Jesper W. Henriksen, Kamal Nasrollahi and Thomas B. Moeslund

Sensors 2021, 21(7), 2553; https://doi.org/10.3390/s21072553 - 6 Apr 2021

Cited by 37 | Viewed by 6970

Abstract

Automating inspection of critical infrastructure such as sewer systems will help utilities optimize maintenance and replacement schedules. The current inspection process consists of manual reviews of video as an operator controls a sewer inspection vehicle remotely. The process is slow, labor-intensive, and expensive and presents a huge potential for automation. With this work, we address a central component of the next generation of robotic inspection of sewers, namely the choice of 3D sensing technology. We investigate three prominent techniques for 3D vision: passive stereo, active stereo, and time-of-flight (ToF). The Realsense D435 camera is chosen as the representative of the first two techniques wheres the PMD CamBoard pico flexx represents ToF. The 3D reconstruction performance of the sensors is assessed in both a laboratory setup and in an outdoor above-ground setup. The acquired point clouds from the sensors are compared with reference 3D models using the cloud-to-mesh metric. The reconstruction performance of the sensors is tested with respect to different illuminance levels and different levels of water in the pipes. The results of the tests show that the ToF-based point cloud from the pico flexx is superior to the output of the active and passive stereo cameras. Full article

(This article belongs to the Special Issue Computer Vision for 3D Perception and Applications)

► Show Figures

Journal Menu

Journal Browser

Computer Vision for 3D Perception and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (12 papers)

Editorial

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI