sensors-logo

Journal Browser

Journal Browser

Sensors Fusion for Human-Centric 3D Capturing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Physical Sensors".

Deadline for manuscript submissions: closed (31 December 2020) | Viewed by 8793

Special Issue Editors


E-Mail Website
Guest Editor
Information Technologies Institute, Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece
Interests: UAV detection and classification; 3D/4D computer vision; 3D human reconstruction and motion capturing; medical image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The Visual Computing Lab, Information Technologies Institute / Centre for Research and Technology Hellas 6th Km Charilaou - Thermi Road, GR57001 Thermi-Thessaloniki, Greece
Interests: Visual Processing including: sign language recognition; object/person detection & tracking; deep learning; federated learning

Special Issue Information

Dear Colleagues,

This Special Issue tries to capture the recent advances in 3D capture technology by fusing data from multiple sensors (be it cameras, inertial, infrared or depth sensors) to produce high-quality 3D human representations (i.e., 3D motion, shape, appearance, performance, and activity).

Nowadays, given the ubiquity of consumer-grade capturing devices, in conjunction with the revolutionary advance of today’s powerful and efficient processing capacities (i.e., modern GPUs and even newer alternatives for efficient on-device hardware solutions), 3D human capturing can be characterized as one of the most beneficial and emerging technologies. The newest generation of commodity low-cost sensing devices are typically capable of capturing a multitude of different modalities (i.e., color, infrared, orientation, structure, as well as higher-level information like device pose or human motion).

These technologies will greatly influence a number of industrial sectors like gaming, creative industries (media, marketing, and the overall XR spectrum), film (VFX), and the enabling of new forms of human–computer interactions. Human perception and understanding as well as their accurate visual representations are important objectives for these industries.

With respect to the wider media sector, current immersive experiences allow for three degrees of freedom—3DOF (omnidirectional)—viewing, but the development of new presentations means (VR HMDs as well as new 3D TVs/displays) will steer their evolution towards 3DOF+ (allowing for limited translations) and 6DOF (unrestricted viewpoint selection) experiences.

From a systemic point of view, the integration of multiple modalities (either homogeneous or heterogeneous) is a challenging task that needs to address issues like spatial and/or temporal alignment, multi-modal sensor fusion (either direct or indirect using a priori knowledge), and real-time operational capacity. Additionally, operating in real-world conditions imposes extra challenges besides real-time performance like limited power usage, constrained deployment capabilities, and the necessary robustness. This is further accentuated by the need to commercialize this technology and thus utilize low-cost sensors in an efficient and effective manner. Though recent advances in sensor technology have managed to reduce sensor costs, they cannot overcome the inherent limitations of each modality. The need to develop robust systems still necessitates sensor fusion techniques. Therefore, the design of multi-modal and/or multi-sensor systems requires the maturing of existing, and the development of new, technologies.

As demonstrated by recent research, sensor information fusion can greatly increase the accuracy and performance of such systems as multi-modality usually offers complementarity. While depth sensing provides 3D information, its commercial counterparts do it at the cost of accuracy and resolution that modern miniaturized cameras have mostly addressed. On the other hand, it can better handle color variations (textures and lighting changes) compared to traditional color cameras. Similarly, while inertial units provide live orientation information, their global spatial alignment is an issue that can be guided by depth sensors while at the same time guaranteeing drift-free data collection. Furthermore, deep networks have started showcasing promising performance in terms of domain alignment and transfer learning, paving the way for their exploitation in the context of sensor fusion. Nonetheless, more research is needed to mature their applicability and allow for multi-sensor data training and/or incorporation of prior modality knowledge when operating in a multi-modal manner.

This Special Issue invites contributions that address multi-sensor and multi-modal information fusion with the aim of capturing humans in 3D. It aims at capturing the current and emerging statues in the human capturing relevant technologies like 3D reconstruction, motion, and actions. In particular, submitted papers should clearly show novel contributions and innovative applications covering but not limited to any of the following topics around 3D human capturing using multiple sensor modalities:

  • Multi-modal data fusion;
  • Multi-sensor alignment;
  • Sensor data denoising and completion;
  • Multi-modal learning for sensor domain invariant representations;
  • Cross-modality transfer learning;
  • Self-supervised multi-modal learning;
  • Multi-sensor and multi-modal capturing systems;
  • Multi-modal dynamic scene capturing;
  • Open source frameworks and libraries for working with multi-modal sensors;
  • Multi-modal and multi-sensor applications (HCI, 3D capture for XR and/or free-viewpoint video, tele-presence, motion capture, real-time action recognition, simultaneous body, hands and face capture, non-rigid 3D reconstruction of humans, real-time calibration systems, and systems integrating multiple sensor types).

Dr. Dimitrios Zarpalas
Dr. Petros Daras
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sensor fusion
  • RGB-D
  • multi-modal learning
  • multi-sensor systems
  • multi-view systems
  • 3D reconstruction
  • motion capture
  • real-time
  • 3D vision
  • volumetric capture
  • 3D action recognition
  • wearable sensors
  • body sensor networks
  • infrared vision
  • depth sensing

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 6673 KiB  
Article
C-MHAD: Continuous Multimodal Human Action Dataset of Simultaneous Video and Inertial Sensing
by Haoran Wei, Pranav Chopada and Nasser Kehtarnavaz
Sensors 2020, 20(10), 2905; https://doi.org/10.3390/s20102905 - 20 May 2020
Cited by 29 | Viewed by 4543
Abstract
Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously [...] Read more.
Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously among actions of non-interest or no actions. It is more challenging to recognize actions of interest in continuous action streams since the starts and ends of these actions are not known and need to be determined in an on-the-fly manner. Furthermore, there exists no public domain multi-modal dataset in which video and inertial data are captured simultaneously for continuous action streams. The main objective of this paper is to describe a dataset that is collected and made publicly available, named Continuous Multimodal Human Action Dataset (C-MHAD), in which video and inertial data stream are captured simultaneously in a continuous way. This dataset is then used in an example recognition technique and the results obtained indicate that the fusion of these two sensing modalities increases the F1 scores compared to using each sensing modality individually. Full article
(This article belongs to the Special Issue Sensors Fusion for Human-Centric 3D Capturing)
Show Figures

Figure 1

22 pages, 11481 KiB  
Article
Ambiguity-Free Optical–Inertial Tracking for Augmented Reality Headsets
by Fabrizio Cutolo, Virginia Mamone, Nicola Carbonaro, Vincenzo Ferrari and Alessandro Tognetti
Sensors 2020, 20(5), 1444; https://doi.org/10.3390/s20051444 - 6 Mar 2020
Cited by 15 | Viewed by 3618
Abstract
The increasing capability of computing power and mobile graphics has made possible the release of self-contained augmented reality (AR) headsets featuring efficient head-anchored tracking solutions. Ego motion estimation based on well-established infrared tracking of markers ensures sufficient accuracy and robustness. Unfortunately, wearable visible-light [...] Read more.
The increasing capability of computing power and mobile graphics has made possible the release of self-contained augmented reality (AR) headsets featuring efficient head-anchored tracking solutions. Ego motion estimation based on well-established infrared tracking of markers ensures sufficient accuracy and robustness. Unfortunately, wearable visible-light stereo cameras with short baseline and operating under uncontrolled lighting conditions suffer from tracking failures and ambiguities in pose estimation. To improve the accuracy of optical self-tracking and its resiliency to marker occlusions, degraded camera calibrations, and inconsistent lighting, in this work we propose a sensor fusion approach based on Kalman filtering that integrates optical tracking data with inertial tracking data when computing motion correlation. In order to measure improvements in AR overlay accuracy, experiments are performed with a custom-made AR headset designed for supporting complex manual tasks performed under direct vision. Experimental results show that the proposed solution improves the head-mounted display (HMD) tracking accuracy by one third and improves the robustness by also capturing the orientation of the target scene when some of the markers are occluded and when the optical tracking yields unstable and/or ambiguous results due to the limitations of using head-anchored stereo tracking cameras under uncontrollable lighting conditions. Full article
(This article belongs to the Special Issue Sensors Fusion for Human-Centric 3D Capturing)
Show Figures

Figure 1

Back to TopTop