Next Article in Journal
Leveraging LoRaWAN Technology for Precision Agriculture in Greenhouses
Previous Article in Journal
In-Situ Measurement of Electrical-Heating-Induced Magnetic Field for an Atomic Magnetometer
Previous Article in Special Issue
SLAM-Based Self-Calibration of a Binocular Stereo Vision Rig in Real-Time
Open AccessArticle

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

1
Cerema Research Center, 31400 Toulouse, France
2
Informatics Research Institute of Toulouse (IRIT), Université de Toulouse, CNRS, 31062 Toulouse, France
3
Vingroup Big Data Institute (VinBDI), Hanoi 10000, Vietnam
4
Clay AIR, Software Solution, 33000 Bordeaux, France
5
School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
6
Zebra Technologies Corp., London SE1 9LQ, UK
7
Department of Computer Science and Engineering, University Carlos III de Madrid, 28270 Colmenarejo, Spain
8
Aparnix, Santiago 7550076, Chile
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(7), 1825; https://doi.org/10.3390/s20071825
Received: 27 February 2020 / Revised: 22 March 2020 / Accepted: 23 March 2020 / Published: 25 March 2020
(This article belongs to the Special Issue Camera as a Smart-Sensor (CaaSS))
We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems. View Full-Text
Keywords: human action recognition; 3D pose estimation; RGB sensors; deep learning human action recognition; 3D pose estimation; RGB sensors; deep learning
Show Figures

Figure 1

MDPI and ACS Style

Pham, H.H.; Salmane, H.; Khoudour, L.; Crouzil, A.; Velastin, S.A.; Zegers, P. A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera. Sensors 2020, 20, 1825.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop