Next Article in Journal
A Bi-Directional LSTM-CNN Model with Attention for Aspect-Level Text Classification
Previous Article in Journal
Privacy and Security Issues in Online Social Networks
Article Menu

Export Article

Open AccessArticle
Future Internet 2018, 10(12), 115; https://doi.org/10.3390/fi10120115

Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks

1
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
2
Shanghai Institute for Advanced Communication and Data Science, Shanghai 200444, China
3
The 32nd Research Institute, China Electronics Technology Group Corporation, No. 63 Chengliugong Road, Jiading District, Shanghai 200444, China
*
Author to whom correspondence should be addressed.
Received: 22 October 2018 / Revised: 17 November 2018 / Accepted: 20 November 2018 / Published: 22 November 2018
(This article belongs to the Section Techno-Social Smart Systems)
Full-Text   |   PDF [1673 KB, uploaded 27 November 2018]   |  

Abstract

In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neural networks require a fixed size image input, which not only limits the network structure but also affects the recognition accuracy. Although this problem has been solved in the field of images, it has not yet been broken through in the field of video. To address the input problem of fixed size video frames in video recognition, we propose a three-dimensional (3D) densely connected convolutional network based on spatial pyramid pooling (3D-DenseNet-SPP). As the name implies, the network structure is mainly composed of three parts: 3DCNN, DenseNet, and SPPNet. Our models were evaluated on a KTH dataset and UCF101 dataset separately. The experimental results showed that our model has better performance in the field of video-based behavior recognition in comparison to the existing models. View Full-Text
Keywords: CNN; action recognition; spatial pyramid pooling; dense connectivity; 3D convolution CNN; action recognition; spatial pyramid pooling; dense connectivity; 3D convolution
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Yang, W.; Chen, Y.; Huang, C.; Gao, M. Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks. Future Internet 2018, 10, 115.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Future Internet EISSN 1999-5903 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top