Next Article in Journal
An Investigation of Spectral Band Selection for Hyperspectral LiDAR Technique
Previous Article in Journal
A Low-Dropout Regulator with PSRR Enhancement through Feed-Forward Ripple Cancellation Technique in 65 nm CMOS Process
Open AccessFeature PaperArticle

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Department of Electronic Engineering, Inha University, Incheon 22212, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(1), 147; https://doi.org/10.3390/electronics9010147
Received: 27 November 2019 / Revised: 4 January 2020 / Accepted: 9 January 2020 / Published: 12 January 2020
(This article belongs to the Section Computer Science & Engineering)
Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN. View Full-Text
Keywords: action recognition; 3D CNN; deep feature attention action recognition; 3D CNN; deep feature attention
Show Figures

Figure 1

MDPI and ACS Style

Anvarov, F.; Kim, D.H.; Song, B.C. Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention. Electronics 2020, 9, 147.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop