3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition
AbstractHuman activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method. View Full-Text
Share & Cite This Article
Arif, S.; Wang, J.; Ul Hassan, T.; Fei, Z. 3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition. Future Internet 2019, 11, 42.
Arif S, Wang J, Ul Hassan T, Fei Z. 3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition. Future Internet. 2019; 11(2):42.Chicago/Turabian Style
Arif, Sheeraz; Wang, Jing; Ul Hassan, Tehseen; Fei, Zesong. 2019. "3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition." Future Internet 11, no. 2: 42.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.