Next Article in Journal
Function Similarity Using Family Context
Previous Article in Journal
A Comprehensive Review of Li-Ion Battery Materials and Their Recycling Techniques
Previous Article in Special Issue
Woven Fabric Pattern Recognition and Classification Based on Deep Convolutional Neural Networks
Open AccessArticle

Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer

School of Electronics Engineering, College of IT Engineering, Kyungpook National University, 80 Daehakro, Bukgu, Daegu 41566, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(7), 1162; https://doi.org/10.3390/electronics9071162
Received: 24 June 2020 / Revised: 14 July 2020 / Accepted: 15 July 2020 / Published: 17 July 2020
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Recent video captioning models aim at describing all events in a long video. However, their event descriptions do not fully exploit the contextual information included in a video because they lack the ability to remember information changes over time. To address this problem, we propose a novel context-aware video captioning model that generates natural language descriptions based on the improved video context understanding. We introduce an external memory, differential neural computer (DNC), to improve video context understanding. DNC naturally learns to use its internal memory for context understanding and also provides contents of its memory as an output for additional connection. By sequentially connecting DNC-based caption models (DNC augmented LSTM) through this memory information, our consecutively connected DNC architecture can understand the context in a video without explicitly searching for event-wise correlation. Our consecutive DNC is sequentially trained with its language model (LSTM) for each video clip to generate context-aware captions with superior quality. In experiments, we demonstrate that our model provides more natural and coherent captions which reflect previous contextual information. Our model also shows superior quantitative performance on video captioning in terms of BLEU ([email protected] 4.37), METEOR (9.57), and CIDEr-D (28.08). View Full-Text
Keywords: deep neural network; deep learning; context understanding; recurrent neural network; action recognition; memory deep neural network; deep learning; context understanding; recurrent neural network; action recognition; memory
Show Figures

Figure 1

MDPI and ACS Style

Kim, J.; Choi, I.; Lee, M. Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer. Electronics 2020, 9, 1162. https://doi.org/10.3390/electronics9071162

AMA Style

Kim J, Choi I, Lee M. Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer. Electronics. 2020; 9(7):1162. https://doi.org/10.3390/electronics9071162

Chicago/Turabian Style

Kim, Jonghong; Choi, Inchul; Lee, Minho. 2020. "Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer" Electronics 9, no. 7: 1162. https://doi.org/10.3390/electronics9071162

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop