Next Article in Journal
Real-Time Detection of Fouling-Layer with a Non-Intrusive Continuous Sensor (NICS) during Thermal Processing in Food Manufacturing
Previous Article in Journal
Using Soft Sensors as a Basis of an Innovative Architecture for Operation Planning and Quality Evaluation in Agricultural Sprayers
Article

Image Captioning Using Motion-CNN with Object Detection

1
Department of Precision Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
2
RITECS Inc., 3-5-11-403, Shibasakityo, Tatekawa-shi, Tokyo 190-0023, Japan
*
Author to whom correspondence should be addressed.
Academic Editor: Cosimo Distante
Sensors 2021, 21(4), 1270; https://doi.org/10.3390/s21041270
Received: 7 January 2021 / Revised: 31 January 2021 / Accepted: 5 February 2021 / Published: 10 February 2021
(This article belongs to the Section Intelligent Sensors)
Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance. View Full-Text
Keywords: deep learning; image captioning; motion estimation; object detection deep learning; image captioning; motion estimation; object detection
Show Figures

Figure 1

MDPI and ACS Style

Iwamura, K.; Louhi Kasahara, J.Y.; Moro, A.; Yamashita, A.; Asama, H. Image Captioning Using Motion-CNN with Object Detection. Sensors 2021, 21, 1270. https://doi.org/10.3390/s21041270

AMA Style

Iwamura K, Louhi Kasahara JY, Moro A, Yamashita A, Asama H. Image Captioning Using Motion-CNN with Object Detection. Sensors. 2021; 21(4):1270. https://doi.org/10.3390/s21041270

Chicago/Turabian Style

Iwamura, Kiyohiko, Jun Y. Louhi Kasahara, Alessandro Moro, Atsushi Yamashita, and Hajime Asama. 2021. "Image Captioning Using Motion-CNN with Object Detection" Sensors 21, no. 4: 1270. https://doi.org/10.3390/s21041270

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop