Next Article in Journal
Optimized Design of Wind Turbine Blade Receptors Based on Electrostatic Field Theory
Next Article in Special Issue
A Low-Power Spike-like Neural Network Design
Previous Article in Journal
Sub-Transmission Network Expansion Planning Considering Regional Energy Systems: A Bi-Level Approach
Previous Article in Special Issue
Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning
Open AccessArticle

A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

by Shan He and Yuanyao Lu *
School of Information Science and Technology, North China University of Technology, Beijing 100144, China
*
Author to whom correspondence should be addressed.
Electronics 2019, 8(12), 1417; https://doi.org/10.3390/electronics8121417
Received: 31 October 2019 / Revised: 23 November 2019 / Accepted: 25 November 2019 / Published: 28 November 2019
Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an end-to-end model that takes deep convolutional neural network (CNN) as the encoder and recurrent neural network (RNN) as the decoder. In order to get better image captioning extraction, we propose a highly modularized multi-branch CNN, which could increase accuracy while maintaining the number of hyper-parameters unchanged. This strategy provides a simply designed network consists of parallel sub-modules of the same structure. While traditional CNN goes deeper and wider to increase accuracy, our proposed method is more effective with a simple design, which is easier to optimize for practical application. Experiments are conducted on Flickr8k, Flickr30k and MSCOCO entities. Results demonstrate that our method achieves state of the art performances in terms of caption quality. View Full-Text
Keywords: image captioning; convolutional neural network (CNN); multi-branch expansion; long short-term memory (LSTM) image captioning; convolutional neural network (CNN); multi-branch expansion; long short-term memory (LSTM)
Show Figures

Figure 1

MDPI and ACS Style

He, S.; Lu, Y. A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning. Electronics 2019, 8, 1417.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop