Next Article in Journal
Serious Game iDO: Towards Better Education in Dementia Care
Previous Article in Journal
Methods and Challenges Using Multispectral and Hyperspectral Images for Practical Change Detection Applications
Open AccessArticle

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala 682022, India
*
Author to whom correspondence should be addressed.
Information 2019, 10(11), 354; https://doi.org/10.3390/info10110354
Received: 11 October 2019 / Revised: 9 November 2019 / Accepted: 13 November 2019 / Published: 15 November 2019
(This article belongs to the Section Artificial Intelligence)
Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance for visually impaired people. Currently, deep neural networks play a vital role in computer vision and natural language processing tasks. The main objective of the work is to generate the grammatically correct description of the image using the semantics of the trained captions. An encoder-decoder framework using the deep neural system is used to implement an image description generation task. The encoder is an image parsing module, and the decoder is a surface realization module. The framework uses Densely connected convolutional neural networks (Densenet) for image encoding and Bidirectional Long Short Term Memory (BLSTM) for language modeling, and the outputs are given to bidirectional LSTM in the caption generator, which is trained to optimize the log-likelihood of the target description of the image. Most of the existing image captioning works use RNN and LSTM for language modeling. RNNs are computationally expensive with limited memory. LSTM checks the inputs in one direction. BLSTM is used in practice, which avoids the problem of RNN and LSTM. In this work, the selection of the best combination of words in caption generation is made using beam search and game theoretic search. The results show the game theoretic search outperforms beam search. The model was evaluated with the standard benchmark dataset Flickr8k. The Bilingual Evaluation Understudy (BLEU) score is taken as the evaluation measure of the system. A new evaluation measure called GCorrectwas used to check the grammatical correctness of the description. The performance of the proposed model achieves greater improvements over previous methods on the Flickr8k dataset. The proposed model produces grammatically correct sentences for images with a GCorrect of 0.040625 and a BLEU score of 69.96% View Full-Text
Keywords: image captioning; image description generation; deep learning; Densenet; bidirectional LSTM image captioning; image description generation; deep learning; Densenet; bidirectional LSTM
Show Figures

Figure 1

MDPI and ACS Style

S R, S.; Idicula, S.M. Dense Model for Automatic Image Description Generation with Game Theoretic Optimization. Information 2019, 10, 354.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop