Next Article in Journal
A Compensation Method for a Time–Space Variant Atmospheric Phase Applied to Time-Series GB-SAR Images
Previous Article in Journal
The Application of LiDAR Data for the Solar Potential Analysis Based on Urban 3D Model
Open AccessLetter

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

1
Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China
2
Key Laboratory of Network Information System Technology (NIST), Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China
3
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(20), 2349; https://doi.org/10.3390/rs11202349
Received: 12 September 2019 / Revised: 3 October 2019 / Accepted: 4 October 2019 / Published: 10 October 2019
(This article belongs to the Section Remote Sensing Image Processing)
Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories’ word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model’s performance for describing high-resolution remote sensing images and obtain better S m scores compared with other methods. S m score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels. View Full-Text
Keywords: remote sensing image captioning; remote sensing image; image understanding; semantic understanding remote sensing image captioning; remote sensing image; image understanding; semantic understanding
Show Figures

Graphical abstract

MDPI and ACS Style

Zhang, Z.; Diao, W.; Zhang, W.; Yan, M.; Gao, X.; Sun, X. LAM: Remote Sensing Image Captioning with Label-Attention Mechanism. Remote Sens. 2019, 11, 2349.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop