Next Article in Journal
Exploring the Distribution Patterns of Flickr Photos
Next Article in Special Issue
Dynamic Recommendation of POI Sequence Responding to Historical Trajectory
Previous Article in Journal
Generation of Lane-Level Road Networks Based on a Trajectory-Similarity-Join Pruning Strategy
Previous Article in Special Issue
Image Retrieval Based on Learning to Rank and Multiple Loss
Open AccessArticle

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

School of Resources and Environmental Engineering, Wuhan University of Technology, Wuhan 430070, China
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2019, 8(9), 417;
Received: 4 July 2019 / Revised: 8 September 2019 / Accepted: 16 September 2019 / Published: 17 September 2019
(This article belongs to the Special Issue Deep Learning and Computer Vision for GeoInformation Sciences)
Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will include too many remote sensing objects and their complex spatial relationships. This will increase the computational burden of the image captioning network and reduce its precision. If the patch size is too small, it often fails to provide enough environmental and contextual information, which makes the remote sensing object difficult to describe. To address this problem, we propose a multi-scale semantic long short-term memory network (MS-LSTM). The remote sensing images are paired into image patches with different spatial scales. First, the large-scale patches have larger sizes. We use a Visual Geometry Group (VGG) network to extract the features from the large-scale patches and input them into the improved MS-LSTM network as the semantic information, which provides a larger receptive field and more contextual semantic information for small-scale image caption so as to play the role of global perspective, thereby enabling the accurate identification of small-scale samples with the same features. Second, a small-scale patch is used to highlight remote sensing objects and simplify their spatial relations. In addition, the multi-receptive field provides perspectives from local to global. The experimental results demonstrated that compared with the original long short-term memory network (LSTM), the MS-LSTM’s Bilingual Evaluation Understudy (BLEU) has been increased by 5.6% to 0.859, thereby reflecting that the MS-LSTM has a more comprehensive receptive field, which provides more abundant semantic information and enhances the remote sensing image captions. View Full-Text
Keywords: LSTM; multi-scale; remote sensing object; image captioning LSTM; multi-scale; remote sensing object; image captioning
Show Figures

Graphical abstract

MDPI and ACS Style

Cui, W.; Zhang, D.; He, X.; Yao, M.; Wang, Z.; Hao, Y.; Li, J.; Wu, W.; Cui, W.; Huang, J. Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective. ISPRS Int. J. Geo-Inf. 2019, 8, 417.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop