remotesensing-logo

Journal Browser

Journal Browser

Advancements of Vision-Language Models (VLMs) in Remote Sensing

A special issue of Remote Sensing (ISSN 2072-4292).

Deadline for manuscript submissions: 15 July 2025 | Viewed by 1088

Special Issue Editors


E-Mail
Guest Editor
School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Interests: object information representation and intelligent recognition; hyperspectral image classification

E-Mail Website
Guest Editor
Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China
Interests: vision-language models; remote sensing; intelligent transportation; egocentric vision

E-Mail Website
Guest Editor
Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong 999077, China
Interests: vision-language models; remote sensing; graph learning; noise-based models

E-Mail Website
Guest Editor
The School of Artificial Intelligence, Xidian University, Xi'an 710071, China
Interests: image processing; machine learning; remote sensing image interpretation

E-Mail Website
Guest Editor
Department of Land Surveying and Geo-Informatics, Hong Kong Polytechnic University, Hong Kong 999077, China
Interests: geography; urban climate; urban thermal infrared remote sensing; artificial intelligence; change detection; spatiotemporal big data; climate change and complex extreme weather; environmental exposure and health assessment

Special Issue Information

Dear Colleague

The research line of vision–language models (VLMs) promotes remote sensing progress, using natural and significant multimodal (vision and language) data processing abilities. This technology provides a feasible scheme for designing more effective paradigms of traditional remote sensing tasks (e.g., object detection, road extraction, etc.). Furthermore, the remarkable data processing ability of VLMs makes it possible to accomplish complex multimodal tasks (e.g., remote sensing VQA) that previous visual models found hard to achieve in an effective and efficient way. As the complexity of the global geopolitical landscape intensifies, there is a particular need for more efficient remote sensing image processing technologies as they become more critical in information-driven warfare. VLMs offer a powerful tool for gathering intelligence from remote sensing images; the technology’s potential in future multimodal remote sensing applications is highly promising and warrants further exploration, especially in the context of ground surveillance and reconnaissance.

This Special Issue aims to explore the advancements of VLMs in remote sensing, including, but not limited to, the improvements brought by VLMs on traditional remote sensing tasks, the application of VLMs on multimodal tasks in remote sensing, and the discussion of VLMs on the potential future tasks that may emerge due to evolving situational requirements and remote sensing technological developments.

This Special Issue seeks to cover a wide range of topics related to the advancements of VLMs in remote sensing, including, but not limited to, the following aspects:

  1. VLMs for multimodal remote sensing data fusion;
  2. Automatic annotation and the description of remote sensing images based on VLMs;
  3. VLMs for the semantic segmentation and change detection of remote sensing images;
  4. Object detection and tracking of remote sensing images based on VLMs;
  5. Super-resolution reconstruction of remote sensing images using VLMs;
  6. Land cover and scene classification based on VLMs;
  7. VLMs for multimodal remote sensing image alignment;
  8. VLMs for open-world remote sensing tasks;
  9. New multimodal remote sensing benchmark datasets;
  10. Efficient multimodal retrieval of remote sensing images;
  11. The large and generalized pretraining of VLMs for remote sensing.

Dr. Zhixi Feng
Dr. Chuang Yang
Dr. Hongyuan Zhang
Dr. Chen Yang
Dr. Yue Chang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • vision–language models (VLMs)
  • multimodal data fusion
  • remote sensing VQA
  • automatic annotation
  • semantic segmentation
  • object detection
  • super-resolution reconstruction
  • change detection
  • pretraining models
  • multimodal retrieval

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 4563 KiB  
Article
Attribute-Based Learning for Remote Sensing Image Captioning in Unseen Scenes
by Zhang Guo, Haomin Liu, Zihao Ren, Licheng Jiao, Shuiping Gou and Ruimin Li
Remote Sens. 2025, 17(7), 1237; https://doi.org/10.3390/rs17071237 - 31 Mar 2025
Viewed by 314
Abstract
Remote sensing image captioning (RSIC) aims to describe ground objects and scenes within remote sensing images in natural language form. As the complexity and diversity of scenes in remote sensing images increase, existing methods, although effective in specific tasks, are largely trained on [...] Read more.
Remote sensing image captioning (RSIC) aims to describe ground objects and scenes within remote sensing images in natural language form. As the complexity and diversity of scenes in remote sensing images increase, existing methods, although effective in specific tasks, are largely trained on particular scene images and corpora. This limits their ability to generate descriptions for scenes not encountered during training. Given the finite resources for data annotation and the expanding range of application scenarios, training data typically cover only a subset of common scenes, leaving many potential scene types unrepresented. Consequently, developing models capable of effectively handling unseen scenes with limited training data is imperative. This study introduces an innovative remote sensing image captioning model based on scene attribute learning—SALCap. The proposed model defines scene attributes and employs a specifically designed global object scene attribute extractor to capture these attributes. It then uses an attribute inference module to predict scene information through scene attributes, ensuring that this part of the scene’s information is reused in sentence generation through additional attribute loss. Experiments show that the method not only improves the accuracy of the description but also significantly enhances the model’s adaptability and generalizability relative to unseen scenes. This advancement expands the practical utility of remote sensing image captioning across diverse scenarios, particularly under the constraints of limited annotations. Full article
(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)
Show Figures

Graphical abstract

Back to TopTop