remotesensing-logo

Journal Browser

Journal Browser

Artificial Intelligence and Machine Learning for Multi-Modal and Multi-Spectral Remote Sensing Image Processing

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: 30 September 2026 | Viewed by 4132

Special Issue Editors


E-Mail Website
Guest Editor
Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
Interests: image processing; computer vision; deep learning; remote sensing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Data Science and Artificial Intelligence, Chang’an University, Xi’an 710064, China
Interests: image processing; computer vision; machine learning; remote sensing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electronic and Electrical Engineering, Brunel University London, Uxbridge, UK
Interests: artificial intelligence; signal processing; biomedical signal processing; data analytics; machine learning; higher order statistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of remote sensing technology, multimodal and multispectral remote sensing data have become an indispensable source of information in fields such as Earth observation, environmental monitoring, resource management, and disaster emergency response. The fusion of multimodal remote sensing data (such as optical, radar, hyperspectral, infrared) and multispectral information can provide multidimensional and multi-scale surface feature descriptions, significantly enhancing the application value of remote sensing data. However, in the face of massive, heterogeneous, and high-dimensional remote sensing data, traditional image processing methods have limitations, such as their low efficiency, weak generalization, and reliance on manual experience in feature extraction, data fusion, target recognition, and semantic interpretation. In recent years, with the rapid development of artificial intelligence and machine learning technologies—such as convolutional neural networks, Transformers, Mamba, transfer learning, generative adversarial networks (GANs), and diffusion models—new momentum has been injected into remote sensing image processing. For example, deep learning-based end-to-end models can automatically extract spatial spectral joint features of multimodal data, significantly optimizing the accuracy of land cover classification; transfer learning alleviates the high cost of remote sensing data annotation by transferring knowledge across domains; and the diffusion model is used for data augmentation, super-resolution reconstruction, and cross-modal synthesis, effectively improving the robustness of the models. In addition, artificial intelligence technologies such as self-supervised learning, graph neural networks (GNNs), and reinforcement learning have provided more new technological paths for multimodal and multispectral remote sensing image interpretation and inference. This Special Issue aims to focus on cutting-edge research of artificial intelligence and machine learning technology in multimodal and multispectral remote sensing image processing, to promote the deep integration of theory and application, and help remote sensing technology move towards intelligence and automation.

Special Issue Submission Direction

This Special Issue invites scholars from home and abroad to submit articles on topics including, but not limited to, the following:

  1. Intelligent fusion and feature extraction of multimodal remote sensing data (optical, SAR, hyperspectral, infrared, etc.).
  2. Remote sensing image classification, object detection, and semantic segmentation based on deep learning.
  3. Application of small-sample/weakly supervised learning in remote sensing image interpretation.
  4. Generative models (such as GAN, diffusion models) and remote sensing data augmentation.
  5. Dynamic analysis and prediction of time-series remote sensing data.
  6. Intelligent Processing of 3D point clouds and stereoscopic remote sensing images.
  7. Edge computing and remote sensing real-time processing system.
  8. Research on the credibility of explainable AI and remote sensing models.
  9. Innovative applications of multimodal remote sensing in fields such as ecology, agriculture, and disasters.
  10. Construction of open-source remote sensing datasets and algorithm frameworks.

Prof. Dr. Tao Lei
Prof. Dr. Tao Gao
Prof. Dr. Lefei Zhang
Prof. Dr. Asoke K. Nandi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • transformer
  • diffusion model
  • multi-modal remote sensing data
  • multi-spectral remote sensing data
  • feature representation
  • feature fusion
  • remote sensing interpretation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 75104 KB  
Article
A Feature-Optimized Deep Learning Framework for Mapping and Spatial Characterization of Tea Plantations in Complex Mountain Landscapes
by Ruyi Wang, Jixian Zhang, Xiaoping Lu, Qi Kang, Bowen Chi, Junfeng Li, Yahang Li and Zhengfang Lou
Remote Sens. 2026, 18(9), 1281; https://doi.org/10.3390/rs18091281 - 23 Apr 2026
Viewed by 245
Abstract
The unchecked expansion of tea plantations onto steep, forest-adjacent slopes in subtropical mountains engenders a conflict between agricultural productivity and ecosystem integrity, particularly by exacerbating habitat fragmentation and soil erosion. While precise monitoring is essential to navigate this trade-off for sustainable management, accurate [...] Read more.
The unchecked expansion of tea plantations onto steep, forest-adjacent slopes in subtropical mountains engenders a conflict between agricultural productivity and ecosystem integrity, particularly by exacerbating habitat fragmentation and soil erosion. While precise monitoring is essential to navigate this trade-off for sustainable management, accurate inventorying remains a challenge due to the plantations’ strong phenological variability, heterogeneous canopy structures, and high spectral confusion with surrounding vegetation. This study proposes a feature-optimized deep learning framework for mapping and characterizing tea plantations in complex landscapes, using Xinyang City, China, as a study area. The framework integrates multi-temporal Sentinel-1/2 observations with a sequential Jeffries-Matusita (JM)-Pearson feature filtering strategy. This approach effectively condenses a 132-variable high-dimensional pool (including optical spectra, vegetation indices, textures, and SAR polarimetry) into a compact 28-feature subset (a 78.8% reduction), preserving critical phenological and structural cues while minimizing redundancy. These optimized predictors drive a hybrid VGG16–UNet++ segmentation network, which couples transfer-learning-based semantic encoding with detail-preserving dense skip fusion. Extensive experiments across 18 model–feature configurations demonstrate that the optimal setting achieves an Overall Accuracy of 97.82%, an F1-score of 0.9093, and a mean IoU of 0.7968. Notably, the method significantly reduces misclassification in rugged, cloud-prone terrain, yielding a User’s Accuracy of 91.14% for tea. Based on the generated wall-to-wall map, we derived two decision-support indicators: multi-threshold steep-slope exposure and a normalized tea–forest interface density. This framework provides actionable, high-precision spatial products to support slope-based zoning, ecological restoration, and sustainable management in fragile mountain agroforestry systems. Full article
Show Figures

Figure 1

34 pages, 19919 KB  
Article
Unsupervised Change Detection in Heterogeneous Remote Sensing Images via Dynamic Mask Guidance
by Paixin Xie, Gao Chen, Qingfeng Zhou, Xiaoyan Li and Jingwen Yan
Remote Sens. 2026, 18(7), 1022; https://doi.org/10.3390/rs18071022 - 29 Mar 2026
Viewed by 517
Abstract
Unsupervised change detection (CD) in heterogeneous remote sensing images is intrinsically difficult due to severe sensor-specific discrepancies. In the absence of ground truth, these discrepancies result in ambiguous optimization objectives that make it difficult for models to distinguish true land-cover changes from modality-driven [...] Read more.
Unsupervised change detection (CD) in heterogeneous remote sensing images is intrinsically difficult due to severe sensor-specific discrepancies. In the absence of ground truth, these discrepancies result in ambiguous optimization objectives that make it difficult for models to distinguish true land-cover changes from modality-driven pseudo-changes. To address these challenges, we propose MaskUCD, a novel unsupervised framework that reformulates heterogeneous CD as a dynamic mask-driven constraint scheduling problem. Fundamentally distinct from conventional strategies that enforce selective feature alignment, MaskUCD employs a spatially adaptive optimization mechanism. Specifically, the iteratively refined mask serves as a geometric reference to guide optimization. It enforces strict feature alignment in mask-unchanged regions to suppress modality-induced discrepancies, while simultaneously promoting feature divergence in mask-changed regions to emphasize semantic inconsistencies. In this way, explicit optimization objectives are established, together with an intrinsic interpretability constraint that guides the CD process. This strategy treats the mask as a structural guide for representation learning rather than a ground-truth reference, thereby avoiding error accumulation caused by directly using inaccurate masks as supervisory signals. To facilitate this optimization, we design a specialized asymmetric autoencoder with a hybrid encoder architecture, utilizing multi-scale frequency analysis and global context modeling to enhance feature representation capabilities. Consequently, this design enables the generation of refined and semantically consistent masks, which provide increasingly precise structural guidance, yielding converged and discriminative difference maps. Extensive experiments demonstrate that MaskUCD achieves state-of-the-art performance and superior robustness compared to existing advanced methods. Full article
Show Figures

Figure 1

20 pages, 28888 KB  
Article
GIMMNet: Geometry-Aware Interactive Multi-Modal Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery
by Qian Weng, Xiansheng Huang, Yifeng Lin, Yu Zhang, Zhaocheng Li, Cairen Jian and Jiawen Lin
Remote Sens. 2026, 18(1), 124; https://doi.org/10.3390/rs18010124 - 29 Dec 2025
Viewed by 762
Abstract
Remote sensing semantic segmentation holds significant application value in urban planning, environmental monitoring, and related fields. In recent years, multimodal approaches that fuse optical imagery with normalized Digital Surface Models (nDSM) have attracted widespread attention due to their superior performance. However, existing methods [...] Read more.
Remote sensing semantic segmentation holds significant application value in urban planning, environmental monitoring, and related fields. In recent years, multimodal approaches that fuse optical imagery with normalized Digital Surface Models (nDSM) have attracted widespread attention due to their superior performance. However, existing methods typically treat nDSM merely as an additional input channel, failing to effectively exploit its inherent 3D geometric priors, which limits segmentation accuracy in complex urban scenes. To address this issue, we propose a Geometry-aware Interactive Multi-Modal Network (GIMMNet), which explicitly models the geometric structure embedded in nDSM to guide the spatial distribution of semantic categories. Specifically, we first design a Geometric Position Prior Module (GPPM) to construct 3D coordinates for each pixel based on nDSM and extract intrinsic geometric priors. Next, a Geometry-Guided Disentangled Fusion Module (GDFM) dynamically adjusts fusion weights according to the differential responses of each modality to the geometric priors, enabling adaptive multimodal feature integration. Finally, during decoding, a Geometry-Attentive Context Module (GACM) explicitly captures the dependencies between land-cover categories and geometric structures, enhancing the model’s spatial awareness and semantic recovery capability. Experimental results on two public remote sensing datasets—Vaihingen and Potsdam—show that the proposed GIMMNet outperforms existing mainstream methods in segmentation performance, demonstrating that enhancing the model’s geometric perception capability effectively improves semantic segmentation accuracy. Notably, our method achieves an mIoU of 85.2% on the Potsdam dataset, surpassing the second-best multimodal approach, PACSCNet, by 2.3%. Full article
Show Figures

Figure 1

30 pages, 23104 KB  
Article
MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion
by Guolong Wu and Yimin Lu
Remote Sens. 2025, 17(20), 3425; https://doi.org/10.3390/rs17203425 - 13 Oct 2025
Cited by 2 | Viewed by 1548
Abstract
Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address [...] Read more.
Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address this challenge, we constructed a multi-modal dataset from five Chinese coastal regions using cloud detection methods and developed Multi-modal Spatial–Frequency Adaptive Fusion Network (MSAFNet) for optical-radar data fusion. MSAFNet employs a dual-path architecture utilizing a Multi-scale Dual-path Feature Module (MDFM) that combines CNN and Transformer capabilities to extract multi-scale features. Additionally, it implements a Dynamic Frequency Domain Adaptive Fusion Module (DFAFM) to achieve deep integration of multi-modal features in both spatial and frequency domains, effectively leveraging the complementary advantages of different sensor data. Results demonstrate that MSAFNet achieves 76.93% mean intersection over union (mIoU), 86.96% mean F1 score (mF1), and 93.26% mean Kappa coefficient (mKappa) in extracting floating raft aquaculture (FRA) and cage aquaculture (CA), significantly outperforming existing methods. Applied to China’s coastal waters, the model generated 2020 nearshore aquaculture distribution maps, demonstrating its generalization capability and practical value in complex marine environments. This approach provides reliable technical support for marine resource management and ecological monitoring. Full article
Show Figures

Figure 1

Back to TopTop