MDPI - Publisher of Open Access Journals

20 pages, 30488 KB

Open AccessArticle

Hierarchical Scale-Adaptive Diffusion Priors for Efficient Remote Sensing Dehazing

by Wei Ju, Zheng Liang, Huan Chen and Jie Shen

Remote Sens. 2026, 18(12), 1907; https://doi.org/10.3390/rs18121907 - 9 Jun 2026

Viewed by 171

Remote sensing image dehazing remains a formidable challenge due to complex atmospheric scattering and large-scale spatially varying degradation, which severely compromise fine-grained surface details. While recent diffusion-based restoration frameworks, such as DiffIR, have achieved remarkable efficiency by injecting compact diffusion priors into deterministic [...] Read more.

Remote sensing image dehazing remains a formidable challenge due to complex atmospheric scattering and large-scale spatially varying degradation, which severely compromise fine-grained surface details. While recent diffusion-based restoration frameworks, such as DiffIR, have achieved remarkable efficiency by injecting compact diffusion priors into deterministic networks, they typically rely on a monolithic global Image Prior Representation (IPR). However, such a global design is suboptimal for the dehazed results of remote sensing imagery, where haze distribution exhibits strong spatial heterogeneity and scale dependency. To address this limitation, this paper presents the Hierarchical and Scale-Adaptive Diffusion Prior (HS-DiffIR) framework. Specifically, Hierarchical Image Prior Representation decomposes the holistic diffusion latent into multi-scale priors aligned with the hierarchical stages of the restoration network. Such a design facilitates fine-grained, scale-aware guidance by projecting the compact global latent into layer-specific representations, thereby bypassing the computational burden of high-dimensional generative modeling. Complementing this, the Scale-Adaptive Injection mechanism utilizes lightweight learnable coefficients to dynamically modulate the influence of diffusion priors across different feature scales, allowing the network to adaptively balance global semantic consistency and local detail recovery under dense-haze conditions. Evaluations on remote sensing benchmarks confirm that HS-DiffIR generally outperforms the DiffIR baseline. The method yields superior quantitative metrics (particularly PSNR) at a marginal computational cost while demonstrating robust detail restoration in regions subject to severe, spatially variant haze. Full article

(This article belongs to the Special Issue Hyperspectral Remote Sensing Image Analysis via Advanced Deep Learning and Computer Vision)

► Show Figures

Figure 1

35 pages, 1263 KB

Open AccessSystematic Review

Advances in Artificial Intelligence-Enabled Crop Pest and Disease Detection: A Systematic Review

by Zhen Ma, Cundeng Wang, Xinzhong Wang and Xuegeng Chen

Agriculture 2026, 16(12), 1262; https://doi.org/10.3390/agriculture16121262 - 7 Jun 2026

Viewed by 444

Abstract

The detection technology of crop diseases and pests is transitioning from single sensor monitoring to intelligent perception and multimodal fusion. This paper follows the PRISMA 2020 standard and systematically reviews the relevant core literature. This paper systematically summarizes the development history of spectral [...] Read more.

The detection technology of crop diseases and pests is transitioning from single sensor monitoring to intelligent perception and multimodal fusion. This paper follows the PRISMA 2020 standard and systematically reviews the relevant core literature. This paper systematically summarizes the development history of spectral sensing technology and analyzes the physical mechanisms of hyperspectral and multispectral imaging in early identification of crop diseases. The focus is on the architectural evolution of deep learning models, including lightweight convolutional neural networks (CNNs), vision transformers (ViTs) with long-range dependency modeling capabilities, and the efficient computing state space model Mamba. In addition, the research progress of spatial spectral joint learning, heterogeneous data fusion, and vision-language models (VLMs) in improving system robustness and interpretability are introduced. By synthesizing the integrated applications of UAV remote sensing, Internet of Things (IoT) edge computing and intelligent robots in staple and cash crops, this paper summarizes the implementation of the integrated system of perception, decision-making and execution. To address the issues of insufficient cross-domain generalization ability and uneven allocation of computing resources in existing models, this paper provides perspectives on the future development of agricultural artificial intelligence (AI) towards foundation model-driven, edge-intelligent collaboration, and green sustainable direction, which can provide theoretical reference for engineering applications in the field of intelligent plant protection. Full article

(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)

► Show Figures

Figure 1

19 pages, 15016 KB

Open AccessArticle

Reliability-Weighted Spatial Coverage Sampling (SCS+R) for High-Precision Image Geometric Correction via GCP Selection

by Menghan Wu, Shengbo Chen, Xitong Xu, Yaqi Zhang, Yuqiao Suo, Jiaqi Yang, Jinchen Zhu, Aonan Zhang and Qiqi Li

Appl. Sci. 2026, 16(11), 5422; https://doi.org/10.3390/app16115422 - 29 May 2026

Viewed by 151

Abstract

Ground control point (GCP) selection is a critical step in the automated high-precision geometric correction of remote sensing imagery. While the quantity, quality, and distribution of GCPs are three factors which may affect the accuracy of geometric correction, traditional automated selection methods predominantly [...] Read more.

Ground control point (GCP) selection is a critical step in the automated high-precision geometric correction of remote sensing imagery. While the quantity, quality, and distribution of GCPs are three factors which may affect the accuracy of geometric correction, traditional automated selection methods predominantly focus on optimizing spatial distribution, often neglecting the inherent quality heterogeneity within matched point sets. This paper proposes a Reliability-weighted Spatial Coverage Sampling (SCS+R) method, which integrates matching reliability into the spatial coverage sampling framework via an adaptive weight factor (α). Experiments using Gaofen-2 (GF-2) imagery demonstrate that with 58 GCPs selected by SCS+R, the relative geometric consistency with the reference imagery is improved to a sub-pixel level (1.55–2.23 m) for multispectral images and within two pixels (0.99–1.81 m) for panchromatic images. Compared to the standard SCS, Voronoi, and weighted Voronoi methods, SCS+R improves the average accuracy by approximately 25%, 16%, and 8%, respectively. These results verify the enhanced stability and robustness of the proposed method in complex environments. Moreover, the optimal adaptive reliability weight α consistently stabilizes in a low range of 0.1–0.3, quantitatively revealing a key principle for small-sample GCP selection: spatial uniformity provides the foundation, while point reliability is the key to achieving high precision. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

25 pages, 26048 KB

Open AccessArticle

MACER-UNet: A Connected Rural Road Extraction Model Integrating Multi-Scale Perception and Edge Enhancement

by Shaoshuai Tang, Sijia Li, Xingming Zheng and Jianhua Ren

Remote Sens. 2026, 18(11), 1724; https://doi.org/10.3390/rs18111724 - 27 May 2026

Viewed by 199

Abstract

Extracting rural road networks from remote sensing images is crucial for data-driven precision agriculture planning. However, traditional semantic segmentation methods often struggle to achieve both high-precision boundary delineation and topological integrity, especially in heterogeneous rural landscapes. To address these issues, this study proposes [...] Read more.

Extracting rural road networks from remote sensing images is crucial for data-driven precision agriculture planning. However, traditional semantic segmentation methods often struggle to achieve both high-precision boundary delineation and topological integrity, especially in heterogeneous rural landscapes. To address these issues, this study proposes MACER-UNet, a novel connectivity-aware road extraction model that integrates multi-scale perception and edge enhancement capabilities. Specifically, MACER-UNet employs ResNet-50 as the backbone network to extract robust deep semantic features. Within the encoder–decoder framework, an atrous spatial pyramid pooling module (ASPP) is embedded to capture rich multi-scale context cues, thereby enhancing robustness to varying road widths and inconsistent imaging conditions. During the decoding process, the convolutional block attention module (CBAM) recalibrates features to reduce noise from the agricultural background. The edge enhancement module (EEM) extracts high-frequency gradient cues for geometric correction and boundary sharpening. This architecture combines spatial attention and edge constraints to balance recognition accuracy and topological connectivity. On the public WHU-CR dataset, MACER-UNet achieved an intersection over union (IoU) of 50.37% and an F1 score of 67.02%, outperforming U-Net (44.27%), DeepLabv3+ (49.43%), and D-LinkNet (49.54%), and its connectivity was comparable to recent state-of-the-art road extraction methods such as C²Net (49.37%) and CGCNet (50.34%). On a self-built dataset with a 3 m resolution in Suihua, the model achieved an IoU of 42.56% and an F1 score of 59.71%. The evaluation results confirm that MACER-UNet provides a road network with geometric consistency and topological integrity for spatial analysis in rural environments. Full article

(This article belongs to the Special Issue Road Extraction and Distress Assessment by Spaceborne, Airborne and Terrestrial Platforms (Second Edition))

► Show Figures

Figure 1

29 pages, 38227 KB

Open AccessArticle

Progressive Deep Learning for Accurate Winter Rapeseed Mapping in Complex Terrain: A Case Study of Hanzhong Basin, China

by Fang Yin, Xinjie Yu, Yao Wang and Lei Liu

Remote Sens. 2026, 18(11), 1706; https://doi.org/10.3390/rs18111706 - 25 May 2026

Viewed by 201

Abstract

Accurate mapping of winter rapeseed cultivation areas is crucial for food security assessment and agricultural resource management, yet remains a persistent challenge in mountainous regions characterized by complex topography and highly fragmented field parcels. To address these challenges, this study develops a progressive [...] Read more.

Accurate mapping of winter rapeseed cultivation areas is crucial for food security assessment and agricultural resource management, yet remains a persistent challenge in mountainous regions characterized by complex topography and highly fragmented field parcels. To address these challenges, this study develops a progressive deep learning framework using single growing-season data from the Hanzhong Basin. We conducted a structured comparison of remote sensing indices, machine learning, and deep learning approaches for rapeseed identification in heterogeneous landscapes. First, sensitivity analysis of the Flowering Index for Rapeseed was performed to identify the optimal parameterization, yielding high inter-class separability (ND = 0.959) during peak flowering and a threshold-based overall accuracy (OA) of 94.41%. Second, a multidimensional feature space was constructed by integrating Sentinel-2 spectral bands, image texture metrics, and topographic variables; Random Forest-based feature importance selection subsequently enhanced Support Vector Machine classification performance to an OA of 90.70%. Third, we proposed an innovative three-stage progressive UNet++ architecture: Stage1 focuses on binary rapeseed/non-rapeseed classification to establish spatial priors; Stage2 refines discrimination among spectrally similar vegetation classes (rapeseed and other vegetation); and Stage3 achieves comprehensive seven-class semantic segmentation. A weighted focal loss function combined with a weight inheritance mechanism was employed to mitigate class imbalance and facilitate inter-stage knowledge transfer. The final model attained an OA of 98.65% and a mean intersection over union of 95.29%, while effectively suppressing salt-and-pepper noise artifacts in geometrically fragmented parcels. Our findings demonstrate the substantial advantages of progressive deep learning strategies for crop monitoring in topographically constrained environments. Full article

(This article belongs to the Special Issue The Emerging Trends and Applications of Big Data and Machine Learning/Artificial Intelligence (AI) in Remote Sensing II)

► Show Figures

Figure 1

26 pages, 2914 KB

Open AccessReview

A Review of Multimodal Image Feature Fusion Technology and Application

by Pingping Cao, Yuting Zhao, Tao Duan, Linguo Li, Chaole Xian and Shujing Li

Appl. Sci. 2026, 16(11), 5290; https://doi.org/10.3390/app16115290 - 25 May 2026

Viewed by 170

Abstract

Multimodal image fusion has emerged as a core technology for complex perception systems—such as autonomous driving, remote sensing monitoring, and medical diagnosis—by integrating complementary information from heterogeneous sensors. Given the rapid technological evolution within this field, particularly driven by the emergence of Mamba [...] Read more.

Multimodal image fusion has emerged as a core technology for complex perception systems—such as autonomous driving, remote sensing monitoring, and medical diagnosis—by integrating complementary information from heterogeneous sensors. Given the rapid technological evolution within this field, particularly driven by the emergence of Mamba architectures, Generative Diffusion Models, and Vision Foundation Models (VFMs), traditional classification methods no longer fully encompass the ongoing paradigm shifts. Following the PRISMA guidelines to ensure the objectivity and reproducibility of the findings, this paper provides a systematic literature review and data extraction for multimodal image feature fusion. Under this standardized framework, a five-dimensional decoupling classification architecture is proposed to deconstruct models across fusion hierarchy, backbone architecture, fusion operator, supervision paradigm, and deployment constraints. Specifically, the analysis highlights the linear computational efficiency of Mamba in long-sequence modeling, the high-fidelity reconstruction capabilities of diffusion models via generative priors, and the universal semantic alignment achieved by VFMs. Furthermore, this study summarizes qualitative and quantitative evaluation metrics alongside cross-domain public datasets for performance benchmarking while discussing critical future directions, including cross-modal alignment in complex environments, parameter-efficient fine-tuning of large models, and real-time inference at the edge. Full article

► Show Figures

Figure 1

24 pages, 20331 KB

Open AccessArticle

Fine-Grained Perception and Spatial Heterogeneity Analysis of Streetscapes Within Beijing’s 5th Ring Road Based on a Multi-Task Fine-Tuning Framework

by Yuhe Hu, Haiming Qin, Nan Chen, Linhe Song, Shuo Wang and Weiqi Zhou

Sustainability 2026, 18(11), 5256; https://doi.org/10.3390/su18115256 - 23 May 2026

Viewed by 309

Abstract

Deep learning-powered Street View Imagery (SVI) analytics provides a critical mechanism for smart city perception within the framework of Sustainable Development Goal 11 (SDG 11), effectively bridging the gap left by traditional remote sensing in fine-grained street-level observation. Over the years, deep learning-based [...] Read more.

Deep learning-powered Street View Imagery (SVI) analytics provides a critical mechanism for smart city perception within the framework of Sustainable Development Goal 11 (SDG 11), effectively bridging the gap left by traditional remote sensing in fine-grained street-level observation. Over the years, deep learning-based semantic segmentation of urban streetscapes has become the dominant paradigm. However, when scaling to megacity measurements, current research faces the dual bottlenecks of “computational redundancy” and the “geographical domain shift” caused by the blind application of pre-trained models based on Western datasets. To address these challenges, this study is the first to systematically quantify the performance trade-off between Multi-Task Learning (MTL) and Single-Task Learning (STL) in megacity scenarios. Using this as a baseline, we constructed and validated a “low-computation, high-robustness” framework for streetscape semantic perception and spatial measurement. Relying on an integrated ResNeXt101-FPN MTL architecture and an ultra-low-cost fine-tuning strategy to overcome geographical domain shift, we extracted and analyzed the spatial heterogeneity of five core semantic elements—vegetation, sky, building, road, and vehicle—across the road network within Beijing’s 5th Ring Road. The results indicate the following: (1) We explicitly defined the computation-accuracy trade-off of MTL and STL in megacity perception. While utilizing only 1/5 of the parameters of STL, the MTL framework achieved a 5.34-fold increase in inference speed with a negligible 0.1% loss in overall mean Intersection over Union (mIoU); however, a 27.13% decrease in boundary segmentation accuracy was observed. (2) We established a low-cost, localized correction paradigm to overcome domain shift. Utilizing a minimal annotation cost (only 200 local images) significantly improved cross-domain adaptability, boosting the overall mIoU by 8.92% and significantly mitigating the geographical domain shift problem. (3) Multi-dimensional measurement and spatial analysis revealed a significant spatial decoupling pattern in Beijing’s streetscapes. The visual proportion of vegetation exhibited a pronounced “north-high, south-low” spatial differentiation, whereas built environment elements (e.g., building and road) displayed a typical “center-periphery” concentric gradient. This objectively reflects the spatial inequality of urban street greenery resources and the monocentric development characteristics of the built environment. The proposed framework therefore serves as a low-cost, AI-driven computational paradigm for smart city perception in resource-constrained regions. Furthermore, the revealed spatial heterogeneity offers data-driven insights for formulating sustainable urban renewal policies aligned with SDG 11. Full article

(This article belongs to the Special Issue Leveraging AI and Deep Learning for Smart Cities: Challenges, Opportunities, and Applications to Sustainable Development)

► Show Figures

Figure 1

24 pages, 62422 KB

Open AccessArticle

GDBNet: A Three-Branch Semantic Segmentation Network Integrating CNN and Transformer for Land Cover Classification in Ski Resorts

by Zhiwei Yi, Lingjia Gu, Ruifei Zhu, Junwei Tian and He Mi

Remote Sens. 2026, 18(10), 1666; https://doi.org/10.3390/rs18101666 - 21 May 2026

Viewed by 193

Abstract

As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and [...] Read more.

As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and convolutional neural networks (CNNs) have emerged as mainstream solutions, their performance remains limited on high-resolution remote sensing data characterized by small datasets and high heterogeneity. Targeting land cover classification in ski resort areas, this study proposes a triple-branch segmentation framework integrating CNNs and Transformers to extract global, detail and boundary features (GDBNet), and constructs the first high-resolution ski resort land cover dataset with a resolution of 0.75 m using JiLin-1 satellite constellation (LULC_SKI). The framework employs a backbone combining SegFormer with dual CNN branches. SegFormer captures global semantic context, while dual ResNet-18 branches extract local semantics and edge details respectively. The neck integrates two specialized feature interaction modules, the proposed Pixel-Guided Feature Attention (PG-AFM) and Boundary-Guided Feature Attention (BG-AFM), which synergistically fuse these heterogeneous feature representations for enhanced multi-scale modeling. For the segmentation head, a multi-task learning approach supervises both semantic and edge outputs. LULC_SKI covers seven representative ski resorts in Jilin Province, China, comprising 10,000 multi-seasonal images annotated with six land cover classes, including roads, vegetation, built-up areas, ski runs, water bodies, and cropland. Experiments demonstrate GDBNet achieves 85.44% mIoU and 91.84% mF1 on LULC_SKI, outperforming other advanced models with particularly significant improvements for linear objects like roads and ski runs. Extensive experimental comparisons show that GDBNet delivers consistently excellent performance on both the iSAID and LoveDA datasets, underscoring the superiority of our proposed method. Ablation studies validate the effectiveness of the triple-branch architecture, attention modules, and multi-task supervision. This work proposes a modular framework for land cover classification in complex ski resort scenarios. Full article

(This article belongs to the Special Issue Signal Processing, Image Processing and Fusion Techniques in Remote Sensing)

► Show Figures

Figure 1

18 pages, 2135 KB

Open AccessArticle

An Efficient Remote Sensing Cross-Modal Retrieval Method Based on Hashing Contrastive Learning

by Jifei Fang and Dali Zhu

Remote Sens. 2026, 18(10), 1630; https://doi.org/10.3390/rs18101630 - 19 May 2026

Viewed by 183

Abstract

Cross-modal image–text retrieval enables searching and retrieving of semantically relevant data across heterogeneous modalities, acting as a pivotal technology for interpreting massive remote sensing (RS) data. Despite recent progress, most existing methods in remote sensing cross-modal image–text retrieval (RSCIR) rely on high-dimensional real-valued [...] Read more.

Cross-modal image–text retrieval enables searching and retrieving of semantically relevant data across heterogeneous modalities, acting as a pivotal technology for interpreting massive remote sensing (RS) data. Despite recent progress, most existing methods in remote sensing cross-modal image–text retrieval (RSCIR) rely on high-dimensional real-valued embeddings, which suffer from excessive storage overhead and slow retrieval speeds, severely limiting their scalability in real-world applications. Conversely, while hashing techniques offer efficiency, traditional methods often fail to preserve the fine-grained semantic consistency required for complex RS scenes, leading to significant performance degradation. To bridge this gap, this paper proposes a novel framework named ConHash (Cross-modal Contrastive Hashing), which transfers the discriminative power of pre-trained vision–language models into a compact binary Hamming space. Specifically, ConHash comprises three synergistic components: (1) a hash module designed to project continuous embeddings into a latent discrete space while reducing information loss; (2) a hash-aware contrastive constraint that enforces cross-modal alignment directly in the hash space; and (3) a collaborative hybrid optimization strategy that jointly constrains real-valued embeddings and hash representations. Extensive experiments on RSICD and RSITMD demonstrate that ConHash achieves a favorable balance between accuracy and efficiency. Using 512-bit hash codes with L1 quantization loss as the main configuration, ConHash achieves mR values of 21.69% and 35.79% on RSICD and RSITMD, respectively. It also provides up to 3.50× retrieval speedup and a 32× theoretical storage reduction compared with 512-dimensional float32 embeddings, making it suitable for scalable remote sensing retrieval applications. Full article

(This article belongs to the Special Issue Multimodal Learning for Intelligent Remote Sensing Interpretation)

► Show Figures

Figure 1

24 pages, 6147 KB

Open AccessArticle

Multi-Scale Transformer-Based Neural Architecture Search for Hyperspectral Image Classification

by Aili Wang, Xinyu Liu and Haisong Chen

Remote Sens. 2026, 18(10), 1586; https://doi.org/10.3390/rs18101586 - 15 May 2026

Viewed by 232

Abstract

Hyperspectral image classification (HSIC) is a crucial task for remote sensing applications, requiring accurate pixel-level labeling while effectively capturing both spectral and spatial information. Traditional convolutional neural network architectures often struggle to balance local texture detail and global contextual consistency, and existing neural [...] Read more.

Hyperspectral image classification (HSIC) is a crucial task for remote sensing applications, requiring accurate pixel-level labeling while effectively capturing both spectral and spatial information. Traditional convolutional neural network architectures often struggle to balance local texture detail and global contextual consistency, and existing neural architecture search (NAS) methods rarely incorporate attention mechanisms, limiting their performance. To address these challenges, this study proposes a multi-scale Transformer-based NAS framework (TR-NAS) for fine-grained hyperspectral image classification. The framework combines local cube sampling, shallow and deep multi-scale convolutions, and a searchable Transformer module that adaptively selects global, local window, and multi-scale attention operators. Lightweight enhanced convolution operators, including dual-gated (DG-Conv) and mixed depthwise (MixConv) convolutions, are incorporated to improve spectral discrimination and scale robustness. Extensive experiments on the PU and Hanchuan datasets demonstrate that TR-NAS achieves superior classification accuracy, stability, and boundary consistency compared to traditional methods and existing NAS architectures, showing improved robustness to spectral similarity and spatial heterogeneity in complex remote sensing scenes. Full article

(This article belongs to the Special Issue Deep Learning for Multi-Sensor Remote Sensing: Advancements in Image Classification and Semantic Segmentation)

► Show Figures

Figure 1

27 pages, 6893 KB

Open AccessArticle

LoRA-Based Deep Learning for High-Fidelity Satellite Image Super-Resolution in Big Data Remote Sensing

by Noha Rashad Mahmoud, Hussam Elbehiery, Basheer Abdel Fattah Youssef and Hanaa Bayomi Ali Mobarz

Computers 2026, 15(5), 313; https://doi.org/10.3390/computers15050313 - 14 May 2026

Viewed by 450

Abstract

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer [...] Read more.

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer a promising alternative to the critical research challenge, especially given the diversity and large scale of satellite datasets. While deep learning-based super-resolution models have been very promising recently, their effectiveness, efficiency, and scalability across heterogeneous satellite scenes are not well studied. This work studies the performance of representative deep learning Super-Resolution frameworks, including the Enhanced Super-Resolution Generative Adversarial Network. (ESRGAN), Swin Transformer for Image Restoration (SwinIR), and latent diffusion models (LDM), under unified experimental conditions using the WorldStrat dataset. The main goal is to establish whether adaptation strategies for parameter efficiency can boost reconstruction quality while reducing computational and training costs. Toward this goal, we investigate hybrid sequential pipelines, ensemble averaging, and Low-Rank Adaptation (LoRA)–based fine-tuning. The experiments indicate that these pipelines, which use multi-model methods, achieve only marginal performance gains while incurring substantial increases in computational complexity. LoRA-Based Fine-Tuning, by contrast, has demonstrated superiority in enhancing reconstruction accuracy and quality across all model families, despite using only a small percentage of trainable parameters. LoRA-based models demonstrate superiority over multi-model methods in both efficiency and performance. The presented results confirm that LoRA is an effective and accessible technique for high-fidelity satellite-based super-resolution image synthesis. The manuscript identifies LoRA as one of the enabling technologies advancing the state of the art in Deep Learning-based Super Resolution for large-scale satellite-based image synthesis. Full article

(This article belongs to the Special Issue Machine Learning: Techniques, Industry Applications, Code Sharing, and Future Trends)

► Show Figures

Figure 1

30 pages, 2617 KB

Open AccessArticle

Time-Efficient Multi-Region SAR Imaging with Heterogeneous UAVs: Joint Task Assignment and Path Planning

by Deyu Song, Xiangyin Zhang, Baichuan Wang, Yalin Zhong, Yuan Yao and Kaiyu Qin

Remote Sens. 2026, 18(10), 1558; https://doi.org/10.3390/rs18101558 - 13 May 2026

Viewed by 358

Abstract

Unmanned aerial vehicles (UAVs) provide a highly flexible platform for synthetic aperture radar (SAR), enabling efficient, high-quality imaging in remote sensing applications. In realistic imaging missions, regions of interest (ROIs) usually have different sizes and spatial distributions. While deploying SAR-UAVs with heterogeneous flight [...] Read more.

Unmanned aerial vehicles (UAVs) provide a highly flexible platform for synthetic aperture radar (SAR), enabling efficient, high-quality imaging in remote sensing applications. In realistic imaging missions, regions of interest (ROIs) usually have different sizes and spatial distributions. While deploying SAR-UAVs with heterogeneous flight and imaging capabilities can improve mission time efficiency, realizing this improvement depends critically on task assignment and path planning. In this paper, the joint task assignment and path planning problem for heterogeneous SAR-UAVs in multi-region imaging missions is addressed. First, flight and imaging models of SAR-UAVs are established, and a constrained optimization problem is formulated to minimize the mission completion time. Then, an improved clustering strategy based on area-density and cost prediction (ADCP) is proposed to align ROI-dependent imaging workloads with heterogeneous SAR-UAV capabilities, thereby leveraging capability advantages and reducing the mission completion time. Finally, a discrete secretary bird optimization algorithm (DSBOA) is developed to generate feasible, high-quality paths. To accelerate convergence, UAV paths are encoded as waypoint sequences, and a mutation-based operator is introduced to update the population. Extensive Monte Carlo simulations show that the proposed approach consistently outperforms the baselines in mission completion time, demonstrating its effectiveness in improving time efficiency for multi-region SAR imaging missions. Ablation experiments further confirm the independent contributions of the proposed ADCP method and DSBOA algorithm. Full article

(This article belongs to the Special Issue Advancing UAV-Based Remote Sensing: Innovations, Techniques and Applications (Second Edition))

► Show Figures

Figure 1

17 pages, 3640 KB

Open AccessCommunication

A Dual-Modal Mixture-of-Experts Attention U-Net (DMoE-AttU-Net) for Change Detection Using Heterogeneous Optical and SAR Remote Sensing Images

by Seyed Ehsan Khankeshizadeh, Ali Mohammadzadeh, Ali Jamali and Sadegh Jamali

Remote Sens. 2026, 18(10), 1508; https://doi.org/10.3390/rs18101508 - 11 May 2026

Viewed by 592

Abstract

Binary change detection (BCD) using heterogeneous optical and SAR imagery faces challenges due to modality-specific noise and the lack of adaptive fusion strategies. Existing methods often fail to suppress SAR speckle noise and accurately localize fine boundaries. This study proposes a novel deep [...] Read more.

Binary change detection (BCD) using heterogeneous optical and SAR imagery faces challenges due to modality-specific noise and the lack of adaptive fusion strategies. Existing methods often fail to suppress SAR speckle noise and accurately localize fine boundaries. This study proposes a novel deep architecture, termed Dual-Modal Mixture-of-Experts Attention U-Net (DMoE-AttU-Net), featuring (i) dual-stream encoders for modality-specific feature extraction, (ii) a mixture-of-experts (MoE) module in the SAR stream with a gating network for dynamic fusion, (iii) Squeeze-and-Excitation (SE) and spatial attention mechanisms in the decoder, and (iv) hierarchical skip connections for multi-scale fusion. Unlike existing multimodal change detection frameworks that apply uniform feature fusion, the proposed architecture introduces a modality-aware design in which the MoE mechanism is selectively applied to the SAR stream, enabling adaptive suppression of speckle noise while preserving complementary optical information. These components collectively enhance change localization and reduce noise-induced artifacts. The proposed model achieved a mean IoU of 0.855 and a kappa coefficient of 0.836 on three optical–SAR datasets, outperforming state-of-the-art methods in both accuracy and spatial consistency. Full article

(This article belongs to the Section Remote Sensing Perspective)

► Show Figures

Figure 1

27 pages, 2483 KB

Open AccessReview

Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies

by Virgilio Zúñiga-Grajeda, Jennifer Aleysha Lomeli, Freddy Hernán Villota-González, César Alejandro García-García and Belkis Sulbarán-Rangel

Limnol. Rev. 2026, 26(2), 19; https://doi.org/10.3390/limnolrev26020019 - 10 May 2026

Viewed by 727

Abstract

Freshwater ecosystems are increasingly affected by eutrophication, sediment loading, and other anthropogenic pressures, creating a growing need for monitoring frameworks that are spatially extensive, temporally consistent, and methodologically robust. Although in situ sampling remains essential, its limited spatial coverage and operational constraints have [...] Read more.

Freshwater ecosystems are increasingly affected by eutrophication, sediment loading, and other anthropogenic pressures, creating a growing need for monitoring frameworks that are spatially extensive, temporally consistent, and methodologically robust. Although in situ sampling remains essential, its limited spatial coverage and operational constraints have accelerated the use of satellite remote sensing combined with artificial intelligence (AI) and machine learning (ML) for water quality assessment. This review critically examines recent studies published between 2020 and March 2026 on the estimation of physicochemical water quality parameters in lakes and rivers using remote sensing, with particular attention to the methodological structure of image processing workflows rather than performance metrics alone. The synthesis shows that predictive performance is strongly conditioned by three interrelated stages: atmospheric correction (AC), spectral feature construction, and validation design. Across the reviewed studies, substantial variation is observed in atmospheric correction processors, spectral engineering strategies, and model architectures, leading to differences in the spectral inputs and analytical conditions used for model development. Validation approaches remain highly heterogeneous and often rely on internal data splits without geographically independent testing, which weakens claims of model generalizability. In addition, few studies explicitly distinguish algorithmic, matchup, and preprocessing uncertainties, revealing a persistent gap in uncertainty reporting. Overall, the review suggests that improvements attributed to newer ML models may partly reflect upstream preprocessing choices rather than algorithmic superiority alone. Future research should prioritize transparent reporting of atmospheric correction pipelines, structured uncertainty decomposition, standardized validation protocols, and cross-site transferability assessments. By synthesizing these methodological patterns, this review provides a consolidated methodological synthesis that supports improved reproducibility, comparability, and operational reliability of remote-sensing-based freshwater quality monitoring. Full article

► Show Figures

Figure 1

27 pages, 14664 KB

Open AccessArticle

S²AM: Dynamic Center–Surround Mechanism for Remote Sensing Salient Object Detection

by Yuzhe Sha, Zhenshan Tan, Xuejin Huo, Rui Liu, Zhanxin Luo and Xianyi Chen

Remote Sens. 2026, 18(10), 1490; https://doi.org/10.3390/rs18101490 - 9 May 2026

Viewed by 336

Abstract

Optical Remote Sensing Image Salient Object Detection (ORSI-SOD) aims to localize visually dominant regions in large-scale remote sensing scenes for applications such as disaster monitoring and urban analysis. Visual saliency fundamentally arises from contrast between a local region and its surrounding context, i.e., [...] Read more.

Optical Remote Sensing Image Salient Object Detection (ORSI-SOD) aims to localize visually dominant regions in large-scale remote sensing scenes for applications such as disaster monitoring and urban analysis. Visual saliency fundamentally arises from contrast between a local region and its surrounding context, i.e., the center–surround mechanism. While ORSI-SOD further extends this principle, existing methods still rely on implicit or weak center bias and lack explicit modeling of center–surround spatial contrast, resulting in unstable saliency localization in complex remote sensing scenes. To address this issue, inspired by the human visual system, we propose a saliency detection framework based on SAM2 that explicitly embodies a dynamic center–surround mechanism, termed S²AM. S²AM explicitly reconstructs the saliency localization process by jointly modeling heterogeneous saliency cues, including semantic centers, surround contrast, and boundary constraints in a prompt-free manner. Specifically, we introduce a Saliency-Aware Domain Adapter (SADA) to inject saliency-sensitive activations into generic foundation features, alleviating the weak and implicit center bias inherited from SAM2. Building upon this, a Centroid-Guided Coarse Localization (CGCL) module explicitly predicts semantic centroids and constructs adaptive center–surround contrast structures, enabling robust localization under highly variable object distributions. Finally, a Structure-Constrained Saliency Location Decoder (SCLD) leverages structural cues as spatial constraints to enhance center saliency and suppress surrounding interference. Extensive experiments on the EORSSD, ORSSD, and ORSI-4199 benchmarks demonstrate that S²AM consistently outperforms state-of-the-art methods across multiple evaluation metrics, validating the effectiveness of dynamic center–surround-driven saliency modeling for challenging remote sensing scenarios. Full article

(This article belongs to the Special Issue Advancements in Deep Learning for Object Detection and Segmentation in Remote Sensing Imagery)

► Show Figures

Figure 1

Search Results (637)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (637)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI