Remote Sensing

23 pages, 49692 KB

Open AccessArticle

SCOPE-YOLO: An Integrated Super-Resolution and Detection Framework for Power Transmission Tower Monitoring in Remote Sensing Imagery

by Dachuan Xu, Hao Wang, Shijie Li, Yuhao Ge, Yang Yang, Cheng Su, Zixuan Zhao and Shaohua Wang

Remote Sens. 2026, 18(3), 534; https://doi.org/10.3390/rs18030534 - 6 Feb 2026

Viewed by 906

Abstract

Reliable knowledge of power transmission tower locations is fundamental for large-scale inspection and asset management in modern power grids. However, in satellite and aerial remote sensing imagery, towers typically appear as small, slender structures embedded in cluttered backgrounds, which leads to frequent missed [...] Read more.

Reliable knowledge of power transmission tower locations is fundamental for large-scale inspection and asset management in modern power grids. However, in satellite and aerial remote sensing imagery, towers typically appear as small, slender structures embedded in cluttered backgrounds, which leads to frequent missed and false detections. To address this challenge, we propose SCOPE-YOLO, an integrated super-resolution-plus-detection framework tailored for scalable transmission and distribution tower monitoring. In the first stage, low-resolution image patches are enhanced by a Real-ESRGAN ×4 super-resolution frontend, which restores high-frequency lattice details and sharpens tower boundaries. The reconstructed images are then processed by SCOPE-YOLO, a YOLOv11-based detector that incorporates a Cross-Scale Feature Aggregation (CFA) module, a Gather–Distribute (GD) routing mechanism, and a high-resolution P2 detection head, together with SAT and layered inference strategies to strengthen small-object perception under complex backgrounds. Experiments on the public SRSPTD dataset demonstrate that SCOPE-YOLO improves F1 score by 0.051 and raises mAP@0.5 by 10.2 percentage points over the YOLOv11-s baseline, while maintaining a compact model size. Compared with a broad set of state-of-the-art detectors, SCOPE-YOLO achieves the best overall performance, reaching 82.8% mAP@0.5 for power tower detection. Cross-domain evaluation on the GZ-PTD test set further confirms the effectiveness of the super-resolution–detection pipeline: Real-ESRGAN×4@2048 + SCOPE-YOLO increases Recall from 0.8621 to 0.9278 and mAP@0.5 from 0.8365 to 0.9132 relative to the low-resolution baseline, substantially reducing missed detections of small and weak tower targets in real-world scenes. Full article

(This article belongs to the Special Issue Leveraging Advanced Remote Sensing Technologies for Comprehensive Renewable Energy Monitoring, Systematic Optimization, and Multidimensional Environmental Integration)

► Show Figures

Graphical abstract

24 pages, 7352 KB

Open AccessArticle

Vertical Structures and Macro-Microphysical Characteristics of Southwest Vortex Precipitation over Sichuan, China

by Yanxia Liu, Jun Wen, Jiafeng Zheng and Hao Wang

Remote Sens. 2026, 18(3), 533; https://doi.org/10.3390/rs18030533 - 6 Feb 2026

Viewed by 365

Abstract

The Southwest China vortex (SWV) is a high-impact mesoscale cyclonic vortex that typically originates over Sichuan Province, China, and frequently produces hazardous rainfall. Yet systematic knowledge of the structural and microphysical properties of SWV precipitation remains insufficiently quantified. Using Global Precipitation Measurement Dual-frequency [...] Read more.

The Southwest China vortex (SWV) is a high-impact mesoscale cyclonic vortex that typically originates over Sichuan Province, China, and frequently produces hazardous rainfall. Yet systematic knowledge of the structural and microphysical properties of SWV precipitation remains insufficiently quantified. Using Global Precipitation Measurement Dual-frequency Precipitation Radar (GPM/DPR) observations from 2014 to 2022, this study investigates the vertical structure and macro- and microphysical characteristics of SWV precipitation, and quantifies their differences across life-cycle stages and precipitation types. The mature stage is characterized by higher echo tops, stronger radar reflectivity, higher strong-echo altitudes, and larger near-surface rainfall, together with a clearer melting-layer bright band and a stronger post-melting shift toward larger drops and lower number concentrations. The developing stage is weakest and shows the largest fraction of coalescence–breakup balance signatures, whereas the dissipating stage features enhanced evaporation- and breakup-related signals. Among precipitation types, deep strong convection exhibits the greatest vertical extent with enhanced ice/mixed-phase growth; stratiform precipitation produces stronger radar echoes and higher rainfall rates than deep weak convection despite similar echo-top heights; and shallow precipitation is characterized by smaller drops, higher concentrations, and active warm-rain spectral evolution. These findings provide satellite-based constraints for microphysics parameterization evaluation and improved numerical prediction of SWV-related rainfall over complex terrain. Full article

(This article belongs to the Special Issue State-of-the-Art Remote Sensing in Precipitation and Thunderstorm)

► Show Figures

Graphical abstract

31 pages, 12517 KB

Open AccessArticle

Remote Sensing Image Super-Resolution via Progressive Diffusion Schrödinger Bridge

by Shiyu Chen, Cailong Deng, Yong Zhang, Zihao Li, Tengfei Zhang and Hao Lin

Remote Sens. 2026, 18(3), 532; https://doi.org/10.3390/rs18030532 - 6 Feb 2026

Viewed by 797

Abstract

Super-resolution (SR) of remote sensing images (RSIs) is essential for advanced image analysis, yet its progress is challenged by the ill-posed nature of SR and the geometric displacement errors commonly found in paired low-resolution (LR) and high-resolution (HR) training data. These displacements violate [...] Read more.

Super-resolution (SR) of remote sensing images (RSIs) is essential for advanced image analysis, yet its progress is challenged by the ill-posed nature of SR and the geometric displacement errors commonly found in paired low-resolution (LR) and high-resolution (HR) training data. These displacements violate the assumptions of Gaussian diffusion models and restrict their effectiveness, especially when the scale gap between LR and HR images is large. To address these issues, we enhance the diffusion Schrödinger bridge (DSB) to exploit its ability to construct diffusion trajectories between arbitrary distributions and develop a progressive DSB (PDSB) framework that incrementally reconstructs HR images from their LR counterparts. The method divides the overall scale change into equal intervals so that small-scale SR results are first generated and then used as conditions for larger-scale reconstruction. Experiments conducted using a dataset built from georeferenced Gaofen-6 (2 m) and Sentinel-2 (10 m) images show that PDSB outperforms the comparison methods in commonly used metrics. Notably, the FID of PDSB is 8.294, which is half that of the second-place method. These results indicate that PDSB effectively mitigates displacement issues, enhances reconstruction accuracy, and demonstrates strong robustness and generalizability for practical RSI applications. Full article

(This article belongs to the Special Issue 3D Information Recovery and 2D Image Processing for Remotely Sensed Optical Images (Third Edition))

► Show Figures

Graphical abstract

29 pages, 14651 KB

Open AccessArticle

LF-DETR: A Laplacian Frequency Enhanced DETR for Aerial RGB-Infrared Pedestrian Detection

by Herong Qi, Hui Qin, Xuanyu Xiang, Chunming Yang and Yihua Tan

Remote Sens. 2026, 18(3), 531; https://doi.org/10.3390/rs18030531 - 6 Feb 2026

Cited by 1 | Viewed by 882

Abstract

Pedestrian detection from unmanned aerial vehicles (UAVs) holds significant value in security surveillance and emergency response applications. While visible-infrared (RGB-IR) fusion technology demonstrates potential in handling complex lighting conditions through cross-modal information complementarity, current mainstream fusion mechanisms still suffer from two evident shortcomings: [...] Read more.

Pedestrian detection from unmanned aerial vehicles (UAVs) holds significant value in security surveillance and emergency response applications. While visible-infrared (RGB-IR) fusion technology demonstrates potential in handling complex lighting conditions through cross-modal information complementarity, current mainstream fusion mechanisms still suffer from two evident shortcomings: (1) Existing approaches insufficiently account for the significant differences in noise distribution between infrared and visible images under varying imaging conditions, leading to unstable feature representations and posing fundamental challenges to subsequent effective fusion; and (2) Existing fusion strategies lack dynamic adaptability to features from different modalities, making it difficult to fully exploit complementary key information across modalities. To address these issues, this paper proposes a novel Laplacian Frequency Enhanced DETR (LF-DETR). The core innovations are threefold: (1) A Laplacian of Gaussian feature enhancement module is designed to independently enhance features in the visible and infrared branches at the early stage of feature extraction, effectively improving the representation quality of each modality. (2) A learnable frequency-domain fusion module is constructed to achieve adaptive complementary fusion of cross-modal features. (3) A dual-domain collaborative framework is proposed to integrate the above modules within a unified DETR architecture for RGB-IR pedestrian detection. Experimental results on the public RGBTDronePerson, VTUAV-det and DVTOD datasets demonstrate that LF-DETR achieves state-of-the-art performance, with particularly significant detection gains in challenging scenarios such as nighttime and low-light conditions, validating the effectiveness and superiority of the proposed method. Full article

► Show Figures

Figure 1

23 pages, 13345 KB

Open AccessArticle

Time-Series Monitoring and Mechanism Analysis of Surface Subsidence in Changchun City Using E-PS-InSAR

by Yunqi Liu, Ying Yang, Kaining Li, Di Liang, Chuanzeng Shu, Zhiguo Meng and Qing Ding

Remote Sens. 2026, 18(3), 530; https://doi.org/10.3390/rs18030530 - 6 Feb 2026

Viewed by 647

Abstract

Surface subsidence has grown to be a major geological problem for big and medium-sized cities in the context of urbanization and climate change. Changchun, a city of moderate size and rapid development, was chosen as the study region for this project. The Enhanced [...] Read more.

Surface subsidence has grown to be a major geological problem for big and medium-sized cities in the context of urbanization and climate change. Changchun, a city of moderate size and rapid development, was chosen as the study region for this project. The Enhanced Permanent Scatterer Interferometric Synthetic Aperture Radar (E-PS-InSAR) technique was used based on Sentinel-1A imagery to gather time-series surface deformation information in order to perform long-term, high-precision monitoring and a mechanistic study of surface deformation in urban–rural integration areas. Subsequently, temperature and land-use type data were then integrated for a thorough investigation using techniques including correlation analysis and functional fitting. The following are the primary conclusions: (1) The E-PS-InSAR technique integrating both PS and DS targets can significantly improve the density of monitoring points compared to traditional methods, providing the complete spatial coverage. (2) Changchun has an average annual subsidence rate of −0.14 mm and an average cumulative subsidence of −0.08 mm. The highest cumulative subsidence is up to −41.31 mm, and the maximum subsidence rate is −17.27 mm/yr. (3) Surface subsidence was correlated with land use types, and cultivated land was the primary contributor to subsidence. (4) Surface subsidence exhibits distinct seasonal fluctuations, and climatic factors exhibit a lagged influence on surface subsidence. These results are crucial for safe infrastructure operation, urban planning, and promptly preventing geological dangers in mid-sized cities. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Graphical abstract

26 pages, 8513 KB

Open AccessArticle

A Sparsity-Assisted Minimum-Entropy Autofocus Algorithm for SAR Moving Target Imaging

by Xuejiao Wen, Xiaolan Qiu and Weidong Chen

Remote Sens. 2026, 18(3), 529; https://doi.org/10.3390/rs18030529 - 6 Feb 2026

Viewed by 562

Abstract

To address the slow convergence and sensitivity to a low signal-to-noise ratio (SNR) of the minimum-entropy autofocus (MEA) algorithm in the refocusing of moving targets, this paper proposes a sparsity-assisted minimum-entropy autofocus algorithm. Within the framework of the traditional gradient descent MEA with [...] Read more.

To address the slow convergence and sensitivity to a low signal-to-noise ratio (SNR) of the minimum-entropy autofocus (MEA) algorithm in the refocusing of moving targets, this paper proposes a sparsity-assisted minimum-entropy autofocus algorithm. Within the framework of the traditional gradient descent MEA with variable step size, the proposed method introduces soft-thresholding-based sparse reconstruction to make moving targets more prominent and suppress background clutter in the image domain. A joint metric combining image entropy and the Hoyer sparsity measure is then constructed, and a three-point adaptive, variable step-size search is employed to reduce the number of evaluations of the cost function, thereby effectively mitigating clutter interference and significantly accelerating the optimization while maintaining good focusing quality. Simulation and real-data experiments demonstrate that, under complex phase errors and different SNR conditions, the proposed algorithm outperforms the conventional variable-step MEA in terms of image entropy, image sparsity, and runtime, while keeping the phase error estimation accuracy within a small range. These results indicate that the proposed method can achieve satisfactory moving-target focusing performance and exhibits promising engineering applicability. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar (SAR) Imaging and Signal Processing Technologies)

► Show Figures

Graphical abstract

20 pages, 2432 KB

Open AccessArticle

Potential of RGB-Derived Vegetation Indices as an Alternative to NIR-Based Vegetation Indices to Monitor Nitrogen Status in Maize

by Mohammad Mhaidat, Iván González-Pérez, José Ramón Rodríguez-Pérez, Jesús P. Val-Aguasca and Enoc Sanz-Ablanedo

Remote Sens. 2026, 18(3), 528; https://doi.org/10.3390/rs18030528 - 6 Feb 2026

Viewed by 772

Abstract

Unmanned aerial vehicles (UAVs) are increasingly used for crop monitoring, but their widespread adoption is limited since they often rely on non-standard specialized cameras equipped with near-infrared (NIR) sensors. More affordable and scalable crop monitoring solutions would be enabled, however, if data could [...] Read more.

Unmanned aerial vehicles (UAVs) are increasingly used for crop monitoring, but their widespread adoption is limited since they often rely on non-standard specialized cameras equipped with near-infrared (NIR) sensors. More affordable and scalable crop monitoring solutions would be enabled, however, if data could be collected using standard RGB sensors. We compared visible-band indices that incorporate blue spectral range (NDGBI and NDRBI) with traditional NIR-based indices (NDVI and GNDVI) for their effectiveness in monitoring maize growth and nitrogen status. UAV multispectral data capture at different maize growth stages was complemented by ground-based spectroradiometer measurements for calibration and validation. Various agronomic and yield variables (including cornstalk NO₃–N content, grain yield, grain moisture, number of corncobs, and grain test weight) were recorded to link spectral responses with plant performance and nutritional status. The results show that the overall performance of the RGB-based approach was comparable to that of the NIR-based approach, with the visible-band indices proving to be highly sensitive to physiological stress, chlorophyll degradation, and nitrogen variability in maize. Our findings highlight the potential of the RGB-based indices to complement or even replace specialized NIR-based indices, providing a cost-effective, high-resolution tool for precision agriculture. Full article

(This article belongs to the Special Issue Perspectives of Remote Sensing for Precision Agriculture)

► Show Figures

Figure 1

19 pages, 3571 KB

Open AccessArticle

Few-Shot Class-Incremental SAR Target Recognition Based on Dynamic Task-Adaptive Classifier

by Dan Li, Feng Zhao, Yong Li and Wei Cheng

Remote Sens. 2026, 18(3), 527; https://doi.org/10.3390/rs18030527 - 6 Feb 2026

Viewed by 605

Abstract

Current synthetic aperture radar automatic target recognition (SAR ATR) tasks face challenges including limited training samples and poor generalization capability to novel classes. To address these issues, few-shot class-incremental learning (FSCIL) has emerged as a promising research direction. Few-shot learning facilitates the expedited [...] Read more.

Current synthetic aperture radar automatic target recognition (SAR ATR) tasks face challenges including limited training samples and poor generalization capability to novel classes. To address these issues, few-shot class-incremental learning (FSCIL) has emerged as a promising research direction. Few-shot learning facilitates the expedited adaptation to novel tasks utilizing a limited number of labeled samples, whereas incremental learning concentrates on the continuous refinement of the model as new categories are incorporated without eradicating previously learned knowledge. Although both methodologies present potential resolutions to the challenges of sample scarcity and class evolution in SAR target recognition, they are not without their own set of difficulties. Fine-tuning with emerging classes can perturb the feature distribution of established classes, culminating in catastrophic forgetting, while training exclusively on a handful of new samples can induce bias towards older classes, leading to distribution collapse and overfitting. To surmount these limitations and satisfy practical application requirements, we propose a Few-Shot Class-Incremental SAR Target Recognition method based on a Dynamic Task-Adaptive Classifier (DTAC). This approach underscores task adaptability through a feature extraction module, a task information encoding module, and a classifier generation module. The feature extraction module discerns both target-specific and task-specific characteristics, while the task information encoding module modulates the network parameters of the classifier generation module based on pertinent task information, thereby improving adaptability. Our innovative classifier generation module, honed with task-specific insights, dynamically assembles classifiers tailored to the current task, effectively accommodating a variety of scenarios and novel class samples. Our extensive experiments on SAR datasets demonstrate that our proposed method generally outperforms the baselines in few-shot class incremental SAR target recognition. Full article

(This article belongs to the Special Issue Advances in Imaging Radar Signal Processing, Target Feature Extraction and Recognition)

► Show Figures

Figure 1

27 pages, 2785 KB

Open AccessArticle

HAFNet: Hybrid Attention Fusion Network for Remote Sensing Pansharpening

by Dan Xu, Jinyu Zhang, Wenrui Li, Xingtao Wang, Penghong Wang and Xiaopeng Fan

Remote Sens. 2026, 18(3), 526; https://doi.org/10.3390/rs18030526 - 5 Feb 2026

Viewed by 818

Abstract

Deep learning–based pansharpening methods for remote sensing have advanced rapidly in recent years. However, current methods still face three limitations that directly affect reconstruction quality. Content adaptivity is often implemented as an isolated step, which prevents effective interaction across scales and feature domains. [...] Read more.

Deep learning–based pansharpening methods for remote sensing have advanced rapidly in recent years. However, current methods still face three limitations that directly affect reconstruction quality. Content adaptivity is often implemented as an isolated step, which prevents effective interaction across scales and feature domains. Dynamic multi-scale mechanisms also remain constrained, since their scale selection is usually guided by global statistics and ignores regional heterogeneity. Moreover, frequency and spatial cues are commonly fused in a static manner, leading to an imbalance between global structural enhancement and local texture preservation. To address these issues, we design three complementary modules. We utilize the Adaptive Convolution Unit (ACU) to generate content-aware kernels through local feature clustering, thereby achieving fine-grained adaptation to diverse ground structures. We also develop the Multi-Scale Receptive Field Selection Unit (MSRFU), a module providing flexible scale modeling by selecting informative branches at varying receptive fields. Meanwhile, we incorporate the Frequency–Spatial Attention Unit (FSAU), designed to dynamically fuse spatial representations with frequency information. This effectively strengthens detail reconstruction while minimizing spectral distortion. Specifically, we propose the Hybrid Attention Fusion Network (HAFNet), which employs the Hybrid Attention-Driven Residual Block (HARB) as the fundamental utility to dynamically integrate the above three specialized components. This design enables dynamic content adaptivity, multi-scale responsiveness, and cross-domain feature fusion within a unified framework. Experiments on public benchmarks confirm the effectiveness of each component and demonstrate HAFNet’s state-of-the-art performance. Full article

(This article belongs to the Special Issue Artificial Intelligence Remote Sensing Change Detection: Development and Challenges)

► Show Figures

Figure 1

32 pages, 8673 KB

Open AccessArticle

Photogrammetric Processing of Regional ShadowCam and LROC NAC Controlled Mosaics, Evaluation of Positional Accuracies, and Scientific Applications

by William M. Collins, Seth A. Grieser, Megan R. Henriksen, Jaclyn D. Clark, Natalie F. Carr, Robert V. Wagner, Torie A. Roseborough, Steven E. Nystrom and Mark S. Robinson

Remote Sens. 2026, 18(3), 525; https://doi.org/10.3390/rs18030525 - 5 Feb 2026

Viewed by 985

Abstract

The Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) and Korea Pathfinder Lunar Orbiter (KPLO) ShadowCam provide high-resolution (0.5–2 m per pixel) images of the Moon. These high-resolution images facilitate the creation of highly detailed controlled mosaics, which can be used for [...] Read more.

The Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) and Korea Pathfinder Lunar Orbiter (KPLO) ShadowCam provide high-resolution (0.5–2 m per pixel) images of the Moon. These high-resolution images facilitate the creation of highly detailed controlled mosaics, which can be used for applications such as regional geomorphic maps, crater size-frequency distribution analysis, and mission planning. We establish the methodology used to produce most of our LROC NAC and ShadowCam regional controlled mosaics, conduct an analysis of the accuracy of our controlled mosaics, and discuss the utility of these products. This accuracy analysis includes a comprehensive analysis of the bundle adjustment results for both our LROC NAC and ShadowCam controlled mosaics and a comprehensive analysis of the positional accuracy of our LROC NAC controlled mosaics, with median positional offsets of our NAC controlled mosaics being <12 m in latitude and <5 m in longitude. Full article

(This article belongs to the Special Issue Advances in Exploring the Moon, Mars, and Asteroids Based on In-Situ and Remote Sensing Measurements)

► Show Figures

Figure 1

32 pages, 53691 KB

Open AccessArticle

Underwater SLAM and Calibration with a 3D Profiling Sonar

by António Ferreira, José Almeida, Aníbal Matos and Eduardo Silva

Remote Sens. 2026, 18(3), 524; https://doi.org/10.3390/rs18030524 - 5 Feb 2026

Viewed by 961

Abstract

High resolution underwater mapping is fundamental to the sustainable development of the blue economy, supporting offshore energy expansion, marine habitat protection, and the monitoring of both living and non-living resources. This work presents a pose-graph SLAM and calibration framework specifically designed for 3D [...] Read more.

High resolution underwater mapping is fundamental to the sustainable development of the blue economy, supporting offshore energy expansion, marine habitat protection, and the monitoring of both living and non-living resources. This work presents a pose-graph SLAM and calibration framework specifically designed for 3D profiling sonars, such as the Coda Octopus Echoscope 3D. The system integrates a probabilistic scan matching method (3DupIC) for direct registration of 3D sonar scans, enabling accurate trajectory and map estimation even under degraded dead reckoning conditions. Unlike other bathymetric SLAM methods that rely on submaps and assume short-term localization accuracy, the proposed approach performs direct scan-to-scan registration, removing this dependency. The factor graph is extended to represent the sonar extrinsic parameters, allowing the sonar-to-body transformation to be refined jointly with trajectory optimization. Experimental validation on a challenging real world dataset demonstrates outstanding localization and mapping performance. The use of refined extrinsic parameters further improves both accuracy and map consistency, confirming the effectiveness of the proposed joint SLAM and calibration approach for robust and consistent underwater mapping. Full article

(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)

► Show Figures

Figure 1

29 pages, 103124 KB

Open AccessArticle

Enhancing Cross-Regional Generalization in UAV Forest Segmentation Across Plantation and Natural Forests with Attention-Refined PP-LiteSeg Networks

by Xinyu Ma, Shuang Zhang, Kaibo Li, Xiaorui Wang, Hong Lin and Zhenping Qiang

Remote Sens. 2026, 18(3), 523; https://doi.org/10.3390/rs18030523 - 5 Feb 2026

Viewed by 495

Abstract

Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we [...] Read more.

Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we propose two enhanced versions based on the PP-LiteSeg architecture for robust cross-regional forest segmentation. Version 01 (V01) integrates a multi-branch attention fusion module composed of parallel channel, spatial, and pixel attention branches. This design enables fine-grained feature enhancement and precise boundary delineation in structurally regular artificial forests, such as the Huayuan Forest Farm. As a result, V01 achieves a mIoU of 92.64% and an F1-score of 96.10%, representing an approximately 18 percentage-point mIoU improvement over PSPNet and DeepLabv3+. Building on this, Version 02 (V02) introduces a lightweight residual connection that directly shortcuts the fused features, thereby improving feature stability and robustness under complex textures and illumination, and demonstrates stronger performance in naturally heterogeneous forests (Longhai Township), attaining an mIoU of 91.87% and an F1-score of 95.77% (5.72 percentage-point mIoU gain over DeepLabv3+). We further conduct comprehensive comparisons against conventional CNN baselines as well as representative lightweight and transformer-based models (BiSeNetV2 and SegFormer-B0). In bidirectional cross-region transfer (train on one region and directly test on the other), V02 exhibits the most stable performance with minimal degradation, highlighting its robustness under domain shift. On a combined cross-regional dataset, V02 achieves a leading mIoU of 91.50%, outperforming U-Net, DeepLabv3+, and PSPNet. In summary, V01 excels in boundary delineation for regular plantation forests, whereas V02 shows more stable generalization across highly varied natural forest landscapes, providing practical solutions for region-adaptive UAV forest segmentation. Full article

(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)

► Show Figures

Figure 1

22 pages, 7754 KB

Open AccessArticle

CSSA: A Cross-Modal Spatial–Semantic Alignment Framework for Remote Sensing Image Captioning

by Xiao Han, Zhaoji Wu, Yunpeng Li, Xiangrong Zhang, Guanchun Wang and Biao Hou

Remote Sens. 2026, 18(3), 522; https://doi.org/10.3390/rs18030522 - 5 Feb 2026

Viewed by 684

Abstract

Remote sensing image captioning (RSIC) aims to generate natural language descriptions for the given remote sensing image, which requires a comprehensive and in-depth understanding of image content and summarizes it with sentences. Most RSIC methods have successful vision feature extraction, but the representation [...] Read more.

Remote sensing image captioning (RSIC) aims to generate natural language descriptions for the given remote sensing image, which requires a comprehensive and in-depth understanding of image content and summarizes it with sentences. Most RSIC methods have successful vision feature extraction, but the representation of spatial features or fusion features fails to fully consider cross-modal differences between remote sensing images and texts, resulting in unsatisfactory performance. Thus, we propose a novel cross-modal spatial–semantic alignment (CSSA) framework for an RSIC task, which consists of a multi-branch cross-modal contrastive learning (MCCL) mechanism and a dynamic geometry Transformer (DG-former) module. Specifically, compared to discrete text, remote sensing images present a noisy property, interfering with the extraction of valid vision features. Therefore, we present an MCCL mechanism to learn consistent representation between image and text, achieving cross-modal semantic alignment. In addition, most objects are scattered in remote sensing images and exhibit a sparsity property due to the overhead view. However, the Transformer structure mines the objects’ relationships without considering the geometry information of the objects, leading to suboptimal capture of the spatial structure. To address this, a DG-former is designed to realize spatial alignment by introducing geometry information. We conduct experiments on three publicly available datasets (Sydney-Captions, UCM-Captions and RSICD), and the superior results demonstrate its effectiveness. Full article

► Show Figures

Figure 1

27 pages, 20135 KB

Open AccessArticle

Seeing Like Argus: Multi-Perspective Global–Local Context Learning for Remote Sensing Semantic Segmentation

by Hongbing Chen, Yizhe Feng, Kun Wang, Mingrui Liao, Haoting Zhai, Tian Xia, Yubo Zhang, Jianhua Jiao and Changji Wen

Remote Sens. 2026, 18(3), 521; https://doi.org/10.3390/rs18030521 - 5 Feb 2026

Viewed by 829

Abstract

Accurate semantic segmentation of high-resolution remote sensing imagery is crucial for applications such as land cover mapping, urban development monitoring, and disaster response. However, remote sensing data still present inherent challenges, including complex spatial structures, significant intra-class variability, and diverse object scales, which [...] Read more.

Accurate semantic segmentation of high-resolution remote sensing imagery is crucial for applications such as land cover mapping, urban development monitoring, and disaster response. However, remote sensing data still present inherent challenges, including complex spatial structures, significant intra-class variability, and diverse object scales, which demand models capable of capturing rich contextual information from both local and global regions. To address these issues, we propose ArgusNet, a novel segmentation framework that enhances multi-scale representations through a series of carefully designed fusion mechanisms. At the core of ArgusNet lies the synergistic integration of Adaptive Windowed Additive Attention (AWAA) and 2D Selective Scan (SS2D). Specifically, our AWAA extends additive attention into a window-based structure with a dynamic routing mechanism, enabling multi-perspective local feature interaction via multiple global query vectors. Furthermore, we introduce a decoder optimization strategy incorporating three-stage feature fusion and a Macro Guidance Module (MGM) to improve spatial detail preservation and semantic consistency. Experiments on benchmark remote sensing datasets demonstrate that ArgusNet achieves competitive and improved segmentation performance compared to state-of-the-art methods, particularly in scenarios requiring fine-grained object delineation and robust multi-scale contextual understanding. Full article

(This article belongs to the Special Issue Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing (Fourth Edition))

► Show Figures

Figure 1

23 pages, 20906 KB

Open AccessArticle

Monitoring Heterogeneous Deformation of Transportation Infrastructure in Beijing Using Sentinel-1 InSAR Time Series

by Weizhen Lin, Xi Guo, Yidi Wang, Changyang Hu and Zhang Yunjun

Remote Sens. 2026, 18(3), 520; https://doi.org/10.3390/rs18030520 - 5 Feb 2026

Viewed by 778

Abstract

Transportation infrastructure is vulnerable to heterogeneous deformation, yet such deformation remains insufficiently monitored and characterized in metropolitan regions due to the lack of high-resolution deformation gradient products and comparison with industrial standards. Here, we generated a 45 m resolution interferometric synthetic aperture radar [...] Read more.

Transportation infrastructure is vulnerable to heterogeneous deformation, yet such deformation remains insufficiently monitored and characterized in metropolitan regions due to the lack of high-resolution deformation gradient products and comparison with industrial standards. Here, we generated a 45 m resolution interferometric synthetic aperture radar (InSAR) surface displacement time series across the Beijing Plain using Sentinel-1 SAR imagery acquired between 2014 and 2024, and calculated deformation gradients along all ring roads, major expressways, and airport runways. These deformation gradients are compared with national standards to evaluate their structural risks. Our analysis shows that (1) subsidence in the Beijing Plain is concentrated in the northern, eastern, and southern regions, where the northeastern region has been uplifting since 2018 due to the groundwater recovery in Beijing; (2) all ring roads, expressways, and airport runways are relatively stable during our observation period of 2015–2021, except for the central runway of Beijing Capital International Airport, which has accumulated a deformation gradient of 1.9‰ during 2015–2021, exceeding the safety limit of 1.5‰, indicating structural risks. These results demonstrate the effectiveness of high-resolution InSAR time series for monitoring deformation and pinpointing potential structural risks. Full article

► Show Figures

Figure 1

31 pages, 95642 KB

Open AccessArticle

Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation

by Riccardo Gelato, Carlo Sgaravatti, Jakob Grahn, Giacomo Boracchi and Filippo Maria Bianchi

Remote Sens. 2026, 18(3), 519; https://doi.org/10.3390/rs18030519 - 5 Feb 2026

Viewed by 548

Abstract

Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large [...] Read more.

Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large dataset with high-quality annotations from domain experts, which is prohibitively time-consuming. In this work, we aim to facilitate and accelerate the annotation of SAR images for avalanche mapping. We build on the Segment Anything Model (SAM), a segmentation foundation model trained on natural images, and tailor it to Sentinel-1 SAR data. Adapting SAM to our use case requires addressing several domain-specific challenges: (1) domain mismatch, since SAM was not trained on satellite or SAR imagery; (2) input adaptation, because SAR products typically provide more than three channels while the SAM is constrained to RGB images; (3) robustness to imprecise prompts that can affect target identification and degrade the segmentation quality, an issue exacerbated in small, low-contrast avalanches; and (4) training efficiency, since standard fine-tuning is computationally demanding for the SAM. We tackle these challenges through a combination of adapters to mitigate the domain gap, multiple encoders to handle multi-channel SAR inputs, prompt-engineering strategies to improve avalanche localization accuracy, and a training algorithm that limits the training time of the encoder, which is recognized as the major bottleneck. We integrate the resulting model into a segmentation tool and show experimentally that it speeds up the annotation of SAR images. Full article

(This article belongs to the Section Environmental Remote Sensing)

► Show Figures

Graphical abstract

25 pages, 62812 KB

Open AccessArticle

From Prompts to Self-Prompts: Parameter-Efficient Multi-Label Remote Sensing via Mask-Guided Classification

by Ge Qu, Xiongwei Guan, Fei Wen and Xinyu Zou

Remote Sens. 2026, 18(3), 518; https://doi.org/10.3390/rs18030518 - 5 Feb 2026

Viewed by 516

Abstract

Multi-label remote sensing scene classification (MLRSSC) requires autonomous discovery of all relevant land-cover categories without human guidance. Conventional expert classifiers return only label vectors without spatial evidence, while foundation segmenters (e.g., SAM, RemoteSAM) remain passively dependent on external prompts—misaligned with autonomous interpretation. We [...] Read more.

Multi-label remote sensing scene classification (MLRSSC) requires autonomous discovery of all relevant land-cover categories without human guidance. Conventional expert classifiers return only label vectors without spatial evidence, while foundation segmenters (e.g., SAM, RemoteSAM) remain passively dependent on external prompts—misaligned with autonomous interpretation. We introduce SAFI-XRS, a parameter-efficient self-prompted framework that transforms passive prompting into active scene parsing. By training only <2% of a 332M-parameter segmenter (∼2.4M parameters), SAFI-XRS generates class-aligned queries from images via a Semantic Query Generator (SQR), replacing external prompts with self-generated conditioning. A Mask-Guided Classifier (MGC) aggregates spatial evidence into label confidences, enabling mask-based explainability. Experiments on UCM-ML, DFC15-ML, and AID-ML show SAFI-XRS surpasses text-prompted foundation segmenters (+3.9/+3.8 mAP on balanced datasets) while achieving 6.8× parameter efficiency compared to expert models, validating a practical path toward autonomous, explainable RS scene understanding. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

25 pages, 31622 KB

Open AccessArticle

Frequency Domain and Gradient-Spatial Multi-Scale Swin KANsformer for Remote Sensing Scene Classification

by Xiaozhang Zhu, Junqing Huang and Haihui Wang

Remote Sens. 2026, 18(3), 517; https://doi.org/10.3390/rs18030517 - 5 Feb 2026

Viewed by 406

Abstract

Transformer-based deep learning techniques have recently shown outstanding potential in remote sensing scene classification (RSSC), benefiting from their ability to capture global semantic relationships and contextual dependencies. However, effectively utilizing the raw image and global semantic information while simultaneously taking into account detailed [...] Read more.

Transformer-based deep learning techniques have recently shown outstanding potential in remote sensing scene classification (RSSC), benefiting from their ability to capture global semantic relationships and contextual dependencies. However, effectively utilizing the raw image and global semantic information while simultaneously taking into account detailed features and multi-scale spatial relationships remains a major challenge. Therefore, this paper proposes a novel FG-Swin KANsformer model that integrates frequency domain and gradient prior information from raw images with the Kolmogorov–Arnold Network (KAN) to enhance nonlinear feature modeling. The FG-Swin KANsformer consists of three key components: the Discrete Cosine Transform (DCT) module, the gradient-spatial feature extraction (GSFE) module, and the Swin Transformer module integrated with KAN. In the feature embedding phase, the DCT module extracts frequency domain features, while the GSFE module uses multi-scale convolutions and Sobel operators to extract spatial structures and gradient information at different scales, thereby enhancing the utilization of the original image’s frequency domain and gradient prior information. In the Swin Transformer feature modeling phase, the conventional multilayer perceptron (MLP) in Swin Transformer Blocks is replaced by KAN, which decomposes complex multivariate functions into univariate compositions, thereby improving nonlinear representation capacity and enhancing feature discrimination. The thorough experiments on three distinct public remote sensing (RS) datasets demonstrate that FG-Swin KANsformer exhibits outstanding performance. Full article

► Show Figures

Figure 1

27 pages, 6439 KB

Open AccessArticle

Contrastive–Transfer-Synergized Dual-Stream Transformer for Hyperspectral Anomaly Detection

by Lei Deng, Jiaju Ying, Qianghui Wang, Yue Cheng and Bing Zhou

Remote Sens. 2026, 18(3), 516; https://doi.org/10.3390/rs18030516 - 5 Feb 2026

Viewed by 629

Abstract

Hyperspectral anomaly detection (HAD) aims to identify pixels that significantly differ from the background without prior knowledge. While deep learning-based reconstruction methods have shown promise, they often suffer from limited feature representation, inefficient training cycles, and sensitivity to imbalanced data distributions. To address [...] Read more.

Hyperspectral anomaly detection (HAD) aims to identify pixels that significantly differ from the background without prior knowledge. While deep learning-based reconstruction methods have shown promise, they often suffer from limited feature representation, inefficient training cycles, and sensitivity to imbalanced data distributions. To address these challenges, this paper proposes a novel contrastive–transfer-synergized dual-stream transformer for hyperspectral anomaly detection (CTDST-HAD). The framework integrates contrastive learning and transfer learning within a dual-stream architecture, comprising a spatial stream and a spectral stream, which are pre-trained separately and synergistically fine-tuned. Specifically, the spatial stream leverages general visual and hyperspectral-view datasets with adaptive elastic weight consolidation (EWC) to mitigate catastrophic forgetting. The spectral stream employs a variational autoencoder (VAE) enhanced with the RossThick–LiSparseR (R-L) physical-kernel-driven model for spectrally realistic data augmentation. During fine-tuning, spatial and spectral features are fused for pixel-level anomaly detection, with focal loss addressing class imbalance. Extensive experiments on nine real hyperspectral datasets demonstrate that CTDST-HAD outperforms state-of-the-art methods in detection accuracy and efficiency, particularly in complex backgrounds, while maintaining competitive inference speed. Full article

► Show Figures

Figure 1

30 pages, 12276 KB

Open AccessArticle

Landslide Susceptibility Assessment in Zunyi City Incorporating MT-InSAR-Based Physical Constraints and Explainable Analysis

by Zirui Zhang, Qingfeng Hu, Haoran Fang, Wenkai Liu, Shoukai Chen, Qifan Wu, Peng Wang, Weiqiang Lu, Weibo Yin, Tangjing Ma and Ruimin Feng

Remote Sens. 2026, 18(3), 515; https://doi.org/10.3390/rs18030515 - 5 Feb 2026

Cited by 1 | Viewed by 519

Abstract

Landslide susceptibility maps (LSMs) are crucial for risk mitigation, but integrating Multi-temporal Interferometric Synthetic Aperture Radar (MT-InSAR) data is often hampered by a lack of physical interpretation. To address this issue, this study proposes an enhanced modeling framework that integrates multi-source monitoring data [...] Read more.

Landslide susceptibility maps (LSMs) are crucial for risk mitigation, but integrating Multi-temporal Interferometric Synthetic Aperture Radar (MT-InSAR) data is often hampered by a lack of physical interpretation. To address this issue, this study proposes an enhanced modeling framework that integrates multi-source monitoring data by coupling dynamic deformation features. Ground deformation velocity is obtained using MT-InSAR and embedded as dynamic physical constraints into the loss function of a Multi-Layer Perceptron (MLP) model. This approach enables the joint optimization of static geological factors and dynamic deformation characteristics in landslide susceptibility prediction. The proposed framework was applied to Zunyi City, Guizhou Province, China, utilizing an inventory of landslide hazard sites and a dataset of 16 susceptibility factors for model training and evaluation. The results demonstrated that the dynamically constrained model significantly improved predictive performance (AUC = 0.976, an increase of 0.032 compared to the baseline model), and enhanced spatial consistency, reflected by an average increase of 0.0184 in predicted susceptibility for inventoried landslide hazard sites. The framework also outperformed other conventional machine learning models across multiple evaluation metrics. Furthermore, SHAP (SHapley Additive exPlanations) analysis revealed that slope (18.68%), DEM (13.26%), rainfall (11.57%), and mining activities (8.79%) were the primary contributing factors in high-susceptibility areas. This study offers a physically interpretable and robust methodology that advances landslide risk assessment and contributes to disaster prevention strategies. Full article

► Show Figures

Figure 1

30 pages, 25344 KB

Open AccessArticle

PTU-Net: A Polarization-Temporal U-Net for Multi-Temporal Sentinel-1 SAR Crop Classification

by Feng Tan, Xikai Fu, Huiming Chai and Xiaolei Lv

Remote Sens. 2026, 18(3), 514; https://doi.org/10.3390/rs18030514 - 5 Feb 2026

Viewed by 389

Abstract

Accurate crop type mapping remains challenging in regions where persistent cloud cover limits the availability of optical imagery. Multi-temporal dual-polarization Sentinel-1 SAR data offer an all-weather alternative, yet existing approaches often underutilize polarization information and rely on single-scale temporal aggregation. This study proposes [...] Read more.

Accurate crop type mapping remains challenging in regions where persistent cloud cover limits the availability of optical imagery. Multi-temporal dual-polarization Sentinel-1 SAR data offer an all-weather alternative, yet existing approaches often underutilize polarization information and rely on single-scale temporal aggregation. This study proposes PTU-Net, a polarization–temporal U-Net designed specifically for pixel-wise crop segmentation from SAR time series. The model introduces a Polarization Channel Attention module to construct physically meaningful VV/VH combinations and adaptively enhance their contributions. It also incorporates a Multi-Scale Temporal Self-Attention mechanism to model pixel-level backscatter trajectories across multiple spatial resolutions. Using a 12-date Sentinel-1 stack over Kings County, California, and high-quality crop-type reference labels, the model was trained and evaluated under a spatially independent split. Results show that PTU-Net outperforms GRU, ConvLSTM, 3D U-Net, and U-Net–ConvLSTM baselines, achieving the highest overall accuracy and mean IoU among all tested models. Ablation studies confirm that both polarization enhancement and multi-scale temporal modeling contribute substantially to performance gains. These findings demonstrate that integrating polarization-aware feature construction with scale-adaptive temporal reasoning can substantially improve the effectiveness of SAR-based crop mapping, offering a promising direction for operational agricultural monitoring. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar (SAR) Imaging and Signal Processing Technologies)

► Show Figures

Graphical abstract

24 pages, 17936 KB

Open AccessArticle

Remote-Sensing Estimation of Evapotranspiration for Multiple Land Cover Types Based on an Improved Canopy Conductance Model

by Jianfeng Wang, Xiaozhou Xin, Zhiqiang Ye, Shihao Zhang, Tianci Li and Shanshan Yu

Remote Sens. 2026, 18(3), 513; https://doi.org/10.3390/rs18030513 - 5 Feb 2026

Cited by 1 | Viewed by 549

Abstract

Evapotranspiration (ET) links the water cycle with the energy balance and serves as a key driving process for ecosystem functioning and water resource management. Canopy conductance (Gc) plays a central role in regulating transpiration, but many models inadequately represent its regulatory mechanisms and [...] Read more.

Evapotranspiration (ET) links the water cycle with the energy balance and serves as a key driving process for ecosystem functioning and water resource management. Canopy conductance (Gc) plays a central role in regulating transpiration, but many models inadequately represent its regulatory mechanisms and show varying applicability across different land cover types. This study develops a remote-sensing ET estimation approach suitable for large scales and diverse land cover types and proposes an improved canopy conductance model for daily latent heat flux (LE) estimation. By integrating the canopy radiation transfer concept from the K95 model into the multiplicative Jarvis framework, an improved canopy conductance model is developed that includes limiting effects from photosynthetically active radiation (PAR), vapor pressure deficit (VPD), air temperature (T), and soil moisture (θ). Eighteen combinations of limiting functions are designed to evaluate structural performance differences. Using observations from 79 global flux sites during 2015–2023 and integrating multi-source datasets, including ERA5, MODIS, and SMAP, a two-stage parameter optimization was applied to determine the optimal limiting function combination for each land cover type. And nine sites from nine different land cover types were selected for independent spatial validation. Temporal validation within the optimization sites shows that, at the daily scale, the model achieves a Kling–Gupta efficiency (KGE) of 0.82, a correlation coefficient (R) of 0.82, and a Root Mean Square Error (RMSE) of 27.83 W/m², demonstrating strong temporal stability. Spatial validation over independent holdout sites achieved KGE = 0.84, R = 0.84, and RMSE = 22.53 W/m². At the 8-day scale, when evaluated over the holdout sites, the model achieves KGE = 0.87, R = 0.88, and RMSE = 18.74 W/m². Compared with the K95 and Jarvis models, KGE increases by about 34% and 15%, while RMSE decreases by about 38% and 12%, respectively. Relative to the MOD16 and PML-V2 products, KGE increases by about 32% and 16%, while RMSE decreases by about 33% and 17%, respectively. Comprehensive comparisons show that explicitly coupling canopy structure with multiple environmental constraints within the Jarvis framework, together with structure optimization across land cover types, can markedly improve large-scale remote-sensing ET retrieval accuracy while maintaining physical consistency and physiological rationality. This provides an effective pathway and parameterization scheme for producing ET products applicable across ecosystems. Full article

► Show Figures

Figure 1

32 pages, 8099 KB

Open AccessArticle

Morphometric Analysis of the Jingpo Lake Volcanic Field: A Terrestrial Analog for Lunar Lava Flow

by Haiting Yang, Teng Hu, Zhizhong Kang, Liang Gao, Lang Qin, Cheng Peng, Chenming Ye and Haoxiang Hu

Remote Sens. 2026, 18(3), 512; https://doi.org/10.3390/rs18030512 - 5 Feb 2026

Viewed by 691

Abstract

The lack of high-precision imaging data for lunar volcanic regions currently hinders the detailed characterization of lava tube systems and their associated fine-scale geomorphology. To address this information deficit, this study establishes the Jingpo Lake Volcanic Field (JLVF) in Northeast China as a [...] Read more.

The lack of high-precision imaging data for lunar volcanic regions currently hinders the detailed characterization of lava tube systems and their associated fine-scale geomorphology. To address this information deficit, this study establishes the Jingpo Lake Volcanic Field (JLVF) in Northeast China as a primary terrestrial analog for the lunar Marius Hills complex. We systematically characterize the basaltic morphometric continuum, tracing the geological evolution from proximal scoria cones through medial lava tube skylights to distal lava plateaus. Focusing on the subsurface transport system, we identify a linear chain of discontinuous skylights that structurally mirrors the “proto-rille” stage of lunar sinuous rilles. Quantitative morphometry reveals that these terrestrial vents reproduce the geometric duality of lunar pits, ranging from stable “deep shafts” to degraded “funnel pits,” effectively validating the mechanical diversity of the lunar inventory. Critically, the “U-to-V” cross-sectional transition observed in JLVF collapse trenches serves as diagnostic ground-truth evidence, confirming that lunar rilles originate from the catastrophic roof failure of subsurface tubes rather than purely thermal erosion. Regarding the lava plateau, our field investigation resolves sub-meter micro-textures—including laminar pahoehoe ropes and inflation fissures—that are typically obscured by the resolution limits of current lunar orbiters. These findings suggest that the seemingly “smooth” lunar maria likely host complex, rugged micro-terrains. Therefore, comparing lunar volcanic regions with simulated volcanic fields from Earth is crucial. Analyzing potential volcanic products from angles undetectable by some lunar satellites can offer vital insights for future lunar exploration. Full article

(This article belongs to the Special Issue New Views of the Moon: Recent Advances in Lunar Remote Sensing and Applications)

► Show Figures

Graphical abstract

23 pages, 13345 KB

Open AccessArticle

Text2AIRS: Fine-Grained Airplane Image Generation in Remote Sensing from Nature Language

by Yunuo Yang, Youwei Cheng, Jinlong Hu, Yan Xia and Yu Zang

Remote Sens. 2026, 18(3), 511; https://doi.org/10.3390/rs18030511 - 5 Feb 2026

Viewed by 529

Abstract

Airplanes are the most popular investigation objects as a dynamic and critical component in remote sensing images. Accurately identifying and monitoring airplane behaviors is crucial for effective air traffic management. However, existing methods for interpreting fine-grained airplanes in remote sensing data depend heavily [...] Read more.

Airplanes are the most popular investigation objects as a dynamic and critical component in remote sensing images. Accurately identifying and monitoring airplane behaviors is crucial for effective air traffic management. However, existing methods for interpreting fine-grained airplanes in remote sensing data depend heavily on large annotated datasets, which are both time-consuming and prone to errors due to the detailed nature of labeling individual points. In this paper, we introduce Text2AIRS, a novel method that generates fine-grained and realistic Airplane Images in Remote Sensing from textual descriptions. Text2AIRS significantly simplifies the process of generating diverse aircraft types, requiring limited texts and allowing for extensive variability in the generated images. Specifically, Text2AIRS is the first to incorporate ground sample distance into the text-to-image stable diffusion model, both at the data and feature levels. Extensive experiments demonstrate our Text2AIRS surpasses the state-of-the-art by a large margin on the Fair1M benchmark dataset. Furthermore, utilizing the fine-grained airplane images generated by Text2AIRS, the existing SOTA object detector achieves 6.12% performance improvement, showing the practical impact of our approach. Full article

(This article belongs to the Special Issue Advanced Technology for Remote Sensing Image Analysis and Applications)

► Show Figures

Figure 1

20 pages, 8410 KB

Open AccessArticle

PC-YOLO: Moving Target Detection in Video SAR via YOLO on Principal Components

by Yu Han, Xinrong Wang, Jiaqing Jiang, Chao Xue, Rui Qin and Ganggang Dong

Remote Sens. 2026, 18(3), 510; https://doi.org/10.3390/rs18030510 - 5 Feb 2026

Viewed by 569

Abstract

Video synthetic aperture radar could provide more valuable information than static images. However, it suffers from several difficulties, such as strong clutter, low signal-to-noise ratio, and variable target scale. The task of moving target detection is therefore difficult to achieve. To solve these [...] Read more.

Video synthetic aperture radar could provide more valuable information than static images. However, it suffers from several difficulties, such as strong clutter, low signal-to-noise ratio, and variable target scale. The task of moving target detection is therefore difficult to achieve. To solve these problems, this paper proposes a model and data co-driven learning method called look once on principal components (PC-YOLO). Unlike preceding works, we regarded the imaging scenario as a combination of low-rank and sparse scenes in theory. The former models the global, slowly varying background information, while the latter expresses the localized anomalies. These were then separated using the principal component decomposition technique to reduce the clutter while simultaneously enhancing the moving targets. The resulting principal components were then handled by an improved version of the look once framework. Since the moving targets featured various scales and weak scattering coefficients, the hierarchical attention mechanism and the cross-scale feature fusion strategy were introduced to further improve the detection performance. Finally, multiple rounds of experiments were performed to verify the proposed method, with the results proving that it could achieve more than 30% improvement in mAP compared to classical methods. Full article

► Show Figures

Figure 1

24 pages, 4274 KB

Open AccessArticle

Observed Effects of Near-Surface Relative Humidity on Rainfall Microphysics During the LIAISE Field Campaign

by Francesc Polls, Joan Bech, Mireia Udina, Eric Peinó and Albert Garcia-Benadí

Remote Sens. 2026, 18(3), 509; https://doi.org/10.3390/rs18030509 - 5 Feb 2026

Viewed by 552

Abstract

This study, conducted in the framework of the LIAISE field campaign in NE Spain (May–September 2021), investigates how near-surface relative humidity influences early-stage rainfall characteristics when precipitation is most affected by temperature and relative humidity before rainfall onset. Two instrumented sites were examined, [...] Read more.

This study, conducted in the framework of the LIAISE field campaign in NE Spain (May–September 2021), investigates how near-surface relative humidity influences early-stage rainfall characteristics when precipitation is most affected by temperature and relative humidity before rainfall onset. Two instrumented sites were examined, using disdrometers, Micro Rain Radar (MRR), C-band weather radar data, and automatic weather stations. Rainfall events were first classified as stratiform or convective using weather radar data based on a texture analysis of the reflectivity field. Then, only stratiform events were selected and further classified into dry and moist categories according to the upper and lower terciles of near-surface (2 m) relative humidity at the rainfall onset (dry < 54%; moist > 72%). Results show that during dry events, the time delay between the detection of precipitation at ~750 m above ground level (AGL) (by MRR or C-band radar) and its arrival at the surface (measured by the disdrometer) is consistently longer than during moist events, indicating possible evaporation of raindrops during their descent. Surface drop size distributions also differ: dry cases have generally fewer small drops (with diameters < 0.8 mm) but relatively more large drops, leading to higher radar reflectivity values despite similar surface rainfall amounts. However, reflectivity observed aloft by C-band radar and MRR does not present the dependence on relative humidity found at ground level. Findings reported here increase our understanding of the impact of low-level conditions on precipitation characteristics and microphysical associated processes and may contribute to improve correction schemes in operational weather radar quantitative precipitation estimates. Full article

(This article belongs to the Special Issue Multi-Platform Hydrometeorological Monitoring and Analysis Using Remote Sensing (Second Edition))

► Show Figures

Figure 1

26 pages, 3810 KB

Open AccessArticle

Combining Hyperspectral Preprocessing and Feature Selection with Machine Learning for Inland Water Quality Parameter Inversion

by Jie Kong, Zhongfa Zhou, Rukai Xie, Xinyue Zhang, Rui Li and Caixia Ding

Remote Sens. 2026, 18(3), 508; https://doi.org/10.3390/rs18030508 - 5 Feb 2026

Viewed by 734

Abstract

The concentrations of carbon, nitrogen, and phosphorus in water bodies significantly influence aquatic ecological conditions. By collecting multitemporal hyperspectral data and water quality parameter data from water bodies and through systematic preprocessing of hyperspectral data combined with multimethod sensitive band selection, an optimal [...] Read more.

The concentrations of carbon, nitrogen, and phosphorus in water bodies significantly influence aquatic ecological conditions. By collecting multitemporal hyperspectral data and water quality parameter data from water bodies and through systematic preprocessing of hyperspectral data combined with multimethod sensitive band selection, an optimal spectral feature subset was determined. Within a machine learning framework, multiple combined remote sensing inversion models were constructed to identify the optimal inversion model for each water quality parameter, along with corresponding preprocessing methods and sensitive bands. The results indicate that differential processing of remote sensing reflectance enhances model accuracy. Sensitive band selection effectively eliminates redundant bands, significantly improving the computational efficiency of inversion models. XGBoost demonstrated superior accuracy in constructing 240 water quality parameter inversion models because of its unique algorithmic design. However, model accuracy is not solely determined by algorithmic complexity or predictive capability but rather by the combined effect of algorithm performance and input feature quality. Verification of the inversion model’s generalization ability via an independent dataset demonstrated its capacity for generalization. These findings provide valuable insights for the reliable application of hyperspectral data in aquatic environmental remote sensing and offer support for regional water quality conservation efforts. Full article

► Show Figures

Graphical abstract

33 pages, 7494 KB

Open AccessArticle

AI-Driven Wetland Mapping Across Diverse Natural Regions of Alberta, Canada, Using Combined Airborne and Satellite Remote Sensing Data

by Michael A. Merchant, Joshua Evans, Rebecca Edwards, Lyle Boychuk, John Simms, Jennifer N. Hird, Jenet Dooley, Thuy Doan, Sydney Toni, Danielle Cobbaert, Amanda Cooper, Craig Mahoney, Kristyn Mayner, Mina Nasr, Nicole Skakun, Marsha Trites-Russell and Cynthia N. McClain

Remote Sens. 2026, 18(3), 507; https://doi.org/10.3390/rs18030507 - 4 Feb 2026

Viewed by 1973

Abstract

This study evaluates the performance of artificial intelligence (AI) technologies for wetland classification in the province of Alberta, Canada, using integrated remote sensing inputs, including airborne light detection and ranging (LiDAR), orthophotography, and multi-sensor satellite imagery (Sentinel-1, Sentinel-2, PlanetScope). Our primary objective was [...] Read more.

This study evaluates the performance of artificial intelligence (AI) technologies for wetland classification in the province of Alberta, Canada, using integrated remote sensing inputs, including airborne light detection and ranging (LiDAR), orthophotography, and multi-sensor satellite imagery (Sentinel-1, Sentinel-2, PlanetScope). Our primary objective was to assess whether AI-driven modelling approaches, specifically machine learning (ML) and deep learning (DL), can meet Alberta’s provincial wetland mapping standards. We hypothesized that integrating high-resolution LiDAR with multi-seasonal optical and radar data composites into advanced AI algorithms would achieve the required classification accuracy, detail, and minimum mapping unit targets. We tested several methodologies in four ecologically distinct pilot areas representing Alberta’s Boreal, Grassland, and Parkland Natural Regions. AI models included ensemble ML using Extreme Gradient Boosting (XGBoost) and Random Forest, and a DL U-Net convolutional neural network (CNN). AI models were trained on expert-labelled photoplots and validated using in situ field surveys. Our findings demonstrate that both ML and DL models met and, in several cases, exceeded the provincial mapping standards with validation overall accuracies surpassing >70% (form), >80% (class), and >90% (wetland–upland). U-Net CNN models generally produced the highest overall accuracies and most precise wetland extent delineation, but XGBoost offered finer detail and granularity for detailed mapping of rare wetland forms. Integrating LiDAR data and derivatives further enhanced model performance, improving accuracy by as much as 13%. Based on these outcomes, we provide a set of recommendations for scaling up these approaches, focusing on model selection, LiDAR imagery integration, and the continued value of field surveys to support the operational scaling of AI-driven classification approaches for wetland inventory updates across Alberta’s diverse landscapes. However, key challenges remain in scaling up this approach due to the cost of acquiring high-resolution LiDAR and satellite imagery. Full article

(This article belongs to the Special Issue Application of Remote Sensing Technology in Wetland Ecology)

► Show Figures

Graphical abstract

22 pages, 11216 KB

Open AccessArticle

A Multi-Scale Remote Sensing Image Change Detection Network Based on Vision Foundation Model

by Shenbo Liu, Dongxue Zhao and Lijun Tang

Remote Sens. 2026, 18(3), 506; https://doi.org/10.3390/rs18030506 - 4 Feb 2026

Cited by 2 | Viewed by 913

Abstract

As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature [...] Read more.

As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature representation, their inherent patch-based processing and global attention mechanisms limit their effectiveness in perceiving multi-scale targets. To address this, we propose a multi-scale remote sensing image change detection network based on a vision foundation model, termed SAM-MSCD. This network integrates an efficient parameter fine-tuning strategy with a cross-temporal multi-scale feature fusion mechanism, significantly improving change perception accuracy in complex scenarios. Specifically, the Low-Rank Adaptation mechanism is adopted for parameter-efficient fine-tuning of the Segment Anything Model (SAM) image encoder, adapting it for the remote sensing change detection task. A bi-temporal feature interaction module(BIM) is designed to enhance the semantic alignment and the modeling of change relationships between feature maps from different time phases. Furthermore, a change feature enhancement module (CFEM) is proposed to fuse and highlight differential information from different levels, achieving precise capture of multi-scale changes. Comprehensive experimental results on four public remote sensing change detection datasets, namely LEVIR-CD, WHU-CD, NJDS, and MSRS-CD, demonstrate that SAM-MSCD surpasses current state-of-the-art (SOTA) methods on several key evaluation metrics, including the F1-score and Intersection over Union(IoU), indicating its broad prospects for practical application. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

26 pages, 16412 KB

Open AccessArticle

Unsupervised Tree Detection from UAV Imagery and 3D Point Clouds via Distance Transform-Based Circle Estimation and AIC Optimization

by Smaragda Markaki and Costas Panagiotakis

Remote Sens. 2026, 18(3), 505; https://doi.org/10.3390/rs18030505 - 4 Feb 2026

Viewed by 1061

Abstract

This work proposes a novel tree detection methodology, named DTCD (Distance Transform Circle Detection), based on a fast circle detection method via Distance Transform and Akaike Information Criterion (AIC) optimization. More specifically, a visible-band vegetation index (RGBVI) is calculated to enhance canopy regions, [...] Read more.

This work proposes a novel tree detection methodology, named DTCD (Distance Transform Circle Detection), based on a fast circle detection method via Distance Transform and Akaike Information Criterion (AIC) optimization. More specifically, a visible-band vegetation index (RGBVI) is calculated to enhance canopy regions, followed by morphological filtering to delineate individual tree crowns. The Euclidean Distance Transform is then applied, and the local maxima of the smoothed distance map are extracted as candidate tree locations. The final detections are iteratively refined using the AIC to optimize the number of trees with respect to canopy coverage efficiency. Additionally, this work introduces DTCD-PC, a modified algorithm tailored for point clouds, which significantly enhances detection accuracy in complex environments. This work makes a significant contribution to tree detection in the following ways: (1) by creating a tree detection framework entirely based on an unsupervised technique, which outperforms state-of-the-art unsupervised and supervised tree detection methods; (2) by introducing a new urban dataset, named AgiosNikolaos-3, that consists of orthomosaics and photogrammetrically reconstructed 3D point clouds, allowing the assessment of the proposed method in complex urban environments. The proposed DTCD approach was evaluated on the Acacia-6 dataset, consisting of UAV images of six-month-old Acacia trees in Southeast Asia, demonstrating superior detection performance compared to existing state-of-the-art techniques, both unsupervised and supervised. Additional experiments were conducted in the custom-developed Urban Dataset, confirming the robustness and generalizability of the DTCD-PC method in heterogeneous environments. Full article

(This article belongs to the Special Issue Remote Sensing and Machine Learning in Vegetation Biophysical Parameters Estimation (Second Edition))

► Show Figures

Figure 1

Journal Menu

Journal Browser

Remote Sens., Volume 18, Issue 3 (February-1 2026) – 156 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI