MDPI - Publisher of Open Access Journals

22 pages, 5363 KiB

Open AccessArticle

Accurate Extraction of Rural Residential Buildings in Alpine Mountainous Areas by Combining Shadow Processing with FF-SwinT

by Guize Luan, Jinxuan Luo, Zuyu Gao and Fei Zhao

Remote Sens. 2025, 17(14), 2463; https://doi.org/10.3390/rs17142463 - 16 Jul 2025

Abstract

Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this [...] Read more.

Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this study uses high-resolution unmanned aerial vehicle (UAV) remote sensing images to construct a specialized dataset for the extraction of rural settlements in alpine mountainous areas, while introducing an innovative shadow mitigation technique that integrates multiple spectral characteristics. This methodology effectively addresses the challenges posed by intense shadows in settlements and environmental occlusions common in mountainous terrain analysis. Based on the comparative experiments with existing deep learning models, the Swin Transformer was selected as the baseline model. Building upon this, the Feature Fusion Swin Transformer (FF-SwinT) model was constructed by optimizing the data processing, loss function, and multi-view feature fusion. Finally, we rigorously evaluated it through ablation studies, generalization tests and large-scale image application experiments. The results show that the FF-SwinT has improved in many indicators compared with the traditional Swin Transformer, and the recognition results have clear edges and strong integrity. These results suggest that the FF-SwinT establishes a novel framework for rural settlement extraction in alpine mountain regions, which is of great significance for regional spatial optimization and development policy formulation. Full article

► Show Figures

Figure 1

23 pages, 3492 KiB

Open AccessArticle

A Multimodal Deep Learning Framework for Accurate Biomass and Carbon Sequestration Estimation from UAV Imagery

by Furkat Safarov, Ugiloy Khojamuratova, Misirov Komoliddin, Xusinov Ibragim Ismailovich and Young Im Cho

Drones 2025, 9(7), 496; https://doi.org/10.3390/drones9070496 - 14 Jul 2025

Viewed by 88

Abstract

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to [...] Read more.

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to capture the intricate spectral and structural heterogeneity of forest canopies, particularly at fine spatial resolutions. To address these limitations, we introduce ForestIQNet, a novel end-to-end multimodal deep learning framework designed to estimate AGB and associated carbon stocks from UAV-acquired imagery with high spatial fidelity. ForestIQNet combines dual-stream encoders for processing multispectral UAV imagery and a voxelized Canopy Height Model (CHM), fused via a Cross-Attentional Feature Fusion (CAFF) module, enabling fine-grained interaction between spectral reflectance and 3D structure. A lightweight Transformer-based regression head then performs multitask prediction of AGB and CO₂e, capturing long-range spatial dependencies and enhancing generalization. Proposed method achieves an R² of 0.93 and RMSE of 6.1 kg for AGB prediction, compared to 0.78 R² and 11.7 kg RMSE for XGBoost and 0.73 R² and 13.2 kg RMSE for Random Forest. Despite its architectural complexity, ForestIQNet maintains a low inference cost (27 ms per patch) and generalizes well across species, terrain, and canopy structures. These results establish a new benchmark for UAV-enabled biomass estimation and provide scalable, interpretable tools for climate monitoring and forest management. Full article

(This article belongs to the Special Issue UAVs for Nature Conservation Tasks in Complex Environments)

► Show Figures

Figure 1

15 pages, 6090 KiB

Open AccessArticle

Automated Detection of Tailing Impoundments in Multi-Sensor High-Resolution Satellite Images Through Advanced Deep Learning Architectures

by Lin Qin and Wenyue Song

Sensors 2025, 25(14), 4387; https://doi.org/10.3390/s25144387 - 14 Jul 2025

Viewed by 164

Abstract

Accurate spatial mapping of Tailing Impoundments (TIs) is vital for environmental sustainability in mining ecosystems. While remote sensing enables large-scale monitoring, conventional methods relying on single-sensor data and traditional machine learning-based algorithm suffer from reduced accuracy in cluttered environments. This research proposes a [...] Read more.

Accurate spatial mapping of Tailing Impoundments (TIs) is vital for environmental sustainability in mining ecosystems. While remote sensing enables large-scale monitoring, conventional methods relying on single-sensor data and traditional machine learning-based algorithm suffer from reduced accuracy in cluttered environments. This research proposes a deep learning framework leveraging multi-source high-resolution imagery to address these limitations. An upgraded You Only Look Once (YOLO) model is introduced, integrating three key innovations: a multi-scale feature aggregation layer, a lightweight hierarchical fusion mechanism, and a modified loss metric. These components enhance the model’s ability to capture spatial dependencies, optimize inference speed, and ensure stable training dynamics. A comprehensive dataset of TIs across varied terrains was constructed, expanded through affine transformations, spectral perturbations, and adversarial sample synthesis. Evaluations confirm the framework’s superior performance in complex scenarios, achieving higher precision and computational efficiency than state-of-the-art detectors. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

20 pages, 10558 KiB

Open AccessArticle

Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism

by Guoqing Zhou, Haoxin Qi, Shuo Shi, Sifu Bi, Xingtao Tang and Wei Gong

Remote Sens. 2025, 17(14), 2411; https://doi.org/10.3390/rs17142411 - 12 Jul 2025

Viewed by 218

Abstract

High-quality multispectral LiDAR (MSL) data are crucial for land cover (LC) classification. However, the Titan MSL system encounters challenges of inconsistent spatial–spectral information due to its unique scanning and data saving method, restricting subsequent classification accuracy. Existing spectral reconstruction methods often require empirical [...] Read more.

High-quality multispectral LiDAR (MSL) data are crucial for land cover (LC) classification. However, the Titan MSL system encounters challenges of inconsistent spatial–spectral information due to its unique scanning and data saving method, restricting subsequent classification accuracy. Existing spectral reconstruction methods often require empirical parameter settings and involve high computational costs, limiting automation and complicating application. To address this problem, we introduce the dual attention spectral optimization reconstruction network (DossaNet), leveraging an attention mechanism and spatial–spectral information. DossaNet can adaptively adjust weight parameters, streamline the multispectral point cloud acquisition process, and integrate it into classification models end-to-end. The experimental results show the following: (1) DossaNet exhibits excellent generalizability, effectively recovering accurate LC spectra and improving classification accuracy. Metrics across the six classification models show some improvements. (2) Compared with the method lacking spectral reconstruction, DossaNet can improve the overall accuracy (OA) and average accuracy (AA) of PointNet++ and RandLA-Net by a maximum of 4.8%, 4.47%, 5.93%, and 2.32%. Compared with the inverse distance weighted (IDW) and k-nearest neighbor (KNN) approach, DossaNet can improve the OA and AA of PointNet++ and DGCNN by a maximum of 1.33%, 2.32%, 0.86%, and 2.08% (IDW) and 1.73%, 3.58%, 0.28%, and 2.93% (KNN). The findings further validate the effectiveness of our proposed method. This method provides a more efficient and simplified approach to enhancing the quality of multispectral point cloud data. Full article

(This article belongs to the Special Issue Advanced Lidar Remote Sensing for Atmosphere, Vegetation, and Ocean Observations)

► Show Figures

Figure 1

20 pages, 10137 KiB

Open AccessArticle

A Multi-Feature Fusion Approach for Sea Fog Detection Under Complex Background

by Shuyuan Yang, Yuzhu Tang, Zeming Zhou, Xiaofeng Zhao, Pinglv Yang, Yangfan Hu and Ran Bo

Remote Sens. 2025, 17(14), 2409; https://doi.org/10.3390/rs17142409 - 12 Jul 2025

Viewed by 119

Abstract

Sea fog is a natural phenomenon that significantly reduces visibility, posing navigational hazards for ships and impacting coastal activities. Geostationary meteorological satellite data have proven to be indispensable for sea fog monitoring due to their large spatial coverage and spatiotemporal consistency. However, the [...] Read more.

Sea fog is a natural phenomenon that significantly reduces visibility, posing navigational hazards for ships and impacting coastal activities. Geostationary meteorological satellite data have proven to be indispensable for sea fog monitoring due to their large spatial coverage and spatiotemporal consistency. However, the spectral similarities between sea fog and low clouds result in omissions and misclassifications. Furthermore, high clouds obscure certain sea fog regions, leading to under-detection and high false alarm rates. In this paper, we present a novel sea fog detection method to alleviate the challenges. Specifically, the approach leverages a fusion of spectral, motion, and spatiotemporal texture consistency features to effectively differentiate sea fog and low clouds. Additionally, a multi-scale self-attention module is incorporated to recover the sea fog region obscured by clouds. Based on the spatial distribution characteristics of sea fog and clouds, we redesigned the loss function to integrate total variation loss, focal loss, and dice loss. Experimental results validate the effectiveness of the proposed method, and the detection accuracy is compared with the vertical feature mask produced by the CALIOP and exhibits a high level of consistency. Full article

(This article belongs to the Special Issue Observations of Atmospheric and Oceanic Processes by Remote Sensing)

► Show Figures

Figure 1

18 pages, 4631 KiB

Open AccessArticle

Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework

by Yifan Shao, Pan Pan, Hongxin Zhao, Jiale Li, Guoping Yu, Guomin Zhou and Jianhua Zhang

Remote Sens. 2025, 17(14), 2404; https://doi.org/10.3390/rs17142404 - 11 Jul 2025

Viewed by 287

Abstract

Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- [...] Read more.

Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- and time-series-based approaches still struggle to preserve fine spatial details in sub-meter scenes. Targeting this gap, we propose an HRNet-CA-enhanced DeepLabV3+ that retains the original model’s strengths while resolving its two key weaknesses: (i) detail loss caused by repeated down-sampling and feature-pyramid compression and (ii) boundary blurring due to insufficient multi-scale information fusion. The Xception backbone is replaced with a High-Resolution Network (HRNet) to maintain full-resolution feature streams through multi-resolution parallel convolutions and cross-scale interactions. A coordinate attention (CA) block is embedded in the decoder to strengthen spatially explicit context and sharpen class boundaries. The rice dataset consisted of 23,295 images (11,295 rice + 12,000 non-rice) via preprocessing and manual labeling and benchmarked the proposed model against classical segmentation networks. Our approach boosts boundary segmentation accuracy to 92.28% MIOU and raises texture-level discrimination to 95.93% F1, without extra inference latency. Although this study focuses on architecture optimization, the HRNet-CA backbone is readily compatible with future multi-source fusion and time-series modules, offering a unified path toward operational paddy mapping in fragmented sub-meter landscapes. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

► Show Figures

Figure 1

20 pages, 6074 KiB

Open AccessArticle

Remote Sensing Archaeology of the Xixia Imperial Tombs: Analyzing Burial Landscapes and Geomantic Layouts

by Wei Ji, Li Li, Jia Yang, Yuqi Hao and Lei Luo

Remote Sens. 2025, 17(14), 2395; https://doi.org/10.3390/rs17142395 - 11 Jul 2025

Viewed by 237

Abstract

The Xixia Imperial Tombs (XITs) represent a crucial, yet still largely mysterious, component of the Tangut civilization’s legacy. Located in northwestern China, this extensive necropolis offers invaluable insights into the Tangut state, culture, and burial practices. This study employs an integrated approach utilizing [...] Read more.

The Xixia Imperial Tombs (XITs) represent a crucial, yet still largely mysterious, component of the Tangut civilization’s legacy. Located in northwestern China, this extensive necropolis offers invaluable insights into the Tangut state, culture, and burial practices. This study employs an integrated approach utilizing multi-resolution and multi-temporal satellite remote sensing data, including Gaofen-2 (GF-2), Landsat-8 OLI, declassified GAMBIT imagery, and Google Earth, combined with deep learning techniques, to conduct a comprehensive archaeological investigation of the XITs’ burial landscape. We performed geomorphological analysis of the surrounding environment and automated identification and mapping of burial mounds and mausoleum features using YOLOv5, complemented by manual interpretation of very-high-resolution (VHR) satellite imagery. Spectral indices and image fusion techniques were applied to enhance the detection of archaeological features. Our findings demonstrated the efficacy of this combined methodology for archaeology prospect, providing valuable insights into the spatial layout, geomantic considerations, and preservation status of the XITs. Notably, the analysis of declassified GAMBIT imagery facilitated the identification of a suspected true location for the ninth imperial tomb (M9), a significant contribution to understanding Xixia history through remote sensing archaeology. This research provides a replicable framework for the detection and preservation of archaeological sites using readily available satellite data, underscoring the power of advanced remote sensing and machine learning in heritage studies. Full article

(This article belongs to the Special Issue Multiscale and Multitemporal High Resolution Remote Sensing for Archaeology and Heritage: From Research to Preservation)

► Show Figures

Figure 1

24 pages, 3937 KiB

Open AccessArticle

HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Hyperspectral Image Classification

by Xin Dai, Zexi Li, Lin Li, Shuihua Xue, Xiaohui Huang and Xiaofei Yang

Remote Sens. 2025, 17(14), 2361; https://doi.org/10.3390/rs17142361 - 9 Jul 2025

Viewed by 238

Abstract

Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) [...] Read more.

Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) insufficient synergy between spectral and spatial feature learning due to rigid coupling mechanisms; (2) high computational complexity resulting from redundant attention calculations; and (3) limited adaptability to spectral redundancy and noise in small-sample scenarios. To address these limitations, we propose HyperTransXNet, a novel CNN-Transformer hybrid architecture that incorporates adaptive spectral-spatial fusion. Specifically, the proposed HyperTransXNet comprises three key modules: (1) a Hybrid Spatial-Spectral Module (HSSM) that captures the refined local spectral-spatial features and models global spectral correlations by combining depth-wise dynamic convolution with frequency-domain attention; (2) a Mixture-of-Experts Routing (MoE-R) module that adaptively fuses multi-scale features by dynamically selecting optimal experts via Top-K sparse weights; and (3) a Spatial-Spectral Tokens Enhancer (SSTE) module that ensures causality-preserving interactions between spectral bands and spatial contexts. Extensive experiments on the Indian Pines, Houston 2013, and WHU-Hi-LongKou datasets demonstrate the superiority of HyperTransXNet. Full article

(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)

► Show Figures

Figure 1

28 pages, 14588 KiB

Open AccessArticle

CAU²DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data

by Xi Lyu, Chenyu Zhang, Lizhi Miao, Xiying Sun, Xinxin Zhou, Xinyi Yue, Zhongchang Sun and Yueyong Pang

Remote Sens. 2025, 17(14), 2359; https://doi.org/10.3390/rs17142359 - 9 Jul 2025

Viewed by 158

Abstract

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face [...] Read more.

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face challenges such as limited receptive fields and insufficient sensitivity to spatial locations when integrating multi-source remote sensing data, and high-quality datasets that integrate multi-spectral and geoscientific indicators to support them are scarce. In response to these issues, this study proposes a DL model (coordinate-attentive U²-DeepLab network [CAU²DNet]) that integrates multi-source remote sensing data. The model integrates the multi-scale feature extraction capability of U²-Net with the global receptive field advantage of DeepLabV3+ through a dual-branch architecture. Thereafter, the spatial semantic perception capability is enhanced by introducing the CoordAttention mechanism, and ConvNextV2 is adopted to optimize the backbone network of the DeepLabV3+ branch, thereby improving the modeling capability of low-resolution geoscientific features. The two branches adopt a decision-level fusion mechanism for feature fusion, which means that the results of each are weighted and summed using learnable weights to obtain the final output feature map. Furthermore, this study constructs the São Paulo slums dataset for model training due to the lack of a multi-spectral slum dataset. This dataset covers 7978 samples of 512 × 512 pixels, integrating high-resolution RGB images, Normalized Difference Vegetation Index (NDVI)/Modified Normalized Difference Water Index (MNDWI) geoscientific indicators, and POI infrastructure data, which can significantly enrich multi-source slum remote sensing data. Experiments have shown that CAU²DNet achieves an intersection over union (IoU) of 0.6372 and an F1 score of 77.97% on the São Paulo slums dataset, indicating a significant improvement in accuracy over the baseline model. The ablation experiments verify that the improvements made in this study have resulted in a 16.12% increase in precision. Moreover, CAU²DNet also achieved the best results in all metrics during the cross-domain testing on the WHU building dataset, further confirming the model’s generalizability. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques)

► Show Figures

Figure 1

46 pages, 5911 KiB

Open AccessArticle

Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition

by Guohao Xie, Zhe Chen, Yaan Li, Mingsong Chen, Feng Chen, Yuxin Zhang, Hongyan Jiang and Hongbing Qiu

Remote Sens. 2025, 17(14), 2338; https://doi.org/10.3390/rs17142338 - 8 Jul 2025

Viewed by 263

Abstract

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, [...] Read more.

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, enhanced by domain-specific prior knowledge. The architecture employs a Convolutional Block Attention Module (CBAM) for localized feature refinement, a lightweight New Transformer Encoder for global context modeling, and a novel TriFusion Block to synergize spectral–temporal–spatial features through parallel multi-branch fusion, addressing the limitations of single-modality extraction. Leveraging the mean teacher framework, DART-MT optimizes consistency regularization to exploit unlabeled data, effectively mitigating class imbalance and annotation scarcity. Evaluations on the DeepShip and ShipsEar datasets demonstrate state-of-the-art accuracy: with 10% labeled data, DART-MT achieves 96.20% (DeepShip) and 94.86% (ShipsEar), surpassing baseline models by 7.2–9.8% in low-data regimes, while reaching 98.80% (DeepShip) and 98.85% (ShipsEar) with 90% labeled data. Under varying noise conditions (−20 dB to 20 dB), the model maintained a robust performance (F1-score: 92.4–97.1%) with 40% lower variance than its competitors, and ablation studies validated each module’s contribution (TriFusion Block alone improved accuracy by 6.9%). This research advances UATR by (1) resolving multi-scale feature fusion bottlenecks, (2) demonstrating the efficacy of semi-supervised learning in marine acoustics, and (3) providing an open-source implementation for reproducibility. In future work, we will extend cross-domain adaptation to diverse oceanic environments. Full article

(This article belongs to the Special Issue Remote Sensing Target Recognition and Detection: Theory and Applications (Second Edition))

► Show Figures

Figure 1

28 pages, 35973 KiB

Open AccessArticle

SFT-GAN: Sparse Fast Transformer Fusion Method Based on GAN for Remote Sensing Spatiotemporal Fusion

by Zhaoxu Ma, Wenxing Bao, Wei Feng, Xiaowu Zhang, Xuan Ma and Kewen Qu

Remote Sens. 2025, 17(13), 2315; https://doi.org/10.3390/rs17132315 - 5 Jul 2025

Viewed by 258

Abstract

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution [...] Read more.

Multi-source remote sensing spatiotemporal fusion aims to enhance the temporal continuity of high-spatial, low-temporal-resolution images. In recent years, deep learning-based spatiotemporal fusion methods have achieved significant progress in this field. However, existing methods face three major challenges. First, large differences in spatial resolution among heterogeneous remote sensing images hinder the reconstruction of high-quality texture details. Second, most current deep learning-based methods prioritize spatial information while overlooking spectral information. Third, these methods often depend on complex network architectures, resulting in high computational costs. To address the aforementioned challenges, this article proposes a Sparse Fast Transformer fusion method based on Generative Adversarial Network (SFT-GAN). First, the method introduces a multi-scale feature extraction and fusion architecture to capture temporal variation features and spatial detail features across multiple scales. A channel attention mechanism is subsequently designed to integrate these heterogeneous features adaptively. Secondly, two information compensation modules are introduced: detail compensation module, which enhances high-frequency information to improve the fidelity of spatial details; spectral compensation module, which improves spectral fidelity by leveraging the intrinsic spectral correlation of the image. In addition, the proposed sparse fast transformer significantly reduces both the computational and memory complexity of the method. Experimental results on four publicly available benchmark datasets showed that the proposed SFT-GAN achieved the best performance compared with state-of-the-art methods in fusion accuracy while reducing computational cost by approximately 70%. Additional classification experiments further validated the practical effectiveness of SFT-GAN. Overall, this approach presents a new paradigm for balancing accuracy and efficiency in spatiotemporal fusion. Full article

(This article belongs to the Special Issue Remote Sensing Data Fusion and Applications (2nd Edition))

► Show Figures

Figure 1

20 pages, 4929 KiB

Open AccessArticle

Remote Sensing Image-Based Building Change Detection: A Case Study of the Qinling Mountains in China

by Lei Fu, Yunfeng Zhang, Keyun Zhao, Lulu Zhang, Ying Li, Changjing Shang and Qiang Shen

Remote Sens. 2025, 17(13), 2249; https://doi.org/10.3390/rs17132249 - 30 Jun 2025

Viewed by 269

Abstract

With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing [...] Read more.

With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing scenes, significantly constrain the performance of change detection. To address these challenges, a change detection algorithm based on spatio-spectral information aggregation is proposed, which consists of two key modules: the Cross-Scale Heterogeneous Convolution module (CSHConv) and the Spatio-Spectral Information Fusion module (SSIF). CSHConv mitigates information loss caused by scale heterogeneity, thereby enhancing the effective utilization of multi-scale features. Meanwhile, SSIF models spatial and spectral information jointly, capturing interactions across different spatial scales and spectral domains. This investigation is illustrated with a case study conducted with the real-world dataset QL-CD (Qinling change detection), acquired in the Qinling region of China. The work includes the construction of QL-CD, which includes 12,724 pairs of images captured by the Gaofen-1 satellite. Experimental results demonstrate that the proposed approach outperforms a wide range of state-of-the-art algorithms. Full article

(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)

► Show Figures

Figure 1

22 pages, 2999 KiB

Open AccessArticle

MSFNet: A Multi-Source Fusion-Based Method with Enhanced Hierarchical Spectral Semantic Perception for Wheat Disease Region Classification

by Wenxu Jia, Ziyang Guo, Wenjing Zhang, Haixi Zhang and Bin Liu

Appl. Sci. 2025, 15(13), 7317; https://doi.org/10.3390/app15137317 - 29 Jun 2025

Viewed by 210

Abstract

Wheat diseases threaten yield and food security, highlighting the need for rapid, accurate diagnosis in precision agriculture. However, current remote sensing methods often lack hierarchical spectral semantic perception or rely on single-source data and simple fusion, limiting diagnostic performance. To address these challenges, [...] Read more.

Wheat diseases threaten yield and food security, highlighting the need for rapid, accurate diagnosis in precision agriculture. However, current remote sensing methods often lack hierarchical spectral semantic perception or rely on single-source data and simple fusion, limiting diagnostic performance. To address these challenges, this study proposed MSFNet, a novel multi-source fusion network with enhanced hierarchical spectral semantic perception, to achieve the precise regional classification of wheat diseases. Specifically, a multi-source fusion module (MSFM) was developed, employing a dual-branch architecture to simultaneously enhance spatial–spectral semantics and comprehensively explore complementary cross-modal features, thereby enabling the effective integration of critical information from both modalities. Furthermore, a hierarchical spectral semantic fusion module (HSSFM) was developed, which employs a pyramid architecture integrated with attention mechanisms to fuse hierarchical spectral semantics, thereby significantly enhancing the model’s hierarchical feature representation capacity. To support this research, we constructed a new multispectral remote sensing dataset, MSWDD2024, tailored for wheat disease region diagnosis. Experimental evaluations on MSWDD2024 demonstrated that MSFNet achieved 95.4% accuracy, 95.6% precision, and 95.6% recall, surpassing ResNet18 by 6.0%, 6.0%, and 5.8%, respectively, and outperforming RGB-only models by over 12% across all metrics. Moreover, MSFNet consistently exceeded the performance of existing state-of-the-art methods. These results confirm the superior effectiveness of MSFNet in remote sensing-based wheat disease diagnosis, offering a promising solution for robust and accurate monitoring in precision agriculture. Full article

► Show Figures

Figure 1

31 pages, 6788 KiB

Open AccessArticle

A Novel Dual-Modal Deep Learning Network for Soil Salinization Mapping in the Keriya Oasis Using GF-3 and Sentinel-2 Imagery

by Ilyas Nurmemet, Yang Xiang, Aihepa Aihaiti, Yu Qin, Yilizhati Aili, Hengrui Tang and Ling Li

Agriculture 2025, 15(13), 1376; https://doi.org/10.3390/agriculture15131376 - 27 Jun 2025

Viewed by 372

Abstract

Soil salinization poses a significant threat to agricultural productivity, food security, and ecological sustainability in arid and semi-arid regions. Effectively and timely mapping of different degrees of salinized soils is essential for sustainable land management and ecological restoration. Although deep learning (DL) methods [...] Read more.

Soil salinization poses a significant threat to agricultural productivity, food security, and ecological sustainability in arid and semi-arid regions. Effectively and timely mapping of different degrees of salinized soils is essential for sustainable land management and ecological restoration. Although deep learning (DL) methods have been widely employed for soil salinization extraction from remote sensing (RS) data, the integration of multi-source RS data with DL methods remains challenging due to issues such as limited data availability, speckle noise, geometric distortions, and suboptimal data fusion strategies. This study focuses on the Keriya Oasis, Xinjiang, China, utilizing RS data, including Sentinel-2 multispectral and GF-3 full-polarimetric SAR (PolSAR) images, to conduct soil salinization classification. We propose a Dual-Modal deep learning network for Soil Salinization named DMSSNet, which aims to improve the mapping accuracy of salinization soils by effectively fusing spectral and polarimetric features. DMSSNet incorporates self-attention mechanisms and a Convolutional Block Attention Module (CBAM) within a hierarchical fusion framework, enabling the model to capture both intra-modal and cross-modal dependencies and to improve spatial feature representation. Polarimetric decomposition features and spectral indices are jointly exploited to characterize diverse land surface conditions. Comprehensive field surveys and expert interpretation were employed to construct a high-quality training and validation dataset. Experimental results indicate that DMSSNet achieves an overall accuracy of 92.94%, a Kappa coefficient of 79.12%, and a macro F1-score of 86.52%, positively outperforming conventional DL models (ResUNet, SegNet, DeepLabv3+). The results confirm the superiority of attention-guided dual-branch fusion networks for distinguishing varying degrees of soil salinization across heterogeneous landscapes and highlight the value of integrating Sentinel-2 optical and GF-3 PolSAR data for complex land surface classification tasks. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

24 pages, 2802 KiB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Viewed by 324

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

Search Results (396)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (396)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI