MDPI - Publisher of Open Access Journals

23 pages, 3492 KiB

Open AccessArticle

A Multimodal Deep Learning Framework for Accurate Biomass and Carbon Sequestration Estimation from UAV Imagery

by Furkat Safarov, Ugiloy Khojamuratova, Misirov Komoliddin, Xusinov Ibragim Ismailovich and Young Im Cho

Drones 2025, 9(7), 496; https://doi.org/10.3390/drones9070496 - 14 Jul 2025

Viewed by 88

Abstract

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to [...] Read more.

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to capture the intricate spectral and structural heterogeneity of forest canopies, particularly at fine spatial resolutions. To address these limitations, we introduce ForestIQNet, a novel end-to-end multimodal deep learning framework designed to estimate AGB and associated carbon stocks from UAV-acquired imagery with high spatial fidelity. ForestIQNet combines dual-stream encoders for processing multispectral UAV imagery and a voxelized Canopy Height Model (CHM), fused via a Cross-Attentional Feature Fusion (CAFF) module, enabling fine-grained interaction between spectral reflectance and 3D structure. A lightweight Transformer-based regression head then performs multitask prediction of AGB and CO₂e, capturing long-range spatial dependencies and enhancing generalization. Proposed method achieves an R² of 0.93 and RMSE of 6.1 kg for AGB prediction, compared to 0.78 R² and 11.7 kg RMSE for XGBoost and 0.73 R² and 13.2 kg RMSE for Random Forest. Despite its architectural complexity, ForestIQNet maintains a low inference cost (27 ms per patch) and generalizes well across species, terrain, and canopy structures. These results establish a new benchmark for UAV-enabled biomass estimation and provide scalable, interpretable tools for climate monitoring and forest management. Full article

(This article belongs to the Special Issue UAVs for Nature Conservation Tasks in Complex Environments)

► Show Figures

Figure 1

30 pages, 5053 KiB

Open AccessArticle

Dual-Branch Spatial–Spectral Transformer with Similarity Propagation for Hyperspectral Image Classification

by Teng Wen, Heng Wang and Liguo Wang

Remote Sens. 2025, 17(14), 2386; https://doi.org/10.3390/rs17142386 - 10 Jul 2025

Viewed by 274

Abstract

In recent years, Vision Transformers (ViTs) have gained significant traction in the field of hyperspectral image classification due to their advantages in modeling long-range dependency relationships between spectral bands and spatial pixels. However, after stacking multiple Transformer encoders, challenges pertaining to information degradation [...] Read more.

In recent years, Vision Transformers (ViTs) have gained significant traction in the field of hyperspectral image classification due to their advantages in modeling long-range dependency relationships between spectral bands and spatial pixels. However, after stacking multiple Transformer encoders, challenges pertaining to information degradation may emerge during the forward propagation. That is to say, existing Transformer-based methods exhibit certain limitations in retaining and effectively utilizing information throughout their forward transmission. To tackle these challenges, this paper proposes a novel dual-branch spatial–spectral Transformer model that incorporates similarity propagation (DBSSFormer-SP). Specifically, this model first employs a Hybrid Pooling Spatial Channel Attention (HPSCA) module to integrate global information by pooling across different dimensional directions, thereby enhancing its ability to extract salient features. Secondly, we introduce a mechanism for transferring similarity attention that aims to retain and strengthen key semantic features, thus mitigating issues associated with information degradation. Additionally, the Spectral Transformer (SpecFormer) module is employed to capture long-range dependencies among spectral bands. Finally, the extracted spatial and spectral features are fed into a multilayer perceptron (MLP) module for classification. The proposed method is evaluated against several mainstream approaches on four public datasets. Experimental results demonstrate that DBSSFormer-SP exhibits excellent classification performance. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning with Applications in Remote Sensing (Third Edition))

► Show Figures

Figure 1

20 pages, 2387 KiB

Open AccessArticle

Contrastive Learning-Based Hyperspectral Image Target Detection Using a Gated Dual-Path Network

by Jiake Wu, Rong Liu and Nan Wang

Remote Sens. 2025, 17(14), 2345; https://doi.org/10.3390/rs17142345 - 9 Jul 2025

Viewed by 261

Abstract

Deep learning-based hyperspectral target detection (HTD) methods often face the challenge of insufficient prior information and difficulty in distinguishing local and global spectral differences. To address these problems, we propose a self-supervised framework that leverages contrastive learning to reduce dependence on prior knowledge, [...] Read more.

Deep learning-based hyperspectral target detection (HTD) methods often face the challenge of insufficient prior information and difficulty in distinguishing local and global spectral differences. To address these problems, we propose a self-supervised framework that leverages contrastive learning to reduce dependence on prior knowledge, called the Gated Dual-Path Network with Contrastive Learning (GDPNCL). In this work, we introduce a novel sample augmentation strategy for deep network training, in which each pixel in the scene is processed using a dual concentric window to generate positive and negative samples. In addition, a Gated Dual-Path Network (GDPN) is proposed to effectively extract and discriminate local and global information from the spectra. Moreover, to mitigate the issue of false negative samples within the same class and to enhance the contrast between negative samples, we design a Weight Information Noise contrastive estimation (WIN) loss. The loss leverages the relationship between samples to further help the model learn representations that effectively distinguish targets from diverse backgrounds. Finally, the trained encoder is subsequently employed to extract features from the prior spectrum and test pixels, and the cosine similarity between them serves as the detection metric. Comprehensive experiments on four challenging hyperspectral datasets demonstrate that the GDPNCL outperforms state-of-the-art methods, highlighting its effectiveness and robustness in HTD. Full article

► Show Figures

Figure 1

24 pages, 2324 KiB

Open AccessArticle

FUSE-Net: Multi-Scale CNN for NIR Band Prediction from RGB Using GNDVI-Guided Green Channel Enhancement

by Gwanghyeong Lee, Deepak Ghimire, Donghoon Kim, Sewoon Cho, Byoungjun Kim and Sunghwan Jeong

Sensors 2025, 25(13), 4076; https://doi.org/10.3390/s25134076 - 30 Jun 2025

Viewed by 326

Abstract

Hyperspectral imaging (HSI) is a powerful tool for precision imaging tasks such as vegetation analysis, but its widespread use remains limited due to the high cost of equipment and challenges in data acquisition. To explore a more accessible alternative, we propose a Green [...] Read more.

Hyperspectral imaging (HSI) is a powerful tool for precision imaging tasks such as vegetation analysis, but its widespread use remains limited due to the high cost of equipment and challenges in data acquisition. To explore a more accessible alternative, we propose a Green Normalized Difference Vegetation Index (GNDVI)-guided green channel adjustment method, termed G-RGB, which enables the estimation of near-infrared (NIR) reflectance from standard RGB image inputs. The G-RGB method enhances the green channel to encode NIR-like information, generating a spectrally enriched representation. Building on this, we introduce FUSE-Net, a novel deep learning model that combines multi-scale convolutional layers and MLP-Mixer-based channel learning to effectively model spatial and spectral dependencies. For evaluation, we constructed a high-resolution RGB-HSI paired dataset by capturing basil leaves under controlled conditions. Through ablation studies and band combination analysis, we assessed the model’s ability to recover spectral information. The experimental results showed that the G-RGB input consistently outperformed unmodified RGB across multiple metrics, including mean squared error (MSE), peak signal-to-noise ratio (PSNR), spectral correlation coefficient (SCC), and structural similarity (SSIM), with the best performance observed when paired with FUSE-Net. While our method does not replace true NIR data, it offers a viable approximation during inference when only RGB images are available, supporting cost-effective analysis in scenarios where HSI systems are inaccessible. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

27 pages, 4947 KiB

Open AccessArticle

From Coarse to Crisp: Enhancing Tree Species Maps with Deep Learning and Satellite Imagery

by Taebin Choe, Seungpyo Jeon, Byeongcheol Kim and Seonyoung Park

Remote Sens. 2025, 17(13), 2222; https://doi.org/10.3390/rs17132222 - 28 Jun 2025

Viewed by 343

Abstract

Accurate, detailed, and up-to-date tree species distribution information is essential for effective forest management and environmental research. However, existing tree species maps face limitations in resolution and update cycle, making it difficult to meet modern demands. To overcome these limitations, this study proposes [...] Read more.

Accurate, detailed, and up-to-date tree species distribution information is essential for effective forest management and environmental research. However, existing tree species maps face limitations in resolution and update cycle, making it difficult to meet modern demands. To overcome these limitations, this study proposes a novel framework that utilizes existing medium-resolution national tree species maps as ‘weak labels’ and fuses multi-temporal Sentinel-2 and PlanetScope satellite imagery data. Specifically, a super-resolution (SR) technique, using PlanetScope imagery as a reference, was first applied to Sentinel-2 data to enhance its resolution to 2.5 m. Then, these enhanced Sentinel-2 bands were combined with PlanetScope bands to construct the final multi-spectral, multi-temporal input data. Deep learning (DL) model training data was constructed by strategically sampling information-rich pixels from the national tree species map. Applying the proposed methodology to Sobaeksan and Jirisan National Parks in South Korea, the performance of various machine learning (ML) and deep learning (DL) models was compared, including traditional ML (linear regression, random forest) and DL architectures (multilayer perceptron (MLP), spectral encoder block (SEB)—linear, and SEB-transformer). The MLP model demonstrated optimal performance, achieving over 85% overall accuracy (OA) and more than 81% accuracy in classifying spectrally similar and difficult-to-distinguish species, specifically Quercus mongolica (QM) and Quercus variabilis (QV). Furthermore, while spectral and temporal information were confirmed to contribute significantly to tree species classification, the contribution of spatial (texture) information was experimentally found to be limited at the 2.5 m resolution level. This study presents a practical method for creating high-resolution tree species maps scalable to the national level by fusing existing tree species maps with Sentinel-2 and PlanetScope imagery without requiring costly separate field surveys. Its significance lies in establishing a foundation that can contribute to various fields such as forest resource management, biodiversity conservation, and climate change research. Full article

(This article belongs to the Special Issue Digital Modeling for Sustainable Forest Management)

► Show Figures

Figure 1

22 pages, 969 KiB

Open AccessArticle

A Spectral Interpretable Bearing Fault Diagnosis Framework Powered by Large Language Models

by Panfeng Bao, Wenjun Yi, Yue Zhu, Yufeng Shen and Haotian Peng

Sensors 2025, 25(12), 3822; https://doi.org/10.3390/s25123822 - 19 Jun 2025

Viewed by 575

Abstract

Most existing fault diagnosis methods, although capable of extracting interpretable features such as attention-weighted fault-related frequencies, remain essentially black-box models that provide only classification results without transparent reasoning or diagnostic justification, limiting users’ ability to understand and trust diagnostic outcomes. In this work, [...] Read more.

Most existing fault diagnosis methods, although capable of extracting interpretable features such as attention-weighted fault-related frequencies, remain essentially black-box models that provide only classification results without transparent reasoning or diagnostic justification, limiting users’ ability to understand and trust diagnostic outcomes. In this work, we present a novel, interpretable fault diagnosis framework that integrates spectral feature extraction with large language models (LLMs). Vibration signals are first transformed into spectral representations using Hilbert- and Fourier-based encoders to highlight key frequencies and amplitudes. A channel attention-augmented convolutional neural network provides an initial fault type prediction. Subsequently, structured information—including operating conditions, spectral features, and CNN outputs—is fed into a fine-tuned enhanced LLM, which delivers both an accurate diagnosis and a transparent reasoning process. Experiments demonstrate that our framework achieves high diagnostic performance while substantially improving interpretability, making advanced fault diagnosis accessible to non-expert users in industrial settings. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

► Show Figures

Figure 1

32 pages, 8835 KiB

Open AccessArticle

SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification

by Xuan Liu, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen and Yongjie Ma

Remote Sens. 2025, 17(12), 2034; https://doi.org/10.3390/rs17122034 - 12 Jun 2025

Viewed by 1440

Abstract

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional [...] Read more.

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution. Full article

► Show Figures

Figure 1

23 pages, 10060 KiB

Open AccessArticle

MFA-SCDNet: A Semantic Change Detection Network for Visible and Infrared Image Pairs

by Xingyu Li, Jiulu Gong, Jianxiong Wen and Zepeng Wang

Remote Sens. 2025, 17(12), 2011; https://doi.org/10.3390/rs17122011 - 11 Jun 2025

Viewed by 912

Abstract

Semantic Change Detection (SCD) in remote sensing imagery is a common technique for monitoring surface dynamics. However, geospatial data acquisition increasingly involves the collection of visible and infrared images. SCD in visible and infrared image pairs confronts the challenge of distinguishing genuine semantic [...] Read more.

Semantic Change Detection (SCD) in remote sensing imagery is a common technique for monitoring surface dynamics. However, geospatial data acquisition increasingly involves the collection of visible and infrared images. SCD in visible and infrared image pairs confronts the challenge of distinguishing genuine semantic change from spectral discrepancies caused by heterogeneous imaging mechanisms. To address this issue, we propose a Modal Feature Analysis Semantic Change Detection Network (MFA-SCDNet), a novel framework that analyzes cross-modal features for change identification. The proposed architecture operates through three principal technical components: An infrared feature enhancement module that transforms infrared inputs into three-channel representations through spectral domain adaptation, enhancing the network’s perception of both high-frequency and low-frequency information in images; an encoder–decoder structure that simultaneously extracts modality-specific features and common features through adversarial learning; and a synergistic information fusion mechanism that integrates semantic recognition with change detection through multi-task optimization. Specific features are employed for semantic recognition, while common features are utilized for change detection, ultimately resulting in a comprehensive understanding of semantic changes. Experiments on public datasets show that MFA-SCDNet has an average improvement of 9.4% in mIoU_bc and 12.9% in mIoU_sc compared with the alternatives. MFA-SCDNet has better performance in heterogeneous images SCD. Full article

► Show Figures

Figure 1

31 pages, 8699 KiB

Open AccessArticle

Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping

by Xinxin Zhang, Hongwei Wei, Yuzhou Shao, Haijun Luan and Da-Han Wang

Remote Sens. 2025, 17(12), 1999; https://doi.org/10.3390/rs17121999 - 10 Jun 2025

Viewed by 365

Abstract

Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across [...] Read more.

Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across spatial‒temporal regions. In this study, we propose a novel rice mapping approach based on dual-branch transformer fusion networks, named RDTFNet. Specifically, we implemented a dual-branch encoder that is based on two improved transformer architectures. One is a multiscale transformer block used to extract spatial–spectral features from a single-phase optical image, and the other is a Restormer block used to extract spatial–temporal features from time-series synthetic aperture radar (SAR) images. Both extracted features were then combined into a feature fusion module (FFM) to generate fully fused spatial–temporal–spectral (STS) features, which were finally fed into the decoder of the U-Net structure for rice mapping. The model’s performance was evaluated through experiments with the Sentinel-1 and Sentinel-2 datasets from the United States. Compared with conventional models, the RDTFNet model achieved the best performance, and the overall accuracy (OA), intersection over union (IoU), precision, recall and F1-score were 96.95%, 88.12%, 95.14%, 92.27% and 93.68%, respectively. The comparative results show that the OA, IoU, accuracy, recall and F1-score improved by 1.61%, 5.37%, 5.16%, 1.12% and 2.53%, respectively, over those of the baseline model, demonstrating its superior performance for rice mapping. Furthermore, in subsequent cross-regional and cross-temporal tests, RDTFNet outperformed other classical models, achieving improvements of 7.11% and 12.10% in F1-score, and 11.55% and 18.18% in IoU, respectively. These results further confirm the robustness of the proposed model. Therefore, the proposed RDTFNet model can effectively fuse STS features from multimodal images and exhibit strong generalization capabilities, providing valuable information for governments in agricultural management. Full article

(This article belongs to the Special Issue Improving Remote Sensing Crop Mapping and Yield Estimation by New Techniques)

► Show Figures

Figure 1

29 pages, 18946 KiB

Open AccessArticle

YOLO-SBA: A Multi-Scale and Complex Background Aware Framework for Remote Sensing Target Detection

by Yifei Yuan, Yingmei Wei, Xiaoyan Zhou, Yanming Guo, Jiangming Chen and Tingshuai Jiang

Remote Sens. 2025, 17(12), 1989; https://doi.org/10.3390/rs17121989 - 9 Jun 2025

Viewed by 487

Abstract

Remote sensing target detection faces significant challenges in handling multi-scale targets, with the high similarity in color and shape between targets and backgrounds in complex scenes further complicating the detection task. To address this challenge, we propose a multi-Scale and complex [...] Read more.

Remote sensing target detection faces significant challenges in handling multi-scale targets, with the high similarity in color and shape between targets and backgrounds in complex scenes further complicating the detection task. To address this challenge, we propose a multi-Scale and complex Background Aware network for remote sensing target detection, named YOLO-SBA. Our proposed YOLO-SBA first processes the input through the Multi-Branch Attention Feature Fusion Module (MBAFF) to extract global contextual dependencies and local detail features. It then integrates these features using the Bilateral Attention Feature Mixer (BAFM) for efficient fusion, enhancing the saliency of multi-scale target features to tackle target scale variations. Next, we utilize the Gated Multi-scale Attention Pyramid (GMAP) to perform channel–spatial dual reconstruction and gating fusion encoding on multi-scale feature maps. This enhances target features while finely suppressing spectral redundancy. Additionally, to prevent the loss of effective information extracted by key modules during inference, we improve the downsampling method using Asymmetric Dynamic Downsampling (ADDown), maximizing the retention of image detail information. We achieve the best performance on the DIOR, DOTA, and RSOD datasets. On the DIOR dataset, YOLO-SBA improves mAP by 16.6% and single-category detection AP by 0.8–23.8% compared to the existing state-of-the-art algorithm. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Algorithm for the Analysis of Remote Sensing Images (Third Edition))

► Show Figures

Figure 1

21 pages, 10091 KiB

Open AccessArticle

Scalable Hyperspectral Enhancement via Patch-Wise Sparse Residual Learning: Insights from Super-Resolved EnMAP Data

by Parth Naik, Rupsa Chakraborty, Sam Thiele and Richard Gloaguen

Remote Sens. 2025, 17(11), 1878; https://doi.org/10.3390/rs17111878 - 28 May 2025

Viewed by 637

Abstract

A majority of hyperspectral super-resolution methods aim to enhance the spatial resolution of hyperspectral imaging data (HSI) by integrating high-resolution multispectral imaging data (MSI), leveraging rich spectral information for various geospatial applications. Key challenges include spectral distortions from high-frequency spatial data, high computational [...] Read more.

A majority of hyperspectral super-resolution methods aim to enhance the spatial resolution of hyperspectral imaging data (HSI) by integrating high-resolution multispectral imaging data (MSI), leveraging rich spectral information for various geospatial applications. Key challenges include spectral distortions from high-frequency spatial data, high computational complexity, and limited training data, particularly for new-generation sensors with unique noise patterns. In this contribution, we propose a novel parallel patch-wise sparse residual learning (P²SR) algorithm for resolution enhancement based on fusion of HSI and MSI. The proposed method uses multi-decomposition techniques (i.e., Independent component analysis, Non-negative matrix factorization, and 3D wavelet transforms) to extract spatial and spectral features to form a sparse dictionary. The spectral and spatial characteristics of the scene encoded in the dictionary enable reconstruction through a first-order optimization algorithm to ensure an efficient sparse representation. The final spatially enhanced HSI is reconstructed by combining the learned features from low-resolution HSI and applying an MSI-regulated guided filter to enhance spatial fidelity while minimizing artifacts. P²SR is deployable on a high-performance computing (HPC) system with parallel processing, ensuring scalability and computational efficiency for large HSI datasets. Extensive evaluations on three diverse study sites demonstrate that P²SR consistently outperforms traditional and state-of-the-art (SOA) methods in both quantitative metrics and qualitative spatial assessments. Specifically, P²SR achieved the best average PSNR (25.2100) and SAM (12.4542) scores, indicating superior spatio-spectral reconstruction contributing to sharper spatial features, reduced mixed pixels, and enhanced geological features. P²SR also achieved the best average ERGAS (8.9295) and Q2n (0.5156), which suggests better overall fidelity across all bands and perceptual accuracy with the least spectral distortions. Importantly, we show that P²SR preserves critical spectral signatures, such as Fe²⁺ absorption, and improves the detection of fine-scale environmental and geological structures. P²SR’s ability to maintain spectral fidelity while enhancing spatial detail makes it a powerful tool for high-precision remote sensing applications, including mineral mapping, land-use analysis, and environmental monitoring. Full article

► Show Figures

Graphical abstract

23 pages, 7221 KiB

Open AccessArticle

SFANet: A Ground Object Spectral Feature Awareness Network for Multimodal Remote Sensing Image Semantic Segmentation

by Yizhou Lan, Daoyuan Zheng, Yingjun Zheng, Feizhou Zhang, Zhuodong Xu, Ke Shang and Zeyu Wan

Remote Sens. 2025, 17(10), 1797; https://doi.org/10.3390/rs17101797 - 21 May 2025

Viewed by 473

Abstract

The semantic segmentation of remote sensing images is vital for accurate surface monitoring and environmental assessment. Multimodal remote sensing images (RSIs) provide a more comprehensive dimension of information, enabling faster and more scientific decision-making. However, existing methods primarily focus on modality and spectral [...] Read more.

The semantic segmentation of remote sensing images is vital for accurate surface monitoring and environmental assessment. Multimodal remote sensing images (RSIs) provide a more comprehensive dimension of information, enabling faster and more scientific decision-making. However, existing methods primarily focus on modality and spectral channels when utilizing spectral features, with limited consideration of their association to ground object types. This association, commonly referred to as the spectral characteristics of ground objects (SCGO), results in distinct spectral responses across different modalities and holds significant potential for improving the segmentation accuracy of multimodal RSIs. Meanwhile, the inclusion of redundant features in the fusion process can also interfere with model performance. To address these problems, a ground object spectral feature awareness network (SFANet) specifically designed for RSIs that effectively leverages spectral features by incorporating the SCGO is proposed. SFANet includes two innovative modules: (1) the Spectral Aware Feature Fusion module, which integrates multimodal features in the encoder based on SCGO, and (2) the Adaptive Spectral Enhancement module, which reduces the confusion from redundant information in the decoder. SFANet significantly improves the mIoU by 5.66% and 4.76% compared to the baseline on two datasets, outperforming existing multimodal RSIs segmentation networks by adaptively enhanced spectral feature awareness. SFANet demonstrates significant advancements over other multimodal RSIs segmentation networks and provides new perspectives for RSI-specific network design by incorporating spectral characteristics. This work offers new perspectives for the design of segmentation networks for RSIs. Full article

► Show Figures

Figure 1

24 pages, 7284 KiB

Open AccessArticle

Soybean Lodging Classification and Yield Prediction Using Multimodal UAV Data Fusion and Deep Learning

by Xingmei Xu, Yushi Fang, Guangyao Sun, Yong Zhang, Lei Wang, Chen Chen, Lisuo Ren, Lei Meng, Yinghui Li, Lijuan Qiu, Yan Guo, Helong Yu and Yuntao Ma

Remote Sens. 2025, 17(9), 1490; https://doi.org/10.3390/rs17091490 - 23 Apr 2025

Viewed by 787

Abstract

UAV remote sensing is widely used in the agricultural sector due to its non-destructive, rapid, and cost-effective advantages. This study utilized two years of field data with multisource fused imagery of soybeans to evaluate lodging conditions and investigate the impact of lodging grade [...] Read more.

UAV remote sensing is widely used in the agricultural sector due to its non-destructive, rapid, and cost-effective advantages. This study utilized two years of field data with multisource fused imagery of soybeans to evaluate lodging conditions and investigate the impact of lodging grade information on yield prediction. Unlike traditional approaches that build empirical lodging models using band reflectance, vegetation indices, and texture features, this research introduces a transfer learning framework. This framework employs a ResNet18 encoder to directly extract features from raw images, bypassing the complexity of manual feature extraction processes. To address the imbalance in the lodging dataset, the Synthetic Minority Over-sampling Technique (SMOTE) strategy was employed in the feature space to balance the training set. The findings reveal that deep learning effectively extracts meaningful features from UAV imagery, outperforming traditional methods in lodging grade classification across all growth stages. On the 65 days after emergence (DAE), lodging grade classification using ResNet18 features achieved the highest accuracy (Accuracy = 0.76, recall = 0.76, F1 score = 0.73), significantly exceeding the performance of traditional methods. However, classification accuracy was relatively low in plots with higher lodging grades (lodging grades = 3, 5, 7), with an accuracy of 0.42 and an F1 score of 0.56. After applying the SMOTE module to balance the samples, the classification accuracy in plots with higher lodging grades improved to 0.65, marking an increase of 54.76%. To improve accuracy in yield prediction, this study integrates lodging information with other features, such as canopy spectral reflectance, vegetation indices, and texture features, using two multimodal data fusion strategies: input-level fusion (ResNet-EF) and intermediate-level fusion (ResNet-MF). The findings reveal that the intermediate-level fusion strategy consistently outperforms input-level fusion in yield prediction accuracy across all growth stages. Specifically, the intermediate-level fusion model incorporating measured lodging grade information achieved the highest prediction accuracy on the 85 DAE (R² = 0.65, RMSE = 529.56 kg/ha). Furthermore, when predicted lodging information was used, the model’s performance remained comparable to that of the measured lodging grades, underscoring the critical role of lodging factors in enhancing yield estimation accuracy. Full article

► Show Figures

Figure 1

24 pages, 7592 KiB

Open AccessArticle

DB-MFENet: A Dual-Branch Multi-Frequency Feature Enhancement Network for Hyperspectral Image Classification

by Chen Zang, Gaochao Song, Lei Li, Guangrui Zhao, Wanxuan Lu, Guiyuan Jiang and Qian Sun

Remote Sens. 2025, 17(8), 1458; https://doi.org/10.3390/rs17081458 - 18 Apr 2025

Viewed by 453

Abstract

HSI classification is essential for monitoring and analyzing the Earth’s surface, with methods utilizing convolutional neural networks (CNNs) and transformers rapidly gaining prominence and advancing in recent years. However, CNNs are limited by their restricted receptive fields and can only process local information. [...] Read more.

HSI classification is essential for monitoring and analyzing the Earth’s surface, with methods utilizing convolutional neural networks (CNNs) and transformers rapidly gaining prominence and advancing in recent years. However, CNNs are limited by their restricted receptive fields and can only process local information. Although transformers excel at establishing long-range dependencies, they underutilize the spatial information of HSIs. To tackle these challenges, we present the multi-frequency feature enhancement network (DB-MFENet) for HSI classification. First, orthogonal position encoding (OPE) is employed to map image coordinates into a high-dimensional space, which is then combined with corresponding spectral values to compute a multi-frequency feature. Next, the multi-frequency feature is divided into low-frequency and high-frequency components, which are separately enhanced through a dual-branch structure and then fused. Finally, a transformer encoder and a linear layer are employed to encode and classify the enhanced multi-frequency feature. The experimental results demonstrate that our method is efficient and robust for HSIs classification, achieving overall accuracies of 97.05%, 91.92%, 98.72%, and 96.31% on Indian Pines, Salinas, Pavia University, and WHU-Hi-HanChuan datasets, respectively. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

16 pages, 927 KiB

Open AccessArticle

Cross-Layer Stream Allocation of mMIMO-OFDM Hybrid Beamforming Video Communications

by You-Ting Chen, Shu-Ming Tseng, Yung-Fang Chen and Chao Fang

Sensors 2025, 25(8), 2554; https://doi.org/10.3390/s25082554 - 17 Apr 2025

Viewed by 362

Abstract

This paper proposes a source encoding rate control and cross-layer data stream allocation scheme for uplink millimeter-wave (mmWave) multi-user massive MIMO (MU-mMIMO) orthogonal frequency division multiplexing (OFDM) hybrid beamforming video communication systems. Unlike most previous studies that focus on the downlink scenario, our [...] Read more.

This paper proposes a source encoding rate control and cross-layer data stream allocation scheme for uplink millimeter-wave (mmWave) multi-user massive MIMO (MU-mMIMO) orthogonal frequency division multiplexing (OFDM) hybrid beamforming video communication systems. Unlike most previous studies that focus on the downlink scenario, our proposed scheme optimizes the uplink transmission while also addressing the limitation of prior works that only consider single-data-stream users. A key distinction of our approach is the integration of cross-layer resource allocation, which jointly considers both the physical layer channel state information (CSI) and the application layer video rate-distortion (RD) function. While traditional methods optimize for spectral efficiency (SE), our proposed method directly maximizes the peak signal-to-noise ratio (PSNR) to enhance video quality, aligning with the growing demand for high-quality video communication. We introduce a novel iterative cross-layer dynamic data stream allocation scheme, where the initial allocation is based on conventional physical-layer data stream allocation, followed by iterative refinement. Through multiple iterations, users with lower PSNR can dynamically contend for data streams, leading to a more balanced and optimized resource allocation. Our approach is a general framework that can incorporate any existing physical-layer data stream allocation as an initialization step before iteration. Simulation results demonstrate that the proposed cross-layer scheme outperforms three conventional physical-layer schemes by 0.4 to 1.14 dB in PSNR for 4–6 users, at the cost of a 1.8 to 2.3× increase in computational complexity (requiring 3.6–5.8 iterations). Full article

(This article belongs to the Special Issue MIMO Technologies in Sensors and Wireless Communication Applications: 2nd Edition)

► Show Figures

Figure 1

Search Results (163)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (163)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI