Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (451)

Search Parameters:
Keywords = remote scene classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 15825 KB  
Article
Enhancing High-Resolution Land Cover Classification Using Multi-Level Cross-Modal Attention Fusion
by Yangwei Jiang, Ting Liu, Junhao Zhou, Yihan Guo and Tangao Hu
Land 2026, 15(1), 181; https://doi.org/10.3390/land15010181 - 19 Jan 2026
Viewed by 117
Abstract
High-precision land cover classification is fundamental to environmental monitoring, urban planning, and sustainable land-use management. With the growing availability of multimodal remote sensing data, combining spectral and structural information has become an effective strategy for improving classification performance in complex high-resolution scenes. However, [...] Read more.
High-precision land cover classification is fundamental to environmental monitoring, urban planning, and sustainable land-use management. With the growing availability of multimodal remote sensing data, combining spectral and structural information has become an effective strategy for improving classification performance in complex high-resolution scenes. However, most existing methods predominantly rely on shallow feature concatenation, which fails to capture long-range dependencies and cross-modal interactions that are critical for distinguishing fine-grained land cover categories. This study proposes a multi-level cross-modal attention fusion network, Cross-Modal Cross-Attention UNet (CMCAUNet), which integrates a Cross-Modal Cross-Attention Fusion (CMCA) module and a Skip-Connection Attention Gate (SCAG) module. The CMCA module progressively enhances multimodal feature representations throughout the encoder, while the SCAG module leverages high-level semantics to refine spatial details during decoding and improve boundary delineation. Together, these modules enable more effective integration of spectral–textural and structural information. Experiments conducted on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of the proposed approach. CMCAUNet achieves an mean Intersection over Union (mIoU) ratio of 81.49% and 84.76%, with Overall Accuracy (OA) of 90.74% and 90.28%, respectively. The model also shows superior performance in small object classification, with targets like “Car,” achieving 90.85% and 96.98% OA for the “Car” category. Ablation studies further confirm that the combination of CMCA and SCAG modules significantly improves feature discriminability and leads to more accurate and detailed land cover maps. Full article
Show Figures

Figure 1

29 pages, 44274 KB  
Article
MSFFDet: A Meta-Learning-Based Support-Guided Feature Fusion Detector for Few-Shot Remote Sensing Detection
by Haoxiang Qi, Wenzhe Zhao, Ting Zhang and Guangyao Zhou
Appl. Sci. 2026, 16(2), 917; https://doi.org/10.3390/app16020917 - 15 Jan 2026
Viewed by 114
Abstract
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the [...] Read more.
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the query domain. We introduce a feature enhancement mechanism that injects fine-grained support cues into the query representation, helping the model focus on relevant regions and suppress background noise. This allows the model to generate more accurate proposals and perform robust classification, especially for visually confusing or small objects. Additionally, our method enhances feature interaction between support and query images through a nonlinear combination strategy, which captures both semantic similarity and discriminative differences. The proposed framework is fully end-to-end and jointly optimizes the feature fusion and detection processes. Experiments on three challenging benchmarks, NWPU VHR-10, iSAID and DIOR, demonstrate that our method consistently achieves state-of-the-art results under different few-shot settings and category splits. Compared with other advanced methods, it yields superior performance, highlighting its strong generalization ability in low-data remote sensing scenarios. Full article
(This article belongs to the Special Issue AI in Object Detection)
Show Figures

Figure 1

31 pages, 6416 KB  
Article
FireMM-IR: An Infrared-Enhanced Multi-Modal Large Language Model for Comprehensive Scene Understanding in Remote Sensing Forest Fire Monitoring
by Jinghao Cao, Xiajun Liu and Rui Xue
Sensors 2026, 26(2), 390; https://doi.org/10.3390/s26020390 - 7 Jan 2026
Viewed by 213
Abstract
Forest fire monitoring in remote sensing imagery has long relied on traditional perception models that primarily focus on detection or segmentation. However, such approaches fall short in understanding complex fire dynamics, including contextual reasoning, fire evolution description, and cross-modal interpretation. With the rise [...] Read more.
Forest fire monitoring in remote sensing imagery has long relied on traditional perception models that primarily focus on detection or segmentation. However, such approaches fall short in understanding complex fire dynamics, including contextual reasoning, fire evolution description, and cross-modal interpretation. With the rise of multi-modal large language models (MLLMs), it becomes possible to move beyond low-level perception toward holistic scene understanding that jointly reasons about semantics, spatial distribution, and descriptive language. To address this gap, we introduce FireMM-IR, a multi-modal large language model tailored for pixel-level scene understanding in remote-sensing forest-fire imagery. FireMM-IR incorporates an infrared-enhanced classification module that fuses infrared and visual modalities, enabling the model to capture fire intensity and hidden ignition areas under dense smoke. Furthermore, we design a mask-generation module guided by language-conditioned segmentation tokens to produce accurate instance masks from natural-language queries. To effectively learn multi-scale fire features, a class-aware memory mechanism is introduced to maintain contextual consistency across diverse fire scenes. We also construct FireMM-Instruct, a unified corpus of 83,000 geometrically aligned RGB–IR pairs with instruction-aligned descriptions, bounding boxes, and pixel-level annotations. Extensive experiments show that FireMM-IR achieves superior performance on pixel-level segmentation and strong results on instruction-driven captioning and reasoning, while maintaining competitive performance on image-level benchmarks. These results indicate that infrared–optical fusion and instruction-aligned learning are key to physically grounded understanding of wildfire scenes. Full article
(This article belongs to the Special Issue Remote Sensing and UAV Technologies for Environmental Monitoring)
Show Figures

Figure 1

29 pages, 11148 KB  
Article
Fine-Grained Classification of Lakeshore Wetland–Cropland Mosaics via Multimodal RS Data Fusion and Weakly Supervised Learning: A Case Study of Bosten Lake, China
by Jinyi Zhang, Alim Samat, Erzhu Li, Enzhao Zhu and Wenbo Li
Land 2026, 15(1), 92; https://doi.org/10.3390/land15010092 - 1 Jan 2026
Viewed by 308
Abstract
High-precision monitoring of arid wetlands is vital for ecological conservation, yet traditional methods incur prohibitive labeling costs due to complex features. In this study, the wetland of Bosten Lake in Xinjiang is selected as a case area, where Pleiades and PlanetScope-3 multimodal remote [...] Read more.
High-precision monitoring of arid wetlands is vital for ecological conservation, yet traditional methods incur prohibitive labeling costs due to complex features. In this study, the wetland of Bosten Lake in Xinjiang is selected as a case area, where Pleiades and PlanetScope-3 multimodal remote sensing data are fused using the Gram–Schmidt method to generate imagery with high spatial and spectral resolution. Based on this dataset, we systematically compare the performance of fully supervised models (FCN, U-Net, DeepLabV3+, and SegFormer) with a weakly supervised learning model, One Model Is Enough (OME), for classifying 19 wetland–cropland mosaic types. Results demonstrate that: (1) SegFormer achieved the best overall performance (98.75% accuracy, 95.33% mIoU), leveraging its attention mechanism to enhance semantic understanding of complex scenes. (2) The weakly supervised OME, using only image-level labels, matched fully supervised performance (98.76% accuracy, 92.82% F1-score) while drastically reducing labeling effort. (3) Multimodal fusion boosted all models’ accuracy, most notably increasing U-Net’s mIoU by 63.39%. (4) Models exhibited complementary strengths: U-Net excelled in wetland vegetation segmentation, DeepLabV3+ in crop classification, and OME in preserving spatial details. This study validates a pathway integrating multimodal fusion with WSL to balance high accuracy and low labeling costs for arid wetland mapping. Full article
(This article belongs to the Special Issue Challenges and Future Trends in Land Cover/Use Monitoring)
Show Figures

Figure 1

17 pages, 11372 KB  
Article
Integrating CNN-Mamba and Frequency-Domain Information for Urban Scene Classification from High-Resolution Remote Sensing Images
by Shirong Zou, Gang Yang, Yixuan Wang, Kunyu Wang and Shouhang Du
Appl. Sci. 2026, 16(1), 251; https://doi.org/10.3390/app16010251 - 26 Dec 2025
Viewed by 263
Abstract
Urban scene classification in high-resolution remote sensing images is critical for applications such as power facility site selection and grid security monitoring. However, the complexity and variability of ground objects present significant challenges to accurate classification. While convolutional neural networks (CNNs) excel at [...] Read more.
Urban scene classification in high-resolution remote sensing images is critical for applications such as power facility site selection and grid security monitoring. However, the complexity and variability of ground objects present significant challenges to accurate classification. While convolutional neural networks (CNNs) excel at extracting local features, they often struggle to model long-range dependencies. Transformers can capture global context but incur high computational costs. To address these limitations, this paper proposes a Global–Local Information Fusion Network (GLIFNet), which integrates VMamba for efficient global modeling with CNN for local detail extraction, enabling more effective fusion of fine-grained and semantic information. Furthermore, a Haar Wavelet Transform Attention Mechanism (HWTAM) is designed to explicitly exploit frequency-domain characteristics, facilitating refined fusion of multi-scale features. The experiment compared nine commonly used or most advanced methods. The results show that GLIFNet achieves mean F1 scores (mF1) of 90.08% and 87.44% on the ISPRS Potsdam and ISPRS Vaihingen datasets, respectively. This represents improvements of 1.26% and 1.91%, respectively, compared to the compared model. The overall accuracy (OA) reaches 90.43% and 92.87%, with respective gains of 2.28% and 1.58%. Experimental results on the LandCover.ai dataset demonstrate that GLIFNet achieved an mF1 score of 88.39% and an accuracy of 92.23%, exhibiting relative improvements of 0.3% and 0.28% compared with the control model. In summary, GLIFNet demonstrates advanced performance in urban scene classification from high-resolution remote sensing images and can provide accurate basic data for power construction. Full article
(This article belongs to the Special Issue Advances in Big Data Analysis in Smart Cities)
Show Figures

Figure 1

27 pages, 5157 KB  
Article
Remote Sensing Scene Classification via Multi-Feature Fusion Based on Discriminative Multiple Canonical Correlation Analysis
by Shavkat Fazilov, Ozod Yusupov, Yigitali Khandamov, Erali Eshonqulov, Jalil Khamidov and Khabiba Abdieva
AI 2026, 7(1), 5; https://doi.org/10.3390/ai7010005 - 23 Dec 2025
Viewed by 547
Abstract
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from [...] Read more.
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from different architectures in a semantically consistent manner remains an important scientific problem. In this study, a DMCCA + SVM model is proposed, in which Discriminative Multiple Canonical Correlation Analysis (DMCCA) is applied to fuse multi-source deep features, and final classification is performed using a Support Vector Machine (SVM). Unlike conventional fusion methods, DMCCA projects heterogeneous features into a unified low-dimensional latent space by maximizing within-class correlation and minimizing between-class correlation, resulting in a more separable and compact feature space. The proposed approach was evaluated on three widely used benchmark datasets—NWPU-RESISC45, AID, and PatternNet—and achieved accuracy scores of 92.75%, 93.92%, and 99.35%, respectively. The results showed that the model outperforms modern individual CNN architectures. Additionally, the model’s stability and generalization capability were confirmed through K-fold cross-validation. Overall, the proposed DMCCA + SVM model was experimentally validated as an effective and reliable solution for high-accuracy classification of remote sensing scenes. Full article
Show Figures

Figure 1

31 pages, 7858 KB  
Article
Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery
by Bing He, Wei He, Qing Chang, Wen Luo and Lingli Xiao
ISPRS Int. J. Geo-Inf. 2026, 15(1), 8; https://doi.org/10.3390/ijgi15010008 - 21 Dec 2025
Cited by 1 | Viewed by 358
Abstract
Traditional road traffic accident analysis has long relied on structured data, making it difficult to integrate high-dimensional heterogeneous information such as remote sensing imagery and leading to an incomplete understanding of accident scene environments. This study proposes a road traffic accident analysis framework [...] Read more.
Traditional road traffic accident analysis has long relied on structured data, making it difficult to integrate high-dimensional heterogeneous information such as remote sensing imagery and leading to an incomplete understanding of accident scene environments. This study proposes a road traffic accident analysis framework based on Multimodal Large Language Models. The approach integrates high-resolution remote sensing imagery with structured accident data through a three-stage progressive training pipeline. Specifically, we fine-tune three open-source vision–language models using Low-Rank Adaptation (LoRA) to sequentially optimize the model’s capabilities in visual environmental description, multi-task accident classification, and Chain-of-Thought (CoT) driven causal reasoning. A multimodal dataset was constructed containing remote sensing image descriptions, accident classification labels, and interpretable reasoning chains. Experimental results show that the fine-tuned model achieved a maximum improvement in the CIDEr score for image description tasks. In the joint classification task of accident severity and duration, the model achieved an accuracy of 71.61% and an F1-score of 0.8473. In the CoT reasoning task, both METEOR and CIDEr scores improved significantly. These results validate the effectiveness of structured reasoning mechanisms in multimodal fusion for transportation applications, providing a feasible path toward interpretable and intelligent analysis for real-world traffic management. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

41 pages, 7642 KB  
Review
Fine-Grained Interpretation of Remote Sensing Image: A Review
by Dongbo Wang, Zedong Yan and Peng Liu
Remote Sens. 2025, 17(23), 3887; https://doi.org/10.3390/rs17233887 - 30 Nov 2025
Viewed by 1347
Abstract
This article conducts a systematic review on the fine-grained interpretation of remote sensing images, delving deeply into its background, current situation, datasets, methodology, and future trends, aiming to provide a comprehensive reference framework for research in this field. In terms of fine-grained interpretation [...] Read more.
This article conducts a systematic review on the fine-grained interpretation of remote sensing images, delving deeply into its background, current situation, datasets, methodology, and future trends, aiming to provide a comprehensive reference framework for research in this field. In terms of fine-grained interpretation datasets, we focus on introducing representative datasets and analyze their key characteristics such as the number of categories, sample size, and resolution, as well as their benchmarking role in research. For methodologies, by classifying the core methods according to the interpretation level system, this paper systematically summarizes the methods, models, and architectures for implementing fine-grained remote sensing image interpretation based on deep learning at different levels such as pixel-level classification and segmentation, object-level detection, and scene-level recognition. Finally, the review concluded that although deep learning has driven substantial advances in accuracy and applicability, fine-grained interpretation remains an inherently challenging problem due to issues such as the distinction of highly similar categories, cross-sensor domain migration, and high annotation costs. We also look forward to future directions, emphasizing the need to enhance the generalization, support open-world recognition further, and adapt to actual complex scenarios, etc. This review aims to promote the application of fine-grained interpretation technology for remote sensing images across a broader range of fields. Full article
Show Figures

Figure 1

25 pages, 23748 KB  
Article
HyperHazeOff: Hyperspectral Remote Sensing Image Dehazing Benchmark
by Artem Nikonorov, Dmitry Sidorchuk, Nikita Odinets, Vladislav Volkov, Anastasia Sarycheva, Ekaterina Dudenko, Mikhail Zhidkov and Dmitry Nikolaev
J. Imaging 2025, 11(12), 422; https://doi.org/10.3390/jimaging11120422 - 26 Nov 2025
Viewed by 693
Abstract
Hyperspectral remote sensing images (HSIs) provide invaluable information for environmental and agricultural monitoring, yet they are often degraded by atmospheric haze, which distorts spatial and spectral content and hinders downstream analysis. Progress in hyperspectral dehazing has been limited by the absence of paired [...] Read more.
Hyperspectral remote sensing images (HSIs) provide invaluable information for environmental and agricultural monitoring, yet they are often degraded by atmospheric haze, which distorts spatial and spectral content and hinders downstream analysis. Progress in hyperspectral dehazing has been limited by the absence of paired real-haze benchmarks; most prior studies rely on synthetic haze or unpaired data, restricting fair evaluation and generalization. We present HyperHazeOff, the first comprehensive benchmark for hyperspectral dehazing that unifies data, tasks, and evaluation protocols. It comprises (i) RRealHyperPDID, 110 scenes with paired real-haze and haze-free HSIs (plus RGB images), and (ii) RSyntHyperPDID, 2616 paired samples generated using a physically grounded haze formation model. The benchmark also provides agricultural field delineation and land classification annotations for downstream task quality assessment, standardized train/validation/test splits, preprocessing pipelines, baseline implementations, pretrained weights, and evaluation tools. Across six state-of-the-art methods (three RGB-based and three HSI-specific), we find that hyperspectral models trained on the widely used HyperDehazing dataset fail to generalize to real haze, while training on RSyntHyperPDID enables significant real-haze restoration by AACNet. HyperHazeOff establishes reproducible baselines and is openly available to advance research in hyperspectral dehazing. Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
Show Figures

Figure 1

27 pages, 1949 KB  
Article
Hierarchical Prompt Engineering for Remote Sensing Scene Understanding with Large Vision–Language Models
by Tianyang Chen and Jianliang Ai
Remote Sens. 2025, 17(22), 3727; https://doi.org/10.3390/rs17223727 - 16 Nov 2025
Viewed by 1074
Abstract
Vision–language models (VLMs) show strong potential for remote-sensing scene classification but still struggle with fine-grained categories and distribution shifts. We introduce a hierarchical prompting framework that decomposes recognition into a coarse-to-fine decision process with structured outputs, combined with parameter-efficient adaptation using LoRA/QLoRA. To [...] Read more.
Vision–language models (VLMs) show strong potential for remote-sensing scene classification but still struggle with fine-grained categories and distribution shifts. We introduce a hierarchical prompting framework that decomposes recognition into a coarse-to-fine decision process with structured outputs, combined with parameter-efficient adaptation using LoRA/QLoRA. To evaluate robustness without depending on external benchmarks, we construct five protocol variants of the AID (V0–V4) that systematically vary label granularity, class consolidation, and augmentation settings. Each variant is designed to align with a specific prompting style and hierarchy. The data pipeline follows a strict split-before-augment strategy, in which augmentation is applied only to the training split to avoid train-test leakage. We further audit leakage using rotation/flip–invariant perceptual hashing across splits to ensure reproducibility. Experiments on all five AID variants show that hierarchical prompting consistently outperforms non-hierarchical prompts and matches or exceeds full fine-tuning, while requiring substantially less compute. Ablation studies on prompt design, adaptation strategy, and model capacity—together with confusion matrices and class-wise metrics—indicate improved recognition at both coarse and fine levels, as well as robustness to rotations and flips. The proposed framework provides a strong, reproducible baseline for remote-sensing scene classification under constrained compute and includes complete prompt templates and processing scripts to support replication. Full article
Show Figures

Figure 1

26 pages, 13736 KB  
Article
Off-Nadir Satellite Image Scene Classification: Benchmark Dataset, Angle-Aware Active Domain Adaptation, and Angular Impact Analysis
by Feifei Peng, Mengchu Guo, Haoqing Hu, Tongtong Yan and Liangcun Jiang
Remote Sens. 2025, 17(22), 3697; https://doi.org/10.3390/rs17223697 - 12 Nov 2025
Viewed by 824
Abstract
Accurate remote sensing scene classification is essential for applications such as environmental monitoring and disaster management. In real-world scenarios, particularly during emergency response and disaster relief operations, acquiring nadir-view satellite images is often infeasible due to cloud cover, satellite scheduling constraints, or dynamic [...] Read more.
Accurate remote sensing scene classification is essential for applications such as environmental monitoring and disaster management. In real-world scenarios, particularly during emergency response and disaster relief operations, acquiring nadir-view satellite images is often infeasible due to cloud cover, satellite scheduling constraints, or dynamic scene conditions. Instead, off-nadir images are frequently captured and can provide enhanced spatial understanding through angular perspectives. However, remote sensing scene classification has primarily relied on nadir-view satellite or airborne imagery, leaving off-nadir perspectives largely unexplored. This study addresses this gap by introducing Off-nadir-Scene10, the first controlled and comprehensive benchmark dataset specifically designed for off-nadir satellite image scene classification. The Off-nadir-Scene10 dataset contains 5200 images across 10 common scene categories captured at 26 different off-nadir angles. All images were collected under controlled single-day conditions, ensuring that viewing geometry was the sole variable and effectively minimizing confounding factors such as illumination, atmospheric conditions, seasonal changes, and sensor characteristics. To effectively leverage abundant nadir imagery for advancing off-nadir scene classification, we propose an angle-aware active domain adaptation method that incorporates geometric considerations into sample selection and model adaptation processes. The method strategically selects informative off-nadir samples while transferring discriminative knowledge from nadir to off-nadir domains. The experimental results show that the method achieves consistent accuracy improvements across three different training ratios: 20%, 50%, and 80%. The comprehensive angular impact analysis reveals that models trained on larger off-nadir angles generalize better to smaller angles than vice versa, indicating that exposure to stronger geometric distortions promotes the learning of view-invariant features. This asymmetric transferability primarily stems from geometric perspective effects, as temporal, atmospheric, and sensor-related variations were rigorously minimized through controlled single-day image acquisition. Category-specific analysis demonstrates that angle-sensitive classes, such as sparse residential areas, benefit significantly from off-nadir viewing observations. This study provides a controlled foundation and practical guidance for developing robust, geometry-aware off-nadir scene classification systems. Full article
Show Figures

Figure 1

23 pages, 3485 KB  
Article
MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms
by Xuanxuan Huang, Xuejie Zhang, Longbao Wang, Dandan Yuan, Shufang Xu, Fengguang Zhou and Zhijun Zhou
Remote Sens. 2025, 17(21), 3572; https://doi.org/10.3390/rs17213572 - 28 Oct 2025
Viewed by 1415
Abstract
Semantic segmentation of high-resolution remote sensing images is of great application value in fields like natural disaster monitoring. Current multimodal semantic segmentation methods have improved the model’s ability to recognize different ground objects and complex scenes by integrating multi-source remote sensing data. However, [...] Read more.
Semantic segmentation of high-resolution remote sensing images is of great application value in fields like natural disaster monitoring. Current multimodal semantic segmentation methods have improved the model’s ability to recognize different ground objects and complex scenes by integrating multi-source remote sensing data. However, these methods still face challenges such as blurred boundary segmentation and insufficient perception of multi-scale ground objects when achieving high-precision classification. To address these issues, this paper proposes MMA-Net, a semantic segmentation network enhanced by two key modules: cross-layer multimodal fusion module and multi-scale multi-attention module. These modules effectively improve the model’s ability to capture detailed features and model multi-scale ground objects, thereby enhancing boundary segmentation accuracy, detail feature preservation, and consistency in multi-scale object segmentation. Specifically, the cross-layer multimodal fusion module adopts a staged fusion strategy to integrate detailed information and multimodal features, realizing detail preservation and modal synergy enhancement. The multi-scale multi-attention module combines cross-attention and self-attention to leverage long-range dependencies and inter-modal complementary relationships, strengthening the model’s feature representation for multi-scale ground objects. Experimental results show that MMA-Net outperforms state-of-the-art methods on the Potsdam and Vaihingen datasets. Its mIoU reaches 88.74% and 84.92% on the two datasets, respectively. Ablation experiments further verify that each proposed module contributes to the final performance. Full article
Show Figures

Graphical abstract

26 pages, 6191 KB  
Article
HLAE-Net: A Hierarchical Lightweight Attention-Enhanced Strategy for Remote Sensing Scene Image Classification
by Mingyuan Yang, Cuiping Shi, Kangning Tan, Haocheng Wu, Shenghan Wang and Liguo Wang
Remote Sens. 2025, 17(19), 3279; https://doi.org/10.3390/rs17193279 - 24 Sep 2025
Viewed by 817
Abstract
Remote sensing scene image classification has extensive application scenarios in fields such as land use monitoring and environmental assessment. However, traditional methodologies based on convolutional neural networks (CNNs) face considerable challenges caused by uneven image quality, imbalanced sample distribution, intra-class similarities and limited [...] Read more.
Remote sensing scene image classification has extensive application scenarios in fields such as land use monitoring and environmental assessment. However, traditional methodologies based on convolutional neural networks (CNNs) face considerable challenges caused by uneven image quality, imbalanced sample distribution, intra-class similarities and limited computing resources. To address such issues, this study proposes a hierarchical lightweight attention-enhanced network (HLAE-Net), which employs a hierarchical feature collaborative extraction (HFCE) strategy. By considering the differences in resolution and receptive field as well as the varying effectiveness of attention mechanisms across different network layers, the network uses different attention modules to progressively extract features from the images. This approach forms a complementary and enhanced feature chain among different layers, forming an efficient collaboration between various attention modules. In addition, an improved lightweight attention module group is proposed, including a lightweight dual coordinate spatial attention module (DCSAM), which captures spatial and channel information, as well as the lightweight multiscale spatial and channel attention module. These improved modules are incorporated into the featured average sampling (FAS) bottleneck and basic bottlenecks. The experiments were studied on four public standard datasets, and the results show that the proposed model outperforms several mainstream models from recent years in overall accuracy (OA). Particularly in terms of small training ratios, the proposed model shows competitive performance. Maintaining the parameter scale, it possesses both good classification ability and computational efficiency, providing a strong solution for the task of image classification. Full article
Show Figures

Graphical abstract

26 pages, 10719 KB  
Article
MPGH-FS: A Hybrid Feature Selection Framework for Robust Multi-Temporal OBIA Classification
by Xiangchao Xu, Huijiao Qiao, Zhenfan Xu and Shuya Hu
Sensors 2025, 25(18), 5933; https://doi.org/10.3390/s25185933 - 22 Sep 2025
Viewed by 918
Abstract
Object-Based Image Analysis (OBIA) generates high-dimensional features that frequently induce the curse of dimensionality, impairing classification efficiency and generalizability in high-resolution remote sensing images. To address these challenges while simultaneously overcoming the limitations of single-criterion feature selection and enhancing temporal adaptability, we propose [...] Read more.
Object-Based Image Analysis (OBIA) generates high-dimensional features that frequently induce the curse of dimensionality, impairing classification efficiency and generalizability in high-resolution remote sensing images. To address these challenges while simultaneously overcoming the limitations of single-criterion feature selection and enhancing temporal adaptability, we propose a novel feature selection framework named Mutual information Pre-filtering and Genetic-Hill climbing hybrid Feature Selection (MPGH-FS), which integrates Mutual Information Correlation Coefficient (MICC) pre-filtering, Genetic Algorithm (GA) global search, and Hill Climbing (HC) local optimization. Experiments based on multi-temporal GF-2 imagery from 2018 to 2023 demonstrated that MPGH-FS could reduce the feature dimension from 232 to 9, and it achieved the highest Overall Accuracy (OA) of 85.55% and a Kappa coefficient of 0.75 in full-scene classification, with training and inference times limited to 6 s and 1 min, respectively. Cross-temporal transfer experiments further validated the method’s robustness to inter-annual variation within the same area, with classification accuracy fluctuations remaining below 4% across different years, outperforming comparative methods. These results confirm that MPGH-FS offers significant advantages in feature compression, classification performance, and temporal adaptability, providing a robust technical foundation for efficient and accurate multi-temporal remote sensing classification. Full article
(This article belongs to the Special Issue Remote Sensing Image Processing, Analysis and Application)
Show Figures

Figure 1

47 pages, 13862 KB  
Review
Land Use/Land Cover Remote Sensing Classification in Complex Subtropical Karst Environments: Challenges, Methodological Review, and Research Frontiers
by Denghong Huang, Zhongfa Zhou, Zhenzhen Zhang, Qingqing Dai, Huanhuan Lu, Ya Li and Youyan Huang
Appl. Sci. 2025, 15(17), 9641; https://doi.org/10.3390/app15179641 - 2 Sep 2025
Cited by 2 | Viewed by 1733
Abstract
Land use/land cover (LULC) data serve as a critical information source for understanding the complex interactions between human activities and global environmental change. The subtropical karst region, characterized by fragmented terrain, spectral confusion, topographic shadowing, and frequent cloud cover, represents one of the [...] Read more.
Land use/land cover (LULC) data serve as a critical information source for understanding the complex interactions between human activities and global environmental change. The subtropical karst region, characterized by fragmented terrain, spectral confusion, topographic shadowing, and frequent cloud cover, represents one of the most challenging natural scenes for remote sensing classification. This study reviews the evolution of multi-source data acquisition (optical, SAR, LiDAR, UAV) and preprocessing strategies tailored for subtropical regions. It evaluates the applicability and limitations of various methodological frameworks, ranging from traditional approaches and GEOBIA to machine learning and deep learning. The importance of uncertainty modeling and robust accuracy assessment systems is emphasized. The study identifies four major bottlenecks: scarcity of high-quality samples, lack of scale awareness, poor model generalization, and insufficient integration of geoscientific knowledge. It suggests that future breakthroughs lie in developing remote sensing intelligent models that are driven by few samples, integrate multi-modal data, and possess strong geoscientific interpretability. The findings provide a theoretical reference for LULC information extraction and ecological monitoring in heterogeneous geomorphic regions. Full article
Show Figures

Figure 1

Back to TopTop