Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (372)

Search Parameters:
Keywords = spectral-spatial representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 11618 KB  
Article
Cascaded Multi-Attention Feature Recurrent Enhancement Network for Spectral Super-Resolution Reconstruction
by He Jin, Jinhui Lan, Zhixuan Zhuang and Yiliang Zeng
Remote Sens. 2026, 18(2), 202; https://doi.org/10.3390/rs18020202 - 8 Jan 2026
Abstract
Hyperspectral imaging (HSI) captures the same scene across multiple spectral bands, providing richer spectral characteristics of materials than conventional RGB images. The spectral reconstruction task seeks to map RGB images into hyperspectral images, enabling high-quality HSI data acquisition without additional hardware investment. Traditional [...] Read more.
Hyperspectral imaging (HSI) captures the same scene across multiple spectral bands, providing richer spectral characteristics of materials than conventional RGB images. The spectral reconstruction task seeks to map RGB images into hyperspectral images, enabling high-quality HSI data acquisition without additional hardware investment. Traditional methods based on linear models or sparse representations struggle to effectively model the nonlinear characteristics of hyperspectral data. Although deep learning approaches have made significant progress, issues such as detail loss and insufficient modeling of spatial–spectral relationships persist. To address these challenges, this paper proposes the Cascaded Multi-Attention Feature Recurrent Enhancement Network (CMFREN). This method achieves targeted breakthroughs over existing approaches through a cascaded architecture of feature purification, spectral balancing and progressive enhancement. This network comprises two core modules: (1) the Hierarchical Residual Attention (HRA) module, which suppresses artifacts in illumination transition regions through residual connections and multi-scale contextual feature fusion, and (2) the Cascaded Multi-Attention (CMA) module, which incorporates a Spatial–Spectral Balanced Feature Extraction (SSBFE) module and a Spectral Enhancement Module (SEM). The SSBFE combines Multi-Scale Residual Feature Enhancement (MSRFE) with Spectral-wise Multi-head Self-Attention (S-MSA) to achieve dynamic optimization of spatial–spectral features, while the SEM synergistically utilizes attention and convolution to progressively enhance spectral details and mitigate spectral aliasing in low-resolution scenes. Experiments across multiple public datasets demonstrate that CMFREN achieves state-of-the-art (SOTA) performance on metrics including RMSE, PSNR, SAM, and MRAE, validating its superiority under complex illumination conditions and detail-degraded scenarios. Full article
Show Figures

Figure 1

23 pages, 10516 KB  
Article
SSGTN: Spectral–Spatial Graph Transformer Network for Hyperspectral Image Classification
by Haotian Shi, Zihang Luo, Yiyang Ma, Guanquan Zhu and Xin Dai
Remote Sens. 2026, 18(2), 199; https://doi.org/10.3390/rs18020199 - 7 Jan 2026
Abstract
Hyperspectral image (HSI) classification is fundamental to a wide range of remote sensing applications, such as precision agriculture, environmental monitoring, and urban planning, because HSIs provide rich spectral signatures that enable the discrimination of subtle material differences. Deep learning approaches, including Convolutional Neural [...] Read more.
Hyperspectral image (HSI) classification is fundamental to a wide range of remote sensing applications, such as precision agriculture, environmental monitoring, and urban planning, because HSIs provide rich spectral signatures that enable the discrimination of subtle material differences. Deep learning approaches, including Convolutional Neural Networks (CNNs), Graph Convolutional Networks (GCNs), and Transformers, have achieved strong performance in learning spatial–spectral representations. However, these models often face difficulties in jointly modeling long-range dependencies, fine-grained local structures, and non-Euclidean spatial relationships, particularly when labeled training data are scarce. This paper proposes a Spectral–Spatial Graph Transformer Network (SSGTN), a dual-branch architecture that integrates superpixel-based graph modeling with Transformer-based global reasoning. SSGTN consists of four key components, namely (1) an LDA-SLIC superpixel graph construction module that preserves discriminative spectral–spatial structures while reducing computational complexity, (2) a lightweight spectral denoising module based on 1×1 convolutions and batch normalization to suppress redundant and noisy bands, (3) a Spectral–Spatial Shift Module (SSSM) that enables efficient multi-scale feature fusion through channel-wise and spatial-wise shift operations, and (4) a dual-branch GCN-Transformer block that jointly models local graph topology and global spectral–spatial dependencies. Extensive experiments on three public HSI datasets (Indian Pines, WHU-Hi-LongKou, and Houston2018) under limited supervision (1% training samples) demonstrate that SSGTN consistently outperforms state-of-the-art CNN-, Transformer-, Mamba-, and GCN-based methods in overall accuracy, Average Accuracy, and the κ coefficient. The proposed framework provides an effective baseline for HSI classification under limited supervision and highlights the benefits of integrating graph-based structural priors with global contextual modeling. Full article
Show Figures

Figure 1

18 pages, 4244 KB  
Article
Semantic-Guided Kernel Low-Rank Sparse Preserving Projections for Hyperspectral Image Dimensionality Reduction and Classification
by Junjun Li, Jinyan Hu, Lin Huang, Chao Hu and Meinan Zheng
Appl. Sci. 2026, 16(1), 561; https://doi.org/10.3390/app16010561 - 5 Jan 2026
Viewed by 125
Abstract
Hyperspectral images present significant challenges for conventional dimensionality reduction methods due to their high dimensionality, spectral redundancy, and complex spatial–spatial dependencies. While kernel-based sparse representation methods have shown promise in handling spectral non-linearities, they often fail to preserve spatial consistency and semantic discriminability [...] Read more.
Hyperspectral images present significant challenges for conventional dimensionality reduction methods due to their high dimensionality, spectral redundancy, and complex spatial–spatial dependencies. While kernel-based sparse representation methods have shown promise in handling spectral non-linearities, they often fail to preserve spatial consistency and semantic discriminability during feature transformation. To address these limitations, we propose a novel semantic-guided kernel low-rank sparse preserving projection (SKLSPP) framework. Unlike previous approaches that primarily focus on spectral information, our method introduces three key innovations: a semantic-aware kernel representation that maintains discriminability through label constraints, a spatially adaptive manifold regularization term that preserves local pixel affinities in the reduced subspace, and an efficient optimization framework that jointly learns sparse codes and projection matrices. Extensive experiments on benchmark datasets demonstrate that SKLSPP achieves superior performance compared to state-of-the-art methods, showing enhanced feature discrimination, reduced redundancy, and improved robustness to noise while maintaining spatial coherence in the dimensionality-reduced features. Full article
Show Figures

Figure 1

21 pages, 4547 KB  
Article
Attention-Gated U-Net for Robust Cross-Domain Plastic Waste Segmentation Using a UAV-Based Hyperspectral SWIR Sensor
by Soufyane Bouchelaghem, Marco Balsi and Monica Moroni
Remote Sens. 2026, 18(1), 182; https://doi.org/10.3390/rs18010182 - 5 Jan 2026
Viewed by 178
Abstract
The proliferation of plastic waste across natural ecosystems has created a global environmental and public health crisis. Monitoring plastic litter using remote sensing remains challenging due to the significant variability in terrain, lighting, and weather conditions. Although earlier approaches, including classical supervised machine [...] Read more.
The proliferation of plastic waste across natural ecosystems has created a global environmental and public health crisis. Monitoring plastic litter using remote sensing remains challenging due to the significant variability in terrain, lighting, and weather conditions. Although earlier approaches, including classical supervised machine learning techniques such as Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM), applied to hyperspectral and multispectral data have shown promise in controlled settings, they often may face challenges in generalizing across diverse environmental conditions encountered in real-world scenarios. In this work, we present a deep learning framework for pixel-wise segmentation of plastic waste in short-wave infrared (900–1700 nm) hyperspectral imagery acquired from an Unmanned Aerial Vehicle (UAV). Our architecture integrates attention gates and residual connections within a U-Net backbone to enhance contextual modeling and spatial-spectral consistency. We introduce a multi-flight dataset spanning over 9 UAV missions across varied environmental settings, consisting of hyperspectral cubes with centimeter-level resolution. Using a leave-one-out cross-validation protocol, our model achieves test accuracy of up to 96.8% (average 90.5%) and a 91.1% F1 score, demonstrating robust generalization to unseen data collected in different environments. Compared to classical models, the deep network captures richer semantic representations, particularly under challenging conditions. This work offers a scalable and deployable tool for automated plastic waste monitoring and represents a significant advancement in remote environmental sensing. Full article
Show Figures

Figure 1

28 pages, 13623 KB  
Article
PAFNet: A Parallel Attention Fusion Network for Water Body Extraction of Remote Sensing Images
by Shaochuan Chen, Chenlong Ding, Mutian Li, Xin Lyu, Xin Li, Zhennan Xu, Yiwei Fang and Heng Li
Remote Sens. 2026, 18(1), 153; https://doi.org/10.3390/rs18010153 - 3 Jan 2026
Viewed by 111
Abstract
Water body extraction plays a crucial role in remote sensing, supporting applications such as environmental monitoring and disaster prevention. Although Deep Convolutional Neural Networks (DCNNs) have achieved remarkable progress, their hierarchical architectures often introduce channel redundancy and hinder the joint representation of fine [...] Read more.
Water body extraction plays a crucial role in remote sensing, supporting applications such as environmental monitoring and disaster prevention. Although Deep Convolutional Neural Networks (DCNNs) have achieved remarkable progress, their hierarchical architectures often introduce channel redundancy and hinder the joint representation of fine spatial structures and high-level semantics, leading to ineffective feature fusion and poor discrimination of water features. To address these limitations, a Parallel Attention Fusion Network (PAFNet) is proposed to achieve more effective multi-scale feature aggregation through parallel attention and adaptive fusion. First, the Feature Refinement Module (FRM) employs multi-branch asymmetric convolutions to extract multi-scale features, which are subsequently fused to suppress channel redundancy and preserve fine spatial details. Then, the Parallel Attention Module (PAM) applies spatial and channel attention in parallel, improving the discriminative representation of water features while mitigating interference from spectrally similar land covers. Finally, a Semantic Feature Fusion Module (SFM) integrates adjacent multi-level features through adaptive channel weighting, thereby achieving precise boundary recovery and robust noise suppression. Extensive experiments conducted on four representative datasets (GID, LandCover.ai, QTPL, and LoveDA) demonstrate the superiority of PAFNet over existing state-of-the-art methods. Specifically, the proposed model achieves 94.29% OA and 95.95% F1-Score on GID, 86.17% OA and 88.70% F1-Score on LandCover.ai, 98.99% OA and 98.96% F1-Score on QTPL, and 89.01% OA and 85.59% F1-Score on LoveDA. Full article
Show Figures

Figure 1

23 pages, 8875 KB  
Article
SAR and Visible Image Fusion via Retinex-Guided SAR Reconstruction
by Yuman Yuan, Tianyu Deng, Yi Le, Hongyang Bai, Shuai Guo, Shangjing Sun and Yuanbo Chen
Remote Sens. 2026, 18(1), 111; https://doi.org/10.3390/rs18010111 - 28 Dec 2025
Viewed by 188
Abstract
The fusion of synthetic aperture radar (SAR) and visible images offers complementary spatial and spectral information, enabling more reliable and comprehensive scene interpretation. However, SAR speckle noise and the intrinsic modality gap pose significant challenges for existing methods in extracting consistent and complementary [...] Read more.
The fusion of synthetic aperture radar (SAR) and visible images offers complementary spatial and spectral information, enabling more reliable and comprehensive scene interpretation. However, SAR speckle noise and the intrinsic modality gap pose significant challenges for existing methods in extracting consistent and complementary features. To address these issues, we propose VGSRF-Net, a Retinex-guided SAR reconstruction-driven fusion network that leverages visible-image priors to refine SAR features. This approach effectively reduces modality discrepancies before fusion, enabling improved multi-modal representation. The cross-modality reconstruction module (CMRM) reconstructs SAR features guided by visible priors, effectively reducing modality discrepancies before fusion and enabling improved multi-modal representation. The multi-modal feature joint representation module (MFJRM) enhances cross-modal complementarity by integrating global contextual interactions and local dynamic convolution, thereby achieving further feature alignment. Finally, the feature enhancement module (FEM) refines multi-scale spatial features and selectively enhances high-frequency details in the frequency domain, improving structural clarity and texture fidelity. Extensive experiments on diverse real-world remote sensing datasets demonstrate that VGSRF-Net surpasses state-of-the-art methods in denoising, structural preservation, and generalization under varying noise and illumination conditions. Full article
Show Figures

Figure 1

29 pages, 4508 KB  
Article
Multi-Perspective Information Fusion Network for Remote Sensing Segmentation
by Jianchao Liu, Shuli Cheng and Anyu Du
Remote Sens. 2026, 18(1), 100; https://doi.org/10.3390/rs18010100 - 27 Dec 2025
Viewed by 282
Abstract
Remote sensing acquires Earth surface information without physical contact through sensors operating at diverse spatial, spectral, and temporal resolutions. In high-resolution remote sensing imagery, objects often exhibit large scale variation, complex spatial distributions, and strong inter-class similarity, posing persistent challenges for accurate semantic [...] Read more.
Remote sensing acquires Earth surface information without physical contact through sensors operating at diverse spatial, spectral, and temporal resolutions. In high-resolution remote sensing imagery, objects often exhibit large scale variation, complex spatial distributions, and strong inter-class similarity, posing persistent challenges for accurate semantic segmentation. Existing methods still struggle to simultaneously preserve fine boundary details and model long-range spatial dependencies, and lack explicit mechanisms to decouple low-frequency semantic context from high-frequency structural information. To address these limitations, we propose the Multi-Perspective Information Fusion Network (MPIFNet) for remote sensing semantic segmentation, motivated by the need to integrate global context, local structures, and multi-frequency information into a unified framework. MPIFNet employs a Global and Local Mamba Block Self-Attention (GLMBSA) module to capture long-range dependencies while preserving local details, and a Double-Branch Haar Wavelet Transform (DBHWT) module to separate and enhance low- and high-frequency features. By fusing spatial, hierarchical, and frequency representations, MPIFNet learns more discriminative and robust features. Evaluations on the Vaihingen, Potsdam, and LoveDA datasets through ablation and comparative studies highlight the strong generalization of our model, yielding mIoU results of 86.03%, 88.36%, and 55.76%. Full article
Show Figures

Graphical abstract

25 pages, 8187 KB  
Article
Cascaded Local–Nonlocal Pansharpening with Adaptive Channel-Kernel Convolution and Multi-Scale Large-Kernel Attention
by Junru Yin, Zhiheng Huang, Qiqiang Chen, Wei Huang, Le Sun, Qinggang Wu and Ruixia Hou
Remote Sens. 2026, 18(1), 97; https://doi.org/10.3390/rs18010097 - 27 Dec 2025
Viewed by 358
Abstract
Pansharpening plays a crucial role in remote sensing applications, as it enables the generation of high-spatial-resolution multispectral images that simultaneously preserve spatial and spectral information. However, most current methods struggle to preserve local textures and exploit spectral correlations across bands while modeling nonlocal [...] Read more.
Pansharpening plays a crucial role in remote sensing applications, as it enables the generation of high-spatial-resolution multispectral images that simultaneously preserve spatial and spectral information. However, most current methods struggle to preserve local textures and exploit spectral correlations across bands while modeling nonlocal information in source images. To address these issues, we propose a cascaded local–nonlocal pansharpening network (CLNNet) that progressively integrates local and nonlocal features through stacked Progressive Local–Nonlocal Fusion (PLNF) modules. This cascaded design allows CLNNet to gradually refine spatial–spectral information. Each PLNF module combines Adaptive Channel-Kernel Convolution (ACKC), which extracts local spatial features using channel-specific convolution kernels, and a Multi-Scale Large-Kernel Attention (MSLKA) module, which leverages multi-scale large-kernel convolutions with varying receptive fields to capture nonlocal information. The attention mechanism in MSLKA enhances spatial–spectral feature representation by integrating information across multiple dimensions. Extensive experiments on the GaoFen-2, QuickBird, and WorldView-3 datasets demonstrate that the proposed method outperforms state-of-the-art methods in quantitative metrics and visual quality. Full article
Show Figures

Figure 1

25 pages, 3845 KB  
Article
Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress
by Yu-Jin Jeon, Hyoung Seok Kim, Taek Sung Lee, Soo Hyun Park, Heesup Yun and Dae-Hyun Jung
Agronomy 2026, 16(1), 55; https://doi.org/10.3390/agronomy16010055 - 24 Dec 2025
Viewed by 267
Abstract
Water availability critically affects basil (Ocimum basilicum L.) growth and physiological performance, making the early and precise monitoring of water-deficit responses essential for precision irrigation. However, conventional visual or biochemical methods are destructive and unsuitable for real-time assessment. This study presents a [...] Read more.
Water availability critically affects basil (Ocimum basilicum L.) growth and physiological performance, making the early and precise monitoring of water-deficit responses essential for precision irrigation. However, conventional visual or biochemical methods are destructive and unsuitable for real-time assessment. This study presents a multimodal optical biosensing and 3D convolutional neural network (3D-CNN) fusion framework for phenotyping physiological responses of basil under water-deficit stress. RGB, depth, and chlorophyll fluorescence (CF) imaging were integrated to capture complementary morphological and photosynthetic information. Through the fusion of 130 optical parameter layers, the 3D-CNN model learned spatial and temporal–spectral features associated with resistance and recovery dynamics, achieving 96.9% classification accuracy—outperforming both 2D-CNN and traditional machine-learning classifiers. Feature-space visualization using t-SNE confirmed that the learned latent representations reflected biologically meaningful stress–recovery trajectories rather than superficial visual differences. This multimodal fusion framework provides a scalable and interpretable approach for the real-time, non-destructive monitoring of crop water stress, establishing a foundation for adaptive irrigation control and intelligent environmental management in precision agriculture. Full article
(This article belongs to the Special Issue Smart Farming: Advancing Techniques for High-Value Crops)
Show Figures

Figure 1

25 pages, 3370 KB  
Article
A SimAM-Enhanced Multi-Resolution CNN with BiGRU for EEG Emotion Recognition: 4D-MRSimNet
by Yutao Huang and Jijie Deng
Electronics 2026, 15(1), 39; https://doi.org/10.3390/electronics15010039 - 22 Dec 2025
Viewed by 168
Abstract
This study proposes 4D-MRSimNet, a framework that employs attention mechanisms to focus on distinct dimensions. The approach applies enhancements to key responses in the spatial and spectral domains and provides a characterization of dynamic evolution in temporal domain, which extracts and integrates complementary [...] Read more.
This study proposes 4D-MRSimNet, a framework that employs attention mechanisms to focus on distinct dimensions. The approach applies enhancements to key responses in the spatial and spectral domains and provides a characterization of dynamic evolution in temporal domain, which extracts and integrates complementary emotional features to facilitate final classification. At the feature level, differential entropy (DE) and power spectral density (PSD) are combined within four core frequency bands (θ, α, β, and γ). These bands are recognized as closely related to emotional processing. This integration constructs a complementary feature representation that preserves both energy distribution and entropy variability. These features are organized into a 4D representation that integrates electrode topology, frequency characteristics, and temporal dependencies inherent in EEG signals. At the network level, a multi-resolution convolutional module embedded with SimAM attention extracts spatial and spectral features at different scales and adaptively emphasizes key information. A bidirectional GRU (BiGRU) integrated with temporal attention further emphasizes critical time segments and strengthens the modeling of temporal dependencies. Experiments show that our method achieves an accuracy of 97.68% for valence and 97.61% for arousal on the DEAP dataset and 99.60% for valence and 99.46% for arousal on the DREAMER dataset. The results demonstrate the effectiveness of complementary feature fusion, multidimensional feature representation, and the complementary dual attention enhancement strategy for EEG emotion recognition. Full article
Show Figures

Figure 1

27 pages, 25451 KB  
Article
Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition
by Katherine Lin Shu and Mu-Jiang-Shan Wang
Symmetry 2026, 18(1), 15; https://doi.org/10.3390/sym18010015 - 21 Dec 2025
Viewed by 263
Abstract
Facial expression recognition (FER) is a key task in affective computing and human–computer interaction, aiming to decode facial muscle movements into emotional categories. Although deep learning-based FER has achieved remarkable progress, robust recognition under uncontrolled conditions (e.g., illumination change, pose variation, occlusion, and [...] Read more.
Facial expression recognition (FER) is a key task in affective computing and human–computer interaction, aiming to decode facial muscle movements into emotional categories. Although deep learning-based FER has achieved remarkable progress, robust recognition under uncontrolled conditions (e.g., illumination change, pose variation, occlusion, and cultural diversity) remains challenging. Traditional Convolutional Neural Networks (CNNs) are effective at local feature extraction but limited in modeling global dependencies, while Vision Transformers (ViT) provide global context modeling yet often neglect fine-grained texture and frequency cues that are critical for subtle expression discrimination. Moreover, existing approaches usually focus on single-domain representations and lack adaptive strategies to integrate heterogeneous cues across spatial, semantic, and spectral domains, leading to limited cross-domain generalization. To address these limitations, this study proposes a unified Multi-Domain Feature Enhancement and Fusion (MDFEFT) framework that combines a ViT-based global encoder with three complementary branches—channel, spatial, and frequency—for comprehensive feature learning. Taking into account the approximately bilateral symmetry of human faces and the asymmetric distortions introduced by pose, occlusion, and illumination, the proposed MDFEFT framework is designed to learn symmetry-aware and asymmetry-robust representations for facial expression recognition across diverse domains. An adaptive Cross-Domain Feature Enhancement and Fusion (CDFEF) module is further introduced to align and integrate heterogeneous features, achieving domain-consistent and illumination-robust expression understanding. The experimental results show that the proposed method consistently outperforms existing CNN-, Transformer-, and ensemble-based models. The proposed model achieves accuracies of 0.997, 0.796, and 0.776 on KDEF, FER2013, and RAF-DB, respectively. Compared with the strongest baselines, it further improves accuracy by 0.3%, 2.2%, and 1.9%, while also providing higher F1-scores and better robustness in cross-domain testing. These results confirm the effectiveness and strong generalization ability of the proposed framework for real-world facial expression recognition. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

28 pages, 33315 KB  
Article
Hyperspectral Image Classification with Multi-Path 3D-CNN and Coordinated Hierarchical Attention
by Wenyi Hu, Wei Shi, Chunjie Lan, Yuxia Li and Lei He
Remote Sens. 2025, 17(24), 4035; https://doi.org/10.3390/rs17244035 - 15 Dec 2025
Viewed by 611
Abstract
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, [...] Read more.
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, ultimately hindering the representation of large-area land-cover structures. To overcome these drawbacks, we present a new framework designed to integrate multi-scale feature fusion and a hierarchical attention mechanism for hyperspectral image classification. Channel-wise Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) spatial attention are combined to enhance feature representation from both spectral bands and spatial locations, allowing the network to emphasize critical wavelengths and salient spatial structures. Finally, by integrating the self-attention inherent in the Transformer architecture with a Cross-Attention Fusion (CAF) mechanism, a local-global feature fusion module is developed. This module effectively captures extended-span interdependencies present in hyperspectral remote sensing images, and this process facilitates the effective integration of both localized and holistic attributes. On the Salinas Valley dataset, the proposed method delivers an Overall Accuracy (OA) of 0.9929 and an Average Accuracy (AA) of 0.9949, attaining perfect recognition accuracy for certain classes. The proposed model demonstrates commendable class balance and classification stability. Across multiple publicly available hyperspectral remote sensing image datasets, it systematically produces classification outcomes that significantly outperform those of established benchmark methods, exhibiting distinct advantages in feature representation, structural modeling, and the discrimination of complex ground objects. Full article
Show Figures

Figure 1

26 pages, 10331 KB  
Article
STM-Net: A Multiscale Spectral–Spatial Representation Hybrid CNN–Transformer Model for Hyperspectral Image Classification
by Yicheng Hu, Jia Ge and Shufang Tian
Remote Sens. 2025, 17(24), 4031; https://doi.org/10.3390/rs17244031 - 14 Dec 2025
Viewed by 586
Abstract
Hyperspectral images (HSIs) have been broadly applied in remote sensing, environmental monitoring, agriculture, and other fields due to their rich spectral information and complex spatial properties. However, the inherent redundancy, spectral aliasing, and spatial heterogeneity of high-dimensional data pose significant challenges to classification [...] Read more.
Hyperspectral images (HSIs) have been broadly applied in remote sensing, environmental monitoring, agriculture, and other fields due to their rich spectral information and complex spatial properties. However, the inherent redundancy, spectral aliasing, and spatial heterogeneity of high-dimensional data pose significant challenges to classification accuracy. Therefore, this study proposes STM-Net, a hybrid deep learning model that integrates SSRE (Spectral–Spatial Residual Extraction Module), Transformer, and MDRM (Multi-scale Differential Residual Module) architectures to comprehensively exploit spectral–spatial features and enhance classification performance. First, the SSRE module employs 3D convolutional layers combined with residual connections to extract multi-scale spectral–spatial features, thereby improving the representation of both local and deep-level characteristics. Second, the MDRM incorporates multi-scale differential convolution and the Convolutional Block Attention Module mechanism to refine local feature extraction and enhance inter-class discriminability at category boundaries. Finally, the Transformer branch equipped with a Dual-Branch Global-Local (DBGL) mechanism integrates local convolutional attention and global self-attention, enabling synergistic optimization of long-range dependency modeling and local feature enhancement. In this study, STM-Net is extensively evaluated on three benchmark HSI datasets: Indian Pines, Pavia University, and Salinas. Additionally, experimental results demonstrate that the proposed model consistently outperforms existing methods regarding OA, AA, and the Kappa coefficient, exhibiting superior generalization capability and stability. Furthermore, ablation studies validate that the SSRE, MDRM, and Transformer components each contribute significantly to improving classification performance. This study presents an effective spectral–spatial feature fusion framework for hyperspectral image classification, offering a novel technical solution for remote sensing data analysis. Full article
Show Figures

Graphical abstract

28 pages, 27771 KB  
Article
Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images
by Xianlu Li, Nicolas Nadisic, Shaoguang Huang, Nikos Deligiannis and Aleksandra Pižurica
Remote Sens. 2025, 17(24), 4030; https://doi.org/10.3390/rs17244030 - 14 Dec 2025
Viewed by 303
Abstract
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs). Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by [...] Read more.
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs). Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering. However, these methods are computationally intensive, generally incorporating only local or non-local structure constraints, and their structural constraints fall short of effectively supervising the entire clustering process. We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering. To preserve local structure—i.e., spatial continuity within subspaces—we introduce a spatial smoothness constraint that aligns clustering predictions with their spatially filtered versions. For non-local structure—i.e., spectral continuity—we employ a mini-cluster-based scheme that refines predictions at the group level, encouraging spectrally similar pixels to belong to the same subspace. These two constraints are jointly optimized to reinforce each other. Specifically, our model is designed as a one-stage approach, in which the structural constraints are applied to the entire clustering process. The time and space complexity of our method are O(n), making it applicable to large-scale HSI data. Experiments on real-world datasets show that our method outperforms state-of-the-art techniques. Full article
Show Figures

Figure 1

36 pages, 7233 KB  
Article
Deep Learning for Tumor Segmentation and Multiclass Classification in Breast Ultrasound Images Using Pretrained Models
by K. E. ArunKumar, Matthew E. Wilson, Nathan E. Blake, Tylor J. Yost and Matthew Walker
Sensors 2025, 25(24), 7557; https://doi.org/10.3390/s25247557 - 12 Dec 2025
Viewed by 612
Abstract
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence [...] Read more.
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence (AI) tools based on pretrained models to segment lesions and detect breast cancer. The proposed workflow includes both the development of segmentation models and development of a series of classification models to classify ultrasound images as normal, benign or malignant. The pretrained models were trained and evaluated on the Breast Ultrasound Images (BUSI) dataset, a publicly available collection of grayscale breast ultrasound images with corresponding expert-annotated masks. For segmentation, images and ground-truth masks were used to pretrained encoder (ResNet18, EfficientNet-B0 and MobileNetV2)–decoder (U-Net, U-Net++ and DeepLabV3) models, including the DeepLabV3 architecture integrated with a Frequency-Domain Feature Enhancement Module (FEM). The proposed FEM improves spatial and spectral feature representations using Discrete Fourier Transform (DFT), GroupNorm, dropout regularization and adaptive fusion. For classification, each image was assigned a label (normal, benign or malignant). Optuna, an open-source software framework, was used for hyperparameter optimization and for the testing of various pretrained models to determine the best encoder–decoder segmentation architecture. Five different pretrained models (ResNet18, DenseNet121, InceptionV3, MobielNetV3 and GoogleNet) were optimized for multiclass classification. DeepLabV3 outperformed other segmentation architectures, with consistent performance across training, validation and test images, with Dice Similarity Coefficient (DSC, a metric describing the overlap between predicted and true lesion regions) values of 0.87, 0.80 and 0.83 on training, validation and test sets, respectively. ResNet18:DeepLabV3 achieved an Intersection over Union (IoU) score of 0.78 during training, while ResNet18:U-Net++ achieved the best Dice coefficient (0.83) and IoU (0.71) and area under the curve (AUC, 0.91) scores on the test (unseen) dataset when compared to other models. However, the proposed Resnet18: FrequencyAwareDeepLabV3 (FADeepLabV3) achieved a DSC of 0.85 and an IoU of 0.72 on the test dataset, demonstrating improvements over standard DeepLabV3. Notably, the frequency-domain enhancement substantially improved the AUC from 0.90 to 0.98, indicating enhanced prediction confidence and clinical reliability. For classification, ResNet18 produced an F1 score—a measure combining precision and recall—of 0.95 and an accuracy of 0.90 on the training dataset, while InceptionV3 performed best on the test dataset, with an F1 score of 0.75 and accuracy of 0.83. We demonstrate a comprehensive approach to automate the segmentation and multiclass classification of breast cancer ultrasound images into benign, malignant or normal transfer learning models on an imbalanced ultrasound image dataset. Full article
Show Figures

Figure 1

Back to TopTop