Next Article in Journal
Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks
Previous Article in Journal
Shallow Water Bathymetry Inversion Method Based on Spatiotemporal Coupling Correlation Adaptive Spectroscopy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cross-Domain Hyperspectral Image Classification Combined Sharpness-Aware Minimization with Local-to-Global Feature Enhancement

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(5), 740; https://doi.org/10.3390/rs18050740
Submission received: 26 November 2025 / Revised: 7 January 2026 / Accepted: 22 January 2026 / Published: 28 February 2026
(This article belongs to the Section AI Remote Sensing)

Highlights

What are the main findings?
  • This study proposes a novel paradigm for classifying hyperspectral satellite imagery using UAV hyperspectral data, enabling effective utilization of large amounts of unlabeled satellite data. By integrating cross-domain learning with the high spatial resolution and abundant labeled information of UAV hyperspectral data, the proposed method significantly enhances the fine-grained classification performance of satellite hyperspectral images in broad-area scenes. This approach offers a new research direction for the intelligent interpretation of hyperspectral remote sensing data acquired from heterogeneous sensor platforms.
  • The proposed method achieves state-of-the-art classification performance, significantly outperforming advanced cross-domain classification approaches and the SOTA method DSFormer on four standard benchmark datasets.
What are the implications of the main findings?
  • A local–global feature extraction model is developed. Initially, the model captures local edge information from cross-domain data, followed by global feature alignment through an improved self-attention mechanism. This strategy enhances boundary detail representation through local feature extraction and optimizes cross-domain feature consistency via global feature alignment, thereby improving the model’s adaptability and robustness in cross-domain hyperspectral classification tasks.
  • An improved sharpness perception minimization (ISAM) strategy is proposed to overcome local optima and reduced generalization resulting from spectral shift in hyperspectral cross-domain classification tasks. To reduce computational complexity and improve training efficiency, this work refines the gradient perturbation strategy by using a single forward propagation to compute approximate perturbations. Furthermore, by combining square root gradient approximation perturbation with a nonlinear gradient scaling mechanism, the gradient update amplitude exhibits gradual growth relative to the gradient size. This adaptive adjustment of feature update intensity suppresses the dominance of large gradients, enhances the influence of small gradients, and ensures more balanced cross-domain feature alignment.

Abstract

With the increasing availability of satellite imagery and the shortening revisit intervals, efficiently processing satellite hyperspectral images has become a critical task. However, in practice, a large portion of satellite hyperspectral data remains unlabeled, making it difficult to achieve satisfactory classification performance using satellite data alone. Meanwhile, UAV-based platforms offer acquisition flexibility, which facilitates the collection of rich and detailed information. To address these challenges, this paper proposes a method called Sharpness-Aware Minimization with Local-to-Global Feature Enhancement (SAMLFE), which uses UAV hyperspectral images for training to enhance the fine-grained classification performance of satellite hyperspectral images in large scenes. Specifically, a spectral dimension mapping model is first employed to unify UAV and satellite images into a common spectral dimension, thereby mitigating the impact of inconsistent feature representations. Next, a local-to-global feature extraction network is constructed to capture both local details and global semantics. Few-shot learning is applied to extract discriminative features from both the source and target domains within the shared feature space, thereby enhancing the model’s ability to utilize limited labeled data efficiently. Furthermore, a conditional adversarial domain adaptation strategy is adopted to align the feature distributions of the source and target domains, thereby alleviating spectral shift. Meanwhile, the integration of an improved Sharpness-Aware Minimization (ISAM) enhances the model’s robustness across domains. Finally, the K-Nearest Neighbor algorithm is employed to perform accurate classification. Experimental results on multiple datasets demonstrate that the proposed method achieves superior generalization and classification performance in cross-domain hyperspectral image classification. It also outperforms existing methods in terms of feature distribution alignment, robustness of feature extraction, and adaptability to small-sample scenarios.

1. Introduction

Hyperspectral remote sensing integrates imaging and spectral detection technologies, covering electromagnetic wave bands including visible light, near-infrared, mid-infrared, and thermal infrared. During the imaging of ground objects’ spatial characteristics, spectral measurements are conducted on each spatial unit, simultaneously capturing spatial and spectral information [1]. Due to the characteristic of “combination of image and spectrum” in hyperspectral images, they contain a much higher degree of ground information. By fully using this feature, landforms can be accurately classified [2]. Therefore, hyperspectral remote sensing has been widely applied in urban planning [3], environmental monitoring [4], precision agriculture [5], and modern medical diagnostics [6].
In recent years, deep learning methods have advanced rapidly, particularly in hyperspectral image classification, significantly enhancing classification accuracy through powerful feature extraction capabilities. Deep learning models can automatically learn spectral and spatial features. In particular, three-dimensional convolutional neural networks (3D CNNs) effectively capture spectral–spatial information, achieving outstanding performance in classification tasks. However, as CNN models deepen, the overfitting phenomenon inevitably occurs [7]. To address this, ResNet was introduced to mitigate gradient vanishing and overfitting issues in deep network training by employing residual blocks. The identity mapping within these blocks allows deep networks to be trained more effectively without increasing complexity [8]. Building upon this framework, the supervised spectral–spatial residual network (SSRN) was specifically optimized for hyperspectral image classification tasks [9]. By incorporating identity mapping, SSRN effectively reduces potential accuracy loss during feature extraction, significantly enhancing classification accuracy.
Despite the considerable success of deep learning in HSI classification, training such models typically requires a large amount of labeled data. However, in practical applications, newly acquired hyperspectral images often lack sufficient labels. Data labeling is time-consuming, labor-intensive, and expensive, greatly limiting the learning potential of current deep learning models. Therefore, few-shot learning (FSL) has attracted considerable attention due to its ability to quickly adapt to new tasks using prior knowledge and a limited number of labeled samples [10,11]. Various strategies have been developed to implement FSL effectively. For instance, the Deep Fuzzy Metric Learning (DFML) method leverages fuzzy logic theory to construct a spatial–spectral fuzzy metric space, enhancing the characterization of uncertainty in mixed and boundary pixel categories [12]. By combining a hybrid CNN–Transformer network with a Gaussian membership function-based fuzzy set representation, DFML improves classification performance under few-shot conditions. To further enhance feature discriminability, the Spatial–Spectral Enhancement and Fusion Network (SSEFN) utilizes a spatial–spectral enhancement strategy to facilitate model learning. Moreover, SSEFN incorporates an Adaptive Decision Fusion (ADF) module to integrate classification decisions from multiple enhanced features, effectively mitigating model overfitting [13]. In the context of few-shot open-set classification, the Self-Supervised Multitask Learning (SSMTL) framework was proposed to enhance extraction by introducing a self-supervised reconstruction task. Integrating modules such as the Data Diversification Module (DDM), Three-Dimensional Multi-Scale Attention Module (3D-MAM), and Adaptive Threshold Module (ATM), SSMTL dynamically adjusts thresholds based on uncertainty, thereby improving open-set classification performance [14].
Although the aforementioned methods have achieved promising classification performance, the generalization ability of the models remains limited due to insufficient labeled samples in the target domain. To address this challenge, domain adaptation methods have been widely adopted to tackle the problem of limited labeled samples. These methods leverage source domain data rich in annotations to extract potential correlation information (i.e., domain-invariant features) between different domains, and subsequently classify unlabeled target domains by mapping different but similar scenes into a shared feature space. This classification approach is known as cross-scene classification [15]. In the context of remote sensing, one effective strategy involves aligning data distributions. For example, semi-supervised Transfer Component Analysis (TCA) reduces inter-domain discrepancies in the feature space, significantly improving performance under scarce label conditions [16]. Beyond statistical methods, deep learning architectures have also been utilized for feature alignment. Specifically, models employing recurrent neural networks (RNNs) have been proposed to extract features followed by transformation learning to obtain domain-invariant representations [17]. Additionally, approaches based on deep metric learning have been developed to align embedded features via unsupervised domain adaptation, facilitating classification using the nearest neighbor (NN) algorithm even when the target scene contains limited labeled samples [18].
In practical cross-scene transfer learning, source and target datasets are often collected using different sensors or affected by varying external conditions. These differences not only lead to distributional shifts, but also result in inconsistencies in spectral dimensions, class distributions, and spatial resolution, thereby presenting significant challenges for cross-scene feature transfer under heterogeneous domain adaptation. In fact, compared to isomorphic transfer learning, heterogeneous relationships between source and target domains are more common in real-world applications. As a result, heterogeneous transfer learning has garnered increasing attention in recent years [19].
To address this challenge, Deep Cross-domain Few-shot Learning (DCFSL) was introduced as a heterogeneous transfer learning approach. Using meta-learning, DCFSL effectively handles the few-shot HSI classification. Moreover, DCFSL incorporates a conditional domain adversarial strategy to mitigate domain shift between the source and target domains, thereby enhancing cross-domain classification performance [20]. From a metric learning perspective, the Class Covariance Metric-based Few-shot Learning (CMFSL) method was proposed to improve adaptability [21]. It employs interactive training and replaces the traditional Euclidean metric with class covariance distance to learn invariant features across domains, thereby enhancing model adaptability and improving HSI classification accuracy. To capture non-local spatial dependencies effectively, Graph Information Aggregation Cross-domain Few-shot Learning (Gia-CFSL) integrates information propagation and graph alignment within graph neural networks [22]. By aligning non-local spatial information at both feature and distribution levels, Gia-CFSL mitigates spectral shifts and improves cross-domain classification accuracy [22]. For heterogeneous domain adaptation, a transfer model based on the Extreme Learning Machine (ELM) network was developed to enforce feature dimension consistency between source and target domains to achieve heterogeneous domain adaptation [23]. Focusing on feature relationships, another innovative method integrates a spectral–spatial enhanced channel attention mechanism to dynamically extract multi-scale global-to-local features. This approach also incorporates correlation alignment losses to reduce distribution discrepancies [24]. From an optimization perspective, Decoupled Knowledge Distillation-based FSL (DKD-FSL) [25] was introduced. DKD-FSL formulates meta-knowledge extraction and debiasing as a collaborative optimization task and introduces a knowledge distillation strategy to efficiently acquire and utilize unbiased meta-knowledge. Additionally, DKD-FSL employs a decoupled log interaction module to optimize the interaction between task-relevant and data-internal knowledge, and integrates a discriminant information refining module to enhance the separability of similar spectral bands [25].
However, the above method primarily focuses on cross-domain learning using data collected from different sensors on the same platform. In practical scenarios, where a large amount of hyperspectral data remains unlabeled, relying solely on single-platform data restricts the diversity of information sources. Therefore, expanding the data scope beyond a single platform presents a promising avenue to improve classification performance [26]. As an effective supplement, UAV remote sensing technology provides high operational flexibility. It effectively captures data even under cloud cover and delivers high-resolution imagery. Furthermore, the detailed spatial and spectral information inherent in UAV data greatly enhances the precision of surface feature identification and classification [27,28,29]. While UAV remote sensing data excels in high resolution and timeliness, satellite remote sensing is characterized by its wide-area coverage and stable observation capabilities. Leveraging these complementary strengths, pre-training the model on UAV data and transferring the learned features to satellite images effectively integrates rich detailed information with large-scale global contexts. In addition, this strategy uses large amounts of unlabeled satellite images and applies cross-domain learning to extract latent information, thereby enhancing the model’s generalization and classification accuracy on satellite data. This approach significantly reduces dependence on manual annotation and improves the model’s practicality and scalability in large-scale remote sensing applications. However, integrating both approaches presents challenges, including differences in spatial–spectral resolution, changes in imaging conditions, and inconsistencies in data sources. Particularly, when migrating small-scene UAV data to large-scene satellite data, addressing these differences while maintaining high classification accuracy has become a key challenge in heterogeneous transfer learning. To accurately capture intrinsic features under spectral resolution variations, the Subpixel Spectral Variability Network (S2VNet) models spectral variability and nonlinear mixture characteristics to deeply integrate complete subpixel information with class features [30]. This mechanism significantly enhances the model’s discriminative capability, achieving precise feature capture in complex scenarios. Meanwhile, to address the challenge of distribution inconsistency in multisource data, Multisource Collaborative Domain Generalization (MS-CDG) employs a distribution consistency alignment strategy [31]. This enables the model to effectively extract domain-invariant features, thereby improving generalization capability under multisource conditions. Although the aforementioned methods have achieved significant progress in subpixel-level feature mining and multisource domain generalization, simultaneously overcoming the significant spatial–spectral resolution discrepancies between UAV and satellite data and enhancing model generalization remains a critical challenge.
Based on this, this paper proposes a cross-domain hyperspectral image classification model that incorporates sharpness perception minimization and local–global feature enhancement. It utilizes UAV hyperspectral images for learning and training, aiming to achieve fine classification of large-scene satellite hyperspectral images. The model primarily addresses the challenges of large spatial–spectral resolution differences and the cross-domain learning of small-scene UAV data and large-scene satellite data under varying imaging conditions. In summary, the main contributions of the work are as follows:
  • This study proposes a novel paradigm for classifying hyperspectral satellite imagery using UAV hyperspectral data, enabling effective utilization of large amounts of unlabeled satellite data. By integrating cross-domain learning with the high spatial resolution and abundant labeled information of UAV hyperspectral data, the proposed method significantly enhances the fine-grained classification performance of satellite hyperspectral images in broad-area scenes. This approach offers a new research direction for the intelligent interpretation of hyperspectral remote sensing data acquired from heterogeneous sensor platforms;
  • A local–global feature extraction model is developed. Initially, the model captures local edge information from cross-domain data, followed by global feature alignment through an improved self-attention mechanism. This strategy enhances boundary detail representation through local feature extraction and optimizes cross-domain feature consistency via global feature alignment, thereby improving the model’s adaptability and robustness in cross-domain hyperspectral classification tasks;
  • An improved sharpness perception minimization (ISAM) strategy is proposed to overcome local optima and reduced generalization resulting from spectral shift in hyperspectral cross-domain classification tasks. To reduce computational complexity and improve training efficiency, this work refines the gradient perturbation strategy by using a single forward propagation to compute approximate perturbations. Furthermore, by combining square root gradient approximation perturbation with a nonlinear gradient scaling mechanism, the gradient update amplitude exhibits gradual growth relative to the gradient size. This adaptive adjustment of feature update intensity suppresses the dominance of large gradients, enhances the influence of small gradients, and ensures more balanced cross-domain feature alignment.

2. Related Works

2.1. Hyperspectral Image Classification via Deep Neural Network

The deep learning models for hyperspectral image classification can be broadly categorized into spectral models and spatial–spectral models. For spectral deep learning, Hu et al. proposed a one-dimensional convolutional neural network that relies solely on spectral band information for classification. By modeling each spectral band individually, the representation capability of spectral information was significantly enhanced [32]. Mou et al. employed a recurrent neural network (RNN) to model spectral bands as sequences, achieving good classification performance [33]. However, although these methods are effective in modeling spectral information, there is potential to further improve classification performance by leveraging the rich spatial features in hyperspectral images.
To address this issue, the researchers began to explore deep learning models that integrate spatial information to fully exploit the feature representation capabilities of hyperspectral images. Li et al. proposed a three-dimensional convolutional neural network (3D CNN) capable of simultaneously extracting spatial and spectral features, thereby significantly enhancing classification performance [34]. In addition, Liu et al. combined two-dimensional and three-dimensional CNNs for feature extraction, which not only enhanced feature representation capability but also effectively reduced computational cost [35]. To further improve the stability of deep networks, Transformer models have been increasingly applied to hyperspectral classification tasks beyond traditional CNN architectures, due to their powerful global modeling capabilities. Ahmad et al. proposed the Spectral–Spatial Wavelet Transformer, which captures both local and global features simultaneously, thereby improving classification accuracy [36]. However, while these methods perform well in single-domain settings, there remains potential to adapt them for cross-domain hyperspectral classification, particularly to better handle differences in data distribution between domains.
Given the limitations of single-domain classification methods, researchers have shifted their focus to cross-domain hyperspectral image classification to enhance model generalization. In cross-domain hyperspectral classification tasks, ResNet is regarded as a key method for enhancing generalization, owing to its strong feature extraction abilities and stable optimization characteristics. Li et al. demonstrated that the residual structure can stably learn features despite changes in data distribution, thus improving model accuracy in cross-domain hyperspectral image classification tasks [20]. Consequently, ResNet and its variants have been widely employed in cross-domain learning tasks. Zhang et al. proposed a comparative learning method based on ResNet, which leverages contrastive learning to enhance cross-domain shared features, thereby improving target domain classification accuracy [37]. Additionally, graph neural networks (GNNs) have been introduced into cross-domain hyperspectral classification tasks. Ye et al. proposed a cross-domain few-shot learning method based on GNNs. By constructing an inter-domain relationship graph, they structured the samples from both source and target domains to enhance cross-domain classification performance [38]. Inspired by previous studies, this paper proposes a local–global feature extraction model that combines the residual learning framework of ResNet with the global modeling capability of the Transformer. This integration enhances the model’s generalization and robustness in cross-domain hyperspectral classification tasks.

2.2. Strategies for Enhancing Model Generalization

Improving the generalization ability of models has become a key challenge in cross-domain hyperspectral image classification tasks. To address this challenge, Domain Adaptation (DA) and Domain Generalization (DG) techniques have been introduced into cross-domain hyperspectral image classification tasks. Specifically, DA aims to reduce the feature distribution discrepancy between the source and target domains, whereas DG focuses on mining domain-invariant features to ensure model robustness on unseen domains. Existing methods mainly include data augmentation, regularization techniques, and domain adaptation, each improving model generalization from different perspectives. These methods have shown promising results, suggesting avenues for continued improvement.
Data augmentation methods expand datasets using adversarial networks (GANs) or traditional transformations, such as rotation, flipping, and noise perturbation, thereby improving the model’s adaptability to changes in inter-domain distributions. The GAN-based data enhancement strategy proposed by Miftahushudur et al. can improve cross-domain classification performance; however, the high dimensionality and complexity of hyperspectral data could pose challenges for the quality of generated samples [39]. Moreover, although traditional data augmentation methods are simple and efficient, when faced with significant inter-domain differences, augmented samples may still struggle to fully compensate for the feature shift between the source and target domains. Regularization techniques enhance generalization capabilities by optimizing the model parameter update process. Dropout reduces the network’s reliance on specific neurons by randomly discarding units and their connections during training, thus effectively mitigating overfitting [40]. However, to further extend their benefits to cross-domain scenarios, additional mechanisms may be incorporated to better handle target domain features.
Early domain adaptation methods primarily focused on mitigating distribution discrepancies through statistical alignment. Wang et al. proposed a maximum mean difference (MMD)-based method that measures the mean difference between the characteristic distributions of the source and target domains in Hilbert Space. This approach narrows the cross-domain distribution deviation and enhances the model’s generalization ability [41]. Additionally, some studies have incorporated manifold learning strategies. These methods abstract visual and semantic spaces into graph structures and utilize matrix factorization to enforce the optimized semantic manifold to approximate the visual manifold in terms of geometric topology, thereby constructing a visually aligned semantic graph to align cross-modal domain distributions [42]. However, these traditional approaches typically assume that the source and target domains share a consistent class space. In practice, scenarios are often more complex, with the target domain frequently exhibiting spectral shifts or containing unknown classes. Addressing these realistic conditions introduces significant generalization challenges. With advancements in deep learning, domain adaptation has witnessed significant breakthroughs, and current methods can be broadly categorized into fine-tuning-based approaches and feature-level domain adaptation.
Fine-tuning-based approaches typically involve transferring the weights of a network pre-trained on the source domain to a target domain model, enabling the model to adapt to the target data distribution through subsequent fine-tuning operations. For instance, Mei et al. trained a five-layer CNN classifier on source domain data and subsequently extracted the fully connected feature layers to construct a feature learning framework tailored for the target domain [43]. Similarly, Yang et al. proposed a dual-branch convolutional neural network that utilizes a fine-tuning strategy to transfer the weights of the initial layers from the pre-trained network to the target domain model [44]. While this strategy proves effective when the source and target domains share similar features, relying solely on fine-tuning may be insufficient to extract deep, robust features when facing significant data discrepancies, such as differences in resolution or sensor types.
Another approach is feature-level domain adaptation, which aims to extract domain-invariant features by minimizing the discrepancy between the feature distributions of the source and target domains. Othman et al. proposed a deep domain adaptation network utilizing a mini-batch gradient optimization algorithm to minimize the feature distribution error between the two domains, thereby effectively achieving cross-scene classification [45]. Similarly, Wang et al. proposed a neural network-based domain adaptation framework that achieves high-precision classification by jointly optimizing three objectives in the embedding space: source discriminative classification, cross-domain distribution alignment, and manifold structure preservation [46]. While the aforementioned methods excel at aligning global feature distributions, there remains room for improvement in capturing the fine-grained class structures intrinsic to the data. To address this, Conditional Adversarial Domain Adaptation strategies have been introduced for cross-domain classification tasks. The primary advantage of this strategy is that it not only focuses on feature alignment but also incorporates the classifier’s predictions (i.e., class information) as a critical condition into the adversarial network. This mechanism encourages a more compact distribution of same-class objects in the feature space (i.e., enhancing intra-class compactness), thereby enabling superior classification performance even when target domain samples are scarce [47].
Although Conditional Adversarial Domain Adaptation strategies effectively facilitate fine-grained alignment between source and target domains by incorporating class prediction information, relying solely on feature-level constraints is often insufficient to fully guarantee model generalization in complex cross-domain scenarios. In fact, the improvement of cross-domain classification performance depends not only on feature alignment but is also intrinsically linked to the optimization process of model parameters. Particularly under small-sample conditions, models tend to be highly sensitive to distribution discrepancies, where even subtle spectral variations can significantly impact final performance. Therefore, while adversarial adaptation strategies establish the correct direction for feature alignment, further incorporating an advanced optimization mechanism is essential. By guiding the model to converge to a flatter and more robust minimum, such a mechanism can significantly enhance the model’s resilience to environmental changes.
In recent years, several studies have aimed to improve model generalization through optimization strategies. Among these, Sharpness-Aware Minimization (SAM) has emerged as a powerful algorithm due to its superior noise robustness and generalization capabilities [48]. SAM introduces a new optimization paradigm, where the core idea is to optimize the worst loss in the neighborhood during the gradient update process. This guides the model to learn on a smoother loss surface, alleviating the issue of local optimization and improving generalization. Although SAM has shown promising results in two-dimensional natural image classification tasks, it faces challenges such as high optimization complexity and unstable local perturbation directions when applied to three-dimensional data, especially in tasks with high-dimensional structures and cross-domain differences, such as hyperspectral images. Further improvements and adaptations are needed. Inspired by previous studies, this paper proposes an improved SAM algorithm that enhances model optimization efficiency and significantly improves generalization in cross-domain scenarios.

3. Proposed SAMLFE for HSI Classification

Figure 1 illustrates the architecture of the proposed SAMLFE model, using the WHU-Hi-HanChuan and Pavia University datasets as examples. First, the target domain images (Pavia University), with limited labels, and the source domain images (WHU-Hi-HanChuan), with abundant labels, are input the spectral dimension mapping module to align their spectral dimensions. The mapped data is then passed into the local-to-global feature extraction model, which comprehensively extracts both local details and global semantic features. This ensures that the global modeling stage retains key detail information while preventing the model from overfocusing on irrelevant local features. Additionally, few-shot learning is applied in the local-to-global feature space to extract shared deep features between the source and target domains, enabling rapid adaptation to spectral offsets in new tasks. A conditional adversarial domain adaptation strategy is then employed to align the feature distributions of the source and target domains, effectively mitigating the spectral offset problem. Simultaneously, the parameter optimization process is refined using the improved sharpness-aware minimization strategy to reduce the model’s sensitivity to feature distribution changes, thus minimizing performance fluctuations caused by spectral offsets. Finally, the extracted features are classified using KNN to determine the final geographic category.

3.1. Spectral Dimension Mapping Model Between Source and Target Domains

Differences in the number of spectral bands between the source and target domains (e.g., 274 bands in the source domain versus 103 bands in the target domain) can cause significant deviations in their feature distributions, thereby affecting the cross-domain performance of the classification model. In addition, the mapping model, first proposed in [49], is adopted to ensure the same dimension of input samples before the embedded feature extractor model. To mitigate feature inconsistencies caused by spectral dimension differences between the source domain (e.g., UAV images) and the target domain (e.g., satellite images), this study employs a spectral dimension mapping model to uniformly map the input data which enables feature alignment across domains and enhances the model’s cross-domain adaptability. For a detailed description of the training process, refer to Algorithm 1.
The spectral dimension mapping model, illustrated in Figure 2, consists of two components: a 2D convolutional layer and a batch normalization layer. The figure depicts the source domain as an example, while the target domain follows an identical mechanism. First, the input data from the source and target domains, denoted as x s and x t , are individually transformed using the spectral dimension mapping models M s and M t . This operation projects the heterogeneous data into a unified spectral feature space with a target dimension of d = 100 . Subsequently, the outputs generated by the 2D convolution layers are normalized through batch normalization (BN) to accelerate training and enhance model stability. Finally, the resulting aligned feature maps are obtained, denoted as x s and x t , as formulated in Equation (1):
x s = M s x s ,   x s = M t x t
where x s and x t represent the input features of the source and target domains, with dimensions of B s × H s × W s and B t × H t × W t , respectively. M s and M t denote the dimension mapping layers, both with a dimension of d × 1 × 1 . After mapping, the bands B s and B t are projected onto the target dimension d . The aligned features of the source and target domains are denoted as x s and x t , with dimensions d × H s × W s and d × H t × W t , respectively, where d is set to 100.
Algorithm 1 Pseudocode for the Training Process of the
Proposed SAMLFE
Input: ST, QT of Dt, SS, QS of DS, the number of training episodes.
Output: The classification accuracy of each class of the target dataset.
1: Spectral Dimension Mapping
2: Calculate X S , X S by Equation (1);
3: Local-to-Global Feature Extraction
4: for episode = 1: episodes do
5:                  Randomly selected ST, QT from Dt, SS, QS from DS;
6:                  Extract deep representations from the mapped data;
7:                  Calculate local features yLFEM by Equation (8);
8:                  Calculate global features O by Equation (14);
9:                  Few-shot learning on extracted features;
10:                 Calculate L S f s l , L t f s l by Equations (18) and (19);
11:                 Lossfsl = L S f s l + L t f s l ;
12:                 Lossfsl.backward()
13: end for
14: Conditional Domain Discriminator
15: for episode = 1: episodes do
16:              Calculate Ld by Equation (22);
17:              Loss = Lfsl + Ld;
18:              Loss.backward()
19: end for
20: ISAM Parameter Optimization
21: for episode = 1: episodes do
22:              Calculate ew, w ^ and e ˜ w by Equations (24)–(26);
23:              Update model parameters to reduce loss fluctuations;
24:              Calculate Wnew by Equation (27)
25: end for

3.2. Local-to-Global Feature Extraction Model

In constructing the local–global feature extraction model, this study adopts a “local-to-global” feature extraction strategy, differing from conventional network practices. The core innovation of this strategy lies in first employing a local feature extraction module to precisely capture details and edge information, followed by a global feature extraction module to model cross-domain relationships. This approach ensures that critical details are preserved during global modeling, while preventing excessive attention to irrelevant local features. In the local feature extraction module, the traditional residual block is replaced with a 3D asymmetric residual block. This module decomposes the three-dimensional convolution into kernels along three directions and executes them sequentially in a predefined order, enabling the model to flexibly capture spectral and spatial dependencies, thereby achieving precise local feature extraction under cross-domain conditions. In the global feature extraction module, an improved self-attention mechanism is introduced, with the weights of attention-enhanced features and asymmetric residual features balanced via an adaptive scaling parameter, γ. During the early training stages, γ is small, and the model primarily relies on domain-invariant residual features to prevent overfitting in the source domain. As training progresses, γ gradually increases, adaptively introducing global attention, which progressively aligns target domain features and enhances the robustness and generalization of cross-domain representations. The architecture of the local–global feature extraction model is illustrated in Figure 3. Moreover, although the network performs identical operations at different depths and across various channels, the resulting feature maps exhibit distinct semantic meanings because of variations in the input. Consequently, the feature representations differ across stages. To intuitively illustrate these semantic differences, different colors, sizes, and shapes are used in the schematic diagram for visual annotation. As illustrated in the bottom left of Figure 3, the Asymmetric Residual Block decomposes a standard 3D convolution into three sequential kernels: 1 × 1 × 3 (vertical), 1 × 3 × 1 (horizontal), and 3 × 1 × 1 (spectral). This decomposition aligns with Equations (3)–(5). Furthermore, the right side of Figure 3 visualizes the feature space distribution. Different colors and shapes (e.g., blue circles for ‘Trees’, gray dots for ‘Meadows’) represent the semantic separation of classes achieved after the Global Feature Extraction, demonstrating the enhanced feature extraction capability of the proposed model.
The Local Feature Extraction Model (LFEM) is constructed, which draws on the concept of residual learning and designs residual blocks with asymmetric structures. By flexibly configuring convolutional kernels in different directions, the model effectively extracts local feature information and accurately captures the edges and contours of land objects. This structure preserves critical boundary information during subsequent global feature modeling, preventing the loss of important details and enhancing the model’s ability to perceive complex land structures.
First, the input data is processed through the initial 3D convolution (3DConv) layer, simultaneously capturing spatial and spectral features while fully exploiting the spatial–spectral correlation in hyperspectral images, thereby enhancing the accuracy and representational capability of feature extraction. The detailed operations are presented in Equation (2).
y 1 t ,   i ,   j = R p = 0 P 1 m = 0 k 1 n = 0 k 1 x t + p ,   i + m ,   j + n w p ,   m ,   n + b
where R ( · ) denotes the ReLU activation function, y 1 t ,   i ,   j represents the output feature map after the first convolutional layer at position t ,   i ,   j , and x t + p ,   i + m ,   j + n corresponds to the value at position in t + p ,   i + m ,   j + n . w p ,   m ,   n represents the weight at position p ,   m ,   n in the three-dimensional convolution, where P × k × k indicates the size of the convolution kernel and b is the bias term. Here, t ,   i ,   j denotes the specific coordinate position within the 3D feature map, where t represents the index along the spectral dimension, while i and j represent the indices along the spatial vertical (row) and horizontal (column) dimensions, respectively.
Subsequently, the detailed features are further extracted using asymmetric convolution to enhance the representation of HSI details, including edges, textures and contours. The asymmetric convolution processes feature maps using multi-directional convolution kernels to capture fine-grained information such as edges and contours within the image. In this study, asymmetric convolution is applied sequentially along the y, x, and z axes. The y axis in the spatial dimension corresponds to the vertical direction in the HSI. Convolution operations along the y axis focus on extracting features related to the vertical structure of land objects, which helps the model classify objects with vertically distributed characteristics, such as buildings and trees. The process is shown in Equation (3).
y y t ,   i ,   j = n = 0 k 1 y z t ,   i ,   j + n W y n + b y
where y y t ,   i ,   j represents the feature map obtained after convolution along the y axis, y z t ,   i ,   j + n represents the value of the input data at position t ,   i ,   j + n , W y n denotes the weight of the 3D convolution at position n, where the convolution kernel has a size of 1 × 1 × 3 , and b y is the offset term along the y axis.
The x axis in the spatial dimension corresponds to the horizontal direction in the HSI. Convolution operations along the x axis help extract the horizontal spatial characteristics of the HSI, especially for land objects with horizontal arrangement or expansion (such as roads, building profiles, etc.), as shown in Equation (4).
y x t ,   i ,   j = m = 0 k 1 y x t ,   i + m ,   j W x m + b x
where y x t ,   i ,   j represents the feature map obtained after convolution along the x axis, y x t ,   i + m ,   j denotes the value of the input data at position t ,   i + m ,   j , W x m represents the weight of the three-dimensional convolution at position m, where 1 × 3 × 1 is the size of the convolution kernel, and b x is the offset term along the x axis.
In hyperspectral images, the z axis corresponds to the spectral dimension. By performing convolution operations along the z axis, the model can more accurately capture pixel variations across different spectral bands. The process is shown in Equation (5).
y z t ,   i ,   j = p = 0 P 1 y 2 t + p ,   i ,   j W z p + b z
where y z t ,   i ,   j represents the feature map obtained after convolution along the z axis, y 2 t + p ,   i ,   j represents the value of the input data at position t + p ,   i ,   j , W z p represents the weight of the 3D convolution at position p, where the convolution kernel has a size of 3 × 1 × 1 , and b z is the offset term along the z axis.
Additionally, after obtaining multiple feature maps, they are fused to enhance the quality of local information as shown in Equation (6). Using asymmetric convolution, the embedded features from both the source and target domains effectively capture details, particularly the edge and contour information in the image. These embedded features not only enhance the accuracy of image representation but also provide stronger support for the subsequent classification tasks.
y 2 t ,   i ,   j = R y z t ,   i ,   j + y y t ,   i ,   j + y x t ,   i ,   j
The feature map is then further processed using 3D convolution operations to uncover the complex relationships between spatial and spectral dimensions, generating a new feature map, as shown in Equation (7).
y 3 t ,   i ,   j = p = 0 P 1 m = 0 k 1 n = 0 k 1 y 2 t + p ,   i + m ,   j + n w p ,   m ,   n + b
Finally, we apply the concept of residual learning to fuse multiple feature maps and generate the new feature maps. This strategy helps preserve important local features while enhancing the model’s ability to learn complex features through residual connections, improving the expression capability and classification performance of the final output, as shown in Equation (8).
y L F E M = R y 1 + y 3
The global feature extraction module (GFEM) leverages the global modeling capability of the improved self-attention mechanism in Transformer. By capturing the long-distance dependencies, it focuses on identifying shared features between the source and target domains. Additionally, the model automatically highlights stable and clear areas in the feature map while suppressing interference from noisy and uncertain regions, enhancing the perception of information in heterogeneous structures, which effectively mitigates information misalignment due to differences in spatial structure and resolution, improving the model’s generalization ability across domains. After local feature extraction, 3D maximum pooling operation is applied to reduce feature dimensionality, preserving the most representative values, emphasizing key local information, and providing a more compact and discriminative input for subsequent global modeling, as shown in Equation (9).
y 4 t ,   i ,   j = max y L F E M t + δ z ,   i + δ h ,   j + δ w
where y 4 t ,   i ,   j represents the maximum pooling of 3D, with the size of the pooling kernel being k z × k h × k w , and 0 δ z < k z , 0 δ h < k h , 0 δ w < k w .
Next, a global feature extraction space is constructed. An improved self-attention mechanism is employed to adaptively focus on key features while suppressing irrelevant or redundant information. This approach effectively establishes long-range dependencies between pixels in the feature map, thereby enhancing the model’s capacity to capture global structural information. The structure of the improved self-attention mechanism is illustrated in Figure 4. Initially, three convolutional layers with kernel size 1 × 1 × 1 are applied to the input y 4 t ,   i ,   j , projecting it into the Query, Key, and Value representations. This process is illustrated in Equation (10).
Q 1 = C o n v 3 d q u e r y x ,     Q 1 l × C 8 × D × W × H K 1 = C o n v 3 d k e y x ,     K 1 l × C 8 × D × W × H V 1 = C o n v 3 d v a l u e x ,     V 1 l × C × D × W × H
where l represents the batch size, C is the number of input channels, and D, W, and H denote the depth, width, and height, respectively, while Q 1 , K 1 , and V 1 represent the Query, Key, and Value representations obtained after the first calculation, respectively.
In addition, to simplify subsequent matrix operations, the spatial dimension D × W × H is flattened into a unified dimension N, where N = D × W × H , as shown in Equation (11).
Q 2 = permute reshape Q 1 ,     Q 2 l × N × G K 2 = reshape K 1 ,     K 2 l × G × N V 2 = reshape V 1 ,     V 2 l × C × N
where Q 2 , K 2 , and V 2 represent the features output after transformation, and G = C / 8 .
Subsequently, the attention score A is calculated by computing the similarity between Q 2 and K 2 , thereby focusing on features with higher scores, as shown in Equation (12).
S = Q 2 K 2 ,                       S l × N × N A = softmax S ,     A l × N × N
After calculating the score matrix, the data is reshaped to restore the original spatial dimensions. First, the attention matrix is transposed by exchanging its last two dimensions, and then batch matrix multiplication is performed with V 2 . The specific process is shown in Equation (13).
Y ˜ = reshape V 2 A T ,     Y ˜ l × C × D × W × H
In the output of the previous step, this paper draws inspiration from the scaling factor in the Linformer self-attention mechanism and introduces a learnable scalar γ to adaptively control the weighting between the self-attention module output and the original input. γ is a trainable parameter optimized through backpropagation during training. The model automatically adjusts the value of γ, thereby adaptively balancing the contributions of the improved self-attention output and the input, as shown in Equation (14).
O = γ Y ˜ + x

3.3. Source and Target Few-Shot Learning

Due to sensor discrepancies and environmental variations, the spectral characteristics of the same object may shift across different scenarios, leading to performance degradation in cross-domain applications. To mitigate the spectral shift between UAV and satellite data, this paper performs few-shot learning tasks based on source and target domains within the local–global feature space, as illustrated in Figure 5. By constructing K-shot C-way tasks, the few-shot learning utilizes a limited number of samples across multiple training rounds, enabling the model to progressively learn feature representations with enhanced generalization across different categories and scenarios. Combined with a multi-task training strategy, the model extracts the deep features shared between the source and target domains, enabling the rapid adaptation to new tasks under spectral shifts while maintaining high classification accuracy. Figure 5 illustrates the construction of meta-learning tasks for the source and target domains. Specifically, the source task is constructed using the Support Set S s and Query Set Q s , while the target task is similarly composed of S t and Q t . As depicted in the right section, the diagram focuses on the calculation of the Euclidean metric within each constructed task. Through meta-learning training, the model encourages samples of the same class to cluster closer together in the feature space. As illustrated in the right section of the figure, where distinct colors represent different categories, the samples are shown aggregating towards each other, thereby achieving greater intra-class compactness.
Assume that the source domain dataset in the local–global feature space is denoted as D s = { x i s ,   y i s } i = 1 n s , where x i s represents the HSI data of the i -th sample and y i s denotes its corresponding category label. The target domain dataset is denoted as D t = { x i t ,   y i t } i = 1 n t , where x i t represents the HSI data of the i-th sample and y i t denotes the corresponding category label in the target domain. D t contains a small amount of labeled data D f and a large amount of unlabeled data D u , where D T = D f D u . The few-shot learning tasks are, respectively, performed on D s and D t . First, a few-shot learning task is performed on the source domain D s . Specifically, D S classes are randomly selected from the source domain dataset C s . For each class, K s labeled samples are selected to form the support set S S . N s unlabeled samples are then selected from the same C s classes to form the query set Q S . The corresponding formula is expressed as follows:
S S = { x i s ,   y i s } i = 1 C s × K s ,   Q S = { x j s ,   y j s } j = 1 C s × N s
Perform the few-shot sample learning task for the target domain D t , replicate the task for domain D T , and generate the support set S T and query set Q T for the target domain as follows:
S T = { x i t ,   y i t } i = 1 C t × K t ,   Q T = { x j t ,   y j t } j = 1 C t × N t
During the training phase, the Softmax method is applied to calculate the similarity between the query sample and the category center of the support set. The network parameters are then updated by optimizing the negative log-probability loss function. For the query sample x j , its class distribution probability is calculated as follows:
P y j = k | x j Q s = exp d f ϕ x j ,   c k   k = 1 C exp d f ϕ x j ,   c k
where d , represents the Euclidean distance, C k is the embedding feature center supporting the k-th set, C is the number of categories in the task, f ϕ represents the mapping layer with parameters ϕ , and the embedding feature extraction network, x j is a sample in the query set, and y j is the true category label of x j .
The classification loss for the source domain episode task is calculated by summing the negative logarithmic probabilities of all query samples:
L f s l s = E S s , Q s ( x ,   y ) Q s log p ϕ y = k | x
where E S s ,   Q s denotes the expectation over the support set and the query set, and S s and Q s represent the support set and query set of the source domain, respectively.
Similarly, FSL is applied to the target domain data, and the classification loss for the target domain episode task is calculated:
L f s l t = E S t ,   Q t ( x ,   y ) Q t log p ϕ y = k | x
where S t and Q t represent the small-sample support set and query set of the target domain, respectively, and p ϕ is the probability distribution output by the network parameterized by ϕ .

3.4. Conditional Domain Discriminator Model

To reduce the distributional difference between UAV and satellite data, this paper employs a conditional domain discriminator and optimizes using domain adversarial losses. The model distinguishes data sources (e.g., UAVs or satellites) using domain discriminators, while the local–global feature extractor learns domain-independent shared features via adversarial training, thereby reducing domain differences such as those related to sensors and environments. As training progresses, the local–global feature extractor and domain discriminator interact, gradually aligning feature distributions and improving classification performance in the target domain (e.g., satellite data), effectively alleviating the impact of spectral offset. The network architecture is illustrated in Figure 6. To address the dimension explosion issue inherent in standard multilinear maps where the output dimension d f × d g becomes computationally prohibitive, the Randomized Multilinear Map described in Formula 20 is employed to project the joint features into a compact 1024-dimensional space. Subsequently, the Conditional Domain Discriminator is implemented as a five-layer Multilayer Perceptron. The blocks labeled ‘FRD’ in the figure represent the hidden layers, where each block consists of a Fully Connected (FC) layer, a ReLU activation function, and a Dropout layer to prevent overfitting. Specifically, the Dropout and ReLU are applied after each FC layer except for the final one. Finally, the last layer utilizes a Softmax function to predict the domain probability.
Suppose h = f ,   g represents the joint variables f and g . The multilinear mapping f g is selected to condition g . Compared to simple concatenation strategies, multilinear mapping f g can more effectively capture multimodal structures in complex data distributions. However, its main disadvantage is its tendency to cause dimensional explosions. Assuming d f and d g represent the dimensions of f and g , respectively, the output dimension of the multilinear map is d f × d g , which is often difficult to embed into deep models. To address the issue of dimensional explosion, traditional multilinear mapping is replaced with random multilinear mapping. The multilinear map T f ,   g can be approximated to T f ,   g using a dot product, as shown in the following formula:
T f ,   g = 1 d R f f R g g
where represents the element-level dot product operation, where R f R d × d f and R g R d × d g are two random matrices. These random matrices are sampled once and fixed during the training phase. They meet d d f × d g , where f is the feature extracted by the local–global feature extractor, and g is the category information predicted by the discriminator D. Additionally, both matrices R f and R g follow symmetric distributions with a mean of zero. Finally, the following conditional strategy is employed:
T h = T f , g , d f × d g 1024 T f , g , o t h e r w i s e
where 1024 represents the largest number of units in the model. If the dimension of the multilinear map T exceeds 1024, a randomized multilinear map will be used. The domain adversarial loss function is then defined as follows:
L d = min D   max T L = E x i s ~ P s ( x ) log [ D ( T ( h i s ) ) ] E x j t ~ P t ( x ) log [ 1 D ( T ( h j t ) ) ]
where D represents the discriminator, h i s and h j t are the embedded features of the source domain and target domain samples, where h = ( f ,   g ) combined with the variable f is the feature extracted by the local-to-global feature extractor. g represents the category information predicted by the discriminator D, and T is the linear dimensional transformation. P s ( x ) and P t ( x ) denote the source domain feature distribution and the target domain feature distribution, respectively. Specifically, x i s consists of support samples from the source domain, while x j t comprises query samples from the target domain. Finally, the loss function in this paper consists of L f s l s , L f s l t and L d , as follows:
L t o t a l = L t o t a l s + L t o t a l t = L f s l s + L f s l t + 2 L d

3.5. Improved Sharpness-Aware Minimization Strategy

In cross-domain hyperspectral image classification tasks, the significant spectral offset between the source and target domains often causes substantial changes in feature distribution, leading to large fluctuations in the model’s loss surface. To address this issue, this paper proposes an improved SAM algorithm, which uses a single forward pass to compute the approximate perturbation. This approach avoids the strong perturbations introduced by the traditional SAM algorithm’s two forward passes, thereby enhancing the stability of gradient perturbation. Additionally, this paper improves the original gradient perturbation strategy by introducing a nonlinear gradient scaling mechanism, causing the gradient update magnitude to grow more gradually relative to the gradient size. This adaptive adjustment of feature update strengths results in more stable perturbation directions, better aligned with the characteristics of cross-domain classification tasks. It effectively smooths the loss function and optimizes the model’s performance in the worst-case scenario. The schematic diagram shown in Figure 7 illustrates the expected effect after incorporating ISAM. Figure 7 illustrates the schematic operation and the expected impact of the Improved Sharpness-Aware Minimization (ISAM) strategy on the model’s optimization landscape. As depicted, the initial loss surfaces (representing components like L s and L d ) typically exhibit pronounced peaks and steep valleys caused by cross-domain spectral shifts. The transition, mediated by the ISAM algorithm, results in a final total loss surface ( L t o t a l ) that is significantly smoother and flatter. This visualization demonstrates how ISAM mitigates the instability of gradient perturbations and guides the model towards flatter minima, thereby enhancing generalization performance in worst-case scenarios.
First, calculate the gradient perturbation amplitude of each parameter w. This is done by taking the square root of the gradient and adding a small positive number ϵ , where ϵ is set to 1 × 10−8 for numerical stability, a common term used in the Adam optimizer, as shown in Equation (24).
e w = w L total 2 + ϵ = w L total + ϵ
Among them, e w is used to measure the amplitude of each parameter gradient and stored in the state, w represents the model parameter, and w L represents the gradient of the loss function L total with respect to the parameter w .
Next, apply the Adam optimizer’s update rules to update the parameters, obtaining the intermediate result w ^ . This step is equivalent to a standard gradient descent update, as outlined in the following process.
w ^ = BaseOptimizerStep w , w L
Subsequently, each parameter is corrected and updated in a gradient-free environment. Using the stored disturbance amplitude e w , ϵ is added to compute the update factor e ˜ w and the parameters are adjusted accordingly.
e ˜ w = e w + ϵ = w L + 2 ϵ
Finally, based on the corrected perturbation factor, an update operation is applied to the model parameters, as shown in Equation (24). This update not only accounts for the relative relationship between the original parameters and the perturbation direction but also refines the parameters through proportional weighting. This approach ensures model stability while guiding optimization in a more robust direction, thereby enhancing the model’s generalization ability in cross-domain hyperspectral image classification tasks.
w new = 1 ρ w ^ + ρ e w e ˜ w
where ρ is a hyperparameter controlling the weight adjustment amplitude during the disturbance correction process, set to 1.1 × 10−8. 1 ρ w ^ represents part of the basic update result, and ρ e w e ˜ w denotes the disturbance correction applied to the parameters. This ensures the adaptive balancing of update forces across different features during gradient updates, preventing the loss of key information due to gradient normalization.

4. Experimental Validation and Analysis

4.1. Dataset Descriptions

To evaluate the performance and efficiency of the proposed model, this study conducted experiments on three datasets with three indicators: overall classification accuracy (OA), average classification accuracy (AA), and the Kappa coefficient. Specifically, OA represents the proportion of correctly classified samples to the total number of test samples, AA reflects the average accuracy for each category and measures the classification differences between categories, while the Kappa coefficient (K × 100) evaluates the consistency between the classification results and the true ground truth. The F1 score effectively evaluates the model’s generalization ability and robustness under conditions of class imbalance. A higher value for these indicators indicates better classification performance. The model size is used to evaluate the model’s spatial complexity, where a larger size indicates higher complexity. The time metric denotes the model’s training duration, with a smaller value corresponding to faster training. Specifically, WHU-Hi-HanChuan serves as the source domain dataset (UAV dataset), while Pavia University, Indian Pines, and Salina are used as the target domain datasets (airborne dataset). Furthermore, the HZ dataset is utilized as the target domain dataset representing satellite imagery.
(1) Source domain dataset: The WHU-Hi-HanChuan (WHHC) dataset was collected by the Leica Aibot X6 UAV platform in Hanchuan City, Hubei Province, China, in 2016. This dataset consists of 274 spectral bands, with a wavelength range of 400 nm to 1000 nm, a spatial resolution of 0.109 m, and an image size of 1217 × 303 pixels. The details of the WHU-Hi-HanChuan dataset are shown in Table 1, and the false-color image and ground truth are shown in Figure 8.
(2) Target domain dataset: The Pavia University (PU) dataset was collected using ROSIS sensors, consisting of 610 × 340 pixels, with a spatial resolution of 1.3 m. This dataset contains 103 spectral bands, with a wavelength range of 430 nm to 860 nm. The dataset includes 9 categories. The details of the PU dataset are shown in Table 2, and the false-color image and ground-truth are shown in Figure 9.
The Salinas (SA) dataset has an image size of 512 × 217 pixels, a spatial resolution of 3.7 m, and covers 204 spectral bands with a wavelength range from 400 nm to 2500 nm. The dataset was collected using an airborne visible/infrared imaging spectrometer (AVIRIS) sensor over the SA Valley in California, USA. The details of the SA dataset are shown in Table 3, and the false-color image and ground truth are shown in Figure 10.
The Hangzhou (HZ) dataset was acquired by the NASA EO-1 Hyperion sensor. There are 198 spectral bands after uncalibrated and noisy bands are removed. The spatial size of the HZ dataset is 590 × 230 with 30 m spatial resolution. The details of the HZ dataset are shown in Table 4, and the false-color image and ground truth are shown in Figure 11.
The Indian Pines (IP) dataset was collected by AVIRIS in 1992 in Northwest Indiana, USA. This dataset contains 200 spectral bands, with a wavelength range of 400 nm to 2500 nm, a spatial resolution of 20 m, and an image size of 145 × 145 pixels. The dataset includes 16 vegetation categories. The details of the IP dataset are shown in Table 5, and the false-color image and ground truth are shown in Figure 12.

4.2. Experimental Setting

The method proposed in this paper uses the Warmup Cosine Learning Rate Scheduler for optimization. In the early stages of training, the learning rate gradually increases from 1 × 10−5 to 1 × 10−3, preventing instability caused by an excessively high learning rate. After the early stage, the learning rate decreases gradually, following the shape of the cosine function, which allows the model to converge more stably in the later stages of training. The model is optimized using the Adam optimizer, with the number of iterations set to 10,000. To reduce the randomness caused by the training samples, the final result for each trial was obtained by averaging 10 repetitions. To ensure a rigorous and fair comparison, all experiments were conducted on a unified hardware platform equipped with a single NVIDIA [e.g., RTX 2080Ti] GPU manufactured by Shanghai finehoo Tech-nology Co., Ltd., located in Shanghai, China and programmed using PyTorch 1.7.1. Specifically, we standardized the input data volume by evaluating all methods under a consistent 5-shot setting (i.e., 5 labeled samples per class) to guarantee identical supervision information across distinct algorithms. Crucially, regarding the training details for the comparative state-of-the-art (SOTA) methods, we strictly adhered to the hyperparameter configurations (e.g., learning rate, weight decay, and training epochs) recommended in their original papers’ implementations. This protocol ensures that each baseline model is evaluated at its optimal performance level, eliminating potential bias arising from improper parameter tuning. The proposed SAMLFE method is compared with several representative algorithms, including traditional machine learning methods, XGBoost [50] and SVM [51], deep learning methods, 3D-CNN [34] and SSRN [9], popular cross-domain classification methods, DCFSL [20] and Gia-CFSL [22], and state-of-the-art (SOTA) methods, GSCViT [52] and DSFormer [53]. A systematic comparison with these methods provides a comprehensive evaluation of SAMLFE’s performance advantages and generalization capabilities across different modeling paradigms and cross-domain scenarios.

4.3. Classification Maps and Categorized Results

Table 6, Table 7, Table 8 and Table 9 show the classification results of different methods on the Pavia University (PU), Indian Pines (IP), Salinas datasets (SA), and Hangzhou (HZ), with the best accuracy highlighted in bold. The SAMLFE classification method proposed in this paper demonstrates significant advantages compared to other methods. The SAMLFE model achieves the highest overall classification accuracy (OA) on the PU, IP, SA, and HZ datasets, with accuracies of 84.27%, 65.45%, 90.61%, and 80.73%, respectively. Particularly in the PU dataset, the OA of the SAMLFE model has increased by 2.07%, 2.92%, 1.63%, 2.78%, 5.19%, 19.17%, 21.54%, and 33.76% compared to DSFormer, GSCViT, Gia-CFSL, DCFSL, SSRN, 3D-CNN, SVM, and XGBoost. These results clearly demonstrate that the SAMLFE model can effectively handle cross-domain HSI classification tasks and maintain high classification accuracy even when significant domain differences exist. In contrast, the other methods perform poorly on cross-domain data, particularly when there are large structural and resolution differences between the source and target domains.
Moreover, the time consumption results reveal clear hierarchical differences in computational complexity among the methods. Traditional machine learning methods, such as SVM and XGBoost, exhibit very low computational overhead due to their simple architectures and the absence of deep feature extraction. As the model depth increases, deep learning methods such as 3D-CNN and SSRN require more convolutional operations for extracting spatial–spectral features, thereby increasing computational time. Meanwhile, although GSCViT and DSFormer possess strong feature extraction capabilities, their architectures are optimized for computational efficiency, reducing convolutional operations and resulting in lower training costs compared to 3D-CNN and SSRN. In contrast, DCFSL and Gia-CFSL incur higher computational overhead due to complex processes, including small-sample learning strategies and feature alignment. SAMLFE falls into the category of models with high computational load, primarily due to the small-sample learning strategy and the ISAM module. While these strategies enhance cross-domain feature extraction and discriminative capabilities, they inevitably increase computational costs. Nevertheless, based on cross-domain HSI classification results, SAMLFE significantly outperforms existing methods across multiple metrics and maintains superior classification performance even when the source and target domains differ substantially.
The model size indicates that SAMLFE is only slightly larger than the 3D-CNN model, suggesting that its space complexity remains low. This implies that the method maintains a small number of parameters and low storage requirements while sustaining high model performance. Although training time is relatively long, the model’s compact architecture and limited parameter count provide strong deployment advantages, enabling adaptation to multiple scenarios and platforms, thereby demonstrating the method’s practicality and scalability.
As shown in Table 6, SAMLFE achieves the optimal classification results for three types of land objects: Asphalt, Trees, and Bare soil, which fully demonstrates the effectiveness of multi-directional feature extraction enabled by the introduction of asymmetric residual blocks. Specifically, Asphalt has flat and regular surface structures, which enhance horizontal feature extraction in hyperspectral images, which helps to more accurately capture their texture and geometric features, improving classification accuracy. Trees have significant vertical structures. By strengthening vertical feature modeling, the system more accurately captures their spatial characteristics, further improving classification performance. Figure 13 presents the classification results for the PU dataset, which indicate that SAMLFE closely matches the ground truth in spatial distribution and geographic boundary classification, reflecting high accuracy and strong generalization.
Table 7 shows the classification results for the IP dataset. Among these categories, category 15, “Buildings-Grass-Trees-Drives,” shows significant improvement in classification accuracy. This category includes rich texture features and spatial structures in multiple directions. The local–global feature enhancement model effectively combines local details with global features, capturing long-range spatial dependencies and further improving classification accuracy. Furthermore, the average accuracy (AA) of the SAMLFE method is lower than that of DSFormer. This is because DSFormer achieves higher classification accuracy on categories with fewer samples in the 4th, 9th, 13th, and 16th categories, thus making the overall AA index higher. In order to fully consider the class imbalance problem, the F1 score is introduced to comprehensively assess the precision and recall of each category. It is not affected by variations in sample sizes and thus provides a fairer evaluation of model performance across all categories. Based on the F1 score, considering the sample sizes of all categories, the proposed method still achieves the best overall performance.
Figure 14 shows the classification results for the IP dataset. As seen in the figure, compared to XGBoost, SVM, 3DCNN, SSRN, DCFSL, Gia-CFSL, GSCViT, DSFormer, and other methods, SAMLFE has the smallest misclassification area and most closely matches the ground truth, which suggests that SAMLFE captures subtle differences between geographic categories more accurately, particularly in complex or mixed areas and further confirms the effectiveness of the proposed method in improving classification accuracy and reducing errors.
Table 8 shows the classification results for SA dataset, with the visual classification maps of various methods shown in Figure 15. The OA, AA, Kappa, and F1 by SAMLFE reached 90.61%, 94.53%, 89.56%, and 93.97%, respectively, all of which outperformed the comparison methods. Specifically, compared to the SOTA methods DSFormer and GSCViT, SAMLFE increased by 1.07% and 1.71%, respectively. Furthermore, as shown in Figure 15, SAMLFE demonstrates greater accuracy in distinguishing boundary regions between classes. This effectively reduces the classification confusion commonly observed in traditional methods when handling fuzzy boundaries. This advantage primarily stems from two strategies: a local-to-global feature extraction approach and an improved sharpness perception minimization technique. The former employs asymmetric residual blocks to extract edge details and integrates an improved self-attention mechanism to achieve global feature alignment. The latter suppresses the impact of spectral shifts on generalization by introducing a nonlinear gradient perturbation mechanism. This result demonstrates that the method maintains excellent cross-domain generalization capabilities, even under significant differences between the source and target domains.
Figure 15 shows the pseudo-color classification results of various methods on the SA dataset. As shown in the figure, category 8, “Grapes_untrained,” exhibits a high rate of misclassification across all methods, indicating the difficulty of classifying this type of land cover. Nevertheless, the proposed method achieves the best classification performance for this category. This type of land cover is typically interspersed with surrounding soil, weeds, and shrubs, leading to significant spectral overlap with other categories, along with serious boundary blurring and the presence of mixed pixels. The local–global feature enhancement model proposed in this paper captures long-range dependencies between cells while finely extracting local detailed features, thereby effectively mitigating the uncertainty caused by mixed categories and enhancing the discrimination of complex land covers.
As shown in Table 9, on HZ dataset, the proposed method achieves a significant improvement in classification accuracy for the Land/Building class, reaching 84.40%. Furthermore, compared with other methods, the OA shows improvements of 4.14%, 5.17%, 2.64%, 2.87%, 4.81%, 7.75%, 13.03%, and 16.66%, respectively.
Figure 16 shows that the classification results of the proposed method most closely match the distribution of real ground objects, with the fewest misclassified areas. This indicates that the Local-to-Global Feature Extraction Model extracts more discriminative features, enabling accurate differentiation of various types of ground objects. Moreover, the introduction of the ISAM further enhances the model’s generalization ability, ensuring stable performance in complex scenarios.

4.4. Ablation Study of SAMLFE Model

In this subsection, the contribution of each module to classification performance is evaluated through ablation experiments. The main contributions of the proposed SAMLFE lie in the ISAM, self-attention, and asymmetric residual block methods. To better analyze the role of these three methods, this paper evaluates their contributions through ablation experiments. As shown in Table 10, this paper focuses on investigating six situations: (1) only performing DCFSL on the source and target domains; (2) adding improved self-attention on the basis of (1); (3) adding ISAM and asymmetric residual block on the basis of (1); (4) adding improved self-attention and asymmetric residual block on the basis of (1); (5) adding ISAM and improved self-attention on the basis of (1); (6) adding ISAM, improved self-attention, and asymmetric residual block on the basis of (1).
The model without any additional modules (e.g., ID1) serves as the baseline reference. Introducing only ISAM (e.g., ID2) significantly improves model performance, indicating that ISAM effectively reduces spectral offsets between the source and target domains, thereby enhancing classification accuracy. Building on this, adding the Asymmetric Residual Block (e.g., ID3) further improves overall accuracy, suggesting that it complements ISAM in feature extraction: ISAM primarily addresses inter-domain offsets, while the Asymmetric Residual Block focuses on capturing local detailed features. The combination of the two modules enables both local detail extraction and enhanced domain generalization. Subsequently, combining improved Self-Attention and the Asymmetric Residual Block (e.g., ID4) further improves performance. Improved Self-Attention captures global contextual information and facilitates extraction of more discriminative features, while the Asymmetric Residual Block enhances local features. By integrating local and global features, the model extracts more refined representations. These two components exhibit complementary roles in feature representation. When ISAM and improved Self-Attention are applied together (e.g., ID5), performance shows little improvement, indicating that without the Asymmetric Residual Block, relying solely on global features is insufficient to further enhance performance. Finally, introducing all three modules simultaneously (e.g., ID6) achieves the best overall results, demonstrating that ISAM, Self-Attention, and the Asymmetric Residual Block produce complementary and synergistic effects in cross-domain generalization, global context capture, and local feature enhancement. Although ID2 shows the best results on the SA dataset, this is acceptable considering the improvements observed on other datasets.
When only the ISAM module is retained, the model still achieves significant performance improvements on the target domain, indicating that ISAM plays a critical role in mitigating spectral offsets between the source and target domains. With the gradual introduction of the improved Self-Attention mechanism and asymmetric residual blocks, model performance steadily improves. Compared with the baseline model DCFSL, the complete SAMLFE structure achieves better classification results in the target domain, fully demonstrating the rationality and effectiveness of the proposed model design.
As shown in Figure 17, the ISAM module yields the most significant performance improvement, indicating that it plays a critical role in enhancing the adaptability of cross-domain HSI classification tasks. Although the improvements from other modules are less pronounced than those of ISAM, they still provide notable performance enhancements compared to the baseline model. This demonstrates that these modules are also effective in cross-domain classification tasks, particularly in improving target domain classification accuracy, reducing distribution differences between the source and target domains, and enhancing the model’s generalization ability and robustness.

5. Discussion

5.1. Effectiveness of the Number of Hyperparameters on the Model

To evaluate the parameter sensitivity of SAMLFE across three target domain datasets, a parameter sensitivity analysis was conducted. The hyperparameter controlling the perturbation correction process in ISAM was denoted as λ 1 , while the minimum learning rate in the preheated cosine annealing strategy was denoted as λ 2 . The candidate values were selected from λ 1 { 1 × 10 9 ,   1 × 10 8 ,   1.1 × 10 8 ,   1.2 × 10 8 } and λ 2 { 1 × 10 7 ,   1 × 10 6 ,   1 × 10 5 ,   1.1 × 10 5 } for combination experiments, respectively. Figure 18 illustrates the trends of performance metrics, including OA, AA, and Kappa, as the combinations of λ 1 and λ 2 change on the PU dataset. In Figure 18, the horizontal and vertical axes represent two parameters, λ 1 and λ 2 , respectively. The vertical axis corresponds to the overall accuracy (OA), average accuracy (AA), and kappa coefficient of the model. The surfaces of different colors illustrate how the model’s OA, AA, and kappa coefficients vary with different parameter combinations. High points on the surface indicate that the model achieves higher OA, AA, and kappa values under those parameter combinations, while troughs indicate lower performance in these metrics. To reduce the computational burden caused by hyperparameter optimization on the same dataset, the parameter combination achieving better performance was selected as the experimental setting. For the PU, IP, SA, and HZ datasets, the final selected parameters were λ 1 = 1.1 × 10 8 and λ 2 = 1 × 10 5 , respectively.

5.2. Analysis of the Impact of Sample Size on SAMLFE Classification Accuracy

To assess the impact of varying sample sizes on SAMLFE performance, this study sets the number of samples per class to 1, 2, 3, 4, and 5, respectively. The 1–5 sample setting covers various small-sample learning scenarios and enables systematic evaluation of the model’s cross-domain learning ability under different data availability conditions. As the sample size increases (e.g., 3–5), target domain samples offer richer intra- and inter-class feature information, enabling the model to better adjust feature distributions and achieve effective cross-domain learning. Therefore, the 1–5 sample setting not only preserves the characteristics of small-sample learning but also validates the model’s generalization ability across domains. Most relevant studies (e.g., Li [20] and Zhang [22]) adopt similar settings to ensure fair comparisons among algorithms and reproducibility of experimental results. Furthermore, as shown in Table 11 and Figure 19, when the sample size is 5, OA, AA, Kappa, and F1 reach their highest values. Therefore, this study ultimately selects 5 samples per class for the experiments.

5.3. Analyzing the Impact of Batch Size on the SAMLFE Framework

In cross-domain hyperspectral image classification, batch size determines the number of samples used by the model for gradient estimation in a single iteration, thereby influencing its generalization performance and feature learning capability. Smaller batch sizes (e.g., 32 or 64) involve fewer samples in gradient computation, causing model convergence to fluctuate and making stable convergence difficult. Moreover, small batches cannot fully capture the diverse spectral–spatial features of the source and target domains, potentially weakening the model’s cross-domain feature learning capability. Conversely, a larger batch size (e.g., 150) introduces more samples per iteration, facilitating smoother gradient estimates, stabilizing model convergence, and promoting the learning of more robust cross-domain features. However, an excessively large batch size exposes the model to many samples from different domains simultaneously, averaging gradient directions, reducing the diversity of gradient updates, and masking subtle inter-domain feature distribution differences. This over-smoothed gradient update diminishes the model’s sensitivity to cross-domain feature variations, ultimately reducing its cross-domain learning ability in the target domain. Based on the above analysis, sensitivity experiments were conducted under four batch size settings: 32, 64, 128, and 150. As shown in Table 12 and Figure 20, increasing batch size gradually improves multiple metrics, including OA, AA, Kappa coefficient, and F1 score. When the batch size is 128, the model achieves optimal performance; further increasing it to 150 slightly reduces performance. Therefore, 128 is selected as the optimal batch size for subsequent experiments.

5.4. Analyzing the Impact of Parameter Gamma on the Improved Self-Attention

To verify the effectiveness of parameter gamma in the Improved Self-attention mechanism, an ablation study was conducted, with results shown in Table 13. It can be observed that the introduction of parameter gamma enables the model to adaptively adjust the weighting ratio between the input and the output of the self-attention mechanism during training. This capability enhances the model’s adaptability to cross-domain classification tasks, leading to significant improvements in OA, AA, and Kappa coefficients.

5.5. Feature Visualization of the Target Domain

This section employs t-SNE to visualize the two-dimensional feature projections of the original HSI data, Gia-CFSL, and SAMLFE across the PU, IP, and SA datasets. The visualization results are presented in Figure 21, Figure 22 and Figure 23. In t-SNE visualizations, each color represents a feature category, illustrating the model’s ability to distinguish different categories in a low-dimensional feature space. The clustering patterns shown by t-SNE directly reflect the model’s hyperspectral feature extraction ability, where tight intra-class clustering indicates that the model learns stable and discriminative features. Clear inter-class separation demonstrates that the model effectively captures subtle spectral–spatial differences between similar categories, thereby distinguishing these differences in the low-dimensional feature space. Compared to the original HSI data and the Gia-CFSL method, the proposed SAMLFE demonstrates reduced inter-class feature confusion and clearer classification boundaries across all three datasets, highlighting its ability to learn more discriminative feature representations. Specifically, Figure 21a, Figure 22a and Figure 23a illustrate that in the original HSI data, various land object categories exhibit wide coverage and significant overlap, complicating effective classification. For instance, in the IP dataset, the cross-domain method Gia-CFSL also exhibits notable category confusion in Figure 22b, particularly between “Corn-notill” and “Corn-mintill,” due to their similar surface coverage types. In contrast, Figure 22c shows that SAMLFE significantly reduces inter-class overlap, presents clearer feature distributions, and achieves better separability. Similarly, in the SA dataset, Figure 23b,c reveal that SAMLFE achieves more compact clustering for easily confused categories such as “Grapes_untrained” and “Vineyard_untrained” in Class 8, with minimal confusion, further validating its capability to distinguish complex classes. Overall, the proposed SAMLFE method significantly enhances category separability in the feature space, effectively improving the model’s discriminative ability and classification performance.

6. Conclusions

This paper proposes a cross-domain hyperspectral image classification method that integrates sharpness-aware minimization with local-to-global feature enhancement, establishing a novel paradigm for large-scene satellite image classification supported by UAV hyperspectral data. The local-to-global feature extraction model simultaneously captures fine-grained local details and long-range dependencies, enabling effective extraction of shared semantic features across domains. When combined with the improved sharpness-aware minimization strategy, the model achieves enhanced cross-domain generalization and more precise feature alignment. Experiments on the PU, IP, SA, and HZ datasets demonstrate that the proposed method outperforms mainstream approaches in both classification accuracy and cross-domain adaptability. Notably, the method maintains strong robustness and generalization capability even when the source and target domains exhibit significant discrepancies. Compared with the SOTA DSFormer method, the overall accuracy (OA) on the four datasets improves by 2.07%, 1%, 1.07, and 4.14%, respectively. These results confirm the method’s effectiveness in extracting semantic features across domains, enhancing feature alignment, and improving classification performance. Future work will focus on exploring more efficient feature alignment strategies and extending the method’s applicability to broader and more diverse remote sensing scenarios.

Author Contributions

Conceptualization, C.L., A.W. and H.W.; methodology, software, validation, C.L.; writing—review and editing, A.W., H.W. and C.L.; visualization, C.L., A.W., M.W. and S.Y.; supervision, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Research and Development Plan Project of Heilongjiang (JD2023SJ19), the National Key Support Project for Foreign Experts of Northeast Special Project (D20250098) and the Program for Young Talents of Basic Research in Universities of Heilongjiang Province (YQJH2024077) and the Postdoctoral Fellowship Program of China Postdoctoral Science Foundation (GZC20252304).

Data Availability Statement

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Q.; Huang, J.; Wang, S.; Zhang, Z.; Shen, T.; Gu, Y. Community Structure Guided Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4404115. [Google Scholar] [CrossRef]
  2. Tu, B.; Ren, Q.; Li, Q.; He, W.; He, W. Hyperspectral Image Classification Using a Superpixel–Pixel–Subpixel Multilevel Network. IEEE Trans. Geosci. Remote Sens. 2023, 72, 5013616. [Google Scholar] [CrossRef]
  3. Weber, C.; Aguejdad, R.; Briottet, X.; Avala, J.; Fabre, S.; Demuynck, J.; Zenou, E.; Deville, Y.; Karoui, M.S.; Benhalouche, F.Z.; et al. Hyperspectral Imagery for Environmental Urban Planning. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1628–1631. [Google Scholar]
  4. Yang, X.; Yu, Y. Estimating Soil Salinity Under Various Moisture Conditions: An Experimental Study. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2525–2533. [Google Scholar] [CrossRef]
  5. Liang, L.; Di, L.; Zhang, L.; Deng, M.; Qin, Z.; Zhao, S.; Lin, H. Estimation of crop LAI using hyperspectral vegetation indices and a hybrid inversion method. Remote Sens. Environ. 2015, 165, 123–134. [Google Scholar] [CrossRef]
  6. Hao, Q.; Pei, Y.; Zhou, R.; Sun, B.; Sun, J.; Li, S.; Kang, X. Fusing Multiple Deep Models for in Vivo Human Brain Hyperspectral Image Classification to Identify Glioblastoma Tumor. IEEE Trans. Instrum. Meas. 2021, 70, 4007314. [Google Scholar] [CrossRef]
  7. Kanthi, M.; Sarma, T.H.; Bindu, C.S. A 3D-Deep CNN Based Feature Extraction and Hyperspectral Image Classification. In Proceedings of the 2020 IEEE India Geoscience and Remote Sensing Symposium, Ahmedabad, India, 1–4 December 2020; pp. 229–232. [Google Scholar]
  8. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  9. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  10. Wei, W.; Tong, L.; Guo, B.; Zhou, J.; Xiao, C. Few-Shot Hyperspectral Image Classification Using Relational Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5539016. [Google Scholar] [CrossRef]
  11. Yu, C.; Gong, B.; Song, M.; Zhao, E.; Chang, C.-I. Multiview Calibrated Prototype Learning for Few-Shot Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5544713. [Google Scholar] [CrossRef]
  12. Tang, H.; Zhang, C.; Tang, D.; Lin, X.; Yang, X.; Xie, W. Few-Shot Hyperspectral Image Classification with Deep Fuzzy Metric Learning. IEEE Geosci. Remote Sens. Lett. 2025, 22, 5502205. [Google Scholar] [CrossRef]
  13. Liu, S.; Fu, C.; Duan, Y.; Wang, X.; Luo, F. Spatial–Spectral Enhancement and Fusion Network for Hyperspectral Image Classification with Few Labeled Samples. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5502414. [Google Scholar] [CrossRef]
  14. Mu, C.; Liu, Y.; Yan, X.; Ali, A.; Liu, Y. Few-Shot Open-Set Hyperspectral Image Classification with Adaptive Threshold Using Self-Supervised Multitask Learning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5526618. [Google Scholar] [CrossRef]
  15. Zhao, C.; Qin, B.; Feng, S.; Zhu, W.; Zhang, L.; Ren, J. An Unsupervised Domain Adaptation Method Towards Multi-Level Features and Decision Boundaries for Cross-Scene Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5546216. [Google Scholar] [CrossRef]
  16. Matasci, G.; Volpi, M.; Kanevski, M.; Tuia, D. Semisupervised Transfer Component Analysis for Domain Adaptation in Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3550–3564. [Google Scholar] [CrossRef]
  17. Zhou, X.; Prasad, S. Deep Feature Alignment Neural Networks for Domain Adaptation of Hyperspectral Data. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5863–5872. [Google Scholar] [CrossRef]
  18. Deng, B.; Jia, S.; Shi, D. Deep Metric Learning-Based Feature Embedding for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1422–1435. [Google Scholar] [CrossRef]
  19. Wang, Y.; Liu, G.; Yang, L.; Liu, J.; Wei, L. An Attention-Based Feature Processing Method for Cross-Domain Hyperspectral Image Classification. IEEE Signal Process. Lett. 2025, 32, 196–200. [Google Scholar] [CrossRef]
  20. Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5501618. [Google Scholar] [CrossRef]
  21. Xi, B.; Li, J.; Li, Y.; Song, R.; Hong, D.; Chanussot, J. Few-Shot Learning with Class-Covariance Metric for Hyperspectral Image Classification. IEEE Trans. Image Process. 2022, 31, 5079–5092. [Google Scholar] [CrossRef]
  22. Zhang, Y.; Li, W.; Zhang, M.; Wang, S.; Tao, R.; Du, Q. Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 1912–1925. [Google Scholar] [CrossRef] [PubMed]
  23. Zhou, L.; Ma, L. Extreme Learning Machine-Based Heterogeneous Domain Adaptation for Classification of Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1781–1785. [Google Scholar] [CrossRef]
  24. Dang, Y.; Li, H.; Liu, B.; Zhang, X. Cross-Domain Few-Shot Learning for Hyperspectral Image Classification Based on Global-to-Local Enhanced Channel Attention. IEEE Geosci. Remote Sens. Lett. 2025, 22, 5501905. [Google Scholar] [CrossRef]
  25. Feng, S.; Zhang, H.; Xi, B.; Zhao, C.; Li, Y.; Chanussot, J. Cross-Domain Few-Shot Learning Based on Decoupled Knowledge Distillation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5534414. [Google Scholar] [CrossRef]
  26. Jiang, Z.; Li, Z.; Wang, Y.; Li, W.; Wang, K.; Tian, J.; Wang, C.; Du, Q. Lifelong Learning with Adaptive Knowledge Fusion and Class Margin Dynamic Adjustment for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5505619. [Google Scholar] [CrossRef]
  27. Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-Borne Hyperspectral with High Spatial Resolution (H2) Benchmark Datasets and Classifier for Precise Crop Identification Based on Deep Convolutional Neural Network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
  28. Wei, L.; Yu, M.; Zhong, Y.; Zhao, J.; Liang, Y.; Hu, X. Spatial-Spectral Fusion Based on Conditional Random Fields for the Fine Classification of Crops in UAV-Borne Hyperspectral Remote Sensing Imagery. Remote Sens. 2019, 11, 780. [Google Scholar] [CrossRef]
  29. Zhong, Y.; Xu, Y.; Wang, X.; Jia, T.; Xia, G.; Ma, A.; Zhang, L. Pipeline Leakage Detection for District Heating Systems Using Multisource Data in Mid and High-Latitude Regions. ISPRS J. Photogramm. Remote Sens. 2019, 151, 207–222. [Google Scholar] [CrossRef]
  30. Han, Z.; Yang, J.; Gao, L.; Zeng, Z.; Zhang, B.; Chanussot, J. Subpixel Spectral Variability Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5504014. [Google Scholar] [CrossRef]
  31. Han, Z.; Zhang, C.; Gao, L.; Zeng, Z.; Ng, M.K.; Zhang, B.; Chanussot, J. Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5535815. [Google Scholar] [CrossRef]
  32. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sensors. 2015, 2015, 258619. [Google Scholar] [CrossRef]
  33. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef]
  34. Li, Y.; Zhang, H.; Shen, Q. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
  35. Liu, X.; Liu, S.; Chen, W.; Qu, S. HDECGCN: A Heterogeneous Dual Enhanced Network Based on Hybrid CNNs Joint Multiscale Dynamic GCNs for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5515717. [Google Scholar] [CrossRef]
  36. Ahmad, M.; Ghous, U.; Usama, M.; Mazzara, M. WaveFormer: Spectral–Spatial Wavelet Transformer for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5502405. [Google Scholar] [CrossRef]
  37. Zhang, S.; Chen, Z.; Wang, D.; Wang, Z.J. Cross-Domain Few-Shot Contrastive Learning for Hyperspectral Images Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5514505. [Google Scholar] [CrossRef]
  38. Ye, Z.; Wang, J.; Sun, T.; Zhang, J.; Li, W. Cross-Domain Few-Shot Learning Based on Graph Convolution Contrast for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5504614. [Google Scholar] [CrossRef]
  39. Miftahushudur, T.; Grieve, B.; Yin, H. Permuted KPCA and SMOTE to Guide GAN-Based Oversampling for Imbalanced HSI Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 489–505. [Google Scholar] [CrossRef]
  40. Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. Neighboring Region Dropout for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1032–1036. [Google Scholar] [CrossRef]
  41. Wang, W.; Wang, X.; Liu, Y.; Yang, J. Rethinking Maximum Mean Discrepancy for Visual Domain Adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 264–277. [Google Scholar] [CrossRef]
  42. Li, Y.; Hu, H.; Wang, D. Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3412–3416. [Google Scholar]
  43. Mei, S.; Ji, J.; Hou, J.; Li, X.; Du, Q. Learning Sensor-Specific Spatial-Spectral Features of Hyperspectral Images via Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4520–4533. [Google Scholar] [CrossRef]
  44. Yang, J.; Zhao, Y.; Chan, J.C. Learning and Transferring Deep Joint Spectral–Spatial Features for Hyperspectral Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4729–4742. [Google Scholar] [CrossRef]
  45. Othman, E.; Bazi, Y.; Melgani, F.; Alhichri, H.; Alajlan, N.; Zuair, M. Domain Adaptation Network for Cross-Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4441–4456. [Google Scholar] [CrossRef]
  46. Wang, Z.; Du, B.; Shi, Q.; Tu, W. Domain Adaptation with Discriminative Distribution and Manifold Embedding for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1155–1159. [Google Scholar] [CrossRef]
  47. Wang, M.; Chen, J.; Wang, Y.; Wang, S.; Li, L.; Su, H.; Gong, Z. Joint Adversarial Domain Adaptation With Structural Graph Alignment. IEEE Trans. Netw. Sci. Eng. 2024, 11, 604–612. [Google Scholar] [CrossRef]
  48. Liu, L.; Zhang, Y.; Tang, J.; Chen, Q. Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization. IEEE Trans. Multimed. 2025, 27, 1100–1113. [Google Scholar] [CrossRef]
  49. He, X.; Chen, Y.; Ghamisi, P. Heterogeneous Transfer Learning for Hyperspectral Image Classification Based on Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3246–3263. [Google Scholar] [CrossRef]
  50. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  51. Zhong, S.; Chang, C.-I.; Zhang, Y. Iterative Support Vector Machine for Hyperspectral Image Classification. In Proceedings of the 2018 25th IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 3309–3312. [Google Scholar]
  52. Zhao, Z.; Xu, X.; Li, S.; Plaza, A. Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5511817. [Google Scholar] [CrossRef]
  53. Xu, Y.; Wang, D.; Zhang, L.; Zhang, L. Dual Selective Fusion Transformer Network for Hyperspectral Image Classification. Neural Netw. 2025, 187, 107311. [Google Scholar] [CrossRef]
Figure 1. Framework of the proposed SAMLFE for HSI classification.
Figure 1. Framework of the proposed SAMLFE for HSI classification.
Remotesensing 18 00740 g001
Figure 2. The structure of spectral dimension mapping model.
Figure 2. The structure of spectral dimension mapping model.
Remotesensing 18 00740 g002
Figure 3. The structure of local-to-global feature extraction model.
Figure 3. The structure of local-to-global feature extraction model.
Remotesensing 18 00740 g003
Figure 4. The feature space of improved self-attention mechanism.
Figure 4. The feature space of improved self-attention mechanism.
Remotesensing 18 00740 g004
Figure 5. The few-shot learning between source and target domain.
Figure 5. The few-shot learning between source and target domain.
Remotesensing 18 00740 g005
Figure 6. The structure of conditional domain discriminator model.
Figure 6. The structure of conditional domain discriminator model.
Remotesensing 18 00740 g006
Figure 7. The schematic diagram of ISAM.
Figure 7. The schematic diagram of ISAM.
Remotesensing 18 00740 g007
Figure 8. WHHC dataset. (a) False-color image; (b) Ground-truth map.
Figure 8. WHHC dataset. (a) False-color image; (b) Ground-truth map.
Remotesensing 18 00740 g008
Figure 9. PU dataset. (a) False-color image; (b) Ground-truth map.
Figure 9. PU dataset. (a) False-color image; (b) Ground-truth map.
Remotesensing 18 00740 g009
Figure 10. SA dataset. (a) False-color image. (b) Ground-truth map.
Figure 10. SA dataset. (a) False-color image. (b) Ground-truth map.
Remotesensing 18 00740 g010
Figure 11. HZ dataset. (a) False-color image. (b) Ground-truth map.
Figure 11. HZ dataset. (a) False-color image. (b) Ground-truth map.
Remotesensing 18 00740 g011
Figure 12. IP dataset. (a) False-color image; (b) Ground-truth map.
Figure 12. IP dataset. (a) False-color image; (b) Ground-truth map.
Remotesensing 18 00740 g012
Figure 13. Classification maps of PU dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Figure 13. Classification maps of PU dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Remotesensing 18 00740 g013
Figure 14. Classification maps of IP dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Figure 14. Classification maps of IP dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Remotesensing 18 00740 g014
Figure 15. Classification maps of SA dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Figure 15. Classification maps of SA dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Remotesensing 18 00740 g015
Figure 16. Classification maps of HZ dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Figure 16. Classification maps of HZ dataset by different methods. (a) Ground truth; (b) XGBoost; (c) SVM; (d) 3DCNN; (e) SSRN; (f) DCFSL; (g) Gia-CFSL; (h) GSCViT; (i) DSFormer; (j) SAMLFE.
Remotesensing 18 00740 g016
Figure 17. The performance comparison of the SAMLFE model across different ablation settings. (a) PU; (b) IP; (c) SA.
Figure 17. The performance comparison of the SAMLFE model across different ablation settings. (a) PU; (b) IP; (c) SA.
Remotesensing 18 00740 g017
Figure 18. The effect of hyperparameters on the SAMLFE model. (a) OA. (b) AA. (c) Kappa.
Figure 18. The effect of hyperparameters on the SAMLFE model. (a) OA. (b) AA. (c) Kappa.
Remotesensing 18 00740 g018
Figure 19. The performance comparison of the SAMLFE model across different sample sizes. (a) PU; (b) IP; (c) SA.
Figure 19. The performance comparison of the SAMLFE model across different sample sizes. (a) PU; (b) IP; (c) SA.
Remotesensing 18 00740 g019
Figure 20. The performance comparison of the SAMLFE model across different batch size. (a) PU; (b) IP; (c) SA.
Figure 20. The performance comparison of the SAMLFE model across different batch size. (a) PU; (b) IP; (c) SA.
Remotesensing 18 00740 g020
Figure 21. Two-dimensional feature visualization on the PU dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Figure 21. Two-dimensional feature visualization on the PU dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Remotesensing 18 00740 g021
Figure 22. Two-dimensional feature visualization on the IP dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Figure 22. Two-dimensional feature visualization on the IP dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Remotesensing 18 00740 g022
Figure 23. Two-dimensional feature visualization on the SA dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Figure 23. Two-dimensional feature visualization on the SA dataset. (a) Original samples; (b) Features by Gia-CFSL; (c) Features by SAMLFE.
Remotesensing 18 00740 g023
Table 1. The Number of Samples of the WHHC dataset.
Table 1. The Number of Samples of the WHHC dataset.
ClassNamePixelsClassNamePixels
1Strawberry44,7359Grass9469
2Cowpea22,75310Red roof10,516
3Soybean10,28711Gray roof16,911
4Sorghum535312Plastic3679
5Water spinach120013Bare soil9116
6Watermelon453314Road18,560
7Greens590315Bright object1136
8Trees17,97816Water75,401
Table 2. The Number of Samples of the PU dataset.
Table 2. The Number of Samples of the PU dataset.
ClassNamePixels
1Asphalt6631
2Meadows18,649
3Gravel2099
4Trees3064
5Sheets1345
6Bare soil5029
7Bitumen1330
8Bricks3682
9Shadow947
Table 3. The Number of Samples of the SA dataset.
Table 3. The Number of Samples of the SA dataset.
ClassNamePixelsClassNamePixels
1Brocoli_green_weeds_120099Soil_vinyard_develop6203
2Brocoli_green_weeds_2372610Corn_senesced_green_weeds3278
3Fallow197611Lettuce_romaine_4wk1068
4Fallow_rough_plow139412Lettuce_romaine_5wk1927
5Fallow_smooth267813Lettuce_romaine_6wk 916
6Stubble395914Lettuce_romaine_7wk1070
7Celery357915Vinyard_untrained7268
8Grapes_untrained11,27116Vinyard_vertical_trellis1807
Table 4. The Number of Samples of the HZ dataset.
Table 4. The Number of Samples of the HZ dataset.
ClassNamePixels
1Water18,043
2Land/building77,450
3Plants40,207
Table 5. The Number of Samples of the IP dataset.
Table 5. The Number of Samples of the IP dataset.
ClassNamePixelsClassNamePixels
1Alfalfa469Oats20
2Corn-notill142810Sovbean-notill972
3Corn-mintill83011Soybean-mintill2455
4Corn23712Soybean-cleam593
5Grass-pasture48313Wheat205
6Grass-tree73014Woods1265
7Grass-pasture-mowed2815Buildings-Grass-Trees-Drives386
8Hay-windrowed47816Stone-Steel-owers93
Table 6. Classification results for the PU dataset.
Table 6. Classification results for the PU dataset.
ClassXGBoostSVM3DCNNSSRNDCFSLGia-CFSLGSCViTDSFormerSAMLFE
147.89 ± 20.4066.69 ± 4.5056.87 ± 3.2347.65 ± 8.3878.44 ± 6.9279.98 ± 2.7977.63 ± 13.1968.04 ± 2.4080.75 ± 4.06
247.52 ± 0.4356.21 ± 13.6270.50 ± 12.3985.85 ± 8.0684.24 ± 10.9389.41 ± 6.0179.76 ± 7.2587.37 ± 13.3488.29 ± 4.98
334.13 ± 18.9156.96 ± 8.6762.37 ± 5.8866.33 ± 18.2967.24 ± 12.2267.83 ± 7.9357.45 ± 23.9689.26 ± 5.3270.83 ± 11.23
463.11 ± 18.7873.44 ± 22.7271.76 ± 7.9783.93 ± 4.6993.26 ± 3.9492.64 ± 2.7890.11 ± 7.3187.09 ± 6.0893.83 ± 3.03
596.77 ± 1.4196.84 ± 1.5896.57 ± 4.1799.81 ± 0.1999.09 ± 1.0696.24 ± 6.5699.97 ± 0.05100.00 ± 0.0099.05 ± 0.85
639.98 ± 1.4952.38 ± 25.5449.77 ± 19.4277.63 ± 6.9176.24 ± 7.9065.38 ± 13.4877.77 ± 5.5566.81 ± 9.8779.28 ± 6.74
766.84 ± 18.3977.46 ± 12.1481.03 ± 6.4199.70 ± 0.3078.88 ± 10.5881.18 ± 3.3394.78 ± 5.6899.12 ± 0.5182.72 ± 10.73
850.92 ± 8.2069.93 ± 2.8845.77 ± 16.0186.32 ± 2.8369.83 ± 14.1869.23 ± 13.3791.09 ± 4.0278.90 ± 6.2369.60 ± 11.79
989.53 ± 8.0999.86 ± 0.1690.66 ± 7.4799.68 ± 0.3294.08 ± 6.1394.73 ± 8.8499.52 ± 0.5293.84 ± 10.4893.38 ± 7.39
OA(%)50.51 ± 2.2062.73 ± 5.1565.10 ± 4.2879.08 ± 2.8781.49 ± 4.7782.64 ± 2.2881.35 ± 4.0982.20 ± 4.7084.27 ± 1.90
AA(%)59.63 ± 0.1172.20 ± 2.9569.48 ± 1.3182.98 ± 2.0182.37 ± 2.7181.85 ± 1.6085.34 ± 0.3385.60 ± 1.0084.19 ± 1.77
K × 10040.01 ± 2.6653.78 ± 5.1055.83 ± 4.0673.01 ± 3.3976.17 ± 5.4977.25 ± 2.6376.01 ± 4.8276.92 ± 5.0479.51 ± 2.31
F149.59 ± 3.1166.70 ± 0.6661.12 ± 5.0679.78 ± 1.8479.28 ± 2.6979.48 ± 3.5475.67 ± 0.4876.28 ± 4.7281.26 ± 1.87
Model size (MB)0.120.760.290.860.592.580.27
Time(s)0.340.0161.0575.061989.375503.9216.4858.455030.01
Table 7. Classification results for the IP dataset.
Table 7. Classification results for the IP dataset.
ClassXGBoostSVM3DCNNSSRNDCFSLGia-CFSLGSCViTDSFormerSAMLFE
168.29 ± 12.9167.48 ± 17.1363.90 ± 7.4697.56 ± 2.44 94.15 ± 8.8184.88 ± 9.0597.22 ± 3.9399.39 ± 1.2295.37 ± 5.71
224.43 ± 18.8328.48 ± 8.6926.52 ± 9.32 39.72 ± 7.15 40.93 ± 6.6943.18 ± 8.0644.79 ± 23.9445.06 ± 2.2543.13 ± 12.66
325.13 ± 15.0034.34 ± 11.2827.01 ± 8.83 32.61 ± 23.17 45.70 ± 6.3947.64 ± 11.6662.38 ± 1.1251.18 ± 1.7253.53 ± 7.37
429.60 ± 2.8760.20 ± 3.5132.50 ± 12.38 54.20 ± 25.76 72.16 ± 16.8381.12 ± 5.4185.90 ± 11.8491.49 ± 6.4371.38 ± 15.32
548.26 ± 19.9050.28 ± 4.5161.88 ± 13.14 84.73 ± 2.63 71.23 ± 7.8773.26 ± 6.0363.01 ± 1.4971.97 ± 13.1174.54 ± 7.53
663.91 ± 6.2277.61 ± 2.6875.03 ± 13.12 74.24 ± 22.87 83.35 ± 7.0976.86 ± 6.7995.28 ± 0.5992.76 ± 2.9185.83 ± 5.24
768.12 ± 20.5585.51 ± 17.5788.70 ± 14.45 96.74 ± 5.65 98.70 ± 3.9196.52 ± 3.25100.00 ± 0.0094.57 ± 10.8798.70 ± 1.99
836.36 ± 3.9670.54 ± 12.7681.61 ± 7.81 80.92 ± 12.81 81.16 ± 13.8492.35 ± 4.1276.50 ± 33.2486.36 ± 9.2784.55 ± 10.50
962.22 ± 23.4191.11 ± 15.4098.67 ± 2.67 76.67 ± 20.41 99.33 ± 2.0097.33 ± 5.3395.00 ± 7.07100.00 ± 0.0098.67 ± 2.67
1029.85 ± 15.3545.81 ± 3.3840.02 ± 7.05 53.77 ± 18.93 56.29 ± 10.7758.47 ± 6.5959.25 ± 16.0252.43 ± 1.9756.83 ± 12.41
1119.69 ± 3.8435.01 ± 16.0450.52 ± 11.45 45.90 ± 9.25 57.49 ± 12.7856.78 ± 7.1348.16 ± 13.7357.43 ± 8.7263.40 ± 7.00
1224.94 ± 8.3338.72 ± 7.2324.97 ± 6.09 54.00 ± 11.48 43.84 ± 15.1743.47 ± 12.8248.97 ± 19.7739.67 ± 9.6339.44 ± 10.89
1382.33 ± 9.6592.50 ± 9.1097.60 ± 4.55 98.38 ± 1.98 96.35 ± 4.9994.10 ± 3.4499.75 ± 0.3699.63 ± 0.2596.80 ± 2.23
1457.17 ± 6.9863.97 ± 14.0254.40 ± 10.9879.86 ± 8.72 86.21 ± 6.2781.30 ± 6.2688.05 ± 16.0086.19 ± 3.6586.33 ± 7.92
1527.56 ± 7.8337.97 ± 16.1932.02 ± 6.46 61.09 ± 16.02 68.92 ± 8.7152.81 ± 9.2159.84 ± 1.1367.06 ± 8.9972.89 ± 11.42
1689.39 ± 8.8380.30 ± 10.9284.55 ± 7.35 98.58 ± 1.8698.64 ± 1.4295.91 ± 6.5391.57 ± 11.93100.00 ± 0.0097.50 ± 3.40
OA(%)34.70 ± 0.7646.82 ± 6.0747.32 ± 3.9257.46 ± 3.8362.65 ± 2.6062.17 ± 2.4163.23 ± 8.6164.45 ± 2.5565.45 ± 2.93
AA(%)47.33 ± 1.6459.99 ± 3.9358.74 ± 2.6670.56 ± 7.1774.65 ± 1.9373.50 ± 1.5875.98 ± 3.5877.20 ± 1.7076.18 ± 1.88
K × 10028.47 ± 1.2640.99 ± 6.0740.83 ± 3.9952.31 ± 4.6158.08 ± 2.7257.39 ± 2.8058.87 ± 9.5860.07 ± 2.7061.06 ± 3.15
F136.46 ± 1.2548.38 ± 2.0448.04 ± 3.2058.69 ± 2.4861.37 ± 1.7758.96 ± 1.3159.55 ± 4.9058.32 ± 0.8165.52 ± 2.54
Model size (MB)0.431.320.330.892.442.590.33
Time(s)0.690.0123.6135.463211.047602.7423.9961.536266.61
Table 8. Classification results for the SA dataset.
Table 8. Classification results for the SA dataset.
ClassXGBoostSVM3DCNNSSRNDCFSLGia-CFSLGSCViTDSFormerSAMLFE
192.65 ± 1.5795.43 ± 2.8098.25 ± 0.2097.85 ± 4.29 99.61 ± 0.8599.54 ± 0.2499.75 ± 0.47100.00 ± 0.0099.64 ± 0.55
272.92 ± 6.6795.05 ± 2.4298.70 ± 1.25 95.87 ± 8.26 99.01 ± 1.2399.11 ± 0.6999.81 ± 0.1896.41 ± 5.0799.56 ± 0.37
361.15 ± 18.3787.23 ± 12.3695.61 ± 0.43 93.08 ± 13.71 90.27 ± 10.2590.54 ± 7.6083.95 ± 12.8597.64 ± 3.0596.00 ± 3.95
497.43 ± 0.9698.61 ± 0.8397.34 ± 0.94 98.08 ± 1.77 99.40 ± 0.4897.48 ± 2.2999.69 ± 0.3299.89 ± 0.0499.12 ± 0.76
594.71 ± 2.8788.04 ± 7.3293.08 ± 4.08 95.23 ± 2.64 91.50 ± 2.8193.00 ± 2.3992.40 ± 5.9184.36 ± 5.5790.18 ± 6.38
687.73 ± 8.5399.36 ± 0.2299.73 ± 0.27 99.94 ± 0.11 99.39 ± 0.9798.41 ± 1.5799.37 ± 0.61100.00 ± 0.0099.05 ± 1.16
790.68 ± 7.7597.96 ± 2.3596.75 ± 1.82 99.96 ± 0.06 98.27 ± 1.3397.37 ± 1.5799.94 ± 0.0999.94 ± 0.0198.52 ± 1.10
845.81 ± 16.8966.11 ± 5.7271.39 ± 8.25 60.08 ± 25.00 75.88 ± 10.5371.11 ± 11.9868.59 ± 12.5470.00 ± 17.1679.06 ± 5.53
972.95 ± 19.0190.01 ± 8.6893.63 ± 0.34 99.74 ± 0.35 99.32 ± 0.7699.26 ± 0.6298.62 ± 2.2099.98 ± 0.0299.36 ± 0.73
1065.70 ± 8.8782.43 ± 0.7683.13 ± 8.65 94.01 ± 1.36 88.00 ± 4.3584.32 ± 4.7989.01 ± 7.3395.95 ± 0.2888.57 ± 4.07
1156.22 ± 27.1287.80 ± 10.7477.28 ± 15.95 99.34 ± 0.48 98.78 ± 1.1695.71 ± 3.9996.06 ± 4.5589.84 ± 3.1997.22 ± 4.28
1281.70 ± 8.7796.86 ± 0.6298.47 ± 1.53 99.38 ± 0.86 99.14 ± 1.4897.61 ± 2.5198.95 ± 1.1299.24 ± 1.0699.80 ± 0.18
1393.08 ± 4.3297.91 ± 0.1197.64 ± 1.92 99.30 ± 0.89 99.33 ± 0.7898.70 ± 0.6199.39 ± 1.3697.86 ± 3.0298.63 ± 1.82
1482.60 ± 7.2190.67 ± 0.3884.46 ± 6.43 97.78 ± 2.41 97.91 ± 1.5798.70 ± 0.8298.28 ± 2.7696.47 ± 4.4498.04 ± 1.72
1553.69 ± 16.0252.93 ± 5.4055.73 ± 0.40 61.66 ± 27.80 76.02 ± 8.0978.05 ± 8.5182.82 ± 12.1181.50 ± 17.9077.56 ± 5.67
1683.94 ± 4.9771.29 ± 8.1388.43 ± 7.41 91.01 ± 6.16 89.54 ± 6.7795.21 ± 3.1693.70 ± 7.0799.14 ± 0.9792.26 ± 7.13
OA(%)69.40 ± 0.8481.09 ± 0.6184.14 ± 3.1484.82 ± 1.7989.45 ± 1.8688.49 ± 1.7488.90 ± 2.7889.54 ± 1.1490.61 ± 0.93
AA(%)77.06 ± 2.8287.36 ± 0.5189.35 ± 3.3892.64 ± 1.3593.83 ± 0.9393.38 ± 0.9093.77 ± 1.5094.26 ± 0.1394.53 ± 0.82
K × 10066.34 ± 1.0378.99 ± 0.7082.37 ± 3.4683.16 ± 1.9388.28 ± 2.0387.24 ± 1.8887.69 ± 3.0888.41 ± 1.2189.56 ± 1.03
F172.86 ± 1.7684.35 ± 2.0384.77 ± 6.1186.69 ± 5.6193.68 ± 0.3592.41 ± 0.3991.95 ± 1.9593.19 ± 2.8893.97 ± 1.00
Model size (MB)0.441.350.330.890.692.590.33
Time(s)0.550.01117.98150.753217.597705.7826.4362.647129.28
Table 9. Classification results for the HZ dataset.
Table 9. Classification results for the HZ dataset.
ClassXGBoostSVM3DCNNSSRNDCFSLGia-CFSLGSCViTDSFormerSAMLFE
197.25 ± 3.0889.58 ± 9.3386.07 ± 2.1389.93 ± 3.1084.74 ± 2.0183.63 ± 3.6583.17 ± 13.0287.73 ± 4.0583.61 ± 1.94
264.42 ± 11.1662.65 ± 14.4570.14 ± 9.2578.19 ± 10.2978.86 ± 2.5577.79 ± 4.5276.80 ± 20.6073.47 ± 16.8384.40 ± 5.17
348.51 ± 3.2267.61 ± 17.2472.57 ± 9.3865.26 ± 5.3872.84 ± 6.0176.19 ± 6.4169.74 ± 18.4977.61 ± 9.3472.35 ± 7.29
OA(%)64.07 ± 6.9267.70 ± 1.9072.98 ± 3.1275.92 ± 3.8777.86 ± 1.7478.09 ± 2.9575.56 ± 6.0076.59 ± 8.6180.73 ± 1.21
AA(%)70.06 ± 3.7773.28 ± 4.0476.26 ± 1.3177.80 ± 0.6178.81 ± 1.8479.20 ± 2.3976.57 ± 3.2079.60 ± 4.3180.12 ± 0.44
K × 10041.76 ± 8.7547.69 ± 1.0554.22 ± 3.8558.87 ± 4.2061.38 ± 3.1662.50 ± 4.5358.12 ± 6.0360.77 ± 12.1165.78 ± 1.52
F162.46 ± 5.8467.44 ± 2.6975.12 ± 2.8175.05 ± 5.1579.14 ± 1.7278.57 ± 3.9749.20 ± 2.5649.83 ± 5.9681.37 ± 0.77
Model size (MB)0.081.310.320.890.682.580.30
Time(s)0.170.01137.64160.031574.635960.5117.7355.754282.37
Table 10. Sequential ablation study on three datasets.
Table 10. Sequential ablation study on three datasets.
DatasetIDISAMImproved
Self-Attention
Asymmetric
Residual Block
OAAAK × 100F1
PU1×××81.49 ± 4.7782.37 ± 2.7176.17 ± 5.4979.28 ± 2.69
2××82.75 ± 3.5782.25 ± 1.4277.61 ± 4.2680.35 ± 4.08
3×81.53 ± 1.0582.48 ± 0.1576.92 ± 1.1979.29 ± 1.10
4×81.62 ± 3.9183.79 ± 0.9877.41 ± 4.4179.93 ± 1.15
5×81.59 ± 2.0882.98 ± 2.4976.31 ± 2.3879.31 ± 0.71
684.27 ± 1.9084.19 ± 1.7779.51 ± 2.3181.26 ± 1.87
IP1×××62.65 ± 2.6074.65 ± 1.9358.08 ± 2.7261.37 ± 1.77
2××63.85 ± 2.0576.39 ± 2.1759.24 ± 2.2663.11 ± 5.79
3×64.31 ± 3.8777.01 ± 1.6259.87 ± 4.0264.85 ± 1.88
4×64.27 ± 2.8877.00 ± 1.5459.67 ± 3.1664.74 ± 1.87
5×64.15 ± 3.7576.08 ± 2.9559.80 ± 4.0264.12 ± 1.83
665.45 ± 2.9376.18 ± 1.8861.06 ± 3.1565.52 ± 2.54
SA1×××89.45 ± 1.8693.83 ± 0.9388.28 ± 2.0393.68 ± 0.35
2××91.50 ± 0.8695.28 ± 0.6490.54 ± 0.9694.52 ± 1.31
3×90.71 ± 0.8194.24 ± 0.2989.65 ± 0.8993.60 ± 4.81
4×90.63 ± 0.6493.61 ± 1.5189.57 ± 0.7293.19 ± 0.63
5×89.62 ± 2.1494.58 ± 1.3588.46 ± 2.3593.56 ± 0.45
690.61 ± 0.9394.53 ± 0.8289.56 ± 1.0393.97 ± 1.00
Table 11. Impact of sample size on the classification metrics.
Table 11. Impact of sample size on the classification metrics.
Sample
Size
PUIPSA
123451234512345
OA(%)
Std
58.74
± 2.59
62.41
± 11.28
65.60
± 10.20
75.66
± 9.65
84.27
± 1.90
41.97
± 0.84
53.97
± 2.55
56.11
± 2.64
60.21
± 0.74
65.45
± 2.93
76.31
± 0.09
83.46
± 2.19
89.17
± 0.46
88.34
± 0.78
90.61
± 0.93
AA(%)
Std
63.63
± 2.93
70.76
± 2.69
74.22
± 2.25
82.46
± 3.42
84.19
± 1.77
51.40
± 0.67
65.59
± 0.31
70.22
± 0.55
74.12
± 0.41
76.18
± 1.88
79.59
± 3.00
88.91
± 0.73
93.47
± 1.10
93.17
± 0.24
94.53
± 0.82
K × 100
Std
48.29
± 2.21
54.45
± 11.71
58.23
± 10.16
69.79
± 10.83
79.51
± 2.31
35.29
± 1.44
48.48
± 2.71
50.88
± 2.65
55.34
± 0.77
61.06
± 3.15
73.68
± 0.23
81.52
± 2.46
87.99
± 0.49
87.03
± 0.84
89.56
± 1.03
F1
Std
56.61
± 2.63
64.35
± 5.62
67.75
± 4.43
78.17
± 4.60
81.26
± 1.87
38.92
± 1.24
53.42
± 0.55
54.97
± 2.44
60.96
± 1.19
65.52
± 2.54
77.93
± 0.25
87.19
± 0.01
91.38
± 1.30
91.36
± 0.32
93.97
± 1.00
Table 12. Classification Results across Different Datasets and Batch Sizes.
Table 12. Classification Results across Different Datasets and Batch Sizes.
Batch
Size
PUIPSA
326412815032641281503264128150
OA(%)
Std
77.26
± 7.46
79.23
± 4.86
84.27
± 1.90
81.31
± 4.50
65.22
± 0.91
64.59
± 0.79
65.45
± 2.93
63.68
± 1.65
89.53
± 0.08
90.11
± 0.51
90.61
± 0.93
89.96
± 1.78
AA(%)
Std
80.08
± 4.79
81.87
± 2.97
84.19
± 1.77
83.01
± 2.67
77.61
± 0.94
77.20
± 0.54
76.18
± 1.88
77.09
± 1.34
92.95
± 0.60
94.04
± 0.70
94.53
± 0.82
94.60
± 0.71
K × 100
Std
70.86
± 9.29
73.48
± 5.90
79.51
± 2.31
76.07
± 5.42
60.67
± 1.18
60.32
± 0.78
61.06
± 3.15
59.21
± 1.77
88.32
± 0.11
88.96
± 0.58
89.56
± 1.03
88.85
± 1.97
F1
Std
74.63
± 6.26
75.09
± 4.16
81.26
± 1.87
78.63
± 3.12
64.22
± 1.79
64.11
± 0.85
65.52
± 2.54
63.04
± 1.51
92.94
± 0.39
93.79
± 1.06
93.97
± 1.00
93.59
± 4.66
Table 13. The sensitivity analysis of parameter gamma.
Table 13. The sensitivity analysis of parameter gamma.
AlgorithmIndexBaselineWithout GammaWith Gamma
Dataset
PUOA81.49 ± 4.7775.93 ± 5.8082.37 ± 5.43
AA82.37 ± 2.7180.05 ± 2.8183.18 ± 2.12
Kappa76.17 ± 5.4969.73 ± 6.3877.48 ± 6.18
F179.28 ± 2.6976.18 ± 2.3079.39 ± 2.92
IPOA62.65 ± 2.6062.45 ± 1.6763.96 ± 1.47
AA74.65 ± 1.9372.63 ± 1.9675.52 ± 1.83
Kappa58.08 ± 2.7257.69 ± 2.0459.44 ± 1.61
F161.37 ± 1.7761.68 ± 3.1263.33 ± 1.82
SAOA89.45 ± 1.8688.13 ± 0.8990.39 ± 1.84
AA93.83 ± 0.9393.16 ± 1.2894.94 ± 1.09
Kappa88.28 ± 2.0386.82 ± 0.9689.33 ± 2.05
F193.68 ± 0.3591.97 ± 1.3294.09 ± 0.92
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, C.; Wang, A.; Wang, M.; Wu, H.; Yan, S.; Zhao, L. Cross-Domain Hyperspectral Image Classification Combined Sharpness-Aware Minimization with Local-to-Global Feature Enhancement. Remote Sens. 2026, 18, 740. https://doi.org/10.3390/rs18050740

AMA Style

Liu C, Wang A, Wang M, Wu H, Yan S, Zhao L. Cross-Domain Hyperspectral Image Classification Combined Sharpness-Aware Minimization with Local-to-Global Feature Enhancement. Remote Sensing. 2026; 18(5):740. https://doi.org/10.3390/rs18050740

Chicago/Turabian Style

Liu, Chengyang, Aili Wang, Minhui Wang, Haibin Wu, Siqi Yan, and Lin Zhao. 2026. "Cross-Domain Hyperspectral Image Classification Combined Sharpness-Aware Minimization with Local-to-Global Feature Enhancement" Remote Sensing 18, no. 5: 740. https://doi.org/10.3390/rs18050740

APA Style

Liu, C., Wang, A., Wang, M., Wu, H., Yan, S., & Zhao, L. (2026). Cross-Domain Hyperspectral Image Classification Combined Sharpness-Aware Minimization with Local-to-Global Feature Enhancement. Remote Sensing, 18(5), 740. https://doi.org/10.3390/rs18050740

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop