A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification

Yang, Gan; Wang, Zhaohui

doi:10.3390/rs17162800

Open AccessArticle

A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification

by

Gan Yang

and

Zhaohui Wang

^*

Faculty of Computer Science and Technology, Hainan University, Haikou 570228, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(16), 2800; https://doi.org/10.3390/rs17162800

Submission received: 11 June 2025 / Revised: 7 August 2025 / Accepted: 7 August 2025 / Published: 13 August 2025

Download

Browse Figures

Versions Notes

Abstract

Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual interpretation. Traditional DL models require massive datasets to learn sophisticated feature representations, hindering their full potential in data-scarce contexts. To tackle this issue, a deep transfer contrastive learning network is proposed. A spectral data augmentation module is incorporated to expand limited sample pairs. Subsequently, a spatial–spectral feature extraction module is designed to fuse the learned feature information. The weights of the spatial feature extraction network are initialized with knowledge transferred from source-domain pretraining, while the spectral residual network acquires rich spectral information. Furthermore, contrastive learning is integrated to enhance discriminative representation learning from scarce samples, effectively mitigating obstacles arising from the high inter-class similarity and large intra-class variance inherent in HSIs. Experiments on four public HSI datasets demonstrate that our method achieves competitive performance against state-of-the-art approaches.

Keywords:

Hyperspectral Image Classification; transfer learning; contrastive learning; few-shot learning

1. Introduction

Hyperspectral images (HSIs) captured by airborne or spaceborne sensors comprise plenty of narrow and continuous spectral channels, which can provide rich spectral and spatial information [1,2]. In the past few years, the HSI classification task has received widespread attention [3,4], and it has found extensive applications in fields such as smart agriculture [5], environmental monitoring [6], mineral surveys [7], etc. Despite this progress, accurate classification remains a formidable challenge, especially with limited training samples [8]. Consequently, few-shot HSI classification has emerged as an active area of research.

HSI data collection and labeling present persistent bottlenecks in hyperspectral remote sensing applications [9]. Initially, hyperspectral sensors produce an immense volume of data, with each pixel exhibiting responses across a spectrum ranging from dozens to hundreds of bands. This necessitates a substantial number of labeled samples to adequately represent the data. The labor cost of HSI labeling a single scene is high, and the interpretation of HSIs is also a specialized task demanding profound expertise; only professionals with extensive field knowledge have the capability to precisely identify and classify the specific materials and objects in the imagery. Additionally, the collection of some HSI data involves inaccessible areas, which poses significant challenges for ground-truth surveys and sample collection. Thus, the availability of labeled HSI samples is currently quite constrained.

In previous research, scholars dedicated their efforts to the meticulous design of feature extraction and selection processes, aiming to address the difficulties arising from data scarcity. A set of techniques was adopted to effectively harness the discriminative power of features, e.g., a histogram of oriented gradients [10], principal component analysis [11], and local binary patterns [12]. As research has progressed, some classification algorithms have been exploited to improve the accuracy, such as random forest [13], support vector machine (SVM) [14], k-nearest neighbors [15], multinomial logistic regression [16], and so on. However, these algorithms can only extract shallow spectral and spatial features from HSIs and lack sufficient generalization and representation capabilities when dealing with limited HSI samples.

Deep learning (DL) techniques, particularly convolutional neural networks [17], stacked autoencoders [18], and deep belief networks [19], have demonstrated remarkable capabilities and achieved breakthroughs in the challenging task of few-shot hyperspectral image (HSI) classification over recent years [20]. These DL classification models can automatically learn high-level feature representations of the hierarchical features of known samples and effectively handle the large spatial variability and high dimensionality of HSIs. For instance, Li et al. [21] introduced a lightweight 3D-CNN framework that directly processes raw HSI cubes, eliminating preprocessing needs while efficiently extracting joint spectral–spatial features with minimal parameters. Zhong et al. [22] introduced an end-to-end spectral–spatial residual network (SSRN) with dedicated residual blocks, which processes raw HSI cubes through identity mapping and batch normalization to mitigate gradient vanishing in deep 3D-CNNs. However, model performance is empirically observed to correlate with training sample abundance. Consequently, the application of traditional DL models in scenarios where training samples are limited is prone to result in overfitting and suboptimal classification performance.

Given the pressing need for solutions to few-shot HSI classification tasks, researchers have studied them for more than a decade [23,24,25,26,27,28,29]. Among these studies, some have focused on the perspective of extending limited sample sets. Haut et al. [24] adopted random occlusion of pixels in different rectangular spatial regions to generate training samples. Zhu et al. [25] leveraged GAN-generated adversarial samples with real data to fine-tune discriminative CNNs. However, these methods may introduce noise during data enhancement and have difficulty in fully considering spatial–spectral information. On the other hand, some scholars consider deep learning to provide better representations from insufficient data. In [27], an adaptive graph convolution extracts multi-scale patch features, while a BiLSTM captures cross-scale spectral–spatial dependencies. Jiang and Jia [30] applied the Siamese Network with depth separable convolution to capture relationships between data. These deep models contain numerous trainable parameters (e.g., network weights), often causing training difficulties that present significant practical challenges.

To resolve this matter, some frameworks have been shown to have more potential, such as meta-learning [21,22,23,24,25,26,27,28,29,30,31,32,33], self-supervised learning [34], transfer learning [35,36], active learning [37,38], contrastive learning [39,40,41], etc. These diverse approaches offer different ways to enhance the classifier’s ability to generalize from limited data. Meta-learning strategies focus on learning rapid-adaptation capabilities and optimizing the initialization and inner loop of the learning algorithm, thereby coping with the problem of few-shot learning. Self-supervised learning [42] has ingeniously utilized data structure to generate pseudo-labels, side-stepping the need for extensive manual annotations. Transfer learning has bridged domains by transferring knowledge from rich source domains to target tasks with limited data [43,44]. Active learning [45] has strategically identified the most informative samples to label, making the most out of scarce annotations. Contrastive learning has honed in on learning discriminative features by contrasting different samples.

Despite the individual merits of existing frameworks, their effectiveness in few-shot learning for HSIs can be further enhanced through a synergistic approach [46]. To this end, we propose a deep transfer contrastive learning network (DTCLN). Transfer learning leverages knowledge from source tasks to improve performance on target tasks [47]. Specifically, the spatial features of HSI (dominated by low-frequency edges and repetitive texture patterns) are consistent with low-level features learned from large-scale natural image corpora such as ImageNet. This consistency leads to high inter-domain transferability, effectively reducing sample complexity and providing a powerful regularizer to prevent overfitting in few-shot environments. Building on this principle, our approach employs a model pre-trained on ImageNet as the spatial feature extraction module, which is subsequently fine-tuned on the HSI dataset to learn domain-specific features. To augment the limited training data, a novel spectral data augmentation module is introduced that incorporates random Gaussian noise. Recognizing the unique characteristics of HSIs, a spectral residual feature extraction network is adopted to effectively capture spectral complexity. Furthermore, a spatial–spectral feature extraction module is incorporated to fuse the learned features. To significantly boost discriminative feature learning from scarce samples, an integrated contrastive loss strategy is designed to address the challenges of high inter-class similarity and large intra-class variance [48]. This loss strategy measures local feature similarity between sample pairs to learn more discriminative representations. The main contributions of this article are summarized as follows.

(1): Spectral data augmentation module: Improves sample diversity through random spectral shift and noise injection
(2): Transfer learning and residual networks are introduced into few-shot hyperspectral classification in a collaborative way, and spectral–spatial dual-branch feature extraction is designed. By combining ImageNet pre-trained spatial features with spectral residual networks, feature expression is optimized. In addition, a spatial attention module (SAM) is introduced to adaptively weight multi-scale spatial features
(3): The cross-entropy and supervised contrast loss are jointly optimized to explicitly maximize the ratio of “inter-class margin/intra-class radius”, directly alleviating the inherent problem of HSIs with similar inter-classes and large intra-class variance.

2. Methodology

2.1. Overview of the Proposed Network

As illustrated in Figure 1, the proposed method comprises two main phases: few-shot pre-training on the source and target domain datasets, and testing on the remaining samples in the target-domain dataset. The pre-training phase first utilizes the source-domain dataset to learn transferable knowledge from the source classes. However, given the rich spatial information inherent in ImageNet images, the features learned during source-domain training may only be partially transferable to the target task. Consequently, the model parameters obtained from source-domain training initialize the weights of the spatial–spectral feature extraction module for the target domain, aiming to enhance knowledge transfer effectiveness. During the testing phase, labeled and unlabeled samples from the HSI dataset are sequentially input to the mapping layer and the pre-trained spatial–spectral feature extraction module to generate fused embedding features. A nearest neighbors (NN) classifier is then trained using the few labeled samples and employed to predict the unlabeled samples. The trained spatial–spectral feature extraction module clusters intra-class samples tightly while distancing inter-class samples in the embedding space. Subsequently, the trained NN classifier leverages this discriminative embedding space to classify unlabeled samples through comparative analysis. Finally, classification maps for all unlabeled samples are generated to evaluate the method’s effectiveness.

2.2. Few-Shot Pre-Training Phase

Training an efficient feature extraction module to adapt to few-shot application scenarios plays a key role in the proposal. The pre-training phase of the proposed spatial–spectral feature extraction module is shown in the upper part of Figure 1. It contains two training phases: one is on the ImageNet dataset, and the other is on the HSI dataset.

2.2.1. Pre-Training Phase on the Source-Domain

In the feature extraction of HSIs, both spatial information and spectral information are typically taken into account. Some spatial features, such as spots, corners, and edges, are universal image representations. The ImageNet dataset, being extremely large and diverse, encompasses a wide range of natural images. Models pre-trained on this vast dataset can learn rich and robust spatial feature representations that generalize well to other image-related tasks, including HSI tasks. This is because the large scale and diversity of ImageNet enable models to capture universal spatial patterns effectively. Therefore, VGG16 [49], which has demonstrated powerful feature extraction capabilities and strong performance on the ImageNet dataset [50], is employed as a pre-trained model to extract spatial features for transfer learning in HSI tasks.

The pre-training process on the ImageNet dataset is depicted in the upper part of Figure 1. The VGG16 architecture comprises 13 convolutional and 3 fully connected layers. Its earlier convolutional operations extract primitive visual elements (e.g., edges, textural patterns), whereas deeper layers progressively abstract high-level semantic constructs. These intermediate layers are particularly beneficial for extracting spatial information from HSIs. In the process of transfer learning, certain layers of the model are typically frozen to preserve their learned features, while supplemental trainable modules are integrated atop the base architecture for target domain specialization. This methodology facilitates effective fine-tuning of pre-trained models on HSI datasets while preserving their inherent stability and representational efficacy. By replicating VGG16’s first seven-layer convolutional structure with frozen pretrained parameters, the transfer spatial feature extraction module effectively harnesses multi-scale feature extraction hierarchies inherent in the source model.

2.2.2. Pre-Training Phase on the Target Domain

The trained transfer spatial feature extraction module is used as a part of the proposed spatial–spectral feature extraction module for training on the HSI dataset. The normalized HSIs are sliced into patches centered on every pixel. Each patch can be represented by

x^{k}

. To enhance the diversity of training samples and enable the model to learn more spectral feature representations, a spectral data augmentation is involved. Inspired by Song et al. [51], a random spectral shift is employed in this strategy. Specifically, a portion of the spectral bands is randomly selected, where half of the spectral values are shifted horizontally by five pixels and the other half are shifted vertically by five pixels. To enhance model robustness against real-world perturbations, Gaussian noise is injected to simulate common degradation sources. The augmented sample

x^{k}

is generated by

{\tilde{x}}_{k} = f (x_{k}) + γ n

(1)

where

f (\cdot)

denotes spectral augmentation operations,

γ = 0.01

scales the noise magnitude, and

n \sim N (0, 1)

models noise artifacts emerging from inter-pixel interference or acquisition errors.

In the domain of few-shot learning tasks, limited labeled samples (the support set) are utilized to classify a vast number of unlabeled samples (the query set). To enable the model to adapt to the few-shot scenario, the samples are firstly divided into support set

S

and query set

Q

. In each training episode,

C

classes are randomly selected to form an episode, which contains few labeled samples per class. Subsequently,

K

labeled samples are selected in each class to serve as the support set

S = {({\tilde{x}}_{i}, y_{i})}_{i = 1}^{C \times M}

, where

{\tilde{x}}_{i}

is the

i

th sample of the support set and

y_{i}

is the label of

{\tilde{x}}_{i}

. Meanwhile,

N

unlabeled samples per class are randomly sampled from the remainder of the same C classes to form a query set

Q = {({\tilde{x}}_{j}, y_{j})}_{j = 1}^{C \times N}

, where (

{\tilde{x}}_{j}, y_{j}

) means the sample of the query set and its label. The entire selection process is called a C-way K-shot task, which helps the model learn to quickly adapt and recognize new classes.

Considering that heterogeneous datasets are used in the pretraining phase, it is difficult to directly apply the model pre-trained on optical images to HSI classification task. To address this limitation, a mapping layer is designed to facilitate discriminative feature extraction. The input HSI

\tilde{x} \in R^{h \times w \times b}

will be multiplied by a mapping matrix

K \in R^{b \times 3}

to obtain

x^{'} \in R^{h \times w \times 3}

. The formulation is

x^{'} = \tilde{x} \times K

(2)

The mapping layer parameters are optimized via backpropagation using the available training samples.

The spatial feature extraction module leverages a transfer learning model pre-trained on ImageNet, providing robust two-dimensional spatial feature representations. To effectively process the continuous spectral bands (dozens to hundreds) characteristic of HSIs, a dedicated spectral residual feature network utilizing convolutional operations is introduced for the spectral dimension. As depicted in Figure 2, this network architecture comprises convolutional layers, spectral residual blocks, and a fully connected (FC) layer. The initial convolutional layer processes the input band vector

x^{'} \in R^{B \times 1}

to generate feature vectors. These vectors are subsequently fed into the spectral residual block. Crucially, padding is applied within each convolutional layer to maintain feature dimensionality, ensuring the output size matches the input. Following the spectral residual processing, a final convolutional layer further refines the feature representation. Each convolutional operation is followed by batch normalization (BN) and a ReLU activation function, enhancing training stability and accelerating convergence. Finally, a fully connected layer with 100 units produces the extracted spectral features, denoted as

X_{s p e}^{'}

.

To amalgamate spatial and spectral feature information derived from different training datasets, a spatial–spectral feature extraction module is employed. The feature vectors

X_{s p a}^{'}

and

X_{s p e}^{'}

, learned from both the spatial and spectral branches, are concatenated along the channel dimension. Next, a 2D convolutional layer is utilized to fuse the features and mitigate data variance. Considering the specificity of spatial features in the target domain, the fused feature map

X^{″}

will be input into the spatial attention module (SAM). To capture multi-scale spatial contexts with limited samples, two parallel pooling paths are employed: max pooling extracts locally salient features, and average pooling preserves global consistency. The outputs from both paths are concatenated channel-wise and processed by a 2D convolutional layer with 3 × 3 kernels, reducing channels to 1/4 of the input for computational efficiency. To seamlessly integrate these multimodal features, a ReLU activation introduces non-linearity, followed by a dropout layer to prevent overfitting in few-shot scenarios. Finally, an additional fully connected layer is introduced to generate the embedded features

X^{‴}

. This strategy is pivotal for capturing a richer and more holistic set of data features, thereby enhancing the model’s adaptability to scenarios characterized by limited sample availability.

To address the issues of class imbalance and hard-to-classify samples, an integrated loss function is employed to update the parameters of the model. This approach capitalizes on the stability of cross-entropy (CE) loss while incorporating contrastive loss as a data-driven regularizer, thereby enhancing the robustness of the feature embeddings. It can be calculated as.

L_{t o t a l} = L_{c e} + L_{c o n}

(3)

Here,

L_{c e}

and

L_{c o n}

represent the CE loss and supervised contrastive loss, respectively.

L_{c e}

, serving as the predominant optimization objective for deep learning classifiers, is defined by

L_{c e} = E_{S, Q} [- \sum_{(x, y) \in Q} \log p_{φ} (y = k | x)]

(4)

where

S

and

Q

denote the support and query sets,

x \in Q

represents a query sample with ground-truth label

k

. Existing research [52] reveals that the standard CE loss suffers from two critical limitations: vulnerability to label noise and insufficient margin optimization, ultimately compromising model generalizability. To enhance feature transferability and mitigate these constraints, contrastive learning strategy is integrated into cross-entropy optimization. Formally, given a sample index

i \in I = {1, \dots, N}

, the supervised contrastive loss is defined as

L_{c o n} = \sum_{i \in I} \frac{- 1}{|P (i)|} \sum_{p \in P (i)} \log \frac{e x p ({s i m}_{g} (X_{i} \cdot X_{p}) / τ)}{\sum_{t \in O (i)} e x p ({s i m}_{g} (X_{i} \cdot X_{p}) / τ)}

(5)

where

Z

denotes the feature embedding after extraction, and the cosine similarity between projected features is represented as

{s i m}_{g}

. Define

O (i)

as

I / i

, and let

P (i) = {p \in O (i) : y_{p} = y_{i}}

denote the positive sample indices. The temperature hyperparameter

τ = 0.5

scales the distribution. This supervised contrast loss brings similar samples closer together in the feature space and pushes different samples farther away, thereby directly solving the aliasing problem caused by high intra-class variance and inter-class similarity in HSIs [53].

3. Experiments and Analyses

3.1. Datasets and Configuration

The proposed model was evaluated using four benchmark HSI datasets: Pavia University (PU), Salinas (SA), Indian Pines (IP), and Longkou. Representing distinct geographical environments and diverse object categories, this selection enables a comprehensive test of the model’s generalization ability. False-color (FC) imagery and corresponding ground-truth (GT) maps for all datasets appear in Figure 3. The specific details of each dataset are depicted in Table 1, Table 2, Table 3 and Table 4.

Pavia University (PU): Acquired by the ROSIS-03 sensor over the University of Pavia, Italy, in 2001, the PU dataset has a spatial resolution of 1.3 m and spatial dimensions of 610 × 340 pixels. It originally contained 115 spectral bands ranging from 0.43 to 0.86 µm; after removing 12 noisy bands, 103 bands were retained for experiments. The dataset includes nine ground-truth classes.

Salinas (SA): The Salinas dataset has a spatial resolution of roughly 3.7 m and a spatial scope of 512 × 217 pixels. It features 224 spectral channels from 400 to 2500 nm. However, only 204 bands are used due to water vapor absorption. The dataset has 16 agricultural land-cover classes.

Indian Pines (IP): The Indian Pines dataset was gathered in 1992 via the AVIRIS sensor. It has a spatial resolution of approximately 20 m and a spatial extent of 145 × 145 pixels. Initially, it had 224 spectral bands. However, 20 water-absorption bands and four zero bands were eliminated, leaving 200 bands for experiments. The dataset encompasses 16 classes.

LongKou: The WHU-Hi LongKou dataset was gathered on 17 July 2018 over Longkou town in Hubei Province, China. It was collected by a Headwall Nano-Hyperspec sensor mounted on a drone. The dataset has a spatial size of 550 × 400 pixels and a spatial resolution of roughly 0.463 m. It consists of 270 spectral channels spanning 400–1000 nm. There are nine land-cover types in this area.

Both the ImageNet and HSI training iterations were set to 1000. To rigorously simulate an extreme data regime, we set K = 1 (only one labeled sample per class) and N = 19 for each training episode to ensure that every episode maintained a balanced, 19-way evaluation across all datasets. C is determined by the number of classes in the HSI dataset. During the experiments, five labeled samples from each class of the HSI dataset were randomly picked as the training set, while the remaining samples served as the test set. For few-shot learning methods, 200 labeled samples per class were randomly chosen. In addition, to eliminate the impact of random sampling, all experiments were carried out ten times. Our model was optimized with Adam, and the learning rate was set to 0.001. The overall accuracy (

O A

), average accuracy (

A A

), and kappa coefficient (

κ

) were used to evaluate the classification performance of all methods.

3.2. Ablation Experiments

3.2.1. Combination of Different Modules

To evaluate the transfer spatial feature extraction module (SpaM) and spectral residual feature extraction module (SpeM) in few-shot HSI tasks, four schemes were tested across four datasets. The overall accuracies (OAs) for each scheme are presented in Table 5, with the best results highlighted in bold. The table includes the “SpaM-Scratch” configuration, in which the spatial module is trained from scratch on HSI data, thereby providing a direct assessment of the benefit gained from ImageNet pre-training. Compared with “SpaM-Scratch”, the ImageNet-pre-trained SpaM yields statistically significant OA improvements of 3.3–4.1%, confirming the value of transfer learning. When used individually, SpaM consistently outperforms SpeM, indicating that the pre-trained spatial features remain more informative than spectral features learned from extremely few samples. Combining the pre-trained SpaM with SpeM achieves the highest OA on every dataset, demonstrating that the fusion of complementary spatial and spectral representations yields the most comprehensive feature set for few-shot HSI classification.

3.2.2. SAM Contribution Analysis

To verify the impact of the SAM, the proposal was trained with and without the SAM under the standard five-shot setting (five samples per class). All other modules and hyperparameters were kept identical. Table 6 reports the average results over 10 independent runs. As shown in Table 6, the SAM brings significant performance improvements in all four benchmark datasets, with the largest improvements (+3.58%) in the Indian Pines dataset, which has the most complex spatial structure. This phenomenon can be attributed to the dual-path pooling mechanism of the SAM: the maximum pooling branch effectively captures the irregular boundary characteristics of farmland patches in this dataset, while the average pooling branch maintains spectral consistency in homogeneous crop areas. It is worth noting that although the introduction of the SAM results in a slight increase in the model standard deviation (Pavia University: ±1.56 vs. ±0.4), its classification accuracy in boundary-sensitive areas has significantly improved (such as the “Gravel” class F1 value, which increased by 9.2%), which verifies the robustness enhancement effect of the module in spatial heterogeneous scenarios.

3.2.3. Loss Function

The loss function within our proposed methodology is paramount in improving the classification accuracy. The loss strategy amalgamates cross-entropy with contrastive supervision to learn domain-agnostic feature representations. This fusion is designed to capitalize on the strengths of both components: the discriminative power of cross-entropy and the distinctive feature separation afforded by contrastive learning.

To validate the impact of the proposed loss strategy, all other model components are maintained constant while varying the loss function employed. As depicted in Table 7, the empirical results offer a comparative analysis of the performance across four distinct datasets. They reveal that the incorporation of supervised contrastive learning significantly outperforms the reliance on cross-entropy loss in isolation. This demonstrates that the mechanism of contrastive learning introduces an enhanced capacity for the model to discern between classes. Moreover, the amalgamated loss function, which incorporates both the cross-entropy and contrastive loss, emerges as the most effective, which demonstrates the superiority of our proposed strategy.

3.3. Comparison with State-of-the-Art Algorithms

In this experiment, several state-of-the-art algorithms were employed for comparison, including the supervised machine learning method SVM [14], the 3D conv-based method SSRN [22], the data augmentation method RODA [24], a heterogeneous transfer learning network (HTLN) [36], the few-shot learning methods DFSL [23] and S3Net [49], deep cross-domain few-shot learning for hyperspectral image classification (DCLN) [47], and enhancing a contrastive learning with positive pair mining network (CPPM) [54] to verify the validity of the proposed method. Applying uniform evaluation criteria for SVM, five randomly selected labeled samples per class served as training data. All benchmark methods underwent identical training–validation–testing procedures on consistent datasets.

3.3.1. The Quantitative Evaluations

The detailed classification evaluations presented in Table 8, Table 9, Table 10 and Table 11 provide a comprehensive assessment of the performance of various models across four distinct datasets. A meticulous analysis of these data reveals that SVM performs worst in extremely small sample sizes, demonstrating the bottleneck of shallow models in learning high-dimensional spectral features. While the data augmentation method RODA outperforms SVM, its performance fluctuates significantly, reflecting the noise sensitivity of the generated samples. This contradicts the stringent spectral fidelity requirements of HSIs. The 3D-CNN architecture SSRN performs well with sufficient samples but degrades significantly in the five-shot setting, highlighting the dependence of convolutional networks on data size. The small sample-specific models DFSL, S3Net, and CPPM optimize feature embedding through meta-learning and self-supervised learning and perform well on the PU and SA datasets, but their performance drops sharply for the IP dataset, indicating that spectral–spatial correlation modeling in complex farmland scenes is still insufficient. Empirical evaluations show that HTLN delivers competitive performance relative to few-shot learning methods on multiple datasets, confirming transfer learning’s capability to address data scarcity challenges. Notably, our method delivers competitive accuracy across diverse datasets in classes C1, C2, and C8 on the Pavia University (PU) dataset. On the Salinas dataset, our method yields satisfactory classification outcomes for nearly all classes, which is attributed to the ability of the residual spectral network to extract narrowband features and the multi-scale fusion of the SAM. Furthermore, our proposal secures the highest results among the seven classes in the Indian Pines (IP) dataset. In the Longkou dataset, which boasts a higher spatial resolution, the performance of all models is observed to improve. The overall performance metrics of our method are also found to be superior, with the highest OA, AA, and

κ

values across the four datasets. These findings suggest that our proposed method offers a robust strategy for enhancing model performance in few-shot scenarios.

3.3.2. The Qualitative Evaluations

The visual classification maps generated by a variety of methods across the four datasets are depicted in Figure 4, Figure 5, Figure 6 and Figure 7. A comparative analysis reveals that few-shot methods, including DFSL, S3Net, DCLN, and our proposed approach, yield purer and smoother classification maps compared to other methodologies. The SVM classification maps exhibit numerous speckles across many regions and display relatively blurred boundaries between different categories, indicating that SVM, with its simplistic structure, performs poorly when confronted with small sample challenges. While the RODA classification maps are comparatively clearer than those of SVM, they remain less than satisfactory. This observation suggests that data augmentation strategies can partially mitigate the impact of the small sample crisis. However, since the information primarily derives from limited labeled samples, model stability is susceptible to the influence of randomly sampled training data. The SSRN classification maps demonstrate poor results, potentially due to constraints in data volume. Although the classification maps from the few-shot models are not without imperfections, the results are deemed acceptable given the scarcity of training samples. For the Pavia University dataset, our proposed method accurately identifies almost all pixels in the “Gravel,” “Metal-sheets,” “Bare-soil,” and “Bitumen” (C3, C5, C6, and C7) categories. Their boundary clarity is significantly better than that of SVM and SSRN, reflecting the strong representation power of the migrated spatial features on heterogeneous objects. CPPM’s misjudgment of the “Bare-soil” class (C6) reveals the limitations of simple contrastive learning for spectral confusion scenes, while DTCLN corrects such errors through spectral residuals. The “Lettuce” classes (C11–C14) in Salinas show large areas of confusion in S3Net, while DTCLN maintains 98.20% accuracy, thanks to the noise injection of the spectral enhancement module that improves the model’s robustness to band shifts. With only 200 samples in the IP “Wheat” class (C13), DTCLN still achieves 99.14% accuracy, proving that under the hybrid loss constraint, the model avoids overfitting to the minority class.

Overall, when compared to other few-shot methods, our proposed model achieves acceptable classification maps that closely resemble the corresponding ground-truth images across the four datasets. This performance improvement stems from our proposed model’s enhanced ability to learn discriminative spectral–spatial features from limited samples. This capability is achieved by leveraging a transferred spatial model pre-trained on large datasets and incorporating a specially designed spectral residual network.

3.4. Effect of the Labeled Samples

To explore the model’s generalization capabilities with very few labeled samples, we conducted supplementary experiments in settings with 3–7 samples/class. As shown in Figure 8, Figure 9, Figure 10 and Figure 11, the OAs of all methods show a steady upward trend with increasing number of available labels, consistent with the common belief that richer samples lead to better performance. Within this range, our method achieves relatively high OAs in most cases, demonstrating its good adaptability to changes in sample size. Furthermore, even with the most scarce samples (three/class), the OA remains competitive, suggesting the potential advantages of the proposed strategy in extremely low-sample scenarios.

3.5. Analysis of Computational Complexity

The experimental results, shown in Table 12, clearly reveal the differences between different methods in training time, testing time, and parameter count. Traditional SVMs once again dominate with seconds-long training times and minimal parameter counts. All deep learning approaches, however, incur significant time overheads due to their network architecture and backpropagation. SSRN and RODA, due to their efficient feature extraction and sample augmentation, maintain training times of 400–1300 s on all four datasets, with parameter counts below 0.4M, resulting in the lowest overall training burden. DFSL and S3Net, utilizing a minimalist framework, further reduce trainable parameters to 0.03–0.09 M, while maintaining test latency under 7 s, demonstrating the advantages of their lightweight design. In contrast, HTLN, CPPM, and DCLN, due to the introduction of cross-domain alignment or positive sample mining modules, see their parameter counts rise to 0.63–0.85 M. This results in training times generally exceeding 1400 s, with test latency reaching as high as nearly 25 s, representing computational bottlenecks. The method proposed in this paper maintains fewer trainable parameters (compared with HTLN/DCLN/CPPM) while keeping the training and testing times within an acceptable range, indicating that it achieves a better balance between computational efficiency and classification performance.

4. Conclusions

In this work, a deep transfer contrastive learning network is proposed for few-shot HSI classification tasks. To mitigate sample scarcity, a spectral data augmentation method using random spectral shifting and noise injection improves the model’s robustness to real-world perturbations. Subsequently, transfer learning is incorporated into the proposed spatial–spectral feature extraction network. The combination of an ImageNet-pretrained spatial feature extraction module and a spectral residual network significantly improves feature representation capabilities. The SAM captures multi-scale context through dual-path pooling (maximum pooling and average pooling), which significantly improves boundary-sensitive areas. Concurrently, the hybrid loss function explicitly expands inter-class distances and compresses intra-class radii, directly alleviating the high inter-class similarity and large intra-class variance issues of HSIs. Experimental results on four public HSI datasets demonstrate that our method achieves competitive performance even with limited training samples. However, the method is still limited by computational complexity and generalization ability in extreme scenarios (such as low-resolution HSIs). In the future, lightweight deployment and cross-sensor generalization optimization need to be explored.

Author Contributions

Conceptualization, G.Y. and Z.W.; methodology, G.Y.; software, G.Y.; validation, G.Y. and Z.W.; formal analysis, G.Y. and Z.W.; investigation, G.Y.; resources, G.Y.; data curation, G.Y.; writing—original draft preparation, G.Y.; writing—review and editing, G.Y. and Z.W.; visualization, G.Y. and Z.W.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Hainan Key Research and Development Plan for Scientific and Technological Collaboration Projects under Grant GHYF2022015-Research on Medical Imaging Aided Diagnosis of Infant Brain Development Diseases.

Data Availability Statement

The original contributions presented in this study are included in the article and were derived from the following resources available in the public domain: Public HSI dataset at https://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 1 December 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu, B.; Dao, P.D.; Liu, J.; He, Y.; Shang, J. Recent advances of hyperspectral imaging technology and applications in agriculture. Remote Sens. 2020, 12, 2659. [Google Scholar] [CrossRef]
Li, Y.; Wang, T.; Cao, Z.; Xin, H.; Wang, R. Efficient Unsupervised Clustering of Hyperspectral Images via Flexible Multi-Anchor Graphs. Remote Sens. 2025, 17, 2647. [Google Scholar] [CrossRef]
Peng, J.; Sun, W.; Li, W.; Li, H.-C.; Meng, X.; Ge, C.; Du, Q. Low-rank and sparse representation for hyperspectral image processing: A review. IEEE Geosci. Remote Sens. Mag. 2022, 10, 10–43. [Google Scholar] [CrossRef]
Li, N.; Wang, Z.; Cheikh, F.A. Discriminating spectral–spatial feature extraction for hyperspectral image classification: A review. Sensors 2024, 24, 2987. [Google Scholar] [CrossRef]
Liu, J.; Feng, Q.; Liang, T.; Yin, J.; Gao, J.; Ge, J.; Hou, M.; Wu, C.; Li, W. Estimating the forage neutral detergent fiber content of alpine grassland in the Tibetan Plateau using hyperspectral data and machine learning algorithms. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4405017. [Google Scholar] [CrossRef]
Anand, R.; Veni, S.; Araint, J. Big data challenges in airborne hyperspectral image for urban landuse classification. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 1808–1814. [Google Scholar]
Yang, X.; Yu, Y. Estimating soil salinity under various moisture conditions: An experimental study. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2525–2533. [Google Scholar] [CrossRef]
Yang, G.; Wang, Z. Self-supervised contrastive learning residual network for hyperspectral image classification under limited labeled samples. In Proceedings of the 2024 7th International Conference on Image and Graphics Processing, Beijing, China, 19–21 January 2024; pp. 116–121. [Google Scholar]
Démoulin, R.; Gastellu-Etchegorry, J.-P.; Briottet, X.; Marionneau, M.; Zhen, Z.; Adeline, K.; Dantec, V.L. Hyperspectral Remote Sensing and 3D Radiative Transfer Modelling for Maize Crop Monitoring. In Proceedings of the 13th EARSeL Workshop on Imaging Spectroscopy, Valence, Spain, 24–26 April 2024. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; Volime 1, pp. 886–893. [Google Scholar]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Li, W.; Chen, C.; Su, H.; Du, Q. Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3681–3693. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, G.; Li, X.; Wang, B. Cascaded random forest for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2018, 11, 1082–1094. [Google Scholar] [CrossRef]
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
Ma, L.; Crawford, M.M.; Tian, J. Local manifold learning-based k-nearest-neighbor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2010, 48, 4099–4109. [Google Scholar] [CrossRef]
Wang, X. Hyperspectral image classification powered by Khatri-Rao decomposition-based multinomial logistic regression. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5530015. [Google Scholar] [CrossRef]
Maffei, A.; Haut, J.M.; Paoletti, M.E.; Plaza, J.; Bruzzone, L.; Plaza, A. A single model CNN for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2516–2529. [Google Scholar] [CrossRef]
Tao, C.; Pan, H.; Li, Y.; Zou, Z. Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2438–2442. [Google Scholar] [CrossRef]
Li, C.; Wang, Y.; Zhang, X.; Gao, H.; Yang, Y.; Wang, J. Deep belief network for spectral–spatial classification of hyperspectral remote sensor data. Sensors 2019, 19, 204. [Google Scholar] [CrossRef] [PubMed]
Audebert, N.; Le Saux, B.; Lefèvre, S. Deep learning for classification of hyperspectral data: A comparative review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 159–173. [Google Scholar] [CrossRef]
Li, Y.; Zhang, H.; Shen, Q. Spectral-spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2290–2304. [Google Scholar] [CrossRef]
Haut, J.M.; Paoletti, M.E.; Plaza, J.; Plaza, A.; Li, J. Hyperspectral image classification using random occlusion data augmentation. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1751–1755. [Google Scholar] [CrossRef]
Zhu, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Generative adversarial networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5046–5063. [Google Scholar] [CrossRef]
Aburaed, N.; Alkhatib, M.Q.; Marshall, S.; Zabalza, J.; Al Ahmad, H. Hyperspectral data scarcity problem from a super resolution perspective: Data augmentation analysis and scheme. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 5056–5057. [Google Scholar]
Liu, S.; Li, H.; Jiang, C.; Feng, J. Spectral–spatial graph convolutional network with dynamic-synchronized multiscale features for few-shot hyperspectral image classification. Remote Sens. 2024, 16, 895. [Google Scholar] [CrossRef]
Shi, M.; Ren, J. A lightweight dense relation network with attention for hyperspectral image few-shot classification. Eng. Appl. Artif. Intell. 2023, 126, 106842. [Google Scholar] [CrossRef]
Zhang, Z.; Gao, D.; Liu, D. Spectral-spatial domain attention network for hyperspectral image few-shot classification. Remote Sens. 2024, 16, 592. [Google Scholar] [CrossRef]
Jiang, S.; Jia, S. A 3D lightweight Siamese network for hyperspectral image classification with limited samples. In Proceedings of the 10th International Conference on Computing and Pattern Recognition, Shanghai, China, 15–17 October 2021; pp. 142–148. [Google Scholar]
Gao, K.; Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Sun, Y. Small sample classification of hyperspectral image using model-agnostic meta-learning algorithm and convolutional neural network. Int. J. Remote Sens. 2020, 42, 3090–3122. [Google Scholar] [CrossRef]
Zhou, F.; Zhang, L.; Wei, W.; Bai, Z.; Zhang, Y. Meta transfer learning for few-shot hyperspectral image classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Brussels, Belgium, 11–16 July 2021; pp. 3681–3684. [Google Scholar]
Gao, K.; Liu, B.; Yu, X.; Yu, A. Unsupervised meta learning with multiview constraints for hyperspectral image small sample set classification. IEEE Trans. Image Process. 2022, 31, 3449–3462. [Google Scholar] [CrossRef]
Zhao, L.; Luo, W.; Liao, Q.; Chen, S.; Wu, J. Hyperspectral image classification with contrastive self-supervised learning under limited labeled samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6008205. [Google Scholar] [CrossRef]
Qu, Y.; Baghbaderani, R.K.; Qi, H. Few-shot hyperspectral image classification through multitask transfer learning. In Proceedings of the 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, The Netherlands, 24–26 September 2019; pp. 1–5. [Google Scholar]
He, X.; Chen, Y.; Ghamisi, P. Heterogeneous transfer learning for hyperspectral image classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3246–3263. [Google Scholar] [CrossRef]
Li, X.; Cao, Z.; Zhao, L.; Jiang, J. ALPN: Active-learning-based prototypical network for few-shot hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5508305. [Google Scholar] [CrossRef]
Thoreau, R.; Achard, V.; Risser, L.; Berthelot, B.; Briottet, X. Active learning for hyperspectral image classification: A comparative review. IEEE Geosci. Remote Sens. Mag. 2022, 10, 256–278. [Google Scholar] [CrossRef]
Zhang, S.; Chen, Z.; Wang, D.; Wang, Z.J. Cross-domain few-shot contrastive learning for hyperspectral images classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5514505. [Google Scholar] [CrossRef]
Liu, Q.; Peng, J.; Zhang, G.; Sun, W.; Du, Q. Deep contrastive learning network for small-sample hyperspectral image classification. J. Remote Sens. 2023, 3, 0025. [Google Scholar] [CrossRef]
Liu, Q.; Peng, J.; Chen, N.; Sun, W.; Du, Q.; Zhou, Y. Refined prototypical contrastive learning for few-shot hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5506214. [Google Scholar] [CrossRef]
Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
Devabathini, N.J.; Mathivanan, P. Sign language recognition through video frame feature extraction using transfer learning and neural networks. In Proceedings of the 2023 International Conference on Next Generation Electronics (NEleX), Vellore, India, 14–16 December 2023; pp. 1–6. [Google Scholar]
Dhillon, G.S.; Chaudhari, P.; Ravichandran, A.; Soatto, S. A baseline for few-shot image classification. arXiv 2019, arXiv:1909.02729. [Google Scholar]
Settles, B. Active Learning Literature Survey; University of Wisconsin-Madison: Madison, WI, USA, 2009. [Google Scholar]
Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep cross-domain few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5501618. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1359. [Google Scholar] [CrossRef]
Xue, Z.; Zhou, Y.; Du, P. S3Net: Spectral–spatial Siamese network for few-shot hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5531219. [Google Scholar] [CrossRef]
Theckedath, D.; Sedamkar, R.R. Detecting affect states using VGG16, ResNet50 and SE-ResNet50 networks. SN Comput. Sci. 2020, 1, 79. [Google Scholar] [CrossRef]
Elsayed, G.; Krishnan, D.; Mobahi, H.; Regan, K.; Bengio, S. Large margin deep networks for classification. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, Canada, 3–8 December 2018; pp. 850–860. [Google Scholar]
Song, L.; Feng, Z.; Yang, S.; Zhang, X.; Jiao, L. Self-supervised assisted semi-supervised residual network for hyperspectral image classification. Remote Sens. 2022, 14, 2997. [Google Scholar] [CrossRef]
Yang, H.; Ni, J.; Gao, J.; Han, Z.; Luan, T. A novel method for peanut variety identification and classification by improved VGG16. Sci. Rep. 2021, 11, 15756. [Google Scholar] [CrossRef] [PubMed]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the Machine Learning Research (PMLR), Virtual Event, 13–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar]
Braham, N.A.A.; Mairal, J.; Chanussot, J.; Mou, L.; Zhu, X.X. Enhancing contrastive learning with positive pair mining for few-shot hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2024, 17, 8509–8526. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed network.

Figure 2. Framework of the proposed spectral residual feature module.

Figure 3. The false-color images and ground-truth maps. (a,b): Pavia University. (c,d): Salinas. (e,f): Indian Pines. (g,h): LongKou.

Figure 4. Classification maps for different methods using the PU dataset.

Figure 5. Classification maps for different methods using the Salinas dataset.

Figure 6. Classification maps for different methods using the IP dataset.

Figure 7. Classification maps for different methods using the LongKou dataset.

Figure 8. Classification accuracies (%) generated by different numbers of labeled samples from the PU dataset.

Figure 9. Classification accuracies (%) generated by different numbers of labeled samples from the Salinas dataset.

Figure 10. Classification accuracies (%) generated by different numbers of labeled samples from the IP dataset.

Figure 11. Classification accuracies (%) generated by different numbers of labeled samples from the LongKou dataset.

Table 1. Samples from the Pavia University dataset.

Pavia University
No.	Land-Cover Type	Training	Test
1	Asphalt	5	6626
2	Medows	5	18,644
3	Gravel	5	2094
4	Trees	5	3059
5	Metal-sheets	5	1340
6	Bare-soil	5	5024
7	Bitumen	5	1325
8	Bricks	5	3677
9	Shadows	5	942
	Total	45	42,731

Table 2. Samples from the Salinas University dataset.

Salinas
No.	Land-Cover Type	Training	Test
1	Brocoli 1	5	2004
2	Brocoli 2	5	3721
3	Fallow	5	1971
4	Fallow_plow	5	1389
5	Fallow_smooth	5	2673
6	Stubble	5	3954
7	Celery	5	3574
8	Grapes_untrained	5	11,266
9	Soil	5	6198
10	Corn	5	3273
11	Lettuce_4_wk	5	1063
12	Lettuce_5_wk	5	1922
13	Lettuce_6_wk	5	911
14	Lettuce_7_wk	5	1065
15	Vinyard-untrained	5	7263
16	Vinyard-vertical	5	1802
	Total	80	54,049

Table 3. Samples from the Indian Pines University dataset.

Indian Pines
No.	Land-Cover Type	Training	Test
1	Alfalfa	5	41
2	Corn-notill	5	1432
3	Corn-mintill	5	825
4	Corn	5	232
5	Grass-pasture	5	478
6	Grass-trees	5	725
7	Grass-pasture mowed	5	23
8	Hay-windrowed	5	473
9	Oats	5	15
10	Soybean-notill	5	967
11	Soybean-mintill	5	2450
12	Soybean-clean	5	588
13	Wheat	5	200
14	Woods	5	1260
15	Buildings-grass	5	381
16	Stone-steel	5	88
	Total	80	10,169

Table 4. Samples from the LongKou University dataset.

LongKou
No.	Land-Cover Type	Training	Test
1	Corn	5	34,506
2	Cotton	5	8369
3	Sesame	5	3026
4	Broad-leaf soy-bean	5	63,207
5	Narrow-leaf soybean	5	4146
6	Rice	5	11,849
7	Water	5	67,051
8	Roads and Houses	5	7119
9	Mixed weed	5	5224
	Total	45	204,497

Table 5. OAs of different schemes for the four datasets.

Dataset	Schemes	OA (%)
Pavia University	SpaM (Pre-trained)	$82.24 \pm 1.48$
	SpaM (Scratch)	$78.12 \pm 2.05$
	SpeM	$81.71 \pm 1.22$
	SpaM (Pre-trained) + SpeM	$85.46 \pm 1.56$
Salinas	SpaM (Pre-trained)	$89.43 \pm 0.83$
	SpaM (Scratch)	$86.71 \pm 1.36$
	SpeM	$87.91 \pm 0.99$
	SpaM (Pre-trained) + SpeM	$91.10 \pm 2.32$
Indian Pines	SpaM (Pre-trained)	$74.59 \pm 2.10$
	SpaM (Scratch)	$71.05 \pm 2.44$
	SpeM	$73.86 \pm 1.73$
	SpaM (Pre-trained) + SpeM	$75.40 \pm 2.21$
LongKou	SpaM (Pre-trained)	$91.51 \pm 1.82$
	SpaM (Scratch)	$89.20 \pm 1.68$
	SpeM	$90.79 \pm 1.10$
	SpaM (Pre-trained) + SpeM	$94.05 \pm 1.37$

Table 6. Influence of the spatial attention module (SAM) on overall accuracy (OA%).

Dataset	With SAM	Without SAM	∆OA (%)
Pavia University	$85.46 \pm 1.56$	$82.11 \pm 0.4$	$+ 3.35$
Salinas	$91.10 \pm 2.32$	$88.96 \pm 0.3$	$+ 2.14$
Indian Pines	$75.40 \pm 2.21$	$71.82 \pm 0.6$	$+ 3.58$
LongKou	$94.05 \pm 1.37$	$91.74 \pm 0.3$	+2.31

Table 7. Ablation studies on the loss function.

Dataset	Loss Function	OA (%)
Pavia University	$L_{C E}$	$84.73 \pm 2.47$
	$L_{c o n}$	$85.21 \pm 1.12$
	$L_{C E} {+ L}_{c o n}$	$85.46 \pm 1.56$
Salinas	$L_{C E}$	$90.73 \pm 1.49$
	$L_{c o n}$	$90.13 \pm 1.76$
	$L_{C E} {+ L}_{c o n}$	$91.10 \pm 2.32$
Indian Pines	$L_{C E}$	$73.79 \pm 0.97$
	$L_{c o n}$	$74.34 \pm 1.33$
	$L_{C E} {+ L}_{c o n}$	$75.40 \pm 2.21$
LongKou	$L_{C E}$	$89.97 \pm 1.72$
	$L_{c o n}$	$91.39 \pm 1.62$
	$L_{C E} {+ L}_{c o n}$	$94.05 \pm 1.37$

Table 8. Classification results from different models for the Pavia University dataset.

	SVM	SSRN	RODA	DFSL	HTLN	S3Net	CPPM	DCLN	Ours
1	72.42	76.99	35.53	98.24	89.77	81.76	79.12	73.86	83.21
2	82.69	63.97	51.80	98.45	74.37	75.51	85.48	61.77	91.14
3	40.71	55.98	84.78	48.04	62.87	70.08	63.13	39.89	80.47
4	67.90	91.82	86.77	50.94	94.69	85.83	92.07	94.34	92.54
5	99.99	99.51	100	99.92	99.86	99.93	99.72	98.79	98.48
6	27.83	63.29	49.78	77.96	57.79	64.17	71.07	86.05	70.34
7	41.89	55.67	94.72	73.53	93.12	95.14	79.17	90.76	89.60
8	54.84	45.85	20.44	84.24	81.35	72.45	69.97	63.91	74.43
9	81.75	99.63	96.82	75.68	98.06	94.90	97.60	98.97	99.18
OA	59.54 ±2.38	67.49 ±1.27	54.30 ±1.03	78.58 ±2.77	78.20 ±0.98	77.15 ±1.11	81.39 ±2.93	82.04 ±1.42	85.46 ±1.56
AA	63.36 ±3.85	72.52 ±2.89	68.96 ±0.12	78.56 ±1.65	83.53 ±0.45	82.19 ±0.67	81.93 ±1.94	68.92 ±2.31	86.60 ±1.32
$κ$	49.49 ±2.71	58.97 ±0.56	44.88 ±2.34	78.50 ±2.01	72.10 ±1.88	71.02 ±2.56	76.24 ±3.58	75.10 ±1.87	81.18 ±2.12

Table 9. Classification results from different models for the Salinas dataset.

	SVM	SSRN	RODA	DFSL	HTLN	S3Net	CPPM	DCLN	Ours
1	89.94	94.17	81.26	99.70	100	99.91	99.12	99.27	98.54
2	97.36	98.57	87.92	99.60	99.80	94.24	97.50	99.51	98.49
3	84.33	90.56	75.26	98.83	96.40	96.94	96.43	90.60	99.79
4	99.31	98.93	75.94	98.20	83.27	98.80	96.65	99.67	99.54
5	92.13	95.23	87.41	73.19	89.22	97.52	98.36	92.20	96.38
6	97.13	99.25	90.02	97.60	99.97	99.28	98.07	98.73	97.96
7	96.58	98.82	85.57	99.22	98.48	98.89	99.03	99.33	98.65
8	66.48	78.50	83.63	80.24	89.48	81.66	63.39	72.28	76.02
9	97.41	94.11	93.53	100	99.45	99.71	98.99	99.95	98.61
10	76.63	85.56	84.84	74.83	90.05	92.57	92.90	91.43	93.99
11	76.39	90.63	62.94	95.30	97.06	99.53	98.42	98.16	98.20
12	88.41	99.48	79.74	81.12	97.41	88.67	83.17	98.90	99.34
13	91.20	20.08	62.15	100	97.09	97.96	99.02	98.75	98.10
14	79.13	66.29	65.63	32.83	81.96	91.57	96.99	97.97	97.98
15	46.37	59.14	69.97	57.72	60.39	73.39	88.14	85.96	81.33
16	91.23	66.96	72.47	95.62	99.48	96.10	86.50	91.56	97.09
OA	79.50 ±2.09	83.09 ±1.42	80.50 ±1.19	84.76 ±2.22	88.08 ±1.36	90.85 ±1.74	88.37 ±0.75	90.46 ±1.21	91.10 ±2.32
AA	85.63 ±2.08	85.46 ±2.67	78.64 ±2.45	86.50 ±0.76	92.47 ±1.79	94.29 ±2.09	93.30 ± 0.54	94.64 ±0.31	95.62 ±0.62
$κ$	77.26 ±3.51	81.07 ±2.81	78.42 ±1.93	83.07 ±1.58	86.80 ±2.83	89.82 ±0.34	87.12 ±0.81	89.42 ±0.57	90.11 ±1.44

Table 10. Classification results from different models for the Indian Pines dataset.

	SVM	SSRN	RODA	DFSL	HTLN	S3Net	CPPM	DCLN	Ours
1	28.19	77.35	92.68	80.56	36.53	99.76	100	82.41	89.23
2	39.74	18.52	55.31	29.20	68.15	47.38	38.44	57.12	72.97
3	37.05	54.66	51.52	79.02	65.64	58.82	56.70	62.47	74.01
4	26.06	32.30	37.50	87.22	49.04	89.61	74.70	89.24	84.31
5	42.93	9.73	85.36	85.20	86.01	78.97	85.33	65.24	90.99
6	80.92	85.53	88.14	88.75	97.58	95.45	91.27	70.15	94.14
7	23.70	10.27	100	100	19.00	100	100	97.42	95.00
8	92.34	78.50	61.31	100	47.06	83.30	85.90	91.42	97.94
9	12.92	12.32	93.33	100	25.86	100	100	93.24	85.00
10	38.63	62.28	15.31	54.16	64.71	57.23	68.63	64.70	78.97
11	56.25	66.86	30.57	42.54	74.96	58.13	74.44	60.17	59.71
12	22.23	28.25	9.18	37.91	56.76	60.41	43.91	67.04	82.76
13	83.20	99.18	96.00	97.95	76.92	99.05	99.70	84.28	99.14
14	83.40	85.98	68.17	73.39	92.32	83.16	89.35	86.27	77.48
15	27.96	13.82	40.68	58.24	51.46	76.27	79.74	92.84	79.65
16	85.20	87.78	100	100	57.14	99.43	95.00	84.57	99.44
OA	47.98 ±3.72	58.01 ±1.13	48.74 ±2.90	59.70 ±2.97	69.08 ±3.01	67.52 ±3.19	70.82 +1.30	70.25 ±2.50	75.40 ±2.21
AA	48.79 ±2.91	50.56 ±2.26	64.07 ±2.39	75.88 ±3.28	60.57 ±2.77	80.43 ±2.38	80.19 +1.61	77.97 ±1.61	85.05 ±1.32
$κ$	41.91 ±2.45	52.07 ±1.69	42.44 ±4.82	55.17 ±4.33	65.21 ±2.54	63.57 ±3.71	66.85 +1.47	68.38 ±1.24	72.06 ±0.97

Table 11. Classification results from different models for the LongKou dataset.

	SVM	SSRN	RODA	DFSL	HTLN	S3Net	CPPM	DCLN	Ours
1	87.00	90.49	81.32	91.08	77.61	95.25	89.29	88.17	92.3
2	64.61	80.13	78.42	78.79	92.93	87.28	76.07	97.08	72.22
3	57.40	78.84	80.33	93.66	87.13	97.56	97.90	75.29	97.63
4	95.55	92.64	94.13	94.26	77.60	96.93	77.49	97.88	90.15
5	99.82	99.77	98.75	99.96	88.11	99.23	80.12	100.0	92.35
6	58.82	69.78	75.96	91.18	96.12	92.25	91.02	93.97	85.53
7	78.75	97.93	91.42	80.77	89.27	94.92	98.69	75.35	86.65
8	66.12	86.53	79.24	92.64	83.94	97.50	53.25	79.88	81.71
9	99.26	86.05	99.93	99.99	56.41	95.11	51.03	97.27	87.58
OA	71.70 ±4.53	83.20 ±2.64	81.28 ±1.72	86.37 ±2.46	87.73 ±2.33	91.43 ±1.24	85.99 ± 2.11	92.29 ±1.08	94.05 ±1.37
AA	78.59 ±3.76	86.92 ±1.31	86.61 ±3.23	91.37 ±1.96	83.24 ±2.33	93.87 ±1.22	79.43 ±3.10	89.43 ±1.79	93.84 ±2.54
$κ$	64.42 ±4.22	78.31 ±1.77	75.81 ±2.64	82.66 ±2.77	83.63 ±3.79	89.83 ±2.68	82.11 ±4.02	89.74 ±2.01	94.99 ±1.79

Table 12. Computational times (seconds) and parameters (M) of different methods for each dataset.

		SVM	SSRN	RODA	DFSL	HTLN	S3Net	CPPM	DCLN	The Proposal
PU	Train	4.24	722.43	634.23	809.58	1482.30	1204.34	1847.30	1437.82	1567.34
	Test	0.57	3.89	3.38	4.57	8.42	6.23	7.98	7.32	8.61
	Params	-	0.35	0.20	0.03	0.68	0.05	0.85	0.63	0.46
SA	Train	7.32	1029.4	874.87	1198.32	1624.06	1352.33	1629.04	1577.95	1431.51
	Test	0.82	6.22	5.29	7.32	8.11	7.39	8.23	7.91	8.92
	Params	-	0.40	0.20	0.03	0.68	0.09	0.85	0.65	0.46
IP	Train	3.97	463.21	423.25	927.43	1432.57	983.46	1563.97	1360.32	1523.95
	Test	0.49	3.78	3.12	5.43	7.29	5.88	7.92	8.13	7.33
	Params	-	0.35	0.20	0.03	0.68	0.09	0.85	0.63	0.46
LongKou	Train	9.42	509.32	1302.34	1628.94	1937.32	1799.21	2384.22	1988.29	1803.98
	Test	2.35	13.97	20.21	19.39	22.34	21.15	21.33	24.90	20.32
	Params	-	0.35	0.20	0.03	0.68	0.09	0.85	0.65	0.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, G.; Wang, Z. A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification. Remote Sens. 2025, 17, 2800. https://doi.org/10.3390/rs17162800

AMA Style

Yang G, Wang Z. A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification. Remote Sensing. 2025; 17(16):2800. https://doi.org/10.3390/rs17162800

Chicago/Turabian Style

Yang, Gan, and Zhaohui Wang. 2025. "A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification" Remote Sensing 17, no. 16: 2800. https://doi.org/10.3390/rs17162800

APA Style

Yang, G., & Wang, Z. (2025). A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification. Remote Sensing, 17(16), 2800. https://doi.org/10.3390/rs17162800

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification

Abstract

1. Introduction

2. Methodology

2.1. Overview of the Proposed Network

2.2. Few-Shot Pre-Training Phase

2.2.1. Pre-Training Phase on the Source-Domain

2.2.2. Pre-Training Phase on the Target Domain

3. Experiments and Analyses

3.1. Datasets and Configuration

3.2. Ablation Experiments

3.2.1. Combination of Different Modules

3.2.2. SAM Contribution Analysis

3.2.3. Loss Function

3.3. Comparison with State-of-the-Art Algorithms

3.3.1. The Quantitative Evaluations

3.3.2. The Qualitative Evaluations

3.4. Effect of the Labeled Samples

3.5. Analysis of Computational Complexity

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI