LGDAF-Net: A Lightweight CNN–Transformer Framework for Cross-Domain Few-Shot Hyperspectral Image Classification
Abstract
1. Introduction
- We propose a lightweight hierarchical CNN–Transformer framework, termed LGDAF-Net, for cross-domain few-shot hyperspectral image classification, enabling effective feature learning under limited labeled samples.
- We develop a collaborative feature learning mechanism consisting of SESA, LASPM, and a lightweight Transformer-based GACM to jointly capture fine-grained local structures and long-range spatial dependencies.
- Kernel triplet loss and domain adversarial learning are incorporated into the training framework to enhance feature discriminability and improve cross-domain feature alignment while maintaining a compact model structure.
2. Materials and Methods
2.1. Datasets
2.2. Implementation Environment
2.3. Model Architecture
2.3.1. Input Mapping Layer
2.3.2. Spectral–Spatial Dual Attention Preprocessing Module (SESA)
2.3.3. Localized Spatial Perception Module (LASPM)
2.3.4. Global Context Modeling Module (GACM)
2.3.5. Loss Functions and Training Strategy
- Cross-Entropy LossThis loss is used for supervised classification learning of samples from both the source and target domains, and its formula is defined as:where denotes the batch size, is the ground-truth label of sample , and represents the predicted class probability of the sample output by the model.
- 2.
- Kernel Triplet LossTo enhance the discriminability of extracted features, this paper designs a kernel triplet loss by introducing kernel mapping based on the standard triplet loss. Different from the standard triplet loss which calculates distance only based on original features, the proposed loss maps features into a high-dimensional space via a kernel function, to capture non-linear feature relationships more effectively. The formula is:where is the distance between the anchor sample and positive sample in the kernel space, is the distance between the anchor sample and negative sample, and is the margin hyperparameter, which is set to 0.3 in this work.
- 3.
- Domain Adversarial LossThis loss is used to align the feature distributions of the source and target domains, and adversarial training is implemented via the Gradient Reversal Layer (GRL). The goal of the domain classifier is to distinguish samples from the source and target domains, while the goal of the feature encoder is to generate domain-invariant features that cannot be correctly classified by the domain classifier. The loss formula is:where is the total number of samples from the source and target domains, is the domain label ( = 1 for the source domain, = 0 for the target domain), and is the domain probability predicted by the domain classifier. The weight of the gradient reversal layer is dynamically adjusted with the training process: it is initialized to 0.1, and increases linearly to 1.0 as the training episode grows.
- 4.
- Total Loss Function and Phased TrainingA phased training strategy is adopted in this work, and the total loss function is dynamically adjusted according to the training episode:Stage 1 (episode ≤ 500): Only the classification loss of the target domain is optimized.Stage 2 (500 < episode ≤ 1000): Classification loss of the source domain is added.Stage 3 (episode > 1000): Domain adversarial loss is added to achieve feature alignment.The hyperparameter weights are set as follows: = 0.5, = 0.3, and = 0.2.
2.4. Experimental Setup and Cross-Domain Few-Shot Training Protocol
2.4.1. Basic Experimental Parameters
2.4.2. Support/Query Set Construction in Each Episode
- Source domain support set: In each episode, labeled samples are randomly sampled from the source domain data loader. These samples provide sufficient supervised prior knowledge for the model, and are used for classification learning and intra-class/inter-class feature discriminability enhancement.
- Target domain support set: 15% of labeled samples per land cover class from the target dataset are randomly sampled in each episode, which is the only labeled information from the target domain used for model training. This part of samples provides few-shot supervised guidance to adapt the model to the target domain data distribution.
- Target domain query set: Unlabeled samples from the target dataset are sampled in each episode. These samples are only used for cross-domain feature alignment via domain adversarial learning, and their category labels are completely invisible to the model during the entire training process, which strictly follows the few-shot learning setting.
2.4.3. Target Label Usage Rule in Cross-Domain Few-Shot Setting
2.4.4. Dataset Fold Division and Validation Protocol
- For each target dataset, all labeled samples are randomly divided into 5 independent folds, with the same land cover class proportion maintained in each fold to avoid data imbalance;
- In each independent experiment, one fold is used as the labeled target support set for training, and the remaining 4 folds are used as the held-out test set for performance evaluation;
- All comparative experiments (including the proposed LGDAF-Net and two state-of-the-art methods) are conducted under the exact same data division and random seed setting, to ensure the fairness of the comparison;
- All experiments are repeated 5 times independently, and the final reported results are the mean and standard deviation (SD) of the 5 independent runs, to verify the stability of the proposed method.
2.5. Evaluation Metrics
2.6. Comparative Methods
3. Results
3.1. Quantitative Comparison with State-of-the-Art Methods
3.2. Classification Map Visualization
3.3. Performance Under Different Few-Shot Settings
Training Convergence Analysis
3.4. Ablation Study
3.5. Computational Efficiency
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, B. Frontiers in Hyperspectral Image Processing and Information Extraction. J. Remote Sens. 2016, 20, 1062–1090. [Google Scholar] [CrossRef]
- Tong, Q.-X.; Zhang, B.; Zhang, L.-F. Frontier Advances in Hyperspectral Remote Sensing in China. J. Remote Sens. 2016, 20, 689–707. [Google Scholar]
- Ghamisi, P.; Benediktsson, J.A.; Chanussot, J. Spectral–spatial classification of hyperspectral data: A comprehensive review. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4391–4403. [Google Scholar]
- Du, P.-J.; Xia, J.-S.; Xue, Z.-H.; Tan, K.; Su, H.-J.; Bao, R. Research Progress in Hyperspectral Remote Sensing Image Classification. J. Remote Sens. 2016, 20, 236–256. [Google Scholar]
- Hughes, G.F. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
- Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef]
- Joelsson, S.R.; Benediktsson, J.A.; Sveinsson, J.R. Random forest classifiers for hyperspectral data. IEEE Geosci. Remote Sens. Lett. 2005, 1, 160–163. [Google Scholar]
- Yang, X.; Cao, W.; Lu, Y.; Zhou, Y. Hyperspectral image transformer classification networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Islam, S.; Elmekki, H.; Elsebai, A.; Bentahar, J.; Drawel, N.; Rjoub, G.; Pedrycz, W. A comprehensive survey on applications of transformers for deep learning tasks. Expert Syst. Appl. 2024, 241, 122666. [Google Scholar] [CrossRef]
- Liu, Q.; Li, W.; Fan, S.; Jiang, Y. A graph-guided transformer based on dual-stream perception for hyperspectral image classification. Int. J. Remote Sens. 2024, 45, 9359–9387. [Google Scholar] [CrossRef]
- Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2290–2304. [Google Scholar] [CrossRef]
- Li, J.; Zhang, Z.; Song, R.; Li, Y.; Du, Q. SCFormer: Spectral coordinate transformer for cross-domain few-shot hyperspectral image classification. IEEE Trans. Image Process. 2024, 33, 840–855. [Google Scholar] [CrossRef] [PubMed]
- Dong, G.; Ma, Y.; Basu, A. Feature-guided CNN for denoising images from portable ultrasound devices. IEEE Access 2021, 9, 28272–28281. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Salt Lake City, UT, USA, 2018; pp. 7794–7803. [Google Scholar]
- Huang, K.-K.; Yuan, H.-T.; Ren, C.-X.; Hou, Y.E.; Duan, J.L.; Yang, Z. Hyperspectral image classification via cross-domain few-shot learning with kernel triplet loss. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
- Xi, B.; Zhang, Y.; Li, J.; Li, Z.; Chanussot, J. CTF-SSCL: CNN-transformer for few-shot hyperspectral image classification assisted by semisupervised contrastive learning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]













| Algorithm | OA/(%) | AA/(%) | Kappa/(%) |
|---|---|---|---|
| CTF-SSCL | 89.16 | 81.22 | 78.77 |
| CFSL-KT | 89.28 | 93.4 | 87.8 |
| LGDAF-Net | 91.31 ± 0.67 | 94.81 ± 0.76 | 90.11 ± 0.76 |
| Algorithm | OA/(%) | AA/(%) | Kappa/(%) |
|---|---|---|---|
| CTF-SSCL | 92.41 | 92.34 | 89.93 |
| CFSL-KT | 97.51 | 95.67 | 96.69 |
| LGDAF-Net | 98.16 ± 0.67 | 96.62 ± 0.76 | 97.56 ± 0.76 |
| Algorithm | OA/(%) | AA/(%) | Kappa/(%) |
|---|---|---|---|
| CTF-SSCL | 97.71 | 96.84 | 94.17 |
| CFSL-KT | 97.52 | 98.71 | 97.24 |
| LGDAF-Net | 98.46 ± 0.67 | 98.44 ± 0.76 | 98.29 ± 0.76 |
| Shot | CTF-SSCL | CFSL-KT | LGDAF-Net |
|---|---|---|---|
| 5 | 72.47 | 79.62 | 76.88 |
| 10 | 88.92 | 84.41 | 84.89 |
| 15 | 89.16 | 89.28 | 91.31 |
| LASPM | GACM | SESA | OA/(%) | AA/(%) | Kappa/(%) | Time (s) |
|---|---|---|---|---|---|---|
| × | √ | √ | 90.74 | 94.8 | 89.49 | 657 |
| √ | × | √ | 89.77 | 94.18 | 88.38 | 608 |
| √ | √ | × | 90.92 | 94.86 | 89.69 | 672 |
| √ | √ | √ | 91.31 | 94.81 | 90.11 | 784 |
| Model | Parameters (k) | FLOPs (G) | Training Time (s) | Relative Difference |
|---|---|---|---|---|
| LGDAF-Net | 287 | 0.84 | 784 | – |
| CFSL-KT | 737 | 2.05 | 712 | ↑ 61.1% parameters, ↑ 58.7% FLOPs |
| CTF-SSCL | 339 | 1.02 | 756 | ↑ 15.6% parameters, ↑ 17.3% FLOPs |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, G.; Fang, J.; Zhu, D.; Zuo, X. LGDAF-Net: A Lightweight CNN–Transformer Framework for Cross-Domain Few-Shot Hyperspectral Image Classification. Electronics 2026, 15, 1606. https://doi.org/10.3390/electronics15081606
Yang G, Fang J, Zhu D, Zuo X. LGDAF-Net: A Lightweight CNN–Transformer Framework for Cross-Domain Few-Shot Hyperspectral Image Classification. Electronics. 2026; 15(8):1606. https://doi.org/10.3390/electronics15081606
Chicago/Turabian StyleYang, Guang, Jiaoli Fang, Daming Zhu, and Xiaoqing Zuo. 2026. "LGDAF-Net: A Lightweight CNN–Transformer Framework for Cross-Domain Few-Shot Hyperspectral Image Classification" Electronics 15, no. 8: 1606. https://doi.org/10.3390/electronics15081606
APA StyleYang, G., Fang, J., Zhu, D., & Zuo, X. (2026). LGDAF-Net: A Lightweight CNN–Transformer Framework for Cross-Domain Few-Shot Hyperspectral Image Classification. Electronics, 15(8), 1606. https://doi.org/10.3390/electronics15081606

