FC-SBAAT: A Few-Shot Image Classification Approach Based on Feature Collaboration and Sparse Bias-Aware Attention in Transformers
Abstract
1. Introduction
- We propose FC-SBAAT, a unified framework that jointly designs feature collaboration, prototype rectification, and attention-based matching and establishes an information-flow coupling from prototype rectification to relevance-bias construction and sparse selection. The rectified prototypes directly shape the matching bias and determine the sparse aggregation scope, enabling coupled optimization between prototype construction and relation matching and improving accuracy, robustness, and generalization under noisy Few-Shot prototypes and complex inter-class relations.
- We introduce a two-subspace feature enhancement mechanism that strengthens fine-grained representations in two complementary subspaces and builds intra-class relations to characterize the reliability of support samples. An MLP is then employed to adaptively fuse the enhanced features, producing more informative task representations for prototype generation.
- We develop an intra-class-consistency-based prototype rectification strategy that suppresses the influence of outliers and noisy samples on mean prototypes through intra-class relation guided symmetric weighted aggregation, thereby mitigating prototype shift. This strategy avoids cross-class neighborhood or query-driven heuristics and is particularly suitable for novel-class Few-Shot scenarios.
- We introduce a Sparse Bias-Aware Attention mechanism in the Transformer decoder for matching. A relevance-driven bias term is injected into attention logits, and sparse selection is performed on the biased logits so that normalization and value aggregation are computed only over the most task-relevant subset, reducing the time complexity of attention computation while improving semantic focus.
2. Related Work
2.1. Few-Shot Learning
2.2. Contrastive Learning in Few-Shot Learning
2.3. Multi-Head Attention
2.4. Transformer-Based Methods
3. Methodology
3.1. Notations and Problem Definition
3.2. Framework
3.3. FCM
3.4. AFM
3.5. IPRM
3.6. SBAATM
| Algorithm 1 The training procedure of FC-SBAAT |
|
4. Experimental Results and Discussion
4.1. Datasets and Experimental Environment
4.2. Experimental Details
4.3. Comparison with State of the Arts
4.4. Ablation Study
4.4.1. The Impact of Different Structures
4.4.2. The Impact of Different Hyper-Parameters
4.4.3. The Impact of Different Feature Enhancement Strategies
4.4.4. The Computational Impact of Time Complexity
4.5. Visualization
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for Few-Shot learning. In Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Hu, H.; Gu, J.; Zhang, Z.; Dai, J.; Wei, Y. Relation networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3588–3597. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
- Cheng, Y.; Yu, M.; Guo, X.; Zhou, B. Few-shot Learning with Meta Metric Learners. arXiv 2019, arXiv:1901.09890. [Google Scholar] [CrossRef]
- Garcia, V.; Bruna, J. Few-shot learning with graph neural networks. arXiv 2017, arXiv:1711.04043. [Google Scholar] [CrossRef]
- Liu, J.; Song, L.; Qin, Y. Prototype rectification for Few-Shot learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part I 16; Springer: Cham, Switzerland, 2020; pp. 741–756. [Google Scholar] [CrossRef]
- Tsai, Y.-H.H.; Salakhutdinov, R. Improving one-shot learning through fusing side information. arXiv 2017, arXiv:1710.08347. [Google Scholar] [CrossRef]
- Li, X.; Tian, T.; Liu, Y.; Yu, H.; Cao, J.; Ma, Z. Adaptive multi-prototype relation network. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand, 7–10 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1707–1712. [Google Scholar]
- Zhang, W.; Gu, X. Task-aware prototype refinement for improved Few-Shot learning. Neural Comput. Appl. 2023, 35, 17899–17913. [Google Scholar] [CrossRef]
- Li, Z.; Hu, Z.; Luo, W.; Hu, X. SaberNet: Self-attention based effective relation network for few-shot learning. Pattern Recognit. 2023, 133, 109024. [Google Scholar] [CrossRef]
- Li, H.; Huang, G.; Yuan, X.; Zheng, Z.; Chen, X.; Zhong, G.; Pun, C.-M. PSANet: Prototype-guided salient attention for few-shot segmentation. Vis. Comput. 2025, 41, 2987–3001. [Google Scholar] [CrossRef]
- Lim, J.Y.; Lim, K.M.; Lee, C.P.; Tan, Y.X. SCL: Self-supervised Contrastive Learning for few-shot image classification. Neural Networks 2023, 165, 19–30. [Google Scholar] [CrossRef] [PubMed]
- Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-learning in neural networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5149–5169. [Google Scholar] [CrossRef] [PubMed]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2, pp. 1–30. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, MA, USA, 2017; Volume 70, pp. 1126–1135. [Google Scholar]
- Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn quickly for few-shot learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
- Nichol, A.; Schulman, J. Reptile: A scalable metalearning algorithm. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Gong, Y. Meta-Learning with Differentiable Convex Optimization. Technical Report. EasyChair. 2023. Available online: https://easychair.org/publications/preprint/RJVM (accessed on 2 January 2026).
- Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017. [Google Scholar]
- Munkhdalai, T.; Yu, H. Meta networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, MA, USA, 2017; pp. 2554–2563. [Google Scholar]
- Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A Simple Neural attentive meta-learner. arXiv 2017, arXiv:1707.03141. [Google Scholar]
- Sung, F.; Zhang, L.; Xiang, T.; Hospedales, T.; Yang, Y. Learning to learn: Meta-critic networks for sample efficient learning. arXiv 2017, arXiv:1706.09529. [Google Scholar] [CrossRef]
- Finn, C.; Xu, K.; Levine, S. Probabilistic model-agnostic meta-learning. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; pp. 9516–9527. [Google Scholar]
- Taud, H.; Mas, J.-F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Cham, Switzerland, 2018; pp. 451–455. [Google Scholar] [CrossRef]
- Chen, D.; Chen, Y.; Li, Y.; Mao, F.; He, Y.; Xue, H. Self-supervised learning for few-shot image classification. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1745–1749. [Google Scholar] [CrossRef]
- Li, A.; Luo, T.; Xiang, T.; Huang, W.; Wang, L. Few-Shot learning with global class representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9714–9723. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Liu, S.; Johns, E.; Davison, A.J. End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 15–20 June 2019; pp. 1871–1880. [Google Scholar] [CrossRef]
- Huang, X.; Choi, S.H. SAPENet: Self-attention based prototype enhancement network for few-shot learning. Pattern Recognit. 2023, 135, 109170. [Google Scholar] [CrossRef]
- Lin, H.; Cheng, X.; Wu, X.; Shen, D. Cat: Cross attention in vision transformer. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Hou, R.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Cross attention network for few-Shot classification. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 4003–4014. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
- Pfeiffer, J.; Kamath, A.; Rücklé, A.; Cho, K.; Gurevych, I. Adapterfusion: Non-destructive task composition for transfer learning. arXiv 2020, arXiv:2005.00247. [Google Scholar] [CrossRef]
- Sun, Z.; Zheng, W.; Wang, M. SLTRN: Sample-level transformer-based relation network for few-shot classification. Neural Netw. 2024, 176, 106344. [Google Scholar] [CrossRef] [PubMed]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset; Technical Report CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Li, H.; Sun, Y.; Qiao, S. Enhanced lithology identification with few-Shot well-logging data using a confidence-enhanced semi-supervised meta-learning approach. Measurement 2025, 247, 116762. [Google Scholar] [CrossRef]
- Li, H.; Qiao, S.; Sun, Y. A depth graph attention-based multi-channel transfer learning network for fluid classification from logging data. Phys. Fluids 2024, 36, 106623. [Google Scholar] [CrossRef]
- Chen, W.-Y.; Liu, Y.-C.; Kira, Z.; Wang, Y.-C.F.; Huang, J.-B. A closer look at few-shot classification. arXiv 2019, arXiv:1904.04232. [Google Scholar] [CrossRef]
- Oreshkin, B.; Rodríguez López, P.; Lacoste, A. TADAM: Task dependent adaptive metric for improved few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; pp. 721–731. [Google Scholar]
- Hao, F.; He, F.; Cheng, J.; Wang, L.; Cao, J.; Tao, D. Collect and select: Semantic alignment metric learning for few-Shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8460–8469. Available online: https://github.com/haofusheng/saml (accessed on 2 January 2026).
- Zhang, C.; Cai, Y.; Lin, G.; Shen, C. DeepEMD: Differentiable Earth Mover’s distance for few-shot learning. arXiv 2020, arXiv:2003.06777. [Google Scholar] [CrossRef]
- Ye, H.-J.; Hu, H.; Zhan, D.-C.; Sha, F. Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 8808–8817. Available online: https://openaccess.thecvf.com/content_CVPR_2020/html/Ye_Few-Shot_Learning_via_Embedding_Adaptation_With_Set-to-Set_Functions_CVPR_2020_paper.html (accessed on 2 January 2026).
- Zhou, Z.; Qiu, X.; Xie, J.; Wu, J.; Zhang, C. Binocular mutual learning for improving few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; pp. 8402–8411. [Google Scholar] [CrossRef]
- Qiao, S.; Huang, M.; Li, H.; Wang, L.; Wenjing, Y.; Sun, Y.; Zhao, Z. FedSSH: A consumer-oriented federated semi-supervised heterogeneous IoMT framework. IEEE Trans. Consum. Electron. 2025, 71, 8465–8476. [Google Scholar] [CrossRef]
- Qiao, S.; Guo, Q.; Wang, M.; Zhu, H.; Rodrigues, J.J.P.C.; Lyu, Z. FRW-TRACE: Forensic-ready watermarking framework for tamper-resistant biometric data and attack traceability in consumer electronics. IEEE Trans. Consum. Electron. 2025, 71, 8234–8245. [Google Scholar] [CrossRef]
- Qiao, S.; Zhu, H.; Sha, L.; Wang, M.; Guo, Q. DynMark: A dynamic packet counting watermarking scheme for robust traffic tracing in network flows. Comput. Secur. 2025, 157, 104571. [Google Scholar] [CrossRef]
- Qiao, S.; Guo, Q.; Wang, M.; Zhu, H.; Rodrigues, J.J.P.C.; Lyu, Z. Advances in network flow watermarking: A survey. Comput. Secur. 2025, 159, 104653. [Google Scholar] [CrossRef]







| Method | Backbone | MiniImageNet | CUB | ||
|---|---|---|---|---|---|
| 1-Shot | 5-Shot | 1-Shot | 5-Shot | ||
| Cosine classifier (2019) [38] | Conv4 | 47.99 ± 0.18 | 66.93 ± 0.17 | 57.79 ± 0.22 | 74.03 ± 0.18 |
| MAML (2017) [15] | Conv4 | 48.70 ± 1.84 | 63.11 ± 0.92 | 47.85 ± 0.22 | 64.77 ± 0.20 |
| ProtoNet (2017) [1] | Conv4 | 49.42 ± 0.78 | 68.20 ± 0.66 | 54.52 ± 0.23 | 73.30 ± 0.17 |
| RelationNet (2018) [2] | Conv4 | 50.44 ± 0.82 | 65.32 ± 0.70 | 58.81 ± 0.24 | 75.23 ± 0.18 |
| TADAM (2018) [39] | Conv4 | 50.50 ± 0.20 | 69.09 ± 0.16 | 56.64 ± 0.23 | 73.66 ± 0.17 |
| MetaOptNet (2019) [18] | Conv4 | 51.28 ± 0.20 | 69.71 ± 0.16 | 49.52 ± 0.22 | 71.68 ± 0.18 |
| SAML (2019) [40] | Conv4 | 52.88 ± 0.20 | 68.17 ± 0.16 | 62.75 ± 0.23 | 78.24 ± 0.16 |
| DeepEMD (2020) [36] | Conv4 | 53.81 ± 0.20 | 70.56 ± 0.16 | 62.09 ± 0.23 | 83.58 ± 0.17 |
| SLTRN (2024) [34] | Conv4 | 52.11 ± 0.86 | 66.54 ± 0.70 | 67.55 ± 0.96 | 80.07 ± 0.65 |
| Ours | Conv4 | 55.71 ± 0.63 | 73.87 ± 0.44 | 70.37 ± 0.36 | 83.86 ± 0.23 |
| Cosine classifier (2019) [38] | ResNet-12 | 55.43 ± 0.81 | 77.18 ± 0.61 | 67.30 ± 0.86 | 84.75 ± 0.60 |
| ProtoNet (2017) [1] | ResNet-12 | 62.39 ± 0.21 | 80.53 ± 0.34 | 71.88 ± 0.91 | 87.42 ± 0.48 |
| DeepEMD (2020) [41] | ResNet-12 | 65.91 ± 0.82 | 82.41 ± 0.56 | 75.65 ± 0.83 | 88.69 ± 0.50 |
| MetaOptNet (2019) [18] | ResNet-12 | 62.64 ± 0.61 | 78.63 ± 0.46 | 72.00 ± 0.70 | 84.20 ± 0.50 |
| FEAT (2020) [42] | ResNet-12 | 66.78 ± 0.20 | 82.05 ± 0.14 | 73.27 ± 0.22 | 85.77 ± 0.14 |
| BML (2021) [43] | ResNet-12 | 67.04 ± 0.63 | 83.63 ± 0.29 | 76.21 ± 0.63 | 90.45 ± 0.36 |
| Ours | ResNet-12 | 67.42 ± 0.63 | 85.71 ± 0.57 | 77.13 ± 0.41 | 91.48 ± 0.69 |
| Method | MiniImageNet | CUB | ||
|---|---|---|---|---|
| 1-Shot | 5-Shot | 1-Shot | 5-Shot | |
| Base | ||||
| Base-MHA | ||||
| Base-FCM | ||||
| Base-FCM-AFM | ||||
| Base-FCM-AFM+IRPM (Ours) | ||||
| Method | Backbone | MiniImageNet | CUB | ||
|---|---|---|---|---|---|
| 1-Shot | 5-Shot | 1-Shot | 5-Shot | ||
| Vanilla Transformer | Conv4 | 54.61 ± 0.15 | 71.44 ± 0.36 | 68.96 ± 0.37 | 81.93 ± 0.54 |
| Ours | Conv4 | 55.71 ± 0.63 | 73.87 ± 0.44 | 70.37 ± 0.36 | 83.86 ± 0.23 |
| Method | MiniImageNet | CUB | ||
|---|---|---|---|---|
| 1-Shot | 5-Shot | 1-Shot | 5-Shot | |
| Traditional | 55.49 ± 0.22 | 73.34 ± 0.48 | 69.92 ± 0.57 | 83.75 ± 0.17 |
| Ours | 55.71 ± 0.63 | 73.87 ± 0.44 | 70.37 ± 0.36 | 83.86 ± 0.23 |
| Method | FLOPs (M) | Parameters (K) | Processing Time (ms) |
|---|---|---|---|
| Vanilla Transformer | 404.91 | 397.65 | 5.51 |
| Ours | 401.03 | 397.65 | 3.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, M.; Yang, C.; Sha, L.; Li, J.; Tang, S. FC-SBAAT: A Few-Shot Image Classification Approach Based on Feature Collaboration and Sparse Bias-Aware Attention in Transformers. Symmetry 2026, 18, 95. https://doi.org/10.3390/sym18010095
Wang M, Yang C, Sha L, Li J, Tang S. FC-SBAAT: A Few-Shot Image Classification Approach Based on Feature Collaboration and Sparse Bias-Aware Attention in Transformers. Symmetry. 2026; 18(1):95. https://doi.org/10.3390/sym18010095
Chicago/Turabian StyleWang, Min, Chengyu Yang, Lin Sha, Jiaqi Li, and Shikai Tang. 2026. "FC-SBAAT: A Few-Shot Image Classification Approach Based on Feature Collaboration and Sparse Bias-Aware Attention in Transformers" Symmetry 18, no. 1: 95. https://doi.org/10.3390/sym18010095
APA StyleWang, M., Yang, C., Sha, L., Li, J., & Tang, S. (2026). FC-SBAAT: A Few-Shot Image Classification Approach Based on Feature Collaboration and Sparse Bias-Aware Attention in Transformers. Symmetry, 18(1), 95. https://doi.org/10.3390/sym18010095

