Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer
Abstract
1. Introduction
- (1)
- We propose a BMI-Driven Adaptive Verbalizer Construction method that combines Bayesian-inspired re-weighting strategies with mutual information. By formalizing the probability refinement process as , our method dynamically integrates the general linguistic knowledge of PLMs with the specific data distribution of the few-shot task. This effectively alleviates biases in prior knowledge and constructs a robust adaptive verbalizer.
- (2)
- We developed a prompt-template-guided multi-granularity label data augmentation approach, aiming to generate high-quality semantic-coherent samples. Additionally, we introduced an innovative data augmentation training framework that integrates adversarial training strategies and Difficulty-Aware Learning to address the inherent complexity of FSTC tasks and enhance model robustness. This method demonstrates great potential in enhancing other fine-grained natural language understanding tasks.
- (3)
- Extensive experiments on four benchmark datasets demonstrate that our method outperforms existing FSTC models and fine-tuning strategies, achieving state-of-the-art results.
2. Related Work
2.1. Few-Shot Text Classification
2.2. Prompt Tuning for PLMs
3. Methodology
3.1. BMI-Driven Adaptive Verbalizer Construction
3.1.1. PLM-Based Candidate Word Search
3.1.2. Bayesian Mutual Information Refinement
3.2. Prompt-Template-Guided Multi-Granularity Label Data Augmentation
3.3. Low-Entropy Selector
3.4. Difficulty-Aware Adversarial Training
4. Experiments
4.1. Datasets and Templates
4.2. Baselines
- Fine-tuning: As a traditional transfer learning baseline, it adapts pre-trained language models to downstream tasks by directly adjusting their weights [56]. We include it to demonstrate the limitations of standard fine-tuning methods in few-shot settings, particularly their sensitivity to data sparsity and overfitting.
- Prompt tuning: As a representative prompt-based method, it reformulates classification tasks as masked language modeling problems using manually crafted templates [57]. We incorporate it to validate the advantages of prompt-based approaches over fine-tuning in few-shot scenarios while highlighting the limitations of manual prompts.
- EDA: As a classical data augmentation technique, it generates synthetic samples through lexical-level transformations [15]. We select it to compare traditional augmentation strategies with our proposed semantic-consistent augmentation approach, particularly emphasizing EDA’s shortcomings in preserving label semantics and ensuring sample quality.
- KPT: As a knowledge-enhanced prompt-tuning method, it enriches prompt content by incorporating external knowledge bases [44]. We use it as a baseline to verify the costs of external knowledge-based methods in few-shot settings and to demonstrate the advantages of our proposed external-knowledge-free adaptive approach.
- SKP: As a state-of-the-art soft prompt-tuning method, it constructs verbalizers using learnable soft tokens [43]. We choose this advanced baseline to demonstrate the efficacy of our method in verbalizer construction and its theoretical advantages over soft prompts in label mapping.
4.3. Experimental Setting
4.4. Experimental Results
- (1)
- In comparison to the other methods, across different shot settings (1/5/10/20-shot), our method achieved the highest Micro-F1 scores on almost all datasets. For example, it achieved 87.1 Micro-F1 on the AG’s News dataset (5-shot), an impressive 98.1 Micro-F1 on the DBPedia dataset (5-shot), and 93.7 Micro-F1 on the Amazon dataset (5-shot). These results demonstrate the outstanding performance of AMLDA in few-shot text classification tasks, particularly in scenarios with extremely limited training samples.
- (2)
- Compared to the other data augmentation models, such as PT+EDA, EDA indiscriminately deletes or inserts tokens, introducing more noise into the prediction model and affecting its decision-making process. In contrast, AMLDA consistently outperformed EDA across all baselines, highlighting its effectiveness in improving data quality and reducing noise.
- (3)
- In most cases, our approach outperforms other state-of-the-art prompt-tuning methods like KPT and SKP. The results demonstrate that the AMLDA model improves the accuracy of prompt-based FSTC methods. However, it performs slightly worse than SKP on the IMDB dataset. Notably, our method not only achieves better average performance but also exhibits superior stability. For instance, on the Amazon dataset with 1-shot learning, our method achieves a standard deviation of 1.2 compared to SKP’s 2.1, which is significantly lower. This indicates that, with very few samples available, our method enhances the model’s robustness to variations in the input data.
- (1)
- Semantic Distinctiveness: Despite having the highest number of classes (14), DBpedia yields the highest performance. This is because its categories are semantically well separated (e.g., Artist vs. OfficeHolder), which aligns effectively with our BMI-driven verbalizer. Conversely, AG’s News involves topics with higher lexical overlap (e.g., Business and Politics), making the few-shot boundary inherently more difficult to define.
- (2)
- Domain-Knowledge Alignment: The performance gap also stems from the varying degrees of alignment between the PLM’s pre-trained knowledge and the target domain. RoBERTa exhibits stronger zero-shot priors for the global facts found in DBpedia than for the specific linguistic patterns in localized news or sentiment datasets.
- (3)
- Label Granularity: In binary sentiment tasks like Amazon and IMDB, the model focuses on polar semantics. The higher baseline variance compared to DBpedia suggests that while the task is simpler (fewer classes), the model’s sensitivity to the specific nuances of “positive/negative” phrasing in few-shot samples is higher, a challenge that our DAAT module specifically aims to mitigate.
4.5. Computational Complexity Analysis
4.6. Ablations
4.7. AMLDA and Conventional Data Augmentation
4.7.1. Comparison with Conventional Data Augmentation
4.7.2. Combination with Conventional Data Augmentation
4.8. Parameter Sensitivity
4.9. Error Analysis and Case Studies
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-Dabbagh, B.S.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
- Kotei, E.; Thirunavukarasu, R. A systematic review of transformer-based pre-trained language models through self-supervised learning. Information 2023, 14, 187. [Google Scholar] [CrossRef]
- Zhu, K.; Wang, J.; Zhou, J.; Wang, Z.; Chen, H.; Wang, Y.; Yang, L.; Ye, W.; Zhang, Y.; Gong, N.; et al. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. In Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, Salt Lake City, UT, USA, 14–18 October 2024; pp. 57–68. [Google Scholar]
- Schick, T.; Schütze, H. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Online, 19–23 April 2021; pp. 255–269. [Google Scholar] [CrossRef]
- Ding, N.; Hu, S.; Zhao, W.; Chen, Y.; Liu, Z.; Zheng, H.; Sun, M. OpenPrompt: An Open-source Framework for Prompt-learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Dublin, Ireland, 22–27 May 2022; pp. 105–113. [Google Scholar] [CrossRef]
- Ju, T.; Zheng, Y.; Wang, H.; Zhao, H.; Liu, G. Is continuous prompt a combination of discrete prompts? towards a novel view for interpreting continuous prompts. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 7804–7819. [Google Scholar]
- Chen, Y.; Yang, G.; Wang, D.; Li, D. Eliciting knowledge from language models with automatically generated continuous prompts. Expert Syst. Appl. 2024, 239, 122327. [Google Scholar] [CrossRef]
- Hambardzumyan, K.; Khachatrian, H.; May, J. WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 4921–4933. [Google Scholar] [CrossRef]
- Zhao, Z.; Wallace, E.; Feng, S.; Klein, D.; Singh, S. Calibrate Before Use: Improving Few-shot Performance of Language Models. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; Volume 139, pp. 12697–12706. [Google Scholar]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Chen, C.; Shu, K. PromptDA: Label-guided Data Augmentation for Prompt-based Few Shot Learners. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics 2023, Dubrovnik, Croatia, 2–6 May 2023; pp. 562–574. [Google Scholar] [CrossRef]
- Schick, T.; Schmid, H.; Schütze, H. Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 5569–5578. [Google Scholar] [CrossRef]
- Holtzman, A.; West, P.; Shwartz, V.; Choi, Y.; Zettlemoyer, L. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 7038–7051. [Google Scholar] [CrossRef]
- Guo, Y.; Guo, M.; Su, J.; Yang, Z.; Zhu, M.; Li, H.; Qiu, M.; Liu, S.S. Bias in large language models: Origin, evaluation, and mitigation. arXiv 2024, arXiv:2411.10915. [Google Scholar] [CrossRef]
- Wei, J.; Zou, K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6382–6388. [Google Scholar] [CrossRef]
- Bayer, M.; Kaufhold, M.A.; Buchhold, B.; Keller, M.; Dallmeyer, J.; Reuter, C. Data augmentation in natural language processing: A novel text generation approach for long and short text classifiers. Int. J. Mach. Learn. Cybern. 2023, 14, 135–150. [Google Scholar] [CrossRef]
- Bayer, M.; Kaufhold, M.A.; Reuter, C. A survey on data augmentation for text classification. ACM Comput. Surv. 2022, 55, 1–39. [Google Scholar] [CrossRef]
- Zhou, J.; Zheng, Y.; Tang, J.; Jian, L.; Yang, Z. FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 8646–8665. [Google Scholar] [CrossRef]
- Abaskohi, A.; Rothe, S.; Yaghoobzadeh, Y. LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 670–681. [Google Scholar] [CrossRef]
- Luo, Q.; Liu, L.; Lin, Y.; Zhang, W. Don’t Miss the Labels: Label-semantic Augmented Meta-Learner for Few-Shot Text Classification. In Proceedings of the Findings of Association for Computational Linguistics, Online, 1–6 August 2021; pp. 2773–2782. [Google Scholar]
- Cao, C.; Zhou, F.; Dai, Y.; Wang, J.; Zhang, K. A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
- Sui, D.; Chen, Y.; Mao, B.; Qiu, D.; Liu, K.; Zhao, J. Knowledge Guided Metric Learning for Few-Shot Text Classification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 3266–3271. [Google Scholar] [CrossRef]
- Zhuo, L.; Wang, Z.; Fu, Y.; Qian, T. Prompt as free lunch: Enhancing diversity in source-free cross-domain few-shot learning through semantic-guided prompting. arXiv 2024, arXiv:2412.00767. [Google Scholar]
- Kumar, V.; Choudhary, A.; Cho, E. Data Augmentation using Pre-trained Transformer Models. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, Suzhou, China, 21 September–9 October 2020; pp. 18–26. [Google Scholar]
- Ma, L.; Liang, L. Adaptive adversarial training to improve adversarial robustness of DNNs for medical image segmentation and detection. arXiv 2022, arXiv:2206.01736. [Google Scholar] [CrossRef]
- Fang, H.; Kong, J.; Yu, W.; Chen, B.; Li, J.; Wu, H.; Xia, S.; Xu, K. One perturbation is enough: On generating universal adversarial perturbations against vision-language pre-training models. arXiv 2024, arXiv:2406.05491. [Google Scholar]
- Zheng, H.; Zhong, Q.; Ding, L.; Tian, Z.; Niu, X.; Wang, C.; Li, D.; Tao, D. Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 8964–8974. [Google Scholar] [CrossRef]
- Aljehani, A.; Hasan, S.H.; Khan, U.A. Advancing text classification: A systematic review of few-shot learning approaches. Int. J. Comput. Digit. Syst. 2024, 16, 1–14. [Google Scholar] [CrossRef]
- Chae, Y.; Davidson, T. Large language models for text classification: From zero-shot learning to instruction-tuning. Sociol. Methods Res. 2025. [Google Scholar] [CrossRef]
- Lei, T.; Hu, H.; Luo, Q.; Peng, D.; Wang, X. Adaptive Meta-learner via Gradient Similarity for Few-shot Text Classification. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 4873–4882. [Google Scholar]
- Vettoruzzo, A.; Bouguelia, M.R.; Rögnvaldsson, T. Multimodal meta-learning through meta-learned task representations. Neural Comput. Appl. 2024, 36, 8519–8529. [Google Scholar] [CrossRef]
- Huang, Z.; Shen, L.; Yu, J.; Han, B.; Liu, T. Flatmatch: Bridging labeled data and unlabeled data with cross-sharpness for semi-supervised learning. Adv. Neural Inf. Process. Syst. 2023, 36, 18474–18494. [Google Scholar]
- Chen, Y.; Mancini, M.; Zhu, X.; Akata, Z. Semi-supervised and unsupervised deep visual learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 46, 1327–1347. [Google Scholar] [CrossRef]
- Wei, X.S.; Xu, H.Y.; Zhang, F.; Peng, Y.; Zhou, W. An embarrassingly simple approach to semi-supervised few-shot learning. Adv. Neural Inf. Process. Syst. 2022, 35, 14489–14500. [Google Scholar]
- Zhu, D.; Shen, X.; Mosbach, M.; Stephan, A.; Klakow, D. Weaker than you think: A critical look at weakly supervised learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 14229–14253. [Google Scholar]
- Park, S.; Lee, J. LIME: Weakly-Supervised Text Classification without Seeds. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 1083–1088. [Google Scholar]
- Wang, T.; Wang, Z.; Liu, W.; Shang, J. WOT-Class: Weakly Supervised Open-world Text Classification. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 2666–2675. [Google Scholar]
- Liu, X.; Ji, K.; Fu, Y.; Tam, W.; Du, Z.; Yang, Z.; Tang, J. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland, 22–27 May 2022; pp. 61–68. [Google Scholar]
- Shi, Z. Optimising Language Models for Downstream Tasks: A Post-Training Perspective. arXiv 2025, arXiv:2506.20917. [Google Scholar] [CrossRef]
- Schick, T.; Schütze, H. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2339–2352. [Google Scholar] [CrossRef]
- Chang, K.; Xu, S.; Wang, C.; Luo, Y.; Xiao, T.; Zhu, J. Efficient Prompting Methods for Large Language Models: A Survey. arXiv 2024, arXiv:2404.01077. [Google Scholar] [CrossRef]
- Cohen, Y.; Aperstein, Y. A Review of Generative Pretrained Multi-step Prompting Schemes –and a New Multi-step Prompting Framework. Preprints 2024. [Google Scholar]
- Zhu, Y.; Wang, Y.; Mu, J.; Li, Y.; Qiang, J.; Yuan, Y.; Wu, X. Short text classification with Soft Knowledgeable Prompt-tuning. Expert Syst. Appl. 2024, 246, 123248. [Google Scholar] [CrossRef]
- Hu, S.; Ding, N.; Wang, H.; Liu, Z.; Wang, J.; Li, J.; Wu, W.; Sun, M. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 2225–2240. [Google Scholar] [CrossRef]
- Yin, W.; Hay, J.; Roth, D. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3914–3923. [Google Scholar] [CrossRef]
- Ling, T.; Chen, L.; Lai, Y.; Liu, H.L. Evolutionary Verbalizer Search for Prompt-Based Few Shot Text Classification. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Guangzhou, China, 16–18 August 2023; Springer: Cham, Switzerland, 2023; pp. 279–290. [Google Scholar]
- Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 3816–3830. [Google Scholar] [CrossRef]
- Yang, Y.; Pedersen, J.O. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, TN, USA, 8–12 July 1997; Volune 97, pp. 412–420. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef]
- Newman, M.E. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef]
- Kim, M.; Tack, J.; Shin, J.; Hwang, S.J. Entropy weighted adversarial training. In Proceedings of the ICML Workshop, Online, 18–24 July 2021. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level Convolutional Networks for Text Classification. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. Dbpedia—A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef]
- Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [Google Scholar]
- McAuley, J.; Leskovec, J. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 165–172. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 3045–3059. [Google Scholar]
- Ni, S.; Kao, H.Y. KPT++: Refined knowledgeable prompt tuning for few-shot text classification. Knowl.-Based Syst. 2023, 274, 110647. [Google Scholar] [CrossRef]









| Dataset | Type | Classes | Train Set | Test Set |
|---|---|---|---|---|
| AG’s News | Topic classification | 4 | 120,000 | 7600 |
| DBPedia | Topic classification | 14 | 560,000 | 70,000 |
| Amazon | Sentiment classification | 2 | 20,000 | 10,000 |
| IMDB | Sentiment classification | 2 | 25,000 | 25,000 |
| Shot | Method | AG’s News | DBPedia | Amazon | IMDB |
|---|---|---|---|---|---|
| 1 | Fine-tuning | 19.8 ± 10.4 | 8.6 ± 4.5 | 49.9 ± 0.2 | 50.0 ± 0.0 |
| Prompt-tuning | 80.0 ± 6.0 | 92.2 ± 2.5 | 91.9 ± 2.7 | 91.2 ± 3.7 | |
| PT + EDA | 79.3 ± 7.6 | 90.3 ± 1.9 | 88.6 ± 3.5 | 89.5 ± 2.4 | |
| KPT | 83.7 ± 3.5 | 93.7 ± 1.8 | 93.2 ± 1.3 | 92.2 ± 3.0 | |
| SKP | 84.3 ± 2.9 | / | 92.6 ± 2.1 | 92.9 ± 1.7 | |
| ours | 86.1 ± 2.5 | 94.5 ± 1.7 | 92.4 ± 1.2 | 91.2 ± 1.6 | |
| 5 | Fine-tuning | 37.9 ± 10.0 | 95.8 ± 1.3 | 52.1 ± 1.3 | 51.4 ± 1.4 |
| Prompt-tuning | 82.7 ± 2.7 | 97.0 ± 0.6 | 92.2 ± 3.3 | 91.9 ± 3.1 | |
| PT + EDA | 80.6 ± 4.1 | 92.7 ± 2.9 | 89.5 ± 3.1 | 89.8 ± 3.6 | |
| KPT | 85.0 ± 1.2 | 97.1 ± 0.4 | 93.4 ± 1.9 | 92.7 ± 1.5 | |
| SKP | 84.4 ± 1.8 | / | 93.3 ± 1.7 | 93.1 ± 1.6 | |
| ours | 87.1 ± 1.4 | 98.1 ± 0.6 | 93.7 ± 1.3 | 92.8 ± 1.4 | |
| 10 | Fine-tuning | 75.9 ± 8.4 | 93.8 ± 2.2 | 83.0 ± 7.0 | 76.2 ± 8.7 |
| Prompt-tuning | 84.9 ± 2.4 | 97.6 ± 0.4 | 93.9 ± 1.3 | 93.0 ± 1.7 | |
| PT + EDA | 81.4 ± 3.6 | 93.5 ± 2.4 | 92.7 ± 1.8 | 91.1 ± 3.1 | |
| KPT | 86.3 ± 1.6 | 98.0 ± 0.2 | 93.8 ± 1.2 | 92.9 ± 1.8 | |
| SKP | 86.6 ± 1.2 | / | 94.1 ± 1.2 | 94.1 ± 1.5 | |
| ours | 89.1 ± 1.1 | 98.3 ± 0.2 | 94.6 ± 0.8 | 93.6 ± 1.3 | |
| 20 | Fine-tuning | 85.4 ± 1.8 | 97.9 ± 0.2 | 71.4 ± 4.3 | 78.5 ± 10.1 |
| Prompt-tuning | 86.5 ± 1.6 | 97.7 ± 0.3 | 93.5 ± 1.0 | 93.0 ± 1.1 | |
| PT + EDA | 84.1 ± 1.4 | 93.6 ± 2.7 | 92.9 ± 1.9 | 92.8 ± 2.0 | |
| KPT | 87.2 ± 0.8 | 98.1 ± 0.3 | 93.7 ± 1.6 | 93.1 ± 1.1 | |
| SKP | 88.0 ± 1.1 | / | 94.8 ± 2.2 | 95.0 ± 1.8 | |
| ours | 89.8 ± 0.6 | 98.8 ± 0.4 | 95.7 ± 1.3 | 95.1 ± 1.4 |
| Method | AGNews | DBPedia | Amazon | IMDB |
|---|---|---|---|---|
| AMLDA | 89.8 ± 0.6 | 98.8 ± 0.4 | 95.7 ± 1.3 | 95.1 ± 1.4 |
| AMLDA w/o all | 86.5 ± 1.6 | 97.7 ± 0.3 | 93.5 ± 1.0 | 93.0 ± 1.1 |
| AMLDA w/o DAAT | 89.1 ± 1.3 | 98.2 ± 0.6 | 94.5 ± 1.4 | 94.3 ± 2.1 |
| AMLDA w/o PTMD-LDA | 87.3 ± 0.9 | 97.9 ± 0.5 | 94.3 ± 0.8 | 93.2 ± 1.3 |
| AMLDA w/o Classical MI | 88.9 ± 1.1 | 98.3 ± 0.5 | 94.8 ± 1.2 | 93.8 ± 1.5 |
| AMLDA w/o BAVC | 88.6 ± 0.8 | 98.0 ± 0.6 | 94.5 ± 1.3 | 93.5 ± 1.6 |
| Method | AGNews | DBPedia | Amazon | IMDB |
|---|---|---|---|---|
| PT | 86.5 ± 1.6 | 97.7 ± 0.3 | 93.5 ± 1.0 | 93.0 ± 1.1 |
| PT + ConvDA | 84.1 ± 1.4 | 93.6 ± 2.7 | 92.9 ± 1.9 | 92.8 ± 2.0 |
| AMLDA | 89.8 ± 0.6 | 98.8 ± 0.4 | 95.7 ± 1.3 | 95.1 ± 1.4 |
| AMLDA + ConvDA | 89.9 ± 0.5 | 98.8 ± 0.2 | 96.2 ± 1.1 | 95.4 ± 0.8 |
| AMLDA + ConvDA (w/o LES) | 88.6 ± 1.8 | 97.9 ± 0.6 | 94.5 ± 1.3 | 93.2 ± 2.1 |
| Phase | Operation | Sample/Intermediate Result |
|---|---|---|
| Input (Original) | Raw input text with Ground Truth Label | Text: “Michael Phelps won his eighth gold medal at the Beijing Games, breaking Mark Spitz’s record.” Label: Sports |
| Phase 1: BAVC | Bayesian Mutual Information extracts top-m label words () | Adaptive Verbalizer (Top-3): { match, athlete, game } |
| Phase 2: LEG | Label-Enhanced Generator: Constructs label-enriched templates (semantic components) | Template A: “News about { match, athlete, game } and [MASK].” Template B: “This outlines { match, athlete, game } in [MASK].” Template C: “Regarding the { match, athlete, game } [MASK].” |
| Phase 3: PGG | Prompt-Guided Generator: Synthesizes samples with diverse structural positions (Front/Back) relative to input | Candidate 1: “[CLS] News about { match, athlete, game } and [MASK]. [SEP] Michael Phelps won... [SEP]” Candidate 2: “[CLS] Michael Phelps won... [SEP] This outlines { match, athlete, game } in [MASK]. [SEP]” Candidate 3: “[CLS] Regarding the { match, athlete, game } [MASK]. [SEP] Michael Phelps won... [SEP]” |
| Phase 4: LES | Low-Entropy Selector: Sorts candidates by Entropy . Set retention size . | Candidate 1: Entropy = 0.08 (Pred: Sports) → Rank 1 (Keep) Candidate 2: Entropy = 0.15 (Pred: Sports) → Rank 2 (Keep) Candidate 3: Entropy = 0.92 (Pred: Ambiguous) → Rank 3 (Discard) (Reason: High entropy indicates the template “Regarding...” introduced semantic ambiguity.) |
| Final Sample | Final Augmented Training Set for this instance | { Original Input, Candidate 1, Candidate 2 } |
| Misclassified Sample | Label | Prediction |
|---|---|---|
| Briefly: China interest in key Yukos unit China is interested in participating in the bidding for Yuganskneftegaz, the top oil-producing subsidiary of the Russian oil giant Yukos, a Chinese economic official was quoted as saying in a report Thursday by the Russian news agency Interfax. (AG’s News) | Business | Politics |
| Security scare as intruder dives in A CANADIAN husband’s love for his wife has led to a tightening of security at all Olympic venues in Athens. (AG’s News) | Sports | Politics |
| Disturbing readings.... This collection is terribly read, especially the woman’s voice, the strange crying tune she had bothered me so much that none of the words registered. If you like to buy an audio reading of poems, I highly recommend the collection producted by BBC, it is so far the best. (Amazon) | Negative | Positive |
| It was. This is a book about adoption. The subject description that I see listed for this book is WRONG at this time! This book is actually about: “Mary Bradford Clark, the author, is an adoptee. She gives birth to her daughter at 18, and places her for adoption. The daughter, Kathy, searches for Mary and they are reunited. Unfortunately, Kathy’s adoptive parents were not the greatest people, and her life was difficult. It all ends up with Mary ( her birthmom ) ADOPTING Kathy!” (Amazon) | Positive | Negative |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, D.; Li, Z.; Yu, J.; Zhou, Y. Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer. Information 2026, 17, 58. https://doi.org/10.3390/info17010058
Huang D, Li Z, Yu J, Zhou Y. Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer. Information. 2026; 17(1):58. https://doi.org/10.3390/info17010058
Chicago/Turabian StyleHuang, Deling, Zanxiong Li, Jian Yu, and Yulong Zhou. 2026. "Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer" Information 17, no. 1: 58. https://doi.org/10.3390/info17010058
APA StyleHuang, D., Li, Z., Yu, J., & Zhou, Y. (2026). Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer. Information, 17(1), 58. https://doi.org/10.3390/info17010058

