P-Distill: Efficient and Effective Prompt Tuning Using Knowledge Distillation
Abstract
:1. Introduction
- We propose a method called P-Distill to compress the continuous prompts, effectively mitigating the limitation of reducing the model’s usable sequence length in prompt-based learning.
- We introduce a prompt distillation method utilizing the teacher model’s hidden-state and prediction outputs, influenced by continuous prompts, and propose a prompt initialization for stable prompt distillation.
- We validate P-Distill across multiple NLP benchmarks, demonstrating its ability to maintain or enhance accuracy while reducing prompt lengths by up to eight times.
2. Preliminaries
2.1. Pre-Trained Language Models Based on the Transformer
2.2. Prompt-Based Learning Methods
2.3. Knowledge Distillation
3. Methodology
3.1. Prompt-Based Teacher Learning
3.2. Prompt-Enhanced Distillation (P-Distill)
3.2.1. Prompt Initialization
3.2.2. Prompt Distillation
3.3. Distillation-Based Student Learning
4. Experiments
4.1. Datasets
4.2. Baselines
4.3. Experimental Details
4.4. Results
4.4.1. Results on SuperGLUE
4.4.2. Results on Other Tasks
4.4.3. Impact of Compression Ratio
4.5. Ablation Study
4.6. Impact of Prompt Initialization
4.7. Experimental Results of Applying P-Distill to P-Tuning Methodology
4.8. Inference Costs
4.9. Qualitative Analysis in Long-Sequence Classification
5. Conclusions
6. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; de Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-Efficient Transfer Learning for NLP. arXiv 2019, arXiv:1902.00751. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Liu, X.; Ji, K.; Fu, Y.; Tam, W.; Du, Z.; Yang, Z.; Tang, J. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 61–68. [Google Scholar] [CrossRef]
- Chen, S.; Wong, S.; Chen, L.; Tian, Y. Extending Context Window of Large Language Models via Positional Interpolation. arXiv 2023, arXiv:2306.15595. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.116922. [Google Scholar] [CrossRef]
- Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; Tang, J. GPT Understands, Too. arXiv 2023, arXiv:2103.10385. [Google Scholar] [CrossRef]
- Jiang, Z.; Xu, F.F.; Araki, J.; Neubig, G. How Can We Know What Language Models Know? Trans. Assoc. Comput. Linguist. 2020, 8, 423–438. [Google Scholar] [CrossRef]
- Shin, T.; Razeghi, Y.; IV, R.L.L.; Wallace, E.; Singh, S. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. arXiv 2020, arXiv:2010.15980. [Google Scholar] [CrossRef]
- Li, X.L.; Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 4582–4597. [Google Scholar] [CrossRef]
- Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for Natural Language Understanding. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 4163–4174. [Google Scholar] [CrossRef]
- Sun, Z.; Yu, H.; Song, X.; Liu, R.; Yang, Y.; Zhou, D. MobileBERT: A Compact Task-Agnostic BERT for Resource-Limited Devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2158–2170. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2020, arXiv:1910.01108. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Wang, Y.; Wang, J.; Zhang, X. Parameter-efficient online knowledge distillation for pretrained language models. Expert Syst. Appl. 2025, 265, 126040. [Google Scholar] [CrossRef]
- Zhong, Q.; Ding, L.; Liu, J.; Du, B.; Tao, D. PanDa: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation. IEEE Trans. Knowl. Data Eng. 2024, 36, 4835–4848. [Google Scholar] [CrossRef]
- Zhang, J.; Muhamed, A.; Anantharaman, A.; Wang, G.; Chen, C.; Zhong, K.; Cui, Q.; Xu, Y.; Zeng, B.; Chilimbi, T.; et al. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 1128–1136. [Google Scholar] [CrossRef]
- Calderon, N.; Mukherjee, S.; Reichart, R.; Kantor, A. A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 14632–14659. [Google Scholar] [CrossRef]
- Zhou, W.; Wang, Y.; Qian, X. Knowledge Distillation and Contrastive Learning for Detecting Visible-Infrared Transmission Lines Using Separated Stagger Registration Network. IEEE Trans. Circuits Syst. I Regul. Pap. 2024, 1–13. [Google Scholar] [CrossRef]
- Wang, J.; Chen, Y.; Zheng, Z.; Li, X.; Cheng, M.M.; Hou, Q. CrossKD: Cross-Head Knowledge Distillation for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16520–16530. [Google Scholar]
- Lan, Q.; Tian, Q. Gradient-Guided Knowledge Distillation for Object Detectors. arXiv 2023, arXiv:2303.04240. [Google Scholar] [CrossRef]
- Lan, Q.; Tian, Q. Instance, Scale, and Teacher Adaptive Knowledge Distillation for Visual Detection in Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 2358–2370. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3045–3059. [Google Scholar] [CrossRef]
- Vu, T.; Lester, B.; Constant, N.; Al-Rfou’, R.; Cer, D. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5039–5059. [Google Scholar] [CrossRef]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for Thin Deep Nets. arXiv 2015, arXiv:1412.6550. [Google Scholar] [CrossRef]
- Wang, A.; Pruksachatkun, Y.; Nangia, N.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S.R. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv 2019, arXiv:1905.00537. [Google Scholar] [CrossRef]
- Clark, C.; Lee, K.; Chang, M.W.; Kwiatkowski, T.; Collins, M.; Toutanova, K. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 2924–2936. [Google Scholar] [CrossRef]
- De Marneffe, M.C.; Simons, M.; Tonhauser, J. The CommitmentBank: Investigating Projection in Naturally Occurring Discourse. 2019. To Appear in Proceedings of Sinn und Bedeutung 23. Available online: https://github.com/mcdm/CommitmentBank/ (accessed on 20 January 2025).
- Roemmele, M.; Bejan, C.A.; Gordon, A.S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In Proceedings of the 2011 AAAI Spring Symposium Series, Stanford, CA, USA, 21–23 March 2011. [Google Scholar]
- Khashabi, D.; Chaturvedi, S.; Roth, M.; Upadhyay, S.; Roth, D. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 252–262. [Google Scholar]
- Zhang, S.; Liu, X.; Liu, J.; Gao, J.; Duh, K.; Durme, B.V. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension. arXiv 2018, arXiv:1810.12885. [Google Scholar]
- Dagan, I.; Glickman, O.; Magnini, B. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment; Springer: Berlin/Heidelberg, Germany, 2006; pp. 177–190. [Google Scholar]
- Bar Haim, R.; Dagan, I.; Dolan, B.; Ferro, L.; Giampiccolo, D.; Magnini, B.; Szpektor, I. The Second PASCAL Recognising Textual Entailment Challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, Venice, Italy, 10–12 April 2006; Volume 7, pp. 785–794. [Google Scholar]
- Pilehvar, M.T.; Camacho-Collados, J. WiC: The Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Levesque, H.J.; Davis, E.; Morgenstern, L. The Winograd schema challenge. In Proceedings of the AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, Stanford, CA, USA, 21–23 March 2011; Volume 46, p. 47. [Google Scholar]
- Tjong Kim Sang, E.F.; De Meulder, F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, AB, Canada, 31 May–1 June 2003; pp. 142–147. [Google Scholar]
- Carreras, X.; Màrquez, L. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, Boston, MA, USA, 6–7 May 2004; pp. 89–97. [Google Scholar]
- Carreras, X.; Màrquez, L. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, MI, USA, 29–30 June 2005; Dagan, I., Gildea, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2005; pp. 152–164. [Google Scholar]
- Pradhan, S.; Moschitti, A.; Xue, N.; Uryupina, O.; Zhang, Y. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Proceedings of the Joint Conference on EMNLP and CoNLL - Shared Task, Jeju Island, Republic of Korea, 12–14 July 2012; Pradhan, S., Moschitti, A., Xue, N., Eds.; Ninth Conference on Computational Natural Language Learning. pp. 1–40. [Google Scholar]
- Weischedel, R.; Palmer, M.; Marcus, M.; Hovy, E.; Pradhan, S.; Ramshaw, L.; Xue, N.; Taylor, A.; Kaufman, J.; Franchini, M.; et al. Ontonotes release 5.0 ldc2013t19. Linguist. Data Consort. 2013, 23, 170. [Google Scholar]
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; Su, J., Duh, K., Carreras, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 2383–2392. [Google Scholar] [CrossRef]
- Rajpurkar, P.; Jia, R.; Liang, P. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 784–789. [Google Scholar] [CrossRef]
- Li, C.; Goldwasser, D. Encoding Social Information with Graph Convolutional Networks forPolitical Perspective Detection in News Media. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 2594–2604. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Liu, Q., Schlangen, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 38–45. [Google Scholar] [CrossRef]
Symbol | Description |
---|---|
x | Input sequence or data point |
Query, key, and value matrices in attention mechanism | |
Dimensionality of key vectors; scaling factor for gradient stabilization | |
softmax | Softmax function used for normalization in attention |
Continuous prompts added to key and value vectors | |
Length of continuous prompts | |
d | Dimensionality of hidden states and key/value vectors |
Behavior functions of teacher and student models in knowledge distillation | |
Knowledge distillation loss function | |
X | Dataset used for training or evaluation |
L | Loss function (e.g., KL divergence, MSE) |
Prompt length of teacher model | |
Prompt length of student model | |
Learnable weight matrices for prompt initialization | |
Bias terms for prompt initialization | |
Cross-entropy loss function | |
Prediction-layer distillation loss | |
Hidden-state distillation loss | |
Weighted coefficients for loss combination |
BoolQ Acc. | CB Acc. | COPA Acc. | MultiRC F1a | ReCoRD F1 | RTE Acc. | WiC Acc. | WSC Acc. | Average | |
---|---|---|---|---|---|---|---|---|---|
Fine-tuning | 0.777 | 0.946 | 0.710 | 0.705 | 0.706 | 0.762 | 0.746 | 0.683 | 0.754 |
P-tuning v2 | 0.764(8) | 0.946(32) | 0.810(4) | 0.711(16) | 0.728(16) | 0.794(4) | 0.756(4) | 0.731(16) | 0.780 |
0.738(1) | 0.929(4) | 0.790(1) | 0.707(2) | 0.721(2) | 0.783(1) | 0.745(1) | 0.692(2) | 0.763 | |
P-Distill | 0.776(1) | 0.964(4) | 0.810(1) | 0.718(2) | 0.726(2) | 0.798(1) | 0.759(1) | 0.721(2) | 0.784 |
NER | SRL | QA | SC | |||||
---|---|---|---|---|---|---|---|---|
CoNLL03 | CoNLL04 | OntoNotes 5.0 | CoNLL05 Brown | CoNLL05 WSJ | SQuAD 1.1 Dev | SQuAD 2.0 Dev | Allsides | |
Fine-tuning | 0.928 | 0.882 | 0.890 | 0.827 | 0.885 | 0.911 | 0.819 | 0.780 |
P-tuning v2 | 0.919(64) | 0.880(128) | 0.885(128) | 0.837(32) | 0.890(128) | 0.902(64) | 0.782(128) | 0.775(32) |
0.914(8) | 0.866(16) | 0.881(16) | 0.807(4) | 0.877(16) | 0.891(8) | 0.771(16) | 0.772(4) | |
P-Distill | 0.919(8) | 0.888(16) | 0.886(16) | 0.817(4) | 0.885(16) | 0.896(8) | 0.775(16) | 0.783(4) |
CoNLL05 WSJ | Fine-Tuning | P-Tuning v2 | P-Distill |
0.885 | 0.890(128) | 0.890(64) | |
0.888(32) | |||
0.885(16) |
CoNLL03 | CoNLL04 | CoNLL05 WSJ | CoNLL05 Brown | |
---|---|---|---|---|
P-Distill | 0.919 | 0.888 | 0.885 | 0.817 |
P-Distillmean | 0.915 | 0.875 | 0.878 | 0.809 |
P-Distillmax | 0.912 | 0.872 | 0.872 | 0.803 |
CB | COPA | RTE | |
---|---|---|---|
Fine-tuning | 0.946 | 0.71 | 0.762 |
P-tuning | 0.821(16) | 0.76(16) | 0.657(16) |
0.786(2) | 0.70(2) | 0.621(4) | |
P-Distill | 0.821(2) | 0.76(2) | 0.646(4) |
Fine-Tuning | P-Tuning v2 | P-Distill | |
---|---|---|---|
BoolQ | 89.06 | 89.17(8) | 89.07(1) |
CB | 53.94 | 54.22(32) | 53.97(4) |
COPA | 21.82 | 21.83(4) | 21.82(1) |
MultiRC | 226.94 | 227.50(16) | 227.01(2) |
ReCoRD | 163.12 | 163.53(16) | 163.17(2) |
RTE | 42.78 | 42.81(4) | 42.79(1) |
WiC | 18.83 | 18.84(4) | 18.83(1) |
WSC | 23.11 | 23.17(4) | 23.11(1) |
P-tuning v2 Input Text (480 tokens): President Trump on Wednesday lashed out over a critical news report and escalated his previous attacks on the media by suggesting that news organizations he disagrees with be shut down, alarming free-speech advocates who compared the tactics to intimidation efforts by the Nixon administration. […] Last week, angered by the ongoing investigations into his campaign’s ties to Russia, Trump suggested that the Senate Intelligence Committee investigate news outlets over “fake news”. Over the weekend, he expressed disdain at late-night television hosts over their “anti-Trump” material and proposed bringing back the Fairness Doctrine, a rule phased out in 1987 that had required |
Prediction: Left |
P-Distill Input Text (508 tokens): President Trump on Wednesday lashed out over a critical news report and escalated his previous attacks on the media by suggesting that news organizations he disagrees with be shut down, alarming free-speech advocates who compared the tactics to intimidation efforts by the Nixon administration. […] Last week, angered by the ongoing investigations into his campaign’s ties to Russia, Trump suggested that the Senate Intelligence Committee investigate news outlets over “fake news”. Over the weekend, he expressed disdain at late-night television hosts over their “anti-Trump” material and proposed bringing back the Fairness Doctrine, a rule phased out in 1987 that had required broadcasters to provide “equal time” for divergent political views on certain issues. First Amendment advocates roundly condemned the president over his remarks, calling them an assault |
Prediction: Center |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Won, H.-S.; Choi, J.-Y.; Zaman, N.; Aliyeva, D.; Kim, K.-M. P-Distill: Efficient and Effective Prompt Tuning Using Knowledge Distillation. Appl. Sci. 2025, 15, 2420. https://doi.org/10.3390/app15052420
Won H-S, Choi J-Y, Zaman N, Aliyeva D, Kim K-M. P-Distill: Efficient and Effective Prompt Tuning Using Knowledge Distillation. Applied Sciences. 2025; 15(5):2420. https://doi.org/10.3390/app15052420
Chicago/Turabian StyleWon, Hyun-Sik, Joon-Young Choi, Namrah Zaman, Dinara Aliyeva, and Kang-Min Kim. 2025. "P-Distill: Efficient and Effective Prompt Tuning Using Knowledge Distillation" Applied Sciences 15, no. 5: 2420. https://doi.org/10.3390/app15052420
APA StyleWon, H.-S., Choi, J.-Y., Zaman, N., Aliyeva, D., & Kim, K.-M. (2025). P-Distill: Efficient and Effective Prompt Tuning Using Knowledge Distillation. Applied Sciences, 15(5), 2420. https://doi.org/10.3390/app15052420