MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction
Abstract
:1. Introduction
- Based on the current state of the literature, this paper is the first to propose the possibility of label hardening during softening labels in knowledge distillation.
- We explored the possibility of combining multi-instance learning and knowledge distillation to train a teacher and student model at the bag and/or sentence levels.
- We explored the effects of multi-level training modes for the teacher and student model in knowledge distillation.
- We implemented three distillation instances based on the CNN, PCNN, and ATT-BiLSTM neural networks, which outperformed the baselines.
- Our distilled student models, which are trained on distantly supervised datasets, achieved a passable improvement of sentence-level evaluation performance on manually corrected distantly supervised data.
2. Related Work
2.1. Distantly Supervised Relation Extraction
2.2. Knowledge Distillation for DSRE
3. Method
3.1. Task Definition
3.2. Preliminary
3.3. Method Overview
3.4. Adaptive Temperature Regulation
3.5. Multi-Level Knowledge Distilling
3.6. Loss and Training
3.7. Implementation of MKDAT
4. Experiments
4.1. Datasets
4.2. Comparative Models and Evaluation Metrics
4.3. Experimental Settings
5. Results and Analysis
5.1. Main Results
5.2. Ablation Study
5.3. Case Study
6. Conclusions
6.1. Theoretical and Practical Implications
6.2. Summary and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2 July–7 August 2009; pp. 1003–1011. [Google Scholar]
- Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A nucleus for a web of open data. In The Semantic Web; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
- Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June 2008; pp. 1247–1250. [Google Scholar]
- Jiang, H.; Cui, L.; Xu, Z.; Yang, D.; Chen, J.; Li, C.; Liu, J.; Liang, J.; Wang, C.; Xiao, Y.; et al. Relation Extraction Using Supervision from Topic Knowledge of Relation Labels. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 5024–5030. [Google Scholar]
- Zhang, N.; Deng, S.; Sun, Z.; Wang, G.; Chen, X.; Zhang, W.; Chen, H. Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 3016–3025. [Google Scholar] [CrossRef]
- Li, Y.; Long, G.; Shen, T.; Zhou, T.; Yao, L.; Huo, H.; Jiang, J. Self-attention enhanced selective gate with entity-aware embedding for distantly supervised relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8269–8276. [Google Scholar]
- Lin, X.; Liu, T.; Jia, W.; Gong, Z. Distantly Supervised Relation Extraction using Multi-Layer Revision Network and Confidence-based Multi-Instance Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 165–174. [Google Scholar] [CrossRef]
- Riedel, S.; Yao, L.; McCallum, A. Modeling relations and their mentions without labeled text. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, Spain, 20–24 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 148–163. [Google Scholar]
- Wu, Y.; Bamman, D.; Russell, S. Adversarial training for relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1778–1783. [Google Scholar]
- Feng, J.; Huang, M.; Zhao, L.; Yang, Y.; Zhu, X. Reinforcement learning for relation classification from noisy data. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- He, Z.; Chen, W.; Wang, Y.; Zhang, W.; Wang, G.; Zhang, M. Improving neural relation extraction with positive and unlabeled learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 7927–7934. [Google Scholar]
- Chen, J.; Guo, Z.; Yang, J. Distant Supervision for Relation Extraction via Noise Filtering. In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, Shenzhen, China, 26 February–1 March 2021; pp. 361–367. [Google Scholar]
- Shang, Y.; Huang, H.Y.; Mao, X.L.; Sun, X.; Wei, W. Are noisy sentences useless for distant supervised relation extraction? In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8799–8806. [Google Scholar]
- Tang, S.; Zhang, J.; Zhang, N.; Wu, F.; Xiao, J.; Zhuang, Y. ENCORE: External neural constraints regularized distant supervision for relation extraction. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 1113–1116. [Google Scholar]
- Lei, K.; Chen, D.; Li, Y.; Du, N.; Yang, M.; Fan, W.; Shen, Y. Cooperative denoising for distantly supervised relation extraction. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 426–436. [Google Scholar]
- Li, R.; Yang, C.; Li, T.; Su, S. MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation Extraction. ACM Trans. Inf. Syst. (TOIS) 2022, 40, 1–32. [Google Scholar] [CrossRef]
- Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762. [Google Scholar] [CrossRef]
- Shi, G.; Feng, C.; Huang, L.; Zhang, B.; Ji, H.; Liao, L.; Huang, H. Genre Separation Network with Adversarial Training for Cross-genre Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1018–1023. [Google Scholar] [CrossRef]
- Zeng, X.; He, S.; Liu, K.; Zhao, J. Large scaled relation extraction with reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Zhao, H.; Li, R.; Li, X.; Tan, H. CFSRE: Context-aware based on frame-semantics for distantly supervised relation extraction. Knowl.-Based Syst. 2020, 210, 106480. [Google Scholar] [CrossRef]
- Liu, T.; Wang, K.; Chang, B.; Sui, Z. A soft-label method for noise-tolerant distantly supervised relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1790–1795. [Google Scholar]
- Wu, S.; Fan, K.; Zhang, Q. Improving distantly supervised relation extraction with neural noise converter and conditional optimal selector. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 7273–7280. [Google Scholar]
- Ye, Z.X.; Ling, Z.H. Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 2810–2819. [Google Scholar] [CrossRef]
- Yuan, Y.; Liu, L.; Tang, S.; Zhang, Z.; Zhuang, Y.; Pu, S.; Wu, F.; Ren, X. Cross-relation cross-bag attention for distantly-supervised relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 419–426. [Google Scholar]
- Wang, J.; Liu, Q. Distant supervised relation extraction with position feature attention and selective bag attention. Neurocomputing 2021, 461, 552–561. [Google Scholar] [CrossRef]
- Shen, S.; Duan, S.; Gao, H.; Qi, G. Improved distant supervision relation extraction based on edge-reasoning hybrid graph model. J. Web Semant. 2021, 70, 100656. [Google Scholar] [CrossRef]
- Huang, W.; Mao, Y.; Yang, L.; Yang, Z.; Long, J. Local-to-global GCN with knowledge-aware representation for distantly supervised relation extraction. Knowl.-Based Syst. 2021, 234, 107565. [Google Scholar] [CrossRef]
- Zhou, Q.; Zhang, Y.; Ji, D. Distantly supervised relation extraction with KB-enhanced reconstructed latent iterative graph networks. Knowl.-Based Syst. 2022, 260, 110108. [Google Scholar] [CrossRef]
- Alt, C.; Hübner, M.; Hennig, L. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction. arXiv 2019, arXiv:1906.08646. [Google Scholar]
- Peng, T.; Han, R.; Cui, H.; Yue, L.; Han, J.; Liu, L. Distantly supervised relation extraction using global hierarchy embeddings and local probability constraints. Knowl.-Based Syst. 2022, 235, 107637. [Google Scholar] [CrossRef]
- Gou, Y.; Lei, Y.; Liu, L.; Zhang, P.; Peng, X. A dynamic parameter enhanced network for distant supervised relation extraction. Knowl.-Based Syst. 2020, 197, 105912. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Kang, M.; Kang, S. Data-free knowledge distillation in neural networks for regression. Expert Syst. Appl. 2021, 175, 114813. [Google Scholar] [CrossRef]
- Jose, A.; Shetty, S.D. DistilledCTR: Accurate and scalable CTR prediction model through model distillation. Expert Syst. Appl. 2022, 193, 116474. [Google Scholar] [CrossRef]
- Li, Y.; Yang, J.; Song, Y.; Cao, L.; Luo, J.; Li, L.J. Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1910–1918. [Google Scholar]
- Sarfraz, F.; Arani, E.; Zonooz, B. Knowledge distillation beyond model compression. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6136–6143. [Google Scholar]
- Helong, Z.; Liangchen, S.; Jiajie, C.; Ye, Z.; Guoli, W.; Junsong, Y.; Zhang, Q. Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Furlanello, T.; Lipton, Z.; Tschannen, M.; Itti, L.; Anandkumar, A. Born again neural networks. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; pp. 1607–1616. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Zhu, T.; Wang, H.; Yu, J.; Zhou, X.; Chen, W.; Zhang, W.; Zhang, M. Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 6436–6447. [Google Scholar]
- Sepahvand, M.; Abdali-Mohammadi, F.; Taherkordi, A. Teacher–student knowledge distillation based on decomposed deep feature representation for intelligent mobile applications. Expert Syst. Appl. 2022, 202, 117474. [Google Scholar] [CrossRef]
- Tzelepi, M.; Passalis, N.; Tefas, A. Online subclass knowledge distillation. Expert Syst. Appl. 2021, 181, 115132. [Google Scholar] [CrossRef]
Hyperparameter | CNN | PCNN | BLSTM |
---|---|---|---|
9 | 6 | - | |
256 | 128 | - | |
- | - | 128 | |
- | - | ||
- | - | ||
32 | 64 | 64 | |
Model | F1 (DSGT) | F1 (MAGT) | ||
---|---|---|---|---|
Bag | Sentence | Bag | Sentence | |
CNN * | 54.707 | - | 38.989 | - |
CR-CNN * | 60.953 | - | 45.060 | - |
PCNN * | 57.687 | - | 44.011 | - |
ATT-BLSTM * | 60.165 | - | 45.928 | - |
CNN + ONE * | 32.711 | 45.325 | 31.695 | 39.539 |
CNN + ATT * | 30.976 | 31.660 | 30.488 | 26.433 |
PCNN + ONE * | 33.893 | 44.299 | 34.981 | 41.843 |
PCNN + ATT * | 31.913 | 34.879 | 32.367 | 28.805 |
CNN † | 75.178 | 67.699 | 62.789 | 59.043 |
PCNN † | 76.344 | 67.982 | 62.930 | 60.206 |
BLSTM † | 76.819 | 69.284 | 63.178 | 60.789 |
CNN- | 77.788 | 69.379 | 63.513 | 60.644 |
PCNN- | 77.206 | 70.766 | 64.297 | 62.085 |
BLSTM- | 78.085 | 68.982 | 64.699 | 60.319 |
CNN- | 78.043 | 70.439 | 64.040 | 60.746 |
PCNN- | 77.994 | 71.817 | 65.040 | 62.714 |
BLSTM- | 78.971 | 69.909 | 65.279 | 61.173 |
Model | F1 (MAGT) | |
---|---|---|
Bag | Sentence | |
Full PCNN- | 64.043 | 61.919 |
- No Temp. | 62.782 | 60.184 |
- Fixed Temp. | 63.518 | 60.971 |
- Fixed Anchor | 63.820 | 61.269 |
Full PCNN- | 64.957 | 62.259 |
- Only Sent. | 63.178 | 60.439 |
- Averaged Sent. | 63.597 | 61.032 |
- Only Bag | 64.191 | 61.723 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Long, J.; Yin, Z.; Han, Y.; Huang, W. MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction. Information 2024, 15, 382. https://doi.org/10.3390/info15070382
Long J, Yin Z, Han Y, Huang W. MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction. Information. 2024; 15(7):382. https://doi.org/10.3390/info15070382
Chicago/Turabian StyleLong, Jun, Zhuoying Yin, Yan Han, and Wenti Huang. 2024. "MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction" Information 15, no. 7: 382. https://doi.org/10.3390/info15070382
APA StyleLong, J., Yin, Z., Han, Y., & Huang, W. (2024). MKDAT: Multi-Level Knowledge Distillation with Adaptive Temperature for Distantly Supervised Relation Extraction. Information, 15(7), 382. https://doi.org/10.3390/info15070382