Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction
Abstract
:1. Introduction
2. Related Work
2.1. Relation Extraction
2.2. Few-Shot Learning
2.3. Few-Shot Relation Extraction
3. Problem Formulation
4. Methodology
4.1. Framework
4.2. Sentence Encoder
4.2.1. Sentence Representations
4.2.2. Relation Representations
4.3. Prototype Enhancement Module
4.3.1. Basic Prototype
4.3.2. Enhanced Prototype
4.3.3. Final Prototype
4.4. Contrastive–Center Loss
5. Experimental Settings
5.1. Dataset
5.2. Baselines
5.3. Implementation Details
5.4. Main Results
5.5. Domain Adaptation Results
5.6. Ablation Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443–456. [Google Scholar] [CrossRef]
- Xiao, G.; Corman, J. Ontology-Mediated SPARQL Query Answering over Knowledge Graphs. Big Data Res. 2021, 23, 100177. [Google Scholar] [CrossRef]
- Garcia, X.; Bansal, Y.; Cherry, C.; Foster, G.; Krikun, M.; Johnson, M.; Firat, O. The Unreasonable Effectiveness of Few-shot Learning for Machine Translation. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 10867–10878. [Google Scholar]
- Lawrie, D.; Yang, E.; Oard, D.W.; Mayfield, J. Neural Approaches to Multilingual Information Retrieval. In Advances in Information Retrieval; Kamps, J., Goeuriot, L., Crestani, F., Maistro, M., Joho, H., Davis, B., Gurrin, C., Kruschwitz, U., Caputo, A., Eds.; ECIR: Cham, Switzerland, 2023; pp. 521–536. [Google Scholar]
- Wang, Y.; Ma, W.; Zhang, M.; Liu, Y.; Ma, S. A Survey on the Fairness of Recommender Systems. ACM Trans. Inf. Syst. 2023, 41, 1–43. [Google Scholar] [CrossRef]
- Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar]
- Ye, Q.; Liu, L.; Zhang, M.; Ren, X. Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3841–3850. [Google Scholar] [CrossRef]
- Zhang, N.; Deng, S.; Sun, Z.; Wang, G.; Chen, X.; Zhang, W.; Chen, H. Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2–7 June 2019; pp. 3016–3025. [Google Scholar] [CrossRef]
- Luo, X.; Zhou, W.; Wang, W.; Zhu, Y.; Deng, J. Attention-Based Relation Extraction With Bidirectional Gated Recurrent Unit and Highway Network in the Analysis of Geological Data. IEEE Access 2018, 6, 5705–5715. [Google Scholar] [CrossRef]
- Li, Y.; Long, G.; Shen, T.; Zhou, T.; Yao, L.; Huo, H.; Jiang, J. Self-attention enhanced selective gate with entity-aware embedding for distantly supervised relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8269–8276. [Google Scholar]
- Lin, X.; Liu, T.; Jia, W.; Gong, Z. Distantly Supervised Relation Extraction using Multi-Layer Revision Network and Confidence-based Multi-Instance Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 17 April 2021; pp. 165–174. [Google Scholar] [CrossRef]
- Augenstein, I.; Maynard, D.; Ciravegna, F. Relation Extraction from the Web Using Distant Supervision. In Knowledge Engineering and Knowledge Management; Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E., Eds.; Springer: Cham, Switzerland, 2014; pp. 26–41. [Google Scholar]
- Sun, Q.; Liu, Y.; Chua, T.S.; Schiele, B. Meta-Transfer Learning for Few-Shot Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 403–412. [Google Scholar] [CrossRef]
- Lee, H.y.; Li, S.W.; Vu, T. Meta Learning for Natural Language Processing: A Survey. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA, 10–15 July 2022; pp. 666–684. [Google Scholar] [CrossRef]
- Mettes, P.; van der Pol, E.; Snoek, C.G.M. Hyperspherical Prototype Networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
- Yang, K.; Zheng, N.; Dai, X.; He, L.; Huang, S.; Chen, J. Enhance prototypical network with text descriptions for few-shot relation classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Shanghai, China, 19–23 October 2020; pp. 2273–2276. [Google Scholar]
- Han, J.; Cheng, B.; Lu, W. Exploring Task Difficulty for Few-Shot Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 17 April 2021; pp. 2605–2616. [Google Scholar] [CrossRef]
- Liu, Y.; Hu, J.; Wan, X.; Chang, T.H. A Simple yet Effective Relation Information Guided Approach for Few-Shot Relation Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; pp. 757–763. [Google Scholar] [CrossRef]
- Liu, Y.; Hu, J.; Wan, X.; Chang, T.H. Learn from Relation Information: Towards Prototype Representation Rectification for Few-Shot Relation Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, DC, USA, 10–15 July 2022; pp. 1822–1831. [Google Scholar] [CrossRef]
- Wen, M.; Xia, T.; Liao, B.; Tian, Y. Few-shot relation classification using clustering-based prototype modification. Knowl.-Based Syst. 2023, 268, 110477. [Google Scholar] [CrossRef]
- Zelenko, D.; Aone, C.; Richardella, A. Kernel Methods for Relation Extraction. J. Mach. Learn. Res. 2002, 3, 1083–1106. [Google Scholar]
- Deng, B.; Fan, X.; Yang, L. Entity relation extraction method using semantic pattern. Jisuanji Gongcheng/ Comput. Eng. 2007, 33, 212–214. [Google Scholar]
- Shlezinger, N.; Whang, J.; Eldar, Y.C.; Dimakis, A.G. Model-Based Deep Learning. Proc. IEEE 2023, 111, 465–499. [Google Scholar] [CrossRef]
- Shen, Y.; Huang, X. Attention-Based Convolutional Neural Network for Semantic Relation Extraction. In Proceedings of the Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 2526–2536. [Google Scholar]
- Wang, L.; Zhu, C.; de Melo, G.; Zhiyuan, L. Relation Classification via Multi-Level Attention CNNs. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016. [Google Scholar] [CrossRef]
- Ebrahimi, J.; Dou, D. Chain based RNN for relation classification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May–5 June 2015; pp. 1244–1249. [Google Scholar]
- Nguyen, T.H.; Grishman, R. Combining Neural Networks and Log-linear Models to Improve Relation Extraction. arXiv 2015, arXiv:1511.05926. [Google Scholar]
- Li, F.; Zhang, M.; Fu, G.; Qian, T.; Ji, D.H. A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence Features. arXiv 2016, arXiv:1608.07720. [Google Scholar]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016. [Google Scholar] [CrossRef]
- Huang, Y.Y.; Wang, W.Y. Deep Residual Learning for Weakly-Supervised Relation Extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1803–1807. [Google Scholar] [CrossRef]
- Zeng, D.; Dai, Y.; Li, F.; Sherratt, R.S.; Wang, J. Adversarial learning for distant supervised relation extraction. Comput. Mater. Contin. 2018, 55, 121–136. [Google Scholar]
- Qin, P.; Xu, W.; Wang, W.Y. Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2137–2147. [Google Scholar] [CrossRef]
- Qin, P.; Xu, W.; Wang, W.Y. DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 496–505. [Google Scholar] [CrossRef]
- Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-Learning with Memory-Augmented Neural Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1842–1850. [Google Scholar]
- Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A Simple Neural Attentive Meta-Learner. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Ren, M.; Liao, R.; Fetaya, E.; Zemel, R. Incremental few-shot learning with attention attractor networks. Adv. Neural Inf. Process. Syst. 2019, 32, 5275–5285. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Elsken, T.; Staffler, B.; Metzen, J.; Hutter, F. Meta-Learning of Neural Architectures for Few-Shot Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 13–19 June 2020; pp. 12362–12372. [Google Scholar] [CrossRef]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2016; Volume 29. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Han, X.; Zhu, H.; Yu, P.; Wang, Z.; Yao, Y.; Liu, Z.; Sun, M. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar] [CrossRef]
- Gao, T.; Han, X.; Zhu, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 7 November 2019; pp. 6250–6255. [Google Scholar]
- Ye, Z.X.; Ling, Z.H. Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2872–2881. [Google Scholar] [CrossRef]
- Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19, Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019. [Google Scholar] [CrossRef]
- Wang, M.; Zheng, J.; Cai, F.; Shao, T.; Chen, H. DRK: Discriminative Rule-based Knowledge for Relieving Prediction Confusions in Few-shot Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2129–2140. [Google Scholar]
- Yang, S.; Zhang, Y.; Niu, G.; Zhao, Q.; Pu, S. Entity Concept-enhanced Few-shot Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online, 1–6 August 2021; pp. 987–991. [Google Scholar] [CrossRef]
- Peng, H.; Gao, T.; Han, X.; Lin, Y.; Li, P.; Liu, Z.; Sun, M.; Zhou, J. Learning from Context or Names? An Empirical Study on Neural Relation Extraction. In In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 19–20 November 2020; pp. 3661–3672. [Google Scholar] [CrossRef]
- Dong, M.; Pan, C.; Luo, Z. MapRE: An Effective Semantic Mapping Approach for Low-resource Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 17 April 2021; pp. 2694–2704. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A Discriminative Feature Learning Approach for Deep Face Recognition. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 499–515. [Google Scholar]
- Yu, T.; Yang, M.; Zhao, X. Dependency-aware Prototype Learning for Few-shot Relation Classification. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2339–2345. [Google Scholar]
- Zhang, P.; Lu, W. Better Few-Shot Relation Extraction with Label Prompt Dropout. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6996–7006. [Google Scholar] [CrossRef]
Corpus | Task | #Relation | #Entity | #Sentences | #Test |
---|---|---|---|---|---|
FewRel 1.0 | Train | 64 | 89,600 | 44,800 | - |
Validation | 16 | 22,400 | 11,200 | - | |
Test (unpublished) | 20 | 28,000 | 14,000 | 10,000 | |
FewRel 2.0 | Validation | 10 | 2000 | 1000 | - |
Test (unpublished) | 15 | 3000 | 1500 | 10,000 |
Parameter | Value |
---|---|
Encoder | BERT |
Backend model | Bert /cp |
Learning_rate | 1 × 10/5 × 10 |
Max_length | 128 |
Hidden_size | 768 |
Batch_size | 4 |
Optimizer | AdamW |
Validation_step | 1000 |
Max training iterations | 30,000 |
Encoder | Model | 5-Way-1-Shot | 5-Way-5-Shot | 10-Way-1-Shot | 10-Way-5-Shot |
---|---|---|---|---|---|
CNN | Proto-CNN | 72.65/74.52 | 86.15/88.40 | 60.13/62.38 | 76.20/80.45 |
Proto-HATT | 75.01/— — | 87.09/90.12 | 62.48/— — | 77.50/83.05 | |
MLMAN | 79.01/— — | 88.86/92.66 | 67.37/75.59 | 80.07/87.29 | |
BERT | Proto-BERT | 84.77/89.33 | 89.54/94.13 | 76.85/83.41 | 83.42/90.25 |
TD-proto | — —/84.76 | — —/92.38 | — —/74.32 | — —/85.92 | |
ConceptFERE | — —/89.21 | — —/90.34 | — —/75.72 | — —/81.82 | |
DAPL | — —/85.94 | — —/94.28 | — —/77.59 | — —/89.26 | |
HCRP (BERT) | 90.90/93.76 | 93.22/95.66 | 84.11/89.95 | 87.79/92.10 | |
DRK | — —/89.94 | — —/92.42 | — —/81.94 | — —/85.23 | |
SimpleFSRE | 91.29/94.42 | 94.05/96.37 | 86.09/90.73 | 89.68/93.47 | |
Ours (BERT) | 92.31/94.83 | 94.05/97.07 | 86.92/90.46 | 89.36/93.65 | |
CP | — —/95.10 | — —/97.10 | — —/91.20 | — —/94.70 | |
MapRE | — —/95.73 | — —/97.84 | — —/93.18 | — —/95.64 | |
HCRP (CP) | 94.10/96.42 | 96.05/97.96 | 89.13/93.97 | 93.10/96.46 | |
LPD | 93.51/95.12 | 94.33/95.79 | 87.77/90.73 | 89.19/92.15 | |
CBPM | — —/90.89 | — —/94.68 | — —/82.54 | — —/89.67 | |
Ours (CP) | 96.48/97.14 | 97.93/97.98 | 93.88/95.24 | 95.61/96.27 |
Model | 5-Way-1-Shot | 5-Way-5-Shot | 10-Way-1-Shot | 10-Way-5-Shot |
---|---|---|---|---|
Proto-CNN * | 35.09 | 49.37 | 22.98 | 35.22 |
Proto-BERT * | 40.12 | 51.50 | 26.45 | 36.93 |
BERT-PAIR * | 56.25 | 67.44 | 43.64 | 53.17 |
Proto-CNN-ADV * | 42.21 | 58.71 | 28.91 | 44.35 |
Proto-BERT-ADV * | 41.90 | 54.74 | 27.36 | 37.40 |
HCRP | 76.34 | 83.03 | 63.77 | 72.94 |
Ours (CP) | 81.28 | 88.92 | 68.18 | 79.03 |
Model | 5-Way-1-Shot | 10-Way-1-Shot |
---|---|---|
SACT | 96.48 | 93.88 |
w/o modification prototype | 94.89 | 87.07 |
w/o Contractive-center loss | 94.86 | 87.47 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, J.; Cheng, J.; Chen, Y.; Li, K.; Zhang, F.; Shang, Z. Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction. Appl. Sci. 2024, 14, 103. https://doi.org/10.3390/app14010103
Ma J, Cheng J, Chen Y, Li K, Zhang F, Shang Z. Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction. Applied Sciences. 2024; 14(1):103. https://doi.org/10.3390/app14010103
Chicago/Turabian StyleMa, Jiangtao, Jia Cheng, Yonggang Chen, Kunlin Li, Fan Zhang, and Zhanlei Shang. 2024. "Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction" Applied Sciences 14, no. 1: 103. https://doi.org/10.3390/app14010103
APA StyleMa, J., Cheng, J., Chen, Y., Li, K., Zhang, F., & Shang, Z. (2024). Multi-Head Self-Attention-Enhanced Prototype Network with Contrastive–Center Loss for Few-Shot Relation Extraction. Applied Sciences, 14(1), 103. https://doi.org/10.3390/app14010103