2. Related Work
In this section, we provide an overview of related work focusing on intent classification using large pre-trained models and the incorporation of external knowledge for intent classification.
Leveraging large pre-trained embedding models for intent classification is explored in Cavalin et al. [
5], where class labels are not represented as a discrete set of symbols but as a space where the word graphs associated with each class are mapped using typical graph embedding techniques. This allows the classification algorithm to take into account inter-class similarities provided by the repeated occurrence of some words in the training examples of the different classes. The classification is carried out by mapping text embeddings to the word graph embeddings of the classes. Their results demonstrate a considerable positive impact on the detection of out-of-scope examples when an appropriate sentence embedding such as LSTM and BERT is used. Similarly, Zhang et al. [
6] proposed IntentBERT, which is a pre-trained model for few-shot intent classification. The model is trained by fine-tuning BERT on a small set of publicly available labelled utterances. The authors demonstrate that using small task-relevant data for fine-tuning is far more effective and efficient than the current practice that fine-tunes on a large labelled or unlabeled dialogue corpus. Furthermore, Zhang et al. [
7] focused on the compositional aspects of intent classification. The authors decompose intents and queries into four factors, i.e., topic, predicate, object/condition, and query type. To leverage the information, they combine coarse-grained intents and fine-grained factor information applying multitask learning. Purohit et al. [
8] studied the intent classification of short text from social media combining knowledge-guided patterns with syntactic features based on a bag of n-gram tokens. The authors explored knowledge sources to create pattern sets for examining improvement in the multiclass intent classification. The work demonstrated significant gains in performance on the data set collected from Twitter only.
Combining large pre-trained models with KGs is explored in Ahmad et al. [
9], where the authors study a joint intent classification and slot-filling task with unsupervised information extraction for KG construction. The authors trained the intent classifier in a supervised way but used this intent classifier for the slot-filling task in an unsupervised manner. They trained a BERT-based classifier for the intent classification task, which is used in a masking-based occlusion algorithm that extracts information for the slots from an utterance. A KG construction algorithm from dialogue data is also described in this paper. Within their evaluation, they observed that in a completely unsupervised setting the occlusion-based slot-information extraction method yielded good results. Yu et al. [
10] capture commonsense knowledge for e-commerce behaviours by semi-automatically constructing a KG for intent classification. The authors leverage large language models to semi-automatically construct an intention KG, which is then evaluated and curated by human annotators. The annotation is performed on a large number of assertions that can explain a purchasing or co-purchasing behaviour, whereby the intention can be an open reason or a predicate falling into one of 18 categories aligning with ConceptNet, e.g.,
IsA, MadeOf, UsedFor. Furthermore, Pinhanez et al. [
11] manually leveraged symbolic knowledge from curators of conversational systems to improve the accuracy of those systems. The authors use the context of a real-world practice of curators of conversational systems who often embed taxonomically structured meta-knowledge, i.e., knowledge graphs, into their documentation. The work demonstrates that the knowledge graphs can be integrated into the dialogue system to improve its accuracy and to enable tools to support curatorial tasks. He et al. [
12] presented their user intent system and demonstrated its effectiveness in downstream applications deployed in an industrial setting. For KG construction, the authors leveraged lexical rule matching, part-of-speech tagging, and short text matching to construct a KG with “isA” relations between the “intent” nodes.
Further work focused on leveraging large but generic knowledge bases or knowledge graphs for intent classification. Within this work, Zhang et al. [
13] demonstrated that informative entities in KGs can enhance language representation with external knowledge. The authors utilized large-scale textual corpora and KGs to train an enhanced language representation model. The model can leverage lexical, syntactic, and knowledge information simultaneously. By leveraging a knowledge base and slot-filling joint model, He et al. [
14] proposed a multitasking learning intent-detection system. The proposed approach was used to share information and rich external utility between intent and slot modules. The LSTM and convolutional networks were combined with a knowledge base to improve the model’s performance. Siddique et al. [
15] proposed an intent detection model, named RIDE, that leverages commonsense knowledge from ConceptNet in an unsupervised fashion to overcome the issue of training data scarcity. The model computed robust and generalisable relationship meta-features that capture deep semantic relationships between utterances and intent labels. These features were computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Shabbir et al. [
16] presented the generation of accurate intents for unstructured data in Romanised Urdu and integrated this corpus in a RASA NLU module for intent classification. The authors embedded the KG with the RASA framework to maintain the dialogue history for a semantic-based natural language mechanism for chatbot communication and compared results with existing linguistic systems combined with semantic technologies. Similarly, Sant’Anna et al. [
17] engaged RASA to extract intents and entities from a given sentence. Using RASA, the authors investigated the effectiveness of automatic answering systems to consumer questions about products in e-commerce platforms. Hu et al. [
18] proposed a general methodology for the problem of query intent classification by leveraging Wikipedia. The concepts in Wikipedia were used as the intent representation space, thus, each intent domain was represented as a set of Wikipedia articles and categories. The intent of any input query was identified by mapping the query into the Wikipedia representation space. The authors demonstrated the effectiveness of this method in three different applications, i.e., travel, job, and person name.
Differently from the approaches noted above, our work focuses on providing domain-specific knowledge into the classification model by automatically generating semantically structured resources, i.e., knowledge graphs, from the targeted data sets. This allows us to automatically generate a knowledge graph from a document of a targeted domain, which eliminates human intervention or the dependency on existing knowledge graphs needed to guide the intent classification within a goal-oriented dialogue system.
4. Methodology
In this section, we provide insights on using the automatically generated KGs from the targeted data sets, NER, dependency parsing for relation extraction, and a relation filtering approach. Each step of the KG generation allowed us to evaluate the impact of the semantic information represented in the KG in the classification task. As an example,
Table 2 illustrates the different KGs generated from the ProductServiceQA data set. We conclude this section with the manual evaluation of the automatically generated KGs.
4.5. Manual Evaluation of KGs
We manually analysed and curated the automatically generated ProductServiceQA KGs, which resulted in the benchmark KGs for this data set. These benchmark KGs allowed us to evaluate the quality of the automatically generated ProductServiceQA KGs and are not used in training of the intent classification models or the generation of any other automatic KGs. Three curators, one male and two female, all NLP specialists in knowledge extraction, performed the curation.
Term Extraction Curation: The term list was provided to the three annotators who independently identified terms that were correctly extracted based on the definition of a term and the domain of the data set. As an example, the extracted term pay card swiping was annotated as an incorrectly extracted term, whereas pay card was labelled as correct. Where possible, if the term span was incorrect, a corrected version was proposed. In this case, wearable device support bank was corrected in the benchmark KGs to wearable device. Within this manual curation step, 50% of terms were identified as correct, whereby 13 terms were modified.
Taxonomic Relations Curation: A similar curation was performed on the extracted taxonomic relations. The curators were presented with pairs of terms involved in a taxonomic (hyponym) relation, i.e., parent_term → child_term. The annotators had to identify whether the parent term (payment) was correctly identified for the child term (flash payment). A wrongly identified relation pair would be device→support. If the taxonomic relation was not correctly extracted, the experts proposed a replacement parent term from the list or a new term if none was deemed appropriate. Evaluating this step, 33% of relations were considered as correct, whereby 20 new terms were defined and added to the benchmark taxonomy. The benchmark KG, which was used to evaluate the automatically generated KGs, contains 83 terms within a taxonomy of depth 5.
Named Entity with Dependency Relation Curation: For the benchmark KG, we collected a list of NEs and their types, which resulted in 619 NEs (e.g., card) belonging to 22 different types (CARD_TYPE). In order to add the NEs to the benchmark KG, we selected the NE types that match a term in the taxonomy. Seven such types were identified. We then collected all the NEs corresponding to these seven types from the list (amounting to 25 NEs) and added them to their parent in the benchmark KG using a taxonomic relation.
Within the same curation step, the dependency-based relation extraction algorithm was performed, extracting predicates involving two NEs, or a NE and a term (from the initial list of terms in the third step of the approach). A set of 126 triples with terms and NEs were finally added as relations that contain NEs to the previously mentioned benchmark KG.
Author Contributions
Conceptualization, S.D., H.A., J.P.M. and P.B.; methodology, M.A., S.D., H.A., J.P.M. and P.B.; software, S.M., C.R., G.V., D.P. and S.S.; validation, S.M., C.R., G.V., D.P. and S.S.; formal analysis, S.M., C.R. and G.V.; investigation, M.A. and G.V.; writing—original draft preparation, M.A., S.M., C.R., G.V., D.P., S.S., S.D., H.A., J.P.M. and P.B.; writing—review and editing, M.A., G.V., S.D., H.A., J.P.M. and P.B.; supervision, P.B.; project administration, M.A.; funding acquisition, P.B. All authors have read and agreed to the published version of the manuscript.
Funding
This publication has emanated from research supported in part by a grant from Science Foundation Ireland under Grant number SFI/12/RC/2289_P2. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Temerak, M.S.; El-Manstrly, D. The influence of goal attainment and switching costs on customers’ staying intentions. J. Retail. Consum. Serv. 2019, 51, 51–61. [Google Scholar] [CrossRef]
- Abujabal, A.; Roy, R.S.; Yahya, M.; Weikum, G. ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters. arXiv 2018, arXiv:1809.09528. [Google Scholar]
- Fader, A.; Zettlemoyer, L.; Etzioni, O. Paraphrase-Driven Learning for Open Question Answering. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013. [Google Scholar]
- Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; van Kleef, P.; Auer, S.; et al. DBpedia—A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semant. Web J. 2015, 6, 167–195. [Google Scholar] [CrossRef]
- Cavalin, P.; Alves Ribeiro, V.H.; Appel, A.; Pinhanez, C. Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3952–3961. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, Y.; Zhan, L.M.; Chen, J.; Shi, G.; Wu, X.M.; Lam, A.Y. Effectiveness of Pre-training for Few-shot Intent Classification. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; pp. 1114–1120. [Google Scholar]
- Zhang, J.; Ye, Y.; Zhang, Y.; Qiu, L.; Fu, B.; Li, Y.; Yang, Z.; Sun, J. Multi-Point Semantic Representation for Intent Classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9531–9538. [Google Scholar] [CrossRef]
- Purohit, H.; Dong, G.; Shalin, V.; Thirunarayan, K.; Sheth, A. Intent Classification of Short-Text on Social Media. In Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), Chengdu, China, 19–21 December 2015; pp. 222–228. [Google Scholar] [CrossRef]
- Ahmad, Z.; Ekbal, A.; Sengupta, S.; Maitra, A.; Ramnani, R.; Bhattacharyya, P. Unsupervised Approach for Knowledge-Graph Creation from Conversation: The Use of Intent Supervision for Slot Filling. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Yu, C.; Wang, W.; Liu, X.; Bai, J.; Song, Y.; Li, Z.; Gao, Y.; Cao, T.; Yin, B. FolkScope: Intention Knowledge Graph Construction for Discovering E-commerce Commonsense. arXiv 2022, arXiv:2211.08316. [Google Scholar]
- Pinhanez, C.S.; Candello, H.; Cavalin, P.; Pichiliani, M.C.; Appel, A.P.; Alves Ribeiro, V.H.; Nogima, J.; de Bayser, M.; Guerra, M.; Ferreira, H.; et al. Integrating Machine Learning Data with Symbolic Knowledge from Collaboration Practices of Curators to Improve Conversational Systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
- He, Y.; Jia, Q.; Yuan, L.; Li, R.; Ou, Y.; Zhang, N. A Concept Knowledge Graph for User Next Intent Prediction at Alipay. arXiv 2023, arXiv:2301.00503. [Google Scholar] [CrossRef]
- Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1441–1451. [Google Scholar] [CrossRef]
- He, T.; Xu, X.; Wu, Y.; Wang, H.; Chen, J. Multitask Learning with Knowledge Base for Joint Intent Detection and Slot Filling. Appl. Sci. 2021, 11, 4887. [Google Scholar] [CrossRef]
- Siddique, A.B.; Jamour, F.T.; Xu, L.; Hristidis, V. Generalized Zero-shot Intent Detection via Commonsense Knowledge. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021. [Google Scholar]
- Shabbir, J.; Arshad, M.U.; Shahzad, W. NUBOT: Embedded Knowledge Graph With RASA Framework for Generating Semantic Intents Responses in Roman Urdu. arXiv 2021, arXiv:2102.10410. [Google Scholar]
- Sant’Anna, D.T.; Caus, R.O.; dos Santos Ramos, L.; Hochgreb, V.; dos Reis, J.C. Generating Knowledge Graphs from Unstructured Texts: Experiences in the E-commerce Field for Question Answering. In Proceedings of the Joint Proceedings of Workshops AI4LEGAL2020, NLIWOD, PROFILES 2020, QuWeDa 2020 and SEMIFORM2020 Colocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Virtual Conference, 1–6 November 2020; Koubarakis, M., Alani, H., Antoniou, G., Bontcheva, K., Breslin, J.G., Collarana, D., Demidova, E., Dietze, S., Gottschalk, S., Governatori, G., et al., Eds.; 2020; Volume 2722, pp. 56–71. [Google Scholar]
- Hu, J.; Wang, G.; Lochovsky, F.; Sun, J.t.; Chen, Z. Understanding User’s Query Intent with Wikipedia. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, Madrid, Spain, 20–24 April 2009; pp. 471–480. [Google Scholar] [CrossRef]
- Balažević, I.; Allen, C.; Hospedales, T.M. TuckER: Tensor Factorization for Knowledge Graph Completion. In Proceedings of the Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Artetxe, M.; Schwenk, H. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Trans. Assoc. Comput. Linguist. 2019, 7, 597–610. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. MPNet: Masked and Permuted Pre-training for Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 16857–16867. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Mcnemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
- Bordea, G.; Buitelaar, P.; Polajnar, T. Domain-independent term extraction through domain modelling. In Proceedings of the 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France, 28–30 October 2013. [Google Scholar]
- Manjunath, S.H.; McCrae, J.P. Encoder-Attention-Based Automatic Term Recognition (EA-ATR). In Proceedings of the 3rd Conference on Language, Data and Knowledge (LDK 2021), Zaragoza, Spain, 1–3 September 2021; Schloss Dagstuhl-Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2021. [Google Scholar]
- Akbik, A.; Blythe, D.; Vollgraf, R. Contextual String Embeddings for Sequence Labeling. In Proceedings of the COLING 2018, 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 1638–1649. [Google Scholar]
- Pereira, B.; Robin, C.; Daudert, T.; McCrae, J.P.; Mohanty, P.; Buitelaar, P. Taxonomy Extraction for Customer Service Knowledge Base Construction. In Semantic Systems. The Power of AI and Knowledge Graphs; Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 175–190. [Google Scholar]
- Chen, D.; Manning, C. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 740–750. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Figure 1.
Visualisation of the LIME framework explanation with the top-k important words (k = 5) and their contribution degree based on the provided question.
Figure 2.
Knowledge graph generation pipeline.
Table 1.
Statistics on the data sets used, i.e., ComQA, ParaLex, ProductServiceQA, and ATIS.
| ProductServiceQA | ComQA | ParaLex | ATIS |
---|
# Total samples | 7611 | 1829 | 21,306 | 5632 |
# Samples (train) | 5328 | 1463 | 17,045 | 4833 |
# Samples (test) | 2283 | 366 | 4261 | 799 |
# Classes | 338 | 272 | 275 | 8 |
Table 2.
Information on different KGs and statistics on the benchmarks and the automatically generated KGs of the ProductServiceQA data set.
| Benchmark KG | Benchmark KG | Benchmark KG | KG | KG | KG |
---|
Taxonomy | Y | Y | Y | Y | Y | Y |
Semantic Relations | N | Y | Y | N | Y | Y |
Named Entities | N | N | Y | N | N | Y |
Unique Concepts | 84 | 84 | 97 | 100 | 100 | 908 |
Unique Relations | 1 | 221 | 221 | 1 | 230 | 259 |
Vocabulary | 60 | 190 | 392 | 36 | 166 | 468 |
Table 3.
Flair results for different embedding types on the ProductServiceQA data set.
Embedding | Precision | Recall | F |
---|
Flair (Forward+Backward) | 0.94 | 0.92 | 0.93 |
Flair (forward+backward)+GloVe | 0.95 | 0.92 | 0.93 |
Flair (Forward)+GloVe | 0.94 | 0.92 | 0.93 |
GloVe | 0.92 | 0.91 | 0.91 |
BERT | 0.93 | 0.91 | 0.93 |
ELMo | 0.94 | 0.91 | 0.93 |
Table 4.
Intent classification evaluation for the targeted data sets using an RNN (bold numbers indicate the best results for each setting).
ComQA Data Set | ParaLex Data Set |
---|
SOTA Embeddings | Dimension | Precision | SOTA Embeddings | Dimension | Precision |
SBERT | 768 | 98.36 | SBERT | 768 | 54.06 |
LASER | 1024 | 96.75 | LASER | 1024 | 52.92 |
MPNet | 768 | 98.63 | MPNet | 768 | 53.80 |
LASER+SBERT | 1792 | 98.28 | LASER+SBERT | 1792 | 54.07 |
LASER+SBERT+GloVe | 2092 | 98.63 | LASER+SBERT+GloVe | 2092 | 54.41 |
Best Embeddings with KG | Dimension | Precision | Best Embeddings with KG | Dimension | Precision |
LASER+SBERT+KG (750) | 2092 | 99.45 | LASER+MPNet+KG (750)/GloVe | 2092 | 55.42 |
LASER+MPNet+KG (500) | 2092 | 99.45 | | | |
LASER+SBERT+KG (750)/GloVe | 2092 | 99.45 | | | |
ProductServiceQA Data Set | ATIS Data Set |
SOTA Embeddings | Dimension | Precision | SOTA Embeddings | Dimension | Precision |
SBERT | 768 | 68.02 | SBERT | 768 | 98.67 |
LASER | 1024 | 62.68 | LASER | 1024 | 98.87 |
MPNet | 768 | 69.25 | MPNet | 768 | 98.43 |
LASER+SBERT | 1792 | 68.60 | LASER+SBERT | 1792 | 98.50 |
LASER+SBERT+GloVe | 2092 | 68.40 | LASER+SBERT+GloVe | 2092 | 98.62 |
Best Embeddings with KG | Dimension | Precision | Best Embeddings with KG | Dimension | Precision |
LASER+MPNet+KG (DBpedia) | 2092 | 70.00 | LASER+KG (100) | 1324 | 99.25 |
| | | LASER+SBERT+KG (100) | 2092 | 99.25 |
| | | LASER+MPNet+KG (100) | 2092 | 99.25 |
| | | LASER+MPNet+KG (100) | 2092 | 99.25 |
| | | LASER+SBERT+KG (100)/GloVe | 2092 | 99.25 |
| | | LASER+MPNet+KG (100)/GloVe | 2092 | 99.25 |
Table 5.
Statistics on the automatically generated KG with different thresholds of terms.
Terms | 100 | 200 | 300 | 500 | 1000 |
Unique Concepts | 908 | 1008 | 1108 | 1308 | 1808 |
Unique Relations | 259 | 279 | 299 | 305 | 324 |
Vocabulary | 468 | 494 | 529 | 553 | 653 |
Table 6.
Impact of different sets of terms within the KG for intent classification, based on ProductServiceQA (bold numbers indicate the best results for each setting).
| SOTA Embeddings | Dimension | Precision | Best Embeddings with KG | Dimension | Precision |
| SBERT | 768 | 68.02 | LASER+MPNet+KG (100) | 2092 | 69.99 |
| LASER | 1024 | 62.68 | | | | |
| MPNet | 768 | 69.25 | | | | |
| LASER+SBERT | 1792 | 68.60 | | | | |
| LASER+SBERT+GloVe | 2092 | 68.40 | | | | |
| | | Number of Set Terms |
| Embeddings with KG | Dimension | 100 | 200 | 300 | 500 | 1000 |
| KG | 300 | 40.34 | 40.34 | 41.61 | 42.14 | 44.20 |
Concat. | LASER+KG | 1324 | 62.15 | 62.15 | 61.94 | 62.85 | 52.91 |
LASER+SBERT+KG | 2092 | 68.24 | 68.24 | 67.89 | 67.85 | 67.85 |
LASER+MPNet+KG | 2092 | 69.99 | 68.37 | 68.77 | 68.29 | 68.46 |
Substit. | LASER+KG/GloVe | 1324 | 62.51 | 60.58 | 61.54 | 62.64 | 60.36 |
LASER+SBERT+KG/GloVe | 2092 | 68.20 | 68.37 | 68.20 | 67.81 | 67.41 |
LASER+MPNet+KG/GloVe | 2092 | 67.89 | 67.90 | 67.19 | 67.76 | 67.24 |
Table 7.
Intent classification evaluation for the targeted data sets using a Siamese network (bold numbers indicate the best results for each setting; * denote statistically significant, p = 0.05).
ComQA Data Set | ParaLex Data Set |
---|
SOTA Embeddings | Dimension | Precision | SOTA Embeddings | Dimension | Precision |
SBERT | 768 | 95.18 | SBERT | 768 | 48.81 |
SBERT+LASER | 1792 | 94.66 | SBERT+LASER | 1792 | 49.75 |
MPNET | 768 | 94.37 | MPNET | 768 | 50.33 |
MPNET+LASER | 1792 | 94.14 | MPNET+LASER | 1792 | 50.47 |
Best Embeddings with KG | Dimension | Precision | Best Embeddings with KG | Dimension | Precision |
SBERT+LASER+KG (100) | 2092 | 95.18 | MPNET+KG (100) | 1068 | 52.29 * |
ProductServiceQA Data Set | ATIS Data Set |
SOTA Embeddings | Dimension | Precision | SOTA Embeddings | Dimension | Precision |
SBERT | 768 | 73.94 | SBERT | 768 | 99.37 |
SBERT+LASER | 1792 | 73.77 | SBERT+LASER | 1792 | 99.00 |
MPNET | 768 | 73.55 | MPNET | 768 | 98.62 |
MPNET+LASER | 1792 | 73.51 | MPNET+LASER | 1792 | 98.62 |
Best Embeddings with KG | Dimension | Precision | Best Embeddings with KG | Dimension | Precision |
MPNet+LASER+KG (100) | 2092 | 74.69 | MPNet+KG (100) | 1068 | 99.50 |
Table 8.
Intent classification evaluation for the ComQA data set, filtered by questions with a frequency between two and five, using a Siamese network (bold numbers indicate the best results for each setting).
SOTA Embeddings | Dimension | Precision |
---|
SBERT | 768 | 83.31 |
SBERT+LASER | 1792 | 84.12 |
MPNET | 768 | 83.25 |
MPNET+LASER | 1792 | 84.23 |
Best Embeddings with KG | Dimension | Precision |
SBERT+LASER+KG (100) | 2092 | 84.87 |
Table 11.
Intent classification evaluation for the Spanish ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
SOTA Embeddings | Dimension | Precision |
---|
SBERT | 768 | 62.24 |
MPNET | 768 | 64.34 |
SBERT+LASER | 1092 | 61.62 |
MPNET+LASER | 1092 | 65.00 |
Best Embeddings with KG | Dimension | Precision |
MPNet+LASER+KG (100) | 1392 | 65.57 |
Table 12.
Intent classification evaluation for the Chinese ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
SOTA Embeddings | Dimension | Precision |
---|
SBERT | 768 | 58.12 |
MPNET | 768 | 59.30 |
SBERT+LASER | 1092 | 59.01 |
MPNET+LASER | 1092 | 59.70 |
Best Embeddings with KG | Dimension | Precision |
MPNet+KG (100) | 1092 | 60.66 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).