Next Article in Journal
Hemodynamic and Electrophysiological Biomarkers of Interpersonal Tuning during Interoceptive Synchronization
Next Article in Special Issue
Evaluating Ontology-Based PD Monitoring and Alerting in Personal Health Knowledge Graphs and Graph Neural Networks
Previous Article in Journal
Engagement with Optional Formative Feedback in a Portfolio-Based Digital Design Module
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intent Classification by the Use of Automatically Generated Knowledge Graphs

1
Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, H91 AEX4 Galway, Ireland
2
Huawei Research, D02 R156 Dublin, Ireland
3
Amazon Alexa AI, Cambridge CB1 2GA, UK
*
Author to whom correspondence should be addressed.
This work was conducted when the author was working at Huawei Research, Ireland.
Information 2023, 14(5), 288; https://doi.org/10.3390/info14050288
Submission received: 30 March 2023 / Revised: 28 April 2023 / Accepted: 10 May 2023 / Published: 12 May 2023
(This article belongs to the Special Issue Knowledge Graph Technology and its Applications II)

Abstract

:
Intent classification is an essential task for goal-oriented dialogue systems for automatically identifying customers’ goals. Although intent classification performs well in general settings, domain-specific user goals can still present a challenge for this task. To address this challenge, we automatically generate knowledge graphs for targeted data sets to capture domain-specific knowledge and leverage embeddings trained on these knowledge graphs for the intent classification task. As existing knowledge graphs might not be suitable for a targeted domain of interest, our automatic generation of knowledge graphs can extract the semantic information of any domain, which can be incorporated within the classification process. We compare our results with state-of-the-art pre-trained sentence embeddings and our evaluation of three data sets shows improvement in the intent classification task in terms of precision.

1. Introduction

A large part of global business in the consumer domain is providing services, such as consumer payments, mobile cloud services, and more. In providing these services to the customers, a business also needs to provide services to satisfy the customer needs that arise from their customer base [1]. Much of this customer support is provided through online interactions in the form of web chats. The ability to address these customer requests more efficiently can be of significant business benefit.
The intent classification task is the automated categorisation of text with different intents based on customer goals using machine learning (ML) and natural language processing (NLP) techniques. In a general setting, a sentence such as “Where is the best place to buy a television?” could be associated with the purchase intent. Because most goal-oriented dialogue systems are used to engage with customers through personalised conversations, intent classification is an essential component of these systems, where intent can be aligned with a customer’s asked question. Therefore, the automated classification of users’ intent can significantly reduce the manual effort of analysing user comments to identify avenues for improvements and issue remediation.
To enrich the classical classification task with domain-specific knowledge, we focus in this work on automatic knowledge graph (KG) generation, which is incorporated into the classification task. To automatically generate a KG, we leverage term extraction techniques, named entity recognition (NER), and dependency parsing to align the concepts, i.e., terms and named entities with semantic relations. We perform intent classification on two publicly available data sets, i.e., ComQA [2] and ParaLex [3], as well as on one proprietary domain-specific data set, named ProductServiceQA, in the telecommunications domain. For this, we automatically generate KGs based on the data sets used in this study, whereby we distinguish between generic and domain-specific KGs. Because the automatically generated KGs are based on domain-specific data, they emphasise the depth of knowledge. We compare these results to a general KG, i.e., DBpedia [4], which is based on common knowledge and emphasises the breadth of knowledge. Within the process of the automatic KG generation, we evaluate the knowledge extraction, in particular, the extraction of entity classes and semantic relations between them, as expressed within the data set. Finally, we leverage this information as knowledge graph embeddings (KGEs) for intent classification according to the extracted classes and relations.

2. Related Work

In this section, we provide an overview of related work focusing on intent classification using large pre-trained models and the incorporation of external knowledge for intent classification.
Leveraging large pre-trained embedding models for intent classification is explored in Cavalin et al. [5], where class labels are not represented as a discrete set of symbols but as a space where the word graphs associated with each class are mapped using typical graph embedding techniques. This allows the classification algorithm to take into account inter-class similarities provided by the repeated occurrence of some words in the training examples of the different classes. The classification is carried out by mapping text embeddings to the word graph embeddings of the classes. Their results demonstrate a considerable positive impact on the detection of out-of-scope examples when an appropriate sentence embedding such as LSTM and BERT is used. Similarly, Zhang et al. [6] proposed IntentBERT, which is a pre-trained model for few-shot intent classification. The model is trained by fine-tuning BERT on a small set of publicly available labelled utterances. The authors demonstrate that using small task-relevant data for fine-tuning is far more effective and efficient than the current practice that fine-tunes on a large labelled or unlabeled dialogue corpus. Furthermore, Zhang et al. [7] focused on the compositional aspects of intent classification. The authors decompose intents and queries into four factors, i.e., topic, predicate, object/condition, and query type. To leverage the information, they combine coarse-grained intents and fine-grained factor information applying multitask learning. Purohit et al. [8] studied the intent classification of short text from social media combining knowledge-guided patterns with syntactic features based on a bag of n-gram tokens. The authors explored knowledge sources to create pattern sets for examining improvement in the multiclass intent classification. The work demonstrated significant gains in performance on the data set collected from Twitter only.
Combining large pre-trained models with KGs is explored in Ahmad et al. [9], where the authors study a joint intent classification and slot-filling task with unsupervised information extraction for KG construction. The authors trained the intent classifier in a supervised way but used this intent classifier for the slot-filling task in an unsupervised manner. They trained a BERT-based classifier for the intent classification task, which is used in a masking-based occlusion algorithm that extracts information for the slots from an utterance. A KG construction algorithm from dialogue data is also described in this paper. Within their evaluation, they observed that in a completely unsupervised setting the occlusion-based slot-information extraction method yielded good results. Yu et al. [10] capture commonsense knowledge for e-commerce behaviours by semi-automatically constructing a KG for intent classification. The authors leverage large language models to semi-automatically construct an intention KG, which is then evaluated and curated by human annotators. The annotation is performed on a large number of assertions that can explain a purchasing or co-purchasing behaviour, whereby the intention can be an open reason or a predicate falling into one of 18 categories aligning with ConceptNet, e.g., IsA, MadeOf, UsedFor. Furthermore, Pinhanez et al. [11] manually leveraged symbolic knowledge from curators of conversational systems to improve the accuracy of those systems. The authors use the context of a real-world practice of curators of conversational systems who often embed taxonomically structured meta-knowledge, i.e., knowledge graphs, into their documentation. The work demonstrates that the knowledge graphs can be integrated into the dialogue system to improve its accuracy and to enable tools to support curatorial tasks. He et al. [12] presented their user intent system and demonstrated its effectiveness in downstream applications deployed in an industrial setting. For KG construction, the authors leveraged lexical rule matching, part-of-speech tagging, and short text matching to construct a KG with “isA” relations between the “intent” nodes.
Further work focused on leveraging large but generic knowledge bases or knowledge graphs for intent classification. Within this work, Zhang et al. [13] demonstrated that informative entities in KGs can enhance language representation with external knowledge. The authors utilized large-scale textual corpora and KGs to train an enhanced language representation model. The model can leverage lexical, syntactic, and knowledge information simultaneously. By leveraging a knowledge base and slot-filling joint model, He et al. [14] proposed a multitasking learning intent-detection system. The proposed approach was used to share information and rich external utility between intent and slot modules. The LSTM and convolutional networks were combined with a knowledge base to improve the model’s performance. Siddique et al. [15] proposed an intent detection model, named RIDE, that leverages commonsense knowledge from ConceptNet in an unsupervised fashion to overcome the issue of training data scarcity. The model computed robust and generalisable relationship meta-features that capture deep semantic relationships between utterances and intent labels. These features were computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Shabbir et al. [16] presented the generation of accurate intents for unstructured data in Romanised Urdu and integrated this corpus in a RASA NLU module for intent classification. The authors embedded the KG with the RASA framework to maintain the dialogue history for a semantic-based natural language mechanism for chatbot communication and compared results with existing linguistic systems combined with semantic technologies. Similarly, Sant’Anna et al. [17] engaged RASA to extract intents and entities from a given sentence. Using RASA, the authors investigated the effectiveness of automatic answering systems to consumer questions about products in e-commerce platforms. Hu et al. [18] proposed a general methodology for the problem of query intent classification by leveraging Wikipedia. The concepts in Wikipedia were used as the intent representation space, thus, each intent domain was represented as a set of Wikipedia articles and categories. The intent of any input query was identified by mapping the query into the Wikipedia representation space. The authors demonstrated the effectiveness of this method in three different applications, i.e., travel, job, and person name.
Differently from the approaches noted above, our work focuses on providing domain-specific knowledge into the classification model by automatically generating semantically structured resources, i.e., knowledge graphs, from the targeted data sets. This allows us to automatically generate a knowledge graph from a document of a targeted domain, which eliminates human intervention or the dependency on existing knowledge graphs needed to guide the intent classification within a goal-oriented dialogue system.

3. Experimental Setup

In this section, we provide information on the KG extraction framework, KGEs generation, the state-of-the-art (SOTA) pre-trained sentence embeddings, and the data sets used in this work.

3.1. Saffron—Knowledge Extraction Framework

To automatically generate KGs from the targeted data sets, we used the KG extraction framework Saffron (https://saffron.insight-centre.org/, accessed on 30 March 2023). The tool is designed to create a KG automatically from a large text corpus by identifying terms and relations between them using syntactic and corpus frequency information.

3.2. Knowledge Graph Embeddings

In a given KG, each subject h or object t entity can be associated as a point in a continuous vector space. In this work, we use TuckER [19], which employs a three-way Tucker tensor decomposition, which computes the tensor T and a sequence of three matrices leveraging the embeddings of entities (A and C) and relations (B) between them ( G T A B C ). This allows us to create KGEs that are used in the network embedding layers in our system.

3.3. Pre-Trained Word and Sentence Embeddings

In addition to using the automatically generated KGs and KGEs trained on the targeted data sets, we leverage different SOTA pre-trained word and sentence embeddings for the classification task. First, we leveraged the pre-trained GloVe [20] word embeddings, which were trained on six billion tokens extracted from Wikipedia and the Gigaword archive (https://catalog.ldc.upenn.edu/LDC2011T07, accessed on 30 March 2023). LASER [21] is a multilingual sentence encoder to calculate and use multilingual sentence embeddings. The framework learns joint multilingual sentence representations for 93 languages and uses a single Bi-LSTM encoder combined with a decoder trained on publicly available corpora. LASER transforms sentences into language-independent vectors, which allows it to learn a classifier using training data in any of the covered languages. Furthermore, we use SBERT [22], which uses Siamese and triplet network structures for generating sentence embeddings. Finally, we leverage MPNet [23], which is trained through permuted language modelling (PLM), allowing a better understanding of bidirectional contexts. MPNet leverages the dependency among predicted tokens through PLM and takes auxiliary position information as input to make the model see a full sentence. The model is trained on various corpora (over 160 GB of text) and fine-tuned on a variety of down-streaming tasks, such as GLUE and SQuAD, among others.

3.4. LIME

To better understand the predictions made by our intent classification models, we used the local interpretable model-agnostic explanations (LIME, https://github.com/marcotcr/lime, accessed on 30 March 2023) algorithm introduced by Ribeiro et al. [24]. LIME learns an interpretable model locally around the prediction to explain predictions of any given classifier. For each prediction, it illustrates the degree how much each feature contributed to the outcome of the machine learning model’s output. LIME implements this using the original model to generate fresh samples by making slight permutations to the feature values from the training set given to the model. Each of these samples is then given a weight based on its resemblance to the occurrence we are seeking to describe. The explainable model is then trained using the weighted proxy data created earlier.
Figure 1 illustrates the LIME visualisation of important words (i.e., recharge, ticket, develop) and their contribution degree for the intent classification of the question Hello, may I recharge my account? Where can U develop my ticket? Using LIME allowed us to identify the most important words that contribute to the classification task. Furthermore, we compare the top-k words (k = 5) provided by LIME with the automatically extracted terms using Saffron. This allowed us to filter the automatically generated KGs by Saffron based on the important words provided by LIME (cf. Section 4.3).

3.5. Significance Testing

To compare the predictive accuracy of the two models, we use McNemar’s test [25], which is based on a two-by-two contingency table of the two models’ predictions. For McNemar’s test, the null hypothesis shows there is no difference between the marginal frequencies. Therefore, if the p-value is greater than 0.05, it can be concluded that there is not a significant difference between false negatives and false positives. The alternative hypothesis shows there is a significant difference between the marginal frequencies, whereby the p-value is less than or equal to 0.05.

3.6. Data Sets

First, we used a proprietary question–answer data set, named ProductServiceQA data set (Table 1). It consists of 7611 user queries, such as “Can the VISA and MASTER cards be added to the card package?”, which are distributed among 338 different classes (i.e., Bank cards that can be added).
The ComQA data set [2] (http://qa.mpi-inf.mpg.de/comqa/, accessed on 30 March 2023) consists of 11,214 questions of users’ interest, which were collected from WikiAnswers, a community question-answering website. The data set contains questions with various challenging phenomena such as the need for temporal reasoning, comparison, compositionality, and unanswerable questions (e.g., Who was the first human being on Mars?). The questions in ComQA are originally grouped into 4834 clusters, which are annotated with their answer(s) in the form of Wikipedia entities. To evaluate all data sets with a similar set of classes, we selected from ComQA only the QA pairs, which appear more than six times in the data set.
The ParaLex data set [3] (http://knowitall.cs.washington.edu/paralex/, accessed on 30 March 2023). The data set contains paraphrases, word alignments, and basic NLP-processed versions of the questions. There are about 2.5 million distinct questions and 18 million distinct paraphrase pairs. As an example, “What are the green blobs in plant cells?” and a green substance in the plant cell be the? represent the question pairs within this data set. This allowed us to evaluate the targeted data sets with a similar set of intents, ranging between 272 intents for ComQA and 338 for ProductServiceQA.
The ATIS (Airline Travel Information Systems, https://www.kaggle.com/code/siddhadev/atis-dataset-from-ms-cntk?scriptVersionId=10371998, accessed on 30 March 2023) is a data set of manual transcripts regarding humans asking for flight information on automated airline travel inquiry systems. The data consist of 17 unique intent categories.

4. Methodology

In this section, we provide insights on using the automatically generated KGs from the targeted data sets, NER, dependency parsing for relation extraction, and a relation filtering approach. Each step of the KG generation allowed us to evaluate the impact of the semantic information represented in the KG in the classification task. As an example, Table 2 illustrates the different KGs generated from the ProductServiceQA data set. We conclude this section with the manual evaluation of the automatically generated KGs.

4.1. Knowledge Graph Creation Pipeline

The creation of domain-specific KGs includes NLP methods for term extraction, NER follows, and relation extraction provided by the Saffron tool for KG generation (Figure 2). To automatically generate KGs, domain-specific terms and NEs are extracted from the corpus and used as a base for the generation of a taxonomy. Additional relations are extracted from the text corpus and added to the taxonomy to form a KG.

4.1.1. Term Extraction

For the first step, we use the term extraction module implemented in Saffron [26]. The approach extracts noun phrases and uses distribution metrics to select term candidates. Then, a scoring function, i.e., occurrence frequency, context-relevance, reference corpus usage (e.g., Wikipedia), and topic modelling, is used to measure the domain relevance of the terms.

4.1.2. Named Entity Recognition

To obtain NEs from the targeted data sets, we used Flair’s NER model, which is based on XLM-R embeddings (https://huggingface.co/flair/ner-english-large, accessed on 30 March 2023). To include domain-specific NEs of relevance for the proprietary ProductServiceQA data set, a domain-specific NEs recognition model was built to extend the term extraction step [27]. A list of NEs that are specific to the ProductServiceQA data set was provided and used to train the NER system. For this, we used Flair [28], more concretely the “Flair (forward+backward)+GloVe”, embeddings as they performed best for our targeted domain. Table 3 provides the results of a comparison of different embedding methods on the ProductServiceQA data set.

4.1.3. Taxonomy Generation

The taxonomy generation step is constructing a taxonomy using the top N ranked terms (Saffron’s default setting is N = 100) and NEs obtained from the previous steps [29]. For each distinct pair of concepts, c, dC, we attempt to estimate the probability, p(cd). Based on the probability scores given by the pairwise scoring, a likelihood function is defined that represents how likely a given structure of concepts represents a taxonomy for the set of terms provided. Then, greedy search is used to find the KG t with a taxonomic IS-A relation that maximizes the value of the likelihood function.

4.1.4. Relation Extraction

To extract relations between the terms, we make use of dependency parsing. For this, the corpus is parsed using the universal dependencies of the Stanford parser [30] implemented in Stanza (https://stanfordnlp.github.io/stanza/depparse.html, accessed on 30 March 2023). All dependencies involving a term, extracted previously using the Saffron framework, and a verb (using the POS information) are extracted, which provides us with a set of predicate–term pairs, e.g., nsubj(pay, customer) or obj(pay, bill). For phrasal verbs, particles are added to the predicate using a hyphen (-) (get-up). Similarly, for dependencies involving a preposition (obl dependency type), we concatenate the preposition to the predicate (add_to, phone). A triple (term1, predicate, term2) is constructed by combining any dependency pairs where, in the same sentence, the same predicate is the head of two dependencies in the list of pairs obtained in the previous step, e.g., nsubj_obj(customer, pay, bill). The triple relations are added to the previously generated taxonomy (KG t ). The outcome of this step is the KG t r with additional lexical relations between the extracted terms.

4.1.5. Knowledge Graph Generation

By performing term extraction, NER, and relation extraction, we use the obtained triples to generate the KG t r e , incorporating taxonomic and lexical relations between the extracted terms and named entities.

4.2. Intent Classification with Pre-Trained and Knowledge Graph Embeddings

Finally, we leverage the KGEs for intent classification trained on the KGs noted above using TuckER, which we combine with the pre-trained sentence embeddings. For this, we use a multi-layer feed-forward neural network. It is a fully connected network structure with five hidden layers, whereby the dimension of the input layer is decided based on the dimensions of the input embedding. The activation function used is ReLU [31], and we use the Softmax function in the output layer. Categorical Cross-Entropy is used as the loss function and Adam [32] is used as the optimiser. We apply dropout (0.3 dropout rate) between the two hidden layers and between the last hidden layer and the output layer. The number of training epochs is 300 and the batch size is 512.
The embeddings are fed through the above-explained network architecture for model building. With this, we leverage pre-trained embeddings (GloVe, LASER, SBERT, MPNet) in combination with KGEs. The various sentence embedding approaches used in our work can be categorised into three broad methods. In the first approach, the network is trained with the SOTA pre-trained models, i.e., LASER, SBERT, or MPNet. The results obtained from a single embedding category are considered our baseline results. Additionally, we performed a Concatenation approach, where, for a given sentence, two or more embeddings obtained from LASER, SBERT, GloVe, or KGEs are concatenated into the embedding matrix (E). For Substitution, we are examining if a term extracted from the data set is present in the KG. If it is, we use KGEs to obtain the embeddings; otherwise, GloVe embeddings are used. As both KG and GloVe have 300 dimensions, the input layer dimension remains the same.

4.3. Filtering Knowledge Graphs with LIME

We run the LIME algorithm on the targeted data sets and extract all words the model focuses on while making a prediction. The obtained ranked LIME list is compared with the top words provided by the Saffron tool. Whereas LIME extracts only unigrams, Saffron provides bi-grams extracted from the targeted data sets. We ran experiments with all KGs to obtain only the important words marked by LIME. This reduced the vocabulary of the KG by removing the excess noise.

4.4. Intent Classification on Intents Translated into English

Along with its intent in English, ProductServiceQA also holds the intents in Spanish and Chinese. To simulate the intent classification for these languages, we leveraged a translation pipeline to test how a slightly noisy data set affects the Siamese network classifier.

4.5. Manual Evaluation of KGs

We manually analysed and curated the automatically generated ProductServiceQA KGs, which resulted in the benchmark KGs for this data set. These benchmark KGs allowed us to evaluate the quality of the automatically generated ProductServiceQA KGs and are not used in training of the intent classification models or the generation of any other automatic KGs. Three curators, one male and two female, all NLP specialists in knowledge extraction, performed the curation.
Term Extraction Curation: The term list was provided to the three annotators who independently identified terms that were correctly extracted based on the definition of a term and the domain of the data set. As an example, the extracted term pay card swiping was annotated as an incorrectly extracted term, whereas pay card was labelled as correct. Where possible, if the term span was incorrect, a corrected version was proposed. In this case, wearable device support bank was corrected in the benchmark KGs to wearable device. Within this manual curation step, 50% of terms were identified as correct, whereby 13 terms were modified.
Taxonomic Relations Curation: A similar curation was performed on the extracted taxonomic relations. The curators were presented with pairs of terms involved in a taxonomic (hyponym) relation, i.e., parent_term → child_term. The annotators had to identify whether the parent term (payment) was correctly identified for the child term (flash payment). A wrongly identified relation pair would be devicesupport. If the taxonomic relation was not correctly extracted, the experts proposed a replacement parent term from the list or a new term if none was deemed appropriate. Evaluating this step, 33% of relations were considered as correct, whereby 20 new terms were defined and added to the benchmark taxonomy. The benchmark KG t r , which was used to evaluate the automatically generated KGs, contains 83 terms within a taxonomy of depth 5.
Named Entity with Dependency Relation Curation: For the benchmark KG t r e , we collected a list of NEs and their types, which resulted in 619 NEs (e.g., card) belonging to 22 different types (CARD_TYPE). In order to add the NEs to the benchmark KG t r e , we selected the NE types that match a term in the taxonomy. Seven such types were identified. We then collected all the NEs corresponding to these seven types from the list (amounting to 25 NEs) and added them to their parent in the benchmark KG t r e using a taxonomic relation.
Within the same curation step, the dependency-based relation extraction algorithm was performed, extracting predicates involving two NEs, or a NE and a term (from the initial list of terms in the third step of the approach). A set of 126 triples with terms and NEs were finally added as relations that contain NEs to the previously mentioned benchmark KG t r e .

5. Results

In this section, we provide insights into experiments using an RNN as well as Siamese networks for the intent classification task. Additionally, we illustrate the performance of the classification task with the most important term in KGs, identified by the LIME framework. Finally, we evaluate the performance of the classification task in a multilingual setting.

5.1. Intent Classification with Recurrent Neural Networks

Analysing the results for the ComQA data set in the top part of Table 4, MPNet embeddings contribute best to the classification task compared to LASER or SBERT. The KGEs trained on the automatically generated KGs do not outperform the SOTA embeddings, although the performance of the KGs improves with the number of terms within the KG. As seen in Table A1 (see Appendix A), KG t with 100 terms achieves a precision of 40.71, whereas KG t r e , with 750 terms and relations between them, achieves a precision of 93.34.
When concatenating sentence embeddings with GloVe or the automatically generated KGs, KG t with 500 and 750 terms performs best (99.45) when it is combined with LASER and SBERT or MPNET. Comparing the performance between the GloVe embeddings and the automatically generated KGs, the latter outperforms the former in the majority of the setups. Substitution performs comparably to the concatenation approach, where combining LASER+SBERT+KG t r achieves the same precision as the best-reported concatenation approach.
For the ParaLex data set (Table 4), leveraging the SBERT pre-trained model as a single resource performs best (54.06). Nevertheless, when combining different embeddings, LASER+SBERT+GloVe outperforms the standalone embeddings (54.41). Similarly to the ComQA data set, although extracting more terms for KG generation improves the classification task (precision of 22.38 with KG t with 100 terms, 50.45 KG t r e with 750 terms), it does not outperform any SOTA pre-trained models (Table A2 in Appendix A). On the other hand, in combination with LASER+MPNet, the KGEs trained on KG t r with 750 terms and extracted relations outperform the SOTA embeddings for the ParaLex data set using the substitution approach (55.42).
Next, we leverage the SOTA sentence embeddings on the proprietary ProductServiceQA data set (lower part of Table 4). Analysing single embeddings, compared to SBERT, LASER, or the KGEs trained on the automatically generated KGs and DBpedia, MPNet performs best on the proprietary data set in the telecommunication domain (69.25). When combining sentence embeddings with the KGs, DBpedia in combination with LASER+MPNet contributes the most when using the concatenation approach. Similarly to the data sets described above, embedding substitution does not outperform the concatenation approach (see Table A3 in Appendix A).
In addition to the experiments noted above on the proprietary ProductServiceQA data set, we analysed the impact of the set of terms within KG t r e extracted by the Saffron tool. As Saffron in its default setting extracts the 100 most domain-specific terms from the targeted document, we extended the set of domain-specific terms gradually (Table 5). As seen in Table 6, extending the set of terms positively contributes to the classification precision when using the KGs as a single embedding resource. As a result, even the KG with 1000 terms does not outperform any pre-trained sentence embeddings used in this work. Nevertheless, when concatenating the KGs with SOTA pre-trained embeddings, LASER+MPNet+KG t r e with 100 terms performs best (69.99).
For the ATIS data set (bottom part of Table 4) in the aviation travel inquiry domain, LASER performs best (98.87) as a single embedding resource, whereas combining LASER with SBERT or SBERT+GloVe does not improve the performance on the classification task. In contrast, many systems that leverage the information of the KGs outperform the classification task when only the LASER pre-trained embeddings are used (99.25).

5.2. Siamese Network

In addition to the experiments using the RNN architecture, we employed a Siamese network for the classification task. The top part of Table 7 illustrates the results for the ComQA data set, where compared to LASER, as well as the concatenation of SBERT and LASER embeddings, SBERT embeddings contribute best (95.18) to the intent classification task. When concatenating sentence embeddings with KGEs trained on the automatically generated KGs, KG t r e with 100 terms combined with SBERT and LASER performs the same as SBERT pre-trained embeddings only (95.18).
For the ParaLex data set (top right part in Table 7), the MPNet pre-trained model as a single resource performs best within the classification task (50.33). Next, when combining different embeddings, MPNet+LASER slightly improves the performance on the classification task (50.47). We significantly ( p < 0.05 ) outperform the performance of the classification task of MPNet+LASER when leveraging the KG t with 100 terms in combination with MPNet.
The lower part of Table 7 illustrates the intent classification task on the ProductServiceQA data set using the Siamese network. Analysing SOTA pre-trained embeddings, SBERT performs best (73.94); when leveraging the KGEs trained on automatically generated KGs, the combination of MPNet+LASER+KG t t with 100 terms further improves the performance of the classification task compared to the existing pre-trained models (74.69 vs. 73.94).
For the ATIS data set, SBERT demonstrates the best performance among the SOTA pre-trained models (lower part of Table 7). When leveraging the KGEs based on the automatically generated KGs, MPNet+KG t r e , further improves the performance of the intent classification task.
As the sentences in the ComQA data set appear frequently and are thus repetitive, we filtered the data set with sentences in a manner so that they appear only between two and five times in the data set. Compared to the entire ComQA data set (top part of Table 7), the classification precision drops due to the smaller set of sentences used to train the Siamese network (≈95 vs. ≈84). Table 8 demonstrates best performance by the concatenation of the MPNet+LASER embeddings (84.23), while concatenating sentence embeddings with the automatically generated KGs, i.e., SBERT+LASER+KG t with 100 terms, outperforms the usage of SOTA embeddings (84.87 vs. 84.23). Table A6 in Appendix A illustrates the extended analysis with different KGs and sets of terms and relations extracted by Saffron.

5.3. Filtering Knowledge Graphs Using LIME

Within the next steps, we analysed the significance of the automatically extracted terms and relations within the KGs. Therefore, we leveraged the LIME toolkit (cf. Section 3.4) to exclude terms in the original KGs that are not considered important by LIME.

5.3.1. Filtering for Intent Classification with Recurrent Neural Networks

For the ComQA data set (top part of Table 9), we observed two cases, i.e., KG t r with 100 terms and KG t r with 750 terms, where the classification performance significantly ( p < 0.05 ) improved compared to the original KGs (90.49 vs. 90.67 and 94.96 vs. 95.25). This demonstrates that these filtered KGs, which are reduced to one-third (or less) of their original size, hold important domain-specific information to guide the classifier to predict the correct intent.
For the ParaLex data set, three filtered KGs significantly improve ( p < 0.05 ) the performance of the classification task over the original KGs (54.72 vs. 55.14, 55.07 vs. 55.47 and 54.45 vs. 54.62).
Table 9. Intent classification evaluation in terms of precision for the targeted data sets using an RNN and most important terms within KG using LIME (bold numbers indicate the best results for each setting; Orig. = original KG, Filt. = filtered KG; * denote statistically significant, p = 0.05).
Table 9. Intent classification evaluation in terms of precision for the targeted data sets using an RNN and most important terms within KG using LIME (bold numbers indicate the best results for each setting; Orig. = original KG, Filt. = filtered KG; * denote statistically significant, p = 0.05).
ComQAKG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Data Set(100)(500)(750)(100)(500)(750)(100)(500)(750)
Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.
Embeddings18386873315124639241611512723471681425151032727675904133628
LASER+KG89.7489.1091.5490.2690.4990.4990.0390.67 *90.2690.7289.3390.9091.4891.1389.7489.7489.2289.45
LASER+SBERT+KG94.3894.8495.0795.2595.1994.6795.7795.6594.9695.3694.6795.25 *94.7895.3094.7895.4894.7894.61
LASER+MPNet+KG94.8495.1995.1994.9094.7294.2695.1394.6194.0394.61 *93.9193.9194.4994.3892.9393.3993.1694.20
ParaLexKG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Data Set(100)(500)(750)(100)(500)(750)(100)(500)(750)
Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.
Embeddings1697078526211163133431001156313147334364113814453401816388
LASER+KG54.0453.5754.3954.2254.7255.14 *53.9454.1554.7455.1454.4854.62 *54.3954.2954.3455.0954.0855.33
LASER+SBERT+KG54.2554.0854.7655.1154.4854.2754.0454.1154.4354.4155.0054.4853.9254.4654.5554.6955.2355.14
LASER+MPNet+KG54.4854.2055.4054.8354.8154.5353.8953.7355.0755.4755.1655.1654.4154.6954.9554.6755.2155.16
ProductServiceQA Data SetKG t KG t r KG t r e
Orig.Filt.Orig.Filt.Orig.Filt.
EmbeddingsDimension136344941291280286
LASER+KG132463.6463.5163.1662.4263.4263.16
LASER+SBERT+KG209268.5068.4668.7668.3767.8968.86
LASER+MPNet+KG209269.6068.9469.0369.1668.7768.16
Similarly to the aforementioned data sets, KG t r and KG t r e improve the performance of the classification task over the original KGs generated on the ProductServiceQA data set (lower part of Table 9).

5.3.2. Filtering for Intent Classification with Siamese Networks

In addition to the RNN classification using the filtered KGs, we perform the same experiment with the Siamese network. As seen in the top part of Table 10 for the ComQA data set, filtered KG t with 100 and 500 terms, respectively, significantly ( p < 0.05 ) outperform the performance of the classification task in comparison to the original KGs, which contain a larger set of terms and relations.
Similarly, applying the filtered KGs to the ParaLex data set, KG t r and KG t r e outperform their original counterparts while using the SBERT and MPNet embeddings, respectively.
As seen in the lower part of Table 10, the filtered KGs did not significantly outperform any of the original KGs generated from the ProductServiceQA data set. Nevertheless, minor improvement is detected for all KG variants with different SOTA embeddings.
Table 10. Intent classification evaluation in terms of precision for the targeted data sets using a Siamese network and most important terms within KG using LIME (bold numbers indicate the best results for each setting; Orig. = original KG, Filt. = filtered KG; * denote statistically significant, p = 0.05).
Table 10. Intent classification evaluation in terms of precision for the targeted data sets using a Siamese network and most important terms within KG using LIME (bold numbers indicate the best results for each setting; Orig. = original KG, Filt. = filtered KG; * denote statistically significant, p = 0.05).
ComQAKG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Data Set(100)(500)(750)(100)(500)(750)(100)(500)(750)
Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.
Embeddings18386873315124639241611512723471681425151032727675904133628
SBERT+KG94.9694.6794.4394.7894.7894.9094.7895.1394.3194.7894.7894.5594.7895.1394.3294.7894.7894.55
SBERT+LASER+KG95.1394.4994.5595.25 *94.6094.8494.7894.9694.5594.5594.9594.6694.1494.6193.8094.3292.9392.12
MPNet+KG95.1398.63 *94.4994.3892.8693.2894.4994.7294.0394.3292.3492.3594.4994.7294.0394.3292.3592.35
MPNet+LASER+KG95.0289.6294.4394.4992.9293.2894.1494.6193.7994.3292.9292.1294.1494.6193.8094.3292.9392.12
ParaLexKG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Data Set(100)(500)(750)(100)(500)(750)(100)(500)(750)
Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.Orig.Filt.
Embeddings1697078526211163133431001156313147334364113814453401816388
SBERT+KG49.2648.5948.8248.4349.4949.5348.9349.1349.4350.2349.3150.4249.3050.14 *50.5950.5449.9149.91
SBERT+LASER+KG49.1748.9449.5248.5949.1749.8648.6548.8349.2850.4049.3550.2852.7252.0453.6653.1052.7752.44
MPNet+KG52.2950.3551.2152.1151.5452.2850.1852.1650.5552.7951.8654.48 *52.8952.6353.1453.0752.7952.60
MPNet+LASER+KG51.4950.5450.9752.4250.5752.4650.8653.2651.0452.5851.7553.8752.7252.0453.6653.1052.7752.44
ProductServiceQAKG t KG t r KG t r e
Orig.Filt.Orig.Filt.Orig.Filt.
EmbeddingsDimension136344941291280286
SBERT+KG106873.5173.2373.7773.6773.7373.67
SBERT+LASER+KG209273.1673.0674.0874.0673.0773.36
MPNet+KG106873.6473.8074.3774.5073.5973.45
MPNet+LASER+KG209273.8173.4973.7773.1473.2973.49

5.4. Multilingual Setting

As a final experiment, we leverage the translations into English of the multilingual ProductServiceQA data set. Table 11 illustrates the intent classification task when the Spanish language is used. The best performance with pre-trained models is demonstrated with SOTA MPNet+LASER embeddings. When leveraging the KGEs trained on the automatically generated KGs, the classification precision increases to 65.57 when combining the embeddings as MPNet+LASER+KG.
We performed the same experiment with the Chinese intents, which were translated into English. In this setting, the MPNet+LASER embedding combination outperforms other SOTA pre-trained embeddings (Table 12). Similarly to the experiment on the Spanish language, employing the automatically generated KG, in this case in combination with MPNet, further improves the performance of the classification task.

6. Conclusions

In this paper, we presented work on leveraging automatically generated knowledge graphs for intent classification. We provide an analysis of each step, i.e., term extraction, named entity recognition, and relation extraction, towards the creation of knowledge graphs and provide insights on their evaluation and manual curation steps. We perform the intent classification using state-of-the-art sentence embeddings and combine these with domain-specific knowledge graph embeddings trained on the automatically generated knowledge graphs. We evaluate our methodology on four different data sets and demonstrate that the domain-specific knowledge within knowledge graphs further improves the performance on the intent classification task. Furthermore, we study the set of terms and relations within the knowledge graphs and filter them by importance by leveraging the LIME tool. Finally, we leverage the Spanish and Chinese intents of the proprietary ProductServiceQA data set and leverage machine translation to perform the classification on noisy intents translated into English. Our ongoing work focuses on the use of knowledge graph extraction for use in multi-turn intent identification, more specifically on generating questions to direct a user to a more specific answer through knowledge subgraph identification.

Author Contributions

Conceptualization, S.D., H.A., J.P.M. and P.B.; methodology, M.A., S.D., H.A., J.P.M. and P.B.; software, S.M., C.R., G.V., D.P. and S.S.; validation, S.M., C.R., G.V., D.P. and S.S.; formal analysis, S.M., C.R. and G.V.; investigation, M.A. and G.V.; writing—original draft preparation, M.A., S.M., C.R., G.V., D.P., S.S., S.D., H.A., J.P.M. and P.B.; writing—review and editing, M.A., G.V., S.D., H.A., J.P.M. and P.B.; supervision, P.B.; project administration, M.A.; funding acquisition, P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This publication has emanated from research supported in part by a grant from Science Foundation Ireland under Grant number SFI/12/RC/2289_P2. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Extended intent classification evaluation for the ComQA data set using an RNN and the automatically extracted KGs (bold numbers indicate the best results for each setting).
Table A1. Extended intent classification evaluation for the ComQA data set using an RNN and the automatically extracted KGs (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Prec. Best Embeddings with KGDim.Prec.
SBERT76898.36 LASER+SBERT+KG t (500)209299.45
LASER102496.75 LASER+MPNet+KG t (750)209299.45
MPNet76898.63 LASER+SBERT+KG t (750)/GloVe209299.45
LASER+SBERT179298.28
LASER+SBERT+GloVe209298.63
KG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Embeddings with KGDim.(100)(500)(750)(100)(500)(750)(100)(500)(750)DBpedia
KG30040.7175.4186.8945.0875.1383.6179.9684.7093.3414.92
Concat.LASER+KG132495.3595.6295.0895.6395.0895.0895.9095.9095.6396.17
LASER+SBERT+KG209298.9099.1899.4598.9198.6398.6398.3698.6398.9198.91
LASER+MPNet+KG209299.1899.4598.0998.9198.3698.6398.0998.6398.3698.36
Substit.LASER+KG/GloVe132494.8194.5495.3694.8193.7294.2695.3696.7295.3696.72
LASER+SBERT+KG/GloVe209298.3698.6398.9198.0998.9199.4598.9198.9198.3698.09
LASER+MPNet+KG/GloVe209297.5498.0998.3697.5498.3698.0998.3698.9197.8198.36
Table A2. Extended intent classification evaluation for the ParaLex data set using an RNN and the automatically extracted KGs (bold numbers indicate the best results for each setting).
Table A2. Extended intent classification evaluation for the ParaLex data set using an RNN and the automatically extracted KGs (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76854.06 LASER+MPNet+KG/GloVe209255.42
LASER102452.92
MPNet76853.80
LASER+SBERT179254.07
LASER+SBERT+GloVe209254.41
KG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Embeddings with KGDim.(100)(500)(750)(100)(500)(750)(100)(500)(750)DBpedia
KG 22.3846.6749.3925.8647.8247.6530.3448.6950.4520.15
Concat.LASER+KG132454.0454.3954.7253.9454.7454.4854.4354.9554.4653.24
LASER+SBERT+KG209254.2554.7654.4854.0454.4355.0054.1154.6754.2953.66
LASER+MPNet+KG209254.4855.4054.8153.8955.0755.1654.4655.2855.1453.66
Substit.LASER+KG/GloVe132451.4154.2753.4752.9154.2054.2754.2554.4654.2951.55
LASER+SBERT+KG/GloVe209252.3754.3953.2652.1152.4953.5454.5854.9055.1653.43
LASER+MPNet+KG/GloVe209251.6954.6553.1053.4553.4054.7954.6255.3555.4251.64
Table A3. Extended intent classification evaluation for the ProductServiceQA data set using an RNN and the automatically extracted KG with 100 terms (bold numbers indicate the best results for each setting).
Table A3. Extended intent classification evaluation for the ProductServiceQA data set using an RNN and the automatically extracted KG with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76868.02 LASER+MPNet+DBpedia209270.00
LASER102462.68
MPNet76869.25
LASER+SBERT179268.60
LASER+SBERT+GloVe209268.40
Embeddings with KGDim.Bench. KG t Bench. KG t r Bench. KG t r e KG t KG t r KG t r e KG t r e f DBpedia
KG30026.1934.9138.1025.6231.8045.1539.3323.61
Concat.LASER+KG132463.2062.0662.4663.6463.1663.4263.0362.77
LASER+SBERT+KG209268.6868.3767.1468.5068.7667.8968.1167.37
LASER+MPNet+KG209268.7768.9468.2469.5168.1668.7769.2170.00
Substit.LASER+KG/GloVe132459.7561.7660.9359.7560.1862.3362.0760.27
LASER+SBERT+KG/GloVe209267.1567.8568.3367.7668.5568.4668.0767.76
LASER+MPNet+KG/GloVe209267.5967.0266.1467.8568.5167.1568.3768.64
Table A4. Extended intent classification evaluation for the ATIS data set using an RNN and the automatically extracted KG with 100 terms (bold numbers indicate the best results for each setting).
Table A4. Extended intent classification evaluation for the ATIS data set using an RNN and the automatically extracted KG with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76898.67 LASER+KG132499.25
LASER102498.87 LASER+MPNet+KG209299.25
MPNet76898.43 LASER+SBERT+KG/GloVe209299.25
LASER+SBERT179298.50 LASER+MPNet+KG/GloVe209299.25
LASER+SBERT+GloVe209298.62
Embeddings with KGDim.KG t KG t r KG t r e
KG30091.3791.3793.87
Concat.KG30091.3791.3793.87
KG+Glove60098.2598.6298.00
LASER+KG132499.2598.6298.25
LASER+SBERT+KG209298.2599.2598.50
LASER+MPNet+KG209299.2599.2598.50
Substit.LASER+KG/GloVe132498.8798.6297.87
LASER+SBERT+KG/GloVe209299.0098.3799.25
LASER+MPNET+KG/GloVe209299.1299.1299.25
Table A5. Extended intent classification evaluation for the ComQA data set using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting).
Table A5. Extended intent classification evaluation for the ComQA data set using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76895.18 SBERT+LASER+KG209295.18
SBERT+LASER179294.66
MPNET76894.37
MPNET+LASER179294.14
KG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Embeddings with KGDim.(100)(500)(750)(100)(500)(750)(100)(500)(750)
SBERT+KG106894.9694.4394.7894.7894.3194.7894.5594.6194.78
SBERT+LASER+KG209295.1394.5594.6094.7894.5594.9595.1894.5594.43
MPNet+KG106895.1394.4992.8694.4994.0392.3493.4392.8792.23
MPNet+LASER+KG209295.0294.4392.9294.1493.7992.9293.9193.2791.65
Table A6. Extended intent classification evaluation for the ComQA data set, filtered by questions with a frequency between two and five, using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting).
Table A6. Extended intent classification evaluation for the ComQA data set, filtered by questions with a frequency between two and five, using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76883.31 SBERT+LASER+KG209284.87
SBERT+LASER179284.12
MPNET76883.25
MPNET+LASER179284.23
KG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Embeddings with KGDim.(100)(500)(750)(100)(500)(750)(100)(500)(750)
SBERT+KG106883.7883.4884.2484.0784.6583.9584.0183.3183.60
SBERT+LASER+KG209284.8784.4283.5483.3183.8984.0184.1883.1483.78
MPNet+KG106884.2984.0784.1284.3083.3783.8983.9583.8983.95
MPNet+LASER+KG209283.0283.8983.4983.8983.3183.8983.5484.3683.78
Table A7. Extended intent classification evaluation for the ParaLex data set using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting; * denote statistically significant, p = 0.05).
Table A7. Extended intent classification evaluation for the ParaLex data set using a Siamese network and the automatically extracted KGs (bold numbers indicate the best results for each setting; * denote statistically significant, p = 0.05).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76848.81 MPNet+KG106852.29
SBERT+LASER179249.75
MPNET76850.33
MPNET+LASER179250.47
KG t KG t KG t KG t r KG t r KG t r KG t r e KG t r e KG t r e
Embeddings with KGDim.(100)(500)(750)(100)(500)(750)(100)(500)(750)
SBERT+KG106849.2648.8249.4948.9349.4349.3149.2849.7349.12
SBERT+LASER+KG209249.1749.5249.1748.6549.2849.3548.8948.9349.49
MPNet+KG106852.29 *51.2151.5450.1850.5551.8651.1851.6850.62
MPNet+LASER+KG209251.4950.9750.5750.8651.0451.7551.2850.6951.20
Table A8. Extended intent classification evaluation for the ProductServiceQA data set using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
Table A8. Extended intent classification evaluation for the ProductServiceQA data set using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76873.94 MPNet+LASER+KG209274.69
SBERT+LASER179273.77
MPNET76873.55
MPNET+LASER179273.51
Embeddings with KGDim.Bench. KG t Bench. KG t r Bench. KG t r e KG t KG t r KG t r e
SBERT+KG106874.0373.7374.5673.5173.7773.73
SBERT+LASER+KG209273.6873.7773.5173.1674.0873.07
MPNet+KG106874.0873.6473.6873.6474.3773.59
MPNet+LASER+KG209274.6474.6974.1673.8173.7773.29
Table A9. Extended intent classification evaluation for the ATIS data set using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
Table A9. Extended intent classification evaluation for the ATIS data set using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76899.37 MPNet+KG106899.50
SBERT+LASER179299.00
MPNET76898.62
MPNET+LASER179298.62
Embeddings with KGDim.KG t KG t r KG t r e
SBERT+KG106899.2599.5099.37
SBERT+LASER+KG209299.0099.3799.25
MPNet+KG106899.1299.1299.50
MPNet+LASER+KG209298.7598.5099.37
Table A10. Extended intent classification evaluation for the Spanish ProductServiceQA data set translated into English using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
Table A10. Extended intent classification evaluation for the Spanish ProductServiceQA data set translated into English using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76862.24 MPNet+LASER+KG 65.57
MPNET76864.34
SBERT+LASER179261.62
MPNET+LASER179265.00
Embeddings with KGDim.Bench. KG t Bench. KG t r Bench. KG t r e KG t KG t r KG t r e
SBERT+KG106862.8961.5861.7662.3362.4162.54
SBERT+LASER+KG209262.3361.9762.9461.3262.4660.92
MPNet+KG106863.9560.0061.0664.3463.9060.31
MPNet+LASER+KG209265.5759.1362.4164.9163.2959.83
Table A11. Extended intent classification evaluation for the Chinese ProductServiceQA data set translated into English using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
Table A11. Extended intent classification evaluation for the Chinese ProductServiceQA data set translated into English using a Siamese network and the automatically extracted KGs with 100 terms (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDim.Precision Best Embeddings with KGDim.Precision
SBERT76858.12 MPNet+KG 60.66
MPNET76859.30
SBERT+LASER179259.01
MPNET+LASER179259.70
Embeddings with KGDim.Bench. KG t Bench. KG t r Bench. KG t r e KG t KG t r KG t r e
SBERT+KG106858.4758.2558.5658.0859.7058.12
SBERT+LASER+KG209257.3257.4258.5158.3057.4258.34
MPNet+KG106860.5755.7156.1560.6659.4855.54
MPNet+LASER+KG209260.1855.2357.5960.4058.8255.45

References

  1. Temerak, M.S.; El-Manstrly, D. The influence of goal attainment and switching costs on customers’ staying intentions. J. Retail. Consum. Serv. 2019, 51, 51–61. [Google Scholar] [CrossRef]
  2. Abujabal, A.; Roy, R.S.; Yahya, M.; Weikum, G. ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters. arXiv 2018, arXiv:1809.09528. [Google Scholar]
  3. Fader, A.; Zettlemoyer, L.; Etzioni, O. Paraphrase-Driven Learning for Open Question Answering. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013. [Google Scholar]
  4. Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; van Kleef, P.; Auer, S.; et al. DBpedia—A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semant. Web J. 2015, 6, 167–195. [Google Scholar] [CrossRef]
  5. Cavalin, P.; Alves Ribeiro, V.H.; Appel, A.; Pinhanez, C. Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3952–3961. [Google Scholar] [CrossRef]
  6. Zhang, H.; Zhang, Y.; Zhan, L.M.; Chen, J.; Shi, G.; Wu, X.M.; Lam, A.Y. Effectiveness of Pre-training for Few-shot Intent Classification. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; pp. 1114–1120. [Google Scholar]
  7. Zhang, J.; Ye, Y.; Zhang, Y.; Qiu, L.; Fu, B.; Li, Y.; Yang, Z.; Sun, J. Multi-Point Semantic Representation for Intent Classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9531–9538. [Google Scholar] [CrossRef]
  8. Purohit, H.; Dong, G.; Shalin, V.; Thirunarayan, K.; Sheth, A. Intent Classification of Short-Text on Social Media. In Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), Chengdu, China, 19–21 December 2015; pp. 222–228. [Google Scholar] [CrossRef]
  9. Ahmad, Z.; Ekbal, A.; Sengupta, S.; Maitra, A.; Ramnani, R.; Bhattacharyya, P. Unsupervised Approach for Knowledge-Graph Creation from Conversation: The Use of Intent Supervision for Slot Filling. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
  10. Yu, C.; Wang, W.; Liu, X.; Bai, J.; Song, Y.; Li, Z.; Gao, Y.; Cao, T.; Yin, B. FolkScope: Intention Knowledge Graph Construction for Discovering E-commerce Commonsense. arXiv 2022, arXiv:2211.08316. [Google Scholar]
  11. Pinhanez, C.S.; Candello, H.; Cavalin, P.; Pichiliani, M.C.; Appel, A.P.; Alves Ribeiro, V.H.; Nogima, J.; de Bayser, M.; Guerra, M.; Ferreira, H.; et al. Integrating Machine Learning Data with Symbolic Knowledge from Collaboration Practices of Curators to Improve Conversational Systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
  12. He, Y.; Jia, Q.; Yuan, L.; Li, R.; Ou, Y.; Zhang, N. A Concept Knowledge Graph for User Next Intent Prediction at Alipay. arXiv 2023, arXiv:2301.00503. [Google Scholar] [CrossRef]
  13. Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1441–1451. [Google Scholar] [CrossRef]
  14. He, T.; Xu, X.; Wu, Y.; Wang, H.; Chen, J. Multitask Learning with Knowledge Base for Joint Intent Detection and Slot Filling. Appl. Sci. 2021, 11, 4887. [Google Scholar] [CrossRef]
  15. Siddique, A.B.; Jamour, F.T.; Xu, L.; Hristidis, V. Generalized Zero-shot Intent Detection via Commonsense Knowledge. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021. [Google Scholar]
  16. Shabbir, J.; Arshad, M.U.; Shahzad, W. NUBOT: Embedded Knowledge Graph With RASA Framework for Generating Semantic Intents Responses in Roman Urdu. arXiv 2021, arXiv:2102.10410. [Google Scholar]
  17. Sant’Anna, D.T.; Caus, R.O.; dos Santos Ramos, L.; Hochgreb, V.; dos Reis, J.C. Generating Knowledge Graphs from Unstructured Texts: Experiences in the E-commerce Field for Question Answering. In Proceedings of the Joint Proceedings of Workshops AI4LEGAL2020, NLIWOD, PROFILES 2020, QuWeDa 2020 and SEMIFORM2020 Colocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Virtual Conference, 1–6 November 2020; Koubarakis, M., Alani, H., Antoniou, G., Bontcheva, K., Breslin, J.G., Collarana, D., Demidova, E., Dietze, S., Gottschalk, S., Governatori, G., et al., Eds.; 2020; Volume 2722, pp. 56–71. [Google Scholar]
  18. Hu, J.; Wang, G.; Lochovsky, F.; Sun, J.t.; Chen, Z. Understanding User’s Query Intent with Wikipedia. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, Madrid, Spain, 20–24 April 2009; pp. 471–480. [Google Scholar] [CrossRef]
  19. Balažević, I.; Allen, C.; Hospedales, T.M. TuckER: Tensor Factorization for Knowledge Graph Completion. In Proceedings of the Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
  20. Pennington, J.; Socher, R.; Manning, C. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
  21. Artetxe, M.; Schwenk, H. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Trans. Assoc. Comput. Linguist. 2019, 7, 597–610. [Google Scholar] [CrossRef]
  22. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
  23. Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. MPNet: Masked and Permuted Pre-training for Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 16857–16867. [Google Scholar]
  24. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  25. Mcnemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
  26. Bordea, G.; Buitelaar, P.; Polajnar, T. Domain-independent term extraction through domain modelling. In Proceedings of the 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France, 28–30 October 2013. [Google Scholar]
  27. Manjunath, S.H.; McCrae, J.P. Encoder-Attention-Based Automatic Term Recognition (EA-ATR). In Proceedings of the 3rd Conference on Language, Data and Knowledge (LDK 2021), Zaragoza, Spain, 1–3 September 2021; Schloss Dagstuhl-Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2021. [Google Scholar]
  28. Akbik, A.; Blythe, D.; Vollgraf, R. Contextual String Embeddings for Sequence Labeling. In Proceedings of the COLING 2018, 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 1638–1649. [Google Scholar]
  29. Pereira, B.; Robin, C.; Daudert, T.; McCrae, J.P.; Mohanty, P.; Buitelaar, P. Taxonomy Extraction for Customer Service Knowledge Base Construction. In Semantic Systems. The Power of AI and Knowledge Graphs; Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 175–190. [Google Scholar]
  30. Chen, D.; Manning, C. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 740–750. [Google Scholar] [CrossRef]
  31. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010. [Google Scholar]
  32. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Figure 1. Visualisation of the LIME framework explanation with the top-k important words (k = 5) and their contribution degree based on the provided question.
Figure 1. Visualisation of the LIME framework explanation with the top-k important words (k = 5) and their contribution degree based on the provided question.
Information 14 00288 g001
Figure 2. Knowledge graph generation pipeline.
Figure 2. Knowledge graph generation pipeline.
Information 14 00288 g002
Table 1. Statistics on the data sets used, i.e., ComQA, ParaLex, ProductServiceQA, and ATIS.
Table 1. Statistics on the data sets used, i.e., ComQA, ParaLex, ProductServiceQA, and ATIS.
ProductServiceQAComQAParaLexATIS
# Total samples7611182921,3065632
# Samples (train)5328146317,0454833
# Samples (test)22833664261799
# Classes3382722758
Table 2. Information on different KGs and statistics on the benchmarks and the automatically generated KGs of the ProductServiceQA data set.
Table 2. Information on different KGs and statistics on the benchmarks and the automatically generated KGs of the ProductServiceQA data set.
Benchmark KG t Benchmark KG t r Benchmark KG t r e KG t KG t r KG t r e
TaxonomyYYYYYY
Semantic RelationsNYYNYY
Named EntitiesNNYNNY
Unique Concepts848497100100908
Unique Relations12212211230259
Vocabulary6019039236166468
Table 3. Flair results for different embedding types on the ProductServiceQA data set.
Table 3. Flair results for different embedding types on the ProductServiceQA data set.
EmbeddingPrecisionRecallF 1
Flair (Forward+Backward)0.940.920.93
Flair (forward+backward)+GloVe0.950.920.93
Flair (Forward)+GloVe0.940.920.93
GloVe0.920.910.91
BERT0.930.910.93
ELMo0.940.910.93
Table 4. Intent classification evaluation for the targeted data sets using an RNN (bold numbers indicate the best results for each setting).
Table 4. Intent classification evaluation for the targeted data sets using an RNN (bold numbers indicate the best results for each setting).
ComQA Data SetParaLex Data Set
SOTA EmbeddingsDimensionPrecisionSOTA EmbeddingsDimensionPrecision
SBERT76898.36SBERT76854.06
LASER102496.75LASER102452.92
MPNet76898.63MPNet76853.80
LASER+SBERT179298.28LASER+SBERT179254.07
LASER+SBERT+GloVe209298.63LASER+SBERT+GloVe209254.41
Best Embeddings with KGDimensionPrecisionBest Embeddings with KGDimensionPrecision
LASER+SBERT+KG t (750)209299.45LASER+MPNet+KG t r (750)/GloVe209255.42
LASER+MPNet+KG t (500)209299.45
LASER+SBERT+KG t r (750)/GloVe209299.45
ProductServiceQA Data SetATIS Data Set
SOTA EmbeddingsDimensionPrecisionSOTA EmbeddingsDimensionPrecision
SBERT76868.02SBERT76898.67
LASER102462.68LASER102498.87
MPNet76869.25MPNet76898.43
LASER+SBERT179268.60LASER+SBERT179298.50
LASER+SBERT+GloVe209268.40LASER+SBERT+GloVe209298.62
Best Embeddings with KGDimensionPrecisionBest Embeddings with KGDimensionPrecision
LASER+MPNet+KG (DBpedia)209270.00LASER+KG t (100)132499.25
LASER+SBERT+KG t (100)209299.25
LASER+MPNet+KG t r (100)209299.25
LASER+MPNet+KG t r (100)209299.25
LASER+SBERT+KG t r e (100)/GloVe209299.25
LASER+MPNet+KG t r e (100)/GloVe209299.25
Table 5. Statistics on the automatically generated KG t r e with different thresholds of terms.
Table 5. Statistics on the automatically generated KG t r e with different thresholds of terms.
Terms1002003005001000
Unique Concepts9081008110813081808
Unique Relations259279299305324
Vocabulary468494529553653
Table 6. Impact of different sets of terms within the KG t r e for intent classification, based on ProductServiceQA (bold numbers indicate the best results for each setting).
Table 6. Impact of different sets of terms within the KG t r e for intent classification, based on ProductServiceQA (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDimensionPrecisionBest Embeddings with KGDimensionPrecision
SBERT76868.02LASER+MPNet+KG t r e (100)209269.99
LASER102462.68
MPNet76869.25
LASER+SBERT179268.60
LASER+SBERT+GloVe209268.40
Number of Set Terms
Embeddings with KGDimension1002003005001000
KG30040.3440.3441.6142.1444.20
Concat.LASER+KG132462.1562.1561.9462.8552.91
LASER+SBERT+KG209268.2468.2467.8967.8567.85
LASER+MPNet+KG209269.9968.3768.7768.2968.46
Substit.LASER+KG/GloVe132462.5160.5861.5462.6460.36
LASER+SBERT+KG/GloVe209268.2068.3768.2067.8167.41
LASER+MPNet+KG/GloVe209267.8967.9067.1967.7667.24
Table 7. Intent classification evaluation for the targeted data sets using a Siamese network (bold numbers indicate the best results for each setting; * denote statistically significant, p = 0.05).
Table 7. Intent classification evaluation for the targeted data sets using a Siamese network (bold numbers indicate the best results for each setting; * denote statistically significant, p = 0.05).
ComQA Data SetParaLex Data Set
SOTA EmbeddingsDimensionPrecisionSOTA EmbeddingsDimensionPrecision
SBERT76895.18SBERT76848.81
SBERT+LASER179294.66SBERT+LASER179249.75
MPNET76894.37MPNET76850.33
MPNET+LASER179294.14MPNET+LASER179250.47
Best Embeddings with KGDimensionPrecisionBest Embeddings with KGDimensionPrecision
SBERT+LASER+KG t r e (100)209295.18MPNET+KG t (100)106852.29 *
ProductServiceQA Data SetATIS Data Set
SOTA EmbeddingsDimensionPrecisionSOTA EmbeddingsDimensionPrecision
SBERT76873.94SBERT76899.37
SBERT+LASER179273.77SBERT+LASER179299.00
MPNET76873.55MPNET76898.62
MPNET+LASER179273.51MPNET+LASER179298.62
Best Embeddings with KGDimensionPrecisionBest Embeddings with KGDimensionPrecision
MPNet+LASER+KG t r (100)209274.69MPNet+KG t r e (100)106899.50
Table 8. Intent classification evaluation for the ComQA data set, filtered by questions with a frequency between two and five, using a Siamese network (bold numbers indicate the best results for each setting).
Table 8. Intent classification evaluation for the ComQA data set, filtered by questions with a frequency between two and five, using a Siamese network (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDimensionPrecision
SBERT76883.31
SBERT+LASER179284.12
MPNET76883.25
MPNET+LASER179284.23
Best Embeddings with KGDimensionPrecision
SBERT+LASER+KG t (100)209284.87
Table 11. Intent classification evaluation for the Spanish ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
Table 11. Intent classification evaluation for the Spanish ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDimensionPrecision
SBERT76862.24
MPNET76864.34
SBERT+LASER109261.62
MPNET+LASER109265.00
Best Embeddings with KGDimensionPrecision
MPNet+LASER+KG t (100)139265.57
Table 12. Intent classification evaluation for the Chinese ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
Table 12. Intent classification evaluation for the Chinese ProductServiceQA data set translated into English using a Siamese network (bold numbers indicate the best results for each setting).
SOTA EmbeddingsDimensionPrecision
SBERT76858.12
MPNET76859.30
SBERT+LASER109259.01
MPNET+LASER109259.70
Best Embeddings with KGDimensionPrecision
MPNet+KG t (100)109260.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arcan, M.; Manjunath, S.; Robin, C.; Verma, G.; Pillai, D.; Sarkar, S.; Dutta, S.; Assem, H.; McCrae, J.P.; Buitelaar, P. Intent Classification by the Use of Automatically Generated Knowledge Graphs. Information 2023, 14, 288. https://doi.org/10.3390/info14050288

AMA Style

Arcan M, Manjunath S, Robin C, Verma G, Pillai D, Sarkar S, Dutta S, Assem H, McCrae JP, Buitelaar P. Intent Classification by the Use of Automatically Generated Knowledge Graphs. Information. 2023; 14(5):288. https://doi.org/10.3390/info14050288

Chicago/Turabian Style

Arcan, Mihael, Sampritha Manjunath, Cécile Robin, Ghanshyam Verma, Devishree Pillai, Simon Sarkar, Sourav Dutta, Haytham Assem, John P. McCrae, and Paul Buitelaar. 2023. "Intent Classification by the Use of Automatically Generated Knowledge Graphs" Information 14, no. 5: 288. https://doi.org/10.3390/info14050288

APA Style

Arcan, M., Manjunath, S., Robin, C., Verma, G., Pillai, D., Sarkar, S., Dutta, S., Assem, H., McCrae, J. P., & Buitelaar, P. (2023). Intent Classification by the Use of Automatically Generated Knowledge Graphs. Information, 14(5), 288. https://doi.org/10.3390/info14050288

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop