You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

15 February 2019

MOLI: Smart Conversation Agent for Mobile Customer Service

,
,
,
,
,
,
,
,
and
1
No. 10, East Xibeiwang Rd., Haidian District, Beijing 100094, China
2
Alt-Moabit 91c, 10559 Berlin, Germany
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Artificial Intelligence—Methodology, Systems, and Applications

Abstract

Human agents in technical customer support provide users with instructional answers to solve a task that would otherwise require a lot of time, money, energy, physical costs. Developing a dialogue system in this domain is challenging due to the broad variety of user questions. Moreover, user questions are noisy (for example, spelling mistakes), redundant and have various natural language expressions. In this work, we introduce a conversational system, MOLI (the name of our dialogue system), to solve customer questions by providing instructional answers from a knowledge base. Our approach combines models for question type and intent category classification with slot filling and a back-end knowledge base for filtering and ranking answers, and uses a dialog framework to actively query the user for missing information. For answer-ranking we find that sequential matching networks and neural multi-perspective sentence similarity networks clearly outperform baseline models, achieving a 43% error reduction. The end-to-end P@1(Precision at top 1) of MOLI was 0.69 and the customers’ satisfaction was 0.73.

1. Introduction

For many companies, customers can seek customer support from multiple channels such as web page, Facebook or APP. Besides, according to the research of China Information Industry Network (CNII), customer service and support is a sizable and growing market globally, as well as in China. In response to tremendous demand in our company and market we develop our smart customer service MOLI for mobile.
“My Wi-Fi is not working anymore!”—most mobile device users probably have faced this or a similar questions in the past. Solving such questions is the task of customer support agents (CSAs). For frequent questions and user intents, for which solutions often exist in the form of user guides and question-answer knowledge base (QA-KB), this is a repetitive and time consuming process. Automating such conversations would significantly reduce the time CSAs have to invest in solving common questions, which they could then spend on more complex or previously unseen customer problems [1].
Recent advances in dialog systems have led to successful applications in domains such as restaurant [2] and flight bookings [3], providing a convenient way for users to interact with backend services and knowledge bases in natural language, via speech or text-based input. Developing a dialog system for technical customer support presents additional challenges due to the broad variety of topics and tasks that need to be handled. The task is made even more challenging by the fact that the dialogs are often noisy, contain grammatical errors, and incomplete user turns. They also refer to concepts and entities not recognized by standard NER tools (e.g., devices, components). Due to the non-technical background of most customers, problem descriptions can be ambiguous, too colloquial with respect to the more formalized, technical QA knowledge base texts, and possibly miss important information that is necessary to identify a unique and correct solution. CSAs therefore often query customers for contextual information, such as device model and mobile carrier, in order to identify the exact issue and to select a good answer.
With the work described in this paper, we aim to automatize this task of matching instructional answers from a QA-KB to user queries described by users in online support chats. We describe an approach to conversational question answering in the little-explored domain of technical customer support. Our approach selects the best answer from a QA-KB in a dialog-oriented fashion, using intent classification to narrow down answer candidates and in particular to pro-actively query the user for missing information.

3. Conversational Question Answering

3.1. Problem Formalization

The goal of our approach was to identify the answer a i from a corpus of N QA pairs Q A = { ( q 1 , a 1 ) , , ( q n , a n ) } that best matched the user’s question as expressed by a sequence of user turns T = t 1 , , t T . q i is a representative question such as “How to setup email” which prototypically stands for other questions that can be answered by a i . As Figure 1 illustrates, QA pairs as well as user turns are expressed as free text.
Figure 1. Example chat about an “email account setup”.
For each pair ( q i , a i ) , there exists meta data, recorded in the form of properties p i , 1 , , p i , m . Properties describe the context to which a given QA pair applies, such as a specific device name and operating system version. They typically take a single value from a limited set of possible values.
At each turn t k during the conversation our system estimates the utility u i , k I R 0 of answer a i given the current context C = < t 1 , , t k , P k > where P k is a list of relevant properties that have already been identified (filled `slots’).

3.2. Dialog System Overview

Figure 2 gives an overview of the overall architecture of the dialog system. Customers can interact with the system either by entering free text or by selecting one of a set of predefined choices suggested by the system. Choices can be of different types—frequently asked questions at the start of a conversation, or, for instance, product names if the system queried for the customer’s device in the previous turn. Upon receiving a user turn t k , the natural language understanding (NLU) component’s task was to transform t k into a structured representation. It first performed sentence segmentation and tokenization of the input using the Stanford CoreNLP toolkit [23], and then determined the question type, namely, how-to or others. In addition, the NLU component performed intent classification to identify user intent that represents a similar question set. The intent was used for narrowing down candidate standard QA pairs from QA-KB. Then the slot filling (SF) component is used for identifying entities such as product names and attributes, which are linked to concepts in a product knowledge graph (KG). SF is based on a combination of template and sequence classification approaches. After question type classification,intent classification and slot filling, the system determined the truly concerned specific question of user. Since SF component is based on combination of template and sequence classification approaches, the model with high accuracy but low recall. We add a semantic matching component to improve recall without decreasing accuracy.
Figure 2. Dialog system architecture.
The main task of the dialog manager (DM) was to maintain the dialogue context (semantic frame), which encoded all available structured information, and decided on the next system action. If the semantic frame was not complete, that is, it was lacking a slot value such as product information, the DM can re-ask, confirm, or clarify to update the dialogue context and keep the semantic frame unambiguous and complete. If a product name was not detected, the DM may also query the customer database for the most recent product purchased by this customer. Given the question type, the intent category and the values of already filled slots, the DM retrieved the list of potential QA pairs by querying the QA-KB. It then either asked the customer for more information to fill empty slots (which are specified by the properties of the QA pairs) to narrow down this list, or it runs the semantic matching component to rank the remaining QA pairs The DM can also ask for confirmation or disambiguation of a slot filler extracted by the NLU component. DM action choices were passed through a template text-based natural language generation component (NLG) or were converted into option choices represented by buttons in the user interface. At any time during the conversation, the system or the user may choose to refer the conversation to a human support agent.

4. The Proposed Model

4.1. Question Type Classification

Question type classification is the entrance of NLU, which is important for the performance of the whole system. Here, we define n-gram information as a local semantic feature and long dependency relationship between words or phrases as a global structure feature. Most existing question type classification models either learn little structure information or just rely on pre-defined structures, leading to degradation of performance and generalization capability. To address this issue, we propose a sandwich neural network (SNN) to learn semantic and structure representations automatically.
SNN contains four parts: first LSTM layer, CNN and pooling layer, second LSTM layer and concatenation and loss layer. The first LSTM layer was inspired by DSCNN [24] to adjust the word representation which takes the context into account. CNN and pooling was used to learn local n-gram semantic representation. The second LSTM layer was inspired by C-LSTM [25] to use the filter maps after convolution to represent the high-level phrase representation and feed it into following LSTM to learn long-dependency structure representation. The last concatenation and loss layer was used to concatenate these two representations as a new one, to compute loss through corss-entorpy. Figure 3 shows the architecture of our model, where CNN is in the middle of two LSTM layers like a sandwich. Then, we will describe SNN in detail.
Figure 3. Dialog system architecture.

4.1.1. First LSTM Layer

We let our model’s input be a sentence of length s : [ w 1 , w 2 , . . . , w s ] , c be the whole number of word embedding versions and x i ( j ) is the i t h word’s embeddings of the j t h version. The common word embedding versions are Word2vec and Glove.
Our model’s first layer consisted of LSTM networks which processed different versions of word embeddings. For every version of word embeddings, there was an according LSTM network where the input x t R d is the d-dimensional word embedding for w t . The LSTM layer will produce a hidden state representation h t R d at each time step. The hidden state representations will be set as the output of LSTM layers:
h ( i ) = [ h 1 ( i ) , h 2 ( i ) , . . . , h t ( i ) , . . . , h s ( i ) ]
for i = 1 , 2 , . . . , c .

4.1.2. CNN Layer

The second layer was a CNN. To utilize multiple kinds of word embeddings, we applied a filter F R c × d × l , where l is the size of convolution window. The i t h version of word embedding produce the hidden state sequence h ( i ) , which forms one channel of the feature map. Then these feature maps are stacked c-channel feature maps X R c × d × s .
Afterwards, filter F convolved with the window vectors (l-gram) at each position to generate a feature map c R s - l + 1 ; c k is the element of the feature map c for window vector X k : k + l - 1 at position k and it is produced as follows:
c k = f ( i , j , r ( F X k : k + l - 1 ) i , j , r ) ,
where ⨀ denotes element-wise multiplication.
The n feature maps generated from n filters can be rearranged through column vector concatenation method to form a new representation,
W = [ c 1 ; c 2 ; . . . ; c n ]
Each row W j of W R ( s - l + 1 ) × n is the feature map generated from n filters for the window vector at position j. The new successive higher-level representations were then fed into the last LSTM layer.
Here, a max-over-time pooling layer was added after the convolution neural network. The pooling result of the feature map c is :
p = m a x ( c 1 , c 2 , . . . , c s - l + 1 )
These pooling results are used as our local semantic representation s e R n :
se = [ p 1 , p 2 , . . . , p n ]

4.1.3. Second LSTM Layer

We used the same number of filters n to denote the dimension in this LSTM layer for easy and fair fusion in the latter, and use the last hidden unit of LSTM as global structure representation st R n .

4.1.4. Concatenation and Loss Layer

Thus, we got the local semantic representation se and global structure representation st . Then we concatenated se and st to get the sentence representation and compute loss through cross entropy.

4.2. Intent Category Classification

The correct intent can reduce the number of candidate QA pairs significantly. Currently, the data set contains 60 intents such as “Bluetooth”, “Screen Unlock”, “Google Account”, etc.
The intent category classifier estimates the probability p ( I | t k ) where I represents intent. Our baseline approaches are GBDT (Gradient Boosting Decision Tree) and a linear SVM (Support Vector Machine). For feature extraction, at first the t k was tokenized, followed by stop-word removal and transformation into a bag-of-words representation. The features were term frequency-reverse document frequency (TF-IDF) weighted unigram and bigram features. We also implemented a bidirectional LSTM model(BiLSTM). In this model, each w i t k was represented by an embedding e i ∈ R d that we obtain from pre-trained distributed word representations E = [e 1 , …, e W ]. The BiLSTM output was passed to a fully-connected layer followed by a ReLU (Rectified Linear Unit) non-linearity and softmax normalization, s.t. p ( I | t k ) was computed as follows:
S M ( R e L U ( F C ( B i L S T M ( E ) ) ) ( t k )

4.3. Semantic Matching

We assume that the QA pair with the highest semantic similarity to the question expressed in turn t k (and previous turns) will be of the highest utility to the user. After question type and intent category classification we obtain an initial set of candidates, Q A i n i t , by retrieving all QA pairs from the knowledge base that are relevant to the question type and intent category. Following a common information retrieval approach, we then used a pairwise scoring function S ( q i , a i , C ) to sort Q A i n i t by utility, where ( q i , a i ) Q A i n i t .
TF-IDF
TF-IDF means term frequency-inverse document frequency. Our first baseline computes S with a TF-IDF weighted bag-of-words representation of q i , a i and t k to estimate the semantic relatedness by cosine similarity c o s ( v i , v k ) between the feature vectors of the QA pair, v i , and the user turn, v k .
WMD
The second baseline leverages the semantic information of distributed word representations [26]. To this end, we replace the tokens in q i , a i and t k with their respective embeddings and then compute the word mover distance [27] between the embeddings.
SMN
In addition to the baselines we use a sequential matching network (SMN) [20], which treats semantic matching as a classification task. The SMN first represents q i , a i and t k by their respective sequence of word embeddings E i and E k before encoding both separately with a recurrent network, a gated recurrent unit (GRU) [28] in this case. A word-word similarity matrix M w and a sequence-sequence similarity matrix M s is constructed from E i and E k , and important matching information is distilled into a matching vector v m via a convolutional layer followed by max-pooling. v m is further projected using a fully connected layer followed by a softmax.
MPCNN
In this section, we present innovative solutions that incorporate multi-info and context information of user questions into multi-perspective CNN(MPCNN) to fulfill question paraphrase identification. The architecture of model is shown in Figure 4. Our model has two same subnetworks that processing t k and q i a i in parallel after getting context by GRU.
Figure 4. Multi-perspective sentence similarity network with gated recurrent unit (GRU).
(1) Multi-info
To the data, t k is quite long but q i in our QA-KB is short and contains less information. Besides, the a i is quite long and contains some information that related to t k . In this work, we concat q i and a i of QA-KB then to compute S( q i a i , t k ). User queries are always concerned with a specific product but some related standard questions for different products may be the same in the QA-KB. For example in Figure 1 “moto g3” is a mobile name. For a same question, if the product of the question is different, it will influence the matching result. We replace these specific mobiles by the same word “Mobile” directly. In this paper, we use the product-KB and CRF (Conditional Random Field) algorithm to recognize the mobile from t k . The left part of Figure 5 indicates the structure of the product-KB. In product-KB, every mobile has its surface names which are mined from the chat log.
Figure 5. Structure of the product-knowledge base (KB) and question-answer (QA)-KB.
Product-KB hardly contains all mobiles and their surface names so we use CRF to recognize the mobile names from the input user question as a supplement. There are two level features used in CRF, char level ngrames and word level ngrams. The maximum char level ngram is 6 and word level ngram is 3. By using the multi-info of product-KB and answer information, the precision of semantic matching is improved.
(2) Context Multi-Perspective CNN
After getting the multi-info, the input of our neural model are t k and q i a i . Given a user query t k and a response candidate q i a i , the model looks up an embedding table and represents t k and q i a i as t k = [e u , 1 ,e u , 2 ,...,e u , L ] and q i a i = [e s , 1 ,e s , 2 ,...,e s , L ] respectively, where e u , j and e s , j ∈ R d are the embeddings of the j-th word of t k and q i a i respectively. L is the max length of two input sequences. Before feed into multi-perspective CNN, t k is transformed to hidden vectors conM Q u by GRU. Suppose that conM t k = [ h u , 1 , h u , 1 , , h u , L ] are the hidden vectors of t k , then h u , i is defined by
z i = σ ( W z e u , i + U z h u , i - 1 )
r i = σ ( W r e u , i + U r h u , i - 1 )
h ¯ u , i = t a n h ( W h e u , i + U h ( r i h u , i - 1 ) )
h u , i = z i h ¯ u , i + ( 1 - z i ) h u , i - 1 ,
where h u , 0 = 0, z i and r i are an update gate and a reset gate respectively, σ ( . ) is a sigmoid function, and W z , W r , W h , U z , U r , U h are parameters.
Because q i a i is not a sequential sentence the model only gets context information of t k and learns long-term dependencies by GRU. conM t k and q i a i are then processed by the same neural networks. The paper applies to both word level convolutional filters and embedding level convolutional filters. Word level filters operate over sliding windows while considering the full dimensionality of the word embeddings, like typical temporal convolutional filters. The embedding level filters focus on information at a finer granularity and operate over sliding windows of each dimension of the word embeddings. Embedding level filters can find and extract information from individual dimensions, while word level filters can discover broader patterns of contextual information. Both kinds of filters are allowed to extract more information for richer our model.
For every output vector of convolutional filter, the model converts it to a scalar by pooling layer. Pooling helps a convolutional model retain the most prominent and prevalent features, which is helpful for robustness our model. Max pooling is a widely used pooling layer, which applies max operation over the input vector and returns the maximum value. In addition to using max pooling, our model also uses min pooling and mean pooling.

4.4. Dialogue Manager

For the conversation, we adopted the method based on finite state machine (FSM) to manage it. We set up eight intermediate states for conversation besides “start” and “close” states, such as: “Init”, “SlotFull”, “SlotNotFull”, “SlotClarify”, “IntentVerify”, “DeliverAnswer”, “WaitUserInput” and “ErrorHandling” as shown in the Figure 6. We explain the meaning of each state separately as shown in Table 1. Besides, in order to clearly see the jump logic between states, we use arrow lines to indicate the next state to jump , as shown in Figure 6.
Figure 6. Finite state machine for task-oriented dialog.
Table 1. Definition of states.

5. Experiments and Discussion

In this section, we evaluate our approaches for question type classification, question intent category classification, semantic matching as well as end-to-end performance of MOLI. Next, we will introduce the dataset, QA-KB, product-KB and experiment results separately.

5.1. Data Set, QA-KB and Product-KB

The chat transcript data set mainly consists of first contact transcripts, in which the customer’s question or intent is explicitly stated. Each transcript includes the full text of the chat, speaker ids for each message, a product id, optionally question type and an intent category assigned by the customer service agent.
From this corpus, we extracted a dataset of 80,216 user turns, which are manually labeled with question type information by the CSAs. Table 2 shows the distribution over question types contained in the dataset. Out of the 30,593 how-to questions, 6808 have an intent category assigned, for which the distribution over the top 30 categories (out of 60) is shown in Figure 7.
Table 2. Question type statistics.
Figure 7. Distribution of intent categories (top 30) for user question.
The KB stores the QA pairs and its relevant products. Figure 5 indicates the structure of our KB. The left part is the parameters of mobile product and the right part is QA pairs. In the current version, KB includes 20 mobile products, 242 standard QA pairs. Our KB in total includes more than 150,000 triples.

5.2. Question Type Classification

In this section, we firstly split the data set into 80/20 training and test sets, respectively. In the paper, we use 300 dimensional GloVe word embeddings [29]. Hyper-parameter selection was done on the training set via five-fold cross validation and results averaged over multiple runs are reported on the test set and Table 3 shows the evaluation results. As we see, SNN outperforms the baseline models prominently. For example, in the sentence “I want to get support on the steps of factory model setting”, the structure “support on ... steps” contributes a lot to the classification. Any part of the structure (“support” or “steps”) may make mistakes.
Table 3. Question type classification results.

5.3. Intent Category Classification

In this section, we also firstly split the dataset into 80/20 train and test sets, respectively. Hyper-parameter selection is done on the training set via 5-fold cross validation and results averaged over multiple runs are reported on the test set. For baseline model BiLSTM we use 300 dimensional GloVe word embeddings [29]. Table 4 shows the evaluation results on the dataset. The baseline model SVM, even outperforming the BiLSTM model.
Table 4. Intent category classification results for user question.
From the specific every category results of SVM in Table 5 we find that some categories (e.g., “Google account and transfer from previous device”) achieve a disproportional lower performance. For example, “Google account” is often confused with "reset as a Google account" is generally a main topic when trying to reset a device (e.g., “Android smartphone”). It is also noteworthy that “subsidy unlock”, “bootloader unlock” and “screen lock” are frequently confused. This is best illustrated by the example “Hi i need pin for unlock red to my moto g”, which has the true category “Subsidy Unlock” but is categorized as “screen lock”. Without knowledge about the mobile phone and contracts domain it is very difficult to understand that the customer is referring to a “pin” (subsidy unlock code) for “red” (mobile service provider) and not the actual PIN code for unlocking the phone. This example also symbolizes a common problem in smart customer support, where users unfamiliar with the domain are not able to describe their information need in the domain-specific terminology.
Table 5. Intent category classification results for user question, top 10 categories.

5.4. Semantic Matching

For all models except TF-IDF, we use 300 dimensional GloVe word embeddings [29]. To obtain negative samples, for each t k , we randomly selected five standard queries with the same intent and five standard queries with different intents. To alleviate the impact of unbalanced training data, we oversampled positive samples. As the standard questions q i of most QA pairs ( q i , a i ) are usually less than 10 tokens, we also evaluate the impact on model performance when adding the answer a i as additional context (up to 500 characters) to q i .
Table 6 shows the P@1 of each model on our data. We see that the MPCNN and MPCNN_GRU (MPCNN with a Gated Recurrent Unit) outperform the unsupervised baseline approaches, with a 43 % error reduction achieved with the MPCNN_GRU model. Intuitively it makes sense to provide the models with additional context that can be used to learn a better representation of semantic similarity. The SMN’s P@1 are much lower than MPCNN models, even only slightly higher than these unsupervised models.
Table 6. Semantic matching results for user question.

5.5. The Importance of Intent Classification for Semantic Matching

Question intent classification is an important step to narrow down candidate answers. In this section, we compare with baseline models to highlight the effectiveness of intent classification. The baseline models used the same network as MPCNN and MPCNN_GRU, without intent classification so the models were matching with all QA pairs directly. Table 7 shows that the precision of semantic matching with intent outperformed baseline models.
Table 7. Semantic matching results on baseline for user question.

5.6. End-to-End Performance of MOLI

In this section, we display the end-to-end performance of MOLI and compare MOLI with baseline system to highlight the effectiveness of related component. In detail, we list the baseline performance, and then we list the performance with question type classification, intent category recognition, semantic matching improved respectively in Section 5.1Section 5.3. The detailed methods are SNN, SVM, MPCNN_GRU respectively, so we name the baseline models baseline_ Q T S N N , baseline_ I C S V M , baseline_ S M M P C N N _ G R U . At last, we show the performance with all the above components improved together. Besides, in order to prove the effectiveness of semantic matching, we designed the MOLI-SM model, which removed the semantic matching component based on the MOLI. Table 8 shows the P@1 and feedback score of each system. The feedback score is calculated by user’s action. At the end of a session in the system, there is a feedback mechanism where you can grade the recommend answer. There were five level scores that the user could choose. If the score was four or five then we think the answer was useful for the user. In the table, the results show that our improvements were useful.
Table 8. End-to-end performance.

6. Conclusions

In this paper, we describe our smart customer system MOLI in detail with many innovative NLP techniques. We presented a first approach for conversational question answering in the complex and little-explored domain of technical customer support. Our approach matches a user’s question with the most relevant answer from a knowledge base. It does so in a conversational manner, by asking for, and clarifying required information if necessary. Our approach incorporates several separate models to determine an answer. Most notably, it performs question type and intent classification for a dataset with 60 intent categories, slot filling, and semantic answer matching. We observe that while supervised models, both neural and standard ones such as decision trees and SVMs, perform reasonably well on the individual tasks, there is still room for improvement. As many previous authors have shown in other domains, such models can benefit from joint training and end-to-end task modeling.
Our experiments were conducted with a dataset of noisy, real-world chat transcripts, which we plan to make available to the community in the near future. Future research directions include end-to-end, joint modeling of the question type and intent classification, slot filling and semantic matching subtasks, as well as updating the dialogue manager to account for nested, non-linear conversations, and maintaining multiple dialog hypotheses.

Author Contributions

G.Z., J.Z. and Y.L. write the manuscript and help to revise it; C.A., R.S., L.H., S.S. (Stefan Schaffer) and S.S. (Sven Schmeier) help to revise the manuscript; C.H. and F.X. Offer funding for ths paper. Thanks for all authors’ contribution.

Funding

The funding is from Lenovo Research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, Y.; Miao, Q.; Geng, J. Question Answering for Technical Customer Support. In 7th CCF International Conference; Springer: Cham, Switzerland, 2018; pp. 3–15. [Google Scholar]
  2. Wen, T.H.; Vandyke, D.; Mrksic, N. A Network-based End-to-End Trainable Task-oriented Dialogue System. arXiv, 2008; arXiv:1604.04562. [Google Scholar]
  3. Kahaduwa, H.; Pathirana, D.; Arachchi, P.L. Question Answering system for the travel domain. In Proceedings of the Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 29–31 May 2017; pp. 449–454. [Google Scholar]
  4. Lowe, R.T.; Pow, N.; Serban, I.V. Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue Discourse 2017, 8, 31–65. [Google Scholar]
  5. Feng, M.; Xiang, B.; Glass, M.R. Applying deep learning to answer selection: A study and an open task. In Proceedings of the 3rd International Proceedings on Automatic Speech Recognition and Understanding (ASRU); IEEE: Piscataway NJ, USA, 2015; pp. 813–820. [Google Scholar]
  6. Wan, S.; Dras, M.; Dale, R.; Paris, C. Using Dependency-based Features to Take the Para-farce out of Paraphrase. In Proceedings of the Australasian Language Technology Workshop, Sydney, Australia, 30 November–1 December 2006; pp. 131–138. [Google Scholar]
  7. Madnani, N.; Tetreault, J.; Chodorow, M. Re-examining machine translation metrics for paraphrase identification. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, QC, Canada, 3–8 June 2012; pp. 182–190. [Google Scholar]
  8. Fernando, S.; Stevenson, M. A Semantic Similarity Approach to Paraphrase Detection. Available online: https://pdfs.semanticscholar.org/d020/eb83f03a9f9c97e728355c4a9010fa65d8ef.pdf (accessed on 14 February 2019).
  9. Das, D.; Smith, N.A. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2–7 August 2009; pp. 468–476. [Google Scholar]
  10. Guo, W.W.; Diab, M. Modeling sentences in the latent space. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, 8–14 July 2012; pp. 864–872. [Google Scholar]
  11. Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. [Google Scholar]
  12. Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv, 2014; arXiv:1404.2188. [Google Scholar]
  13. Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods for Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
  14. Hu, B.; Lu, Z.; Li, H.; Chen, Q. Convolutional neural network architectures for matching natural language sentences. arXiv, 2015; arXiv:1503.03244. [Google Scholar]
  15. Weston, J.; Bengio, S.; Usunier, N. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; pp. 2764–2770. [Google Scholar]
  16. Huang, P.S.; He, X.; Gao, J.; Deng, L.; Acero, A.; Heck, L. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2333–2338. [Google Scholar]
  17. Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
  18. Tai, K.S.; Socher, R.; Manning, C.D. Improved semantic representations from tree-structured long short-term memory networks. arXiv, 2015; arXiv:1503.00075. [Google Scholar]
  19. He, H.; Gimpel, K.; Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 20th International Proceedings on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1576–1586. [Google Scholar]
  20. Wu, Y.; Wu, W.; Xing, C. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In 55th Annual Meeting of the Association for Computational Linguistics; ACL: Stroudsburg, PA, USA, 2017; pp. 496–505. [Google Scholar]
  21. Rajpurkar, P.; Zhang, J.; Lopyrev, K. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In 21st International Proceedings on Empirical Methods in Natural Language Processing; ACL: Stroudsburg, PA, USA, 2016; pp. 2383–2392. [Google Scholar]
  22. Yang, Y.; Yih, S.W.; Meek, C. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In 20th International Proceedings on Empirical Methods in Natural Language Processing; ACL: Stroudsburg, PA, USA, 2017; pp. 2013–2018. [Google Scholar]
  23. Manning, C.; Surdeanu, M.; Bauer, J. The Stanford CoreNLP natural language processing toolkit. In 52th Annual Meeting of the Association for Computational Linguistics; ACL: Stroudsburg, PA, USA, 2014; pp. 55–60. [Google Scholar]
  24. Zhang, R.; Lee, H.; Radev, D. Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv, 2016; arXiv:1611.02361. [Google Scholar]
  25. Zhou, C.; Sun, C.; Liu, Z. A C-LSTM Neural Network for Text Classification. Comput. Sci. 2015, 1, 39–44. [Google Scholar]
  26. Mikolov, T.; Sutskever, I.; Chen, K. Distributed representations of words and phrases and their compositionality. In 9th International Proceedings on Advances in Neural Information Processing System; MIT Press: Cambridge, MA, USA, 2014; pp. 3111–3119. [Google Scholar]
  27. Kusner, M.; Sun, Y.; Kolkin, N. From word embeddings to document distances. In 32nd International Proceedings on International Conference on Machine Learning; ACM: New York, NY, USA, 2015; pp. 957–966. [Google Scholar]
  28. Chung, J.; Gulcehre, C.; Cho, K.H. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv, 2014; arXiv:1412.3555. [Google Scholar]
  29. Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In 19th International Proceedings on Empirical Methods in Natural Language Processing; ACL: Stroudsburg, PA, USA, 2014; pp. 1532–1543. [Google Scholar]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.