CLASSY: A Conversational Aware Suggestion System †

: Over the last few years, pervasive systems have seen some interesting development. Nevertheless, human–human interaction can also take advantage of those systems by using their ability to perceive the surrounding environment. In this work, we have developed a pervasive system – named CLASSY – that is aware of the conversational context and suggests documents potentially useful to the users based on an Information Retrieval system, and proposed a new scoring approach that uses semantics and distance based on proximity data in order to classify the relationship between tokens. architecture, that is essentially divided into three blocks – context processing , responsible to ﬁnd the most relevant tokens and manage the context; IR system , which has previously indexed the documents that are considered relevant to the users or conversational topics; and suggestions ﬁlter and cache , are responsible for analyzing the document list retrieved by the IR system, applying some ﬁlters to it, and cacheing the suggestions.


Introduction
Over the last few years, discussions regarding pervasive computing have become more common, and it has seen some great development [1][2][3][4], due to the overall innovations and possibilities inherent to this concept and mainly to the fact that our surrounding environment is becoming even more monitored every day [5][6][7].
Unlike traditional computing systems, a pervasive system requires a feed of constant data about its surrounding environment, acquired by sensors of the most various types, ranging from hardware sensors, like temperature and location, to software, such as agents that parse tweets or messages on a chat room. These sensors capture and provide data in a non-invasive way, enabling the "ability to adapt to changing circumstances and respond based on the context of use" [8].
This sense of context can be used in many fields, such as a conversational one, which will be addressed in this paper. Here, context is somewhat related to the topic on which the whole conversation is unfolding. By perceiving this context, it allows the proposed application to enhance the communications through useful suggestions, as the conversational system takes some actions towards helping users regarding that specific context. This is useful in many situations, such as developer forums or technical support services, since it introduces an intermediate layer between the sources (the user and the developer, respectively) and the target (the technician and the issues/code, respectively) that enhances and facilitates their relationships by using the suggestions to the contexts the source is perceiving.
Along with the direct interaction with the users, the problem of conversational intrusion comes up. A system that faces constant human interaction should have conversational intelligence, enabling it "to actively participate in the conversation and to demonstrate awareness of the topic discussed, They intended to "improve the current state of the art related to the interaction between nutritional coaching software systems and their users" by introducing a syntactic and semantic analysis of sentences instead of keyword spotting. To attain that, they analyzed and decomposed the sentences to find a common structure between the users' messages that are based on some general entities like actions, quantifiers, ingredients, etc. These entities and their dependencies allowed them to use a rule-based approach to interpret the meaning of the sentence, so that, by looking at the information associated with each valuable syntagma, through a knowledge database, the nutritional information associated with the sentence could be computed. Even though this approach works well in a specific domain, such as the demonstrated nutritional context, where the authors achieved a 96% accuracy compared to the 44% of the keyword extraction, in a general context it is impractical, since it would require the definition of a set of semantic rules to interpret the meaning of the input and that are able to interpret every imaginable situation, which is infeasible.

Document Recommendation Problem and the CLASSY Solution
The specific problem addressed by this paper is the suggestion of information in a conversational environment. The suggested documents can be of the most various formats, from Portable Document Format (PDF) to simple text files; however, they must be text-based, since the techniques that will be described in the next sections were developed to handle text. The conversational environment can also take many forms, ranging from tweets, usually with a limited number of characters, to chat rooms, where messages can reach thousands of characters. The proposed CLASSY system follows the rules of a pervasive system, by being based on data that is collected without human interaction and by being robust and useful so that it only makes suggestions when it has a certain degree of certainty regarding its recommendation, therefore, preventing the user from constantly being interrupted by system interactions. Its overall architecture, illustrated in Figure 1, is essentially divided into three blocks -context processing, IR system, and suggestions filter and cache.

Information
Retrieval System Indexed documents

Users messages
Suggestions List Figure 1. CLASSY architecture, that is essentially divided into three blockscontext processing, responsible to find the most relevant tokens and manage the context; IR system, which has previously indexed the documents that are considered relevant to the users or conversational topics; and suggestions filter and cache, are responsible for analyzing the document list retrieved by the IR system, applying some filters to it, and cacheing the suggestions.

Context Processing
The context processing block is where user messages are processed in order to find the most relevant tokens -the keywords. This block is composed of the following steps: • Relevant token extraction, which uses an analyzer defined in the IR engine to apply a set of processing techniques to the considered text. This analyzer is used both in the indexing phase (the documents list) and in the query phase, to force the query tokens to follow the same rules and have a similar syntax to the documents indexed tokens. The analyzer pipeline is the following: 1. UAX29 URL Email Tokenizer, which splits the text field into tokens, using white spaces and punctuation as delimiters. Delimiter characters are discarded, with the exception of dots that are not followed by white space, words with hyphens that contain numbers, internet domain names containing top-level domains validated against the white list in the IANA Root Zone Database, URLs and email, IPv4 and IPv6 addresses; 2.
Length filter, that removes words whose length is smaller than 2 and exceeds 12. These values are a fair compromise considering the average word lengths of the various existing languages (http://www.ravi.io/language-word-lengths) to filter small irrelevant words and also the bigger ones, which are atypical; 3.
Lower case filter, that transforms any uppercase letters tokens to the equivalent lowercase one; 4.
Stop word filter, that discards or stops the analysis of tokens that are on a given provided stop words list; 5.
Snowball Porter Stemmer, which is a language-specific stemmer generated by Snowball, a software package that runs pattern-based word stemmers; 6.
Length filter, which removes words whose length is smaller than 2 and exceeds 10. This filter follows the same logic as the previous one but now considering smaller words resulting from the stemmer process.
For every token set of each analyzed query, its Inverse Document Score (IDF) is computed, discarding tokens whose value sited under a certain threshold (a system parameter that, for now, should be defined manually by finding the value that attains a good compromise in terms of filtering the most irrelevant tokens), essentially those that do not show sufficient informative interest; • Context update, a stage where the context is updated with the new tokens resulting from the previous step. This context has a specific size, and it is defined by the relevant tokens of the last n queries; • Implicit query formation, where the query is built with the disjunction of the context entries, boosted with its recency (the position in the context). This implicit query is then sent to the IR system in order to get the result list that will contain documents likely relevant to the user. An example of this stage is shown in Figure 2, where Q * are the queries, RTQ * are the relevant tokens from each query, and IQ is the resulting implicit query: 2. Length filter, that removes words whose length is smaller than 2 and exceeds 12. These 6. Length filter, that removes words whose length is smaller than 2 and exceeds 10. This filter 126 follows the same logic as the previous one but now considering smaller words resulting from 127 the stemmer process.

128
For every token set of each analyzed query, its Inverse Document Score (IDF) is computed, 129 discarding tokens whose value sited under a certain threshold (a system parameter that, for  Additionally, the use of a large set of topics and context in the conversational scenario will result in an increase of noise perceived by the system. For that, it is necessary to create a negative class document that would contain a rather large historical registry of exchanged messages. Implicit queries, including the mentioned negative class in its result list, are seen as having more noise associated with its context, thus not adding value as a suggestion source. This negative class is put into practice in the form of a noise file system. This system has a maximum size, which defines the number of noise files that can be created and used as the negative class. As mentioned, these files containing the historical registry of exchanged messages are populated via a round-robin approach so that after the last possibly created file exceeds a specific size, the first one is emptied and filled with the most recent messages.
Moreover, as a user, it is not very pleasant to be constantly bombarded with suggestions after every single message, especially if some of them may not be as useful as one would like. In that regard, the suggestion system should only suggest new documents with a distance of n messages. This behavior not only reduces the invasive presence of the suggestion system but also refines its suggestions, since every suggestion is able to take into account more contextual information provided by the message window.
However, sometimes, instead of talking about a specific topic, people explicitly refer to some related keys. In this field, a topic can take a lot of forms, by being the title of a document, the author of a tweet, the key of an issue, and so on. This type of interactions are not context but specific topic matches, thus it is useful to perform a direct search for them. With that in mind, a simple query is built by using all tokens in the received message and trying to find a direct match between the query and the document, based on the field that we expect to find (the title, author, key, etc.) If a hit occurs, then the result of this stage is favored over the context results.

IR System
The IR system will previously index the documents that are considered relevant to the users or conversational topics. Assuming that, its job is to receive the queries built from the Context Processing block and then set up a list of relevant documents according to those queries.

Suggestions Filter and Cache
The suggestions filter and cache blocks are responsible for analyzing the document list retrieved by the IR system, on which some filters are applied and ultimately cache the suggestions.
The first one, and the most logical, is filtering the list of documents by its own score, since only documents with a score above a certain threshold are considered to have enough value to be used as a suggestion to the user. This threshold, as the IDF one, is a system parameter that, for now, should be defined manually by finding the value that attains a good compromise in terms of the usefulness of the results.
To increase the value of the suggestions and minimize the conversational intrusion of the system, a suggestion cache must be created, whose goal is to avoid making the same suggestion in consecutive or almost consecutive time-stamped messages, since it does not add any value to the user. This cache also works by establishing a maximum time constraint in which a document is considered to be valuable again; thus, it can be discarded from the suggestion cache when it steps behind that limit. This maximum time constraint is defined manually since it is a subjective parameter that is domain-dependent.

Neighbourhood Approach
To include a more pervasive behavior, a different approach -named neighbourhood approachwas taken, which tries to use semantic similarity related to each word as a mean to get similar contexts.
Semantic distance/similarity is a property of lexical units, typically between words, but this notion can be generalized to larger units such as phrases, sentences, etc. Two words are considered semantically close if there is a lexical-semantic relation between them. There are two types of lexical relations: classical relations (such as synonyms, antonyms, and hypernyms) and ad-hoc non-classical relations (such as cause-and-effect). If the closeness in meaning is due to a certain classical relation, then the terms are said to be semantically similar. On the other hand, semantic relatedness is the term used to describe the more general form of semantical closeness caused by any semantic relation.
For instance the nouns "liquid" and "water" are both semantically similar and related, whereas the nouns "water" and "boat" are semantically related but not similar.
There are a lot of types of semantic measures, but we are interested only in ones -corpus-basedthat are based only on co-occurrence statistics from large corpora. They rely on the hypothesis that words with similar contexts tend to be semantically close [17]. The set of contexts of each target word u is represented by its distributional profile, the set of words that tend to co-occur with u within a certain distance, along with numeric scores signifying this co-occurrence tendency with u. This profile, introduced by M. Antunes, D. Gomes, and R. Aguiar [18] is defined as: where u is the target word and o(u, w) the number of times that w occurs near to u. The building process of those vectors is described in Figure 3, where D * are the contents of the documents, RTD * the relevant tokens from each document, and the neighborhood representation is given by the vector with each neighbor and its magnitude. There are a lot of types of semantic measures, but we are interested only in one -corpus-based - 195 that are based only on co-occurrence statistics from large corpora. They rely on the hypothesis that 196 words with similar contexts tend to be semantically close [17]. The set of contexts of each target word 197 u is represented by its distributional profile, the set of words that tend to co-occur with u within a 198 certain distance, along with numeric scores signifying this co-occurrence tendency with u. This profile, 199 introduced by M. Antunes, D. Gomes and R. Aguiar [18] is defined as: where u is the target word, and o(u, w) the number of times that w occurs near to u.

201
The building process of those vectors is described in Figure 3, where D * are the contents of the   Having built the neighborhood of all relevant tokens of a document, it is possible to obtain a 206 neighborhood score between any two relevant tokens, by computing the inner product, and not only the 207 cosine similarity, since we are interested in the magnitude of the similarity, of its neighborhood vectors: This score is a good indicator of how related two tokens are and inherently how can they refer to 209 the same context.

210
The process of building neighborhoods and computing its scores was implemented inside the IR Having built the neighborhood of all relevant tokens of a document, it is possible to obtain a neighborhood score between any two relevant tokens, by computing the inner product, and not only the cosine similarity -since we are interested in the magnitude of the similarity -of its neighborhood vectors: This score is a good indicator of how related two tokens are, and inherently how can they refer to the same context.
The process of building neighborhoods and computing their scores was implemented inside the IR engine, resorting to all the potential and efficiency of the Lucene's internal structures that enabled the handling of individual token analysis. This allowed us to use the neighborhood score, based on the implicit query, as the value used to re-rank the top n documents retrieved by the Lucene engine to improve the contextual analysis of each document. The steps towards computing the neighborhood score, illustrated in Figure 4, are: 1. Neighbourhood pair computing -the neighborhood vector of each document token will be compared with the neighborhood vector of each query token. The score associated with each individual document token results of averaging all comparisons; 2. Intermediate sub-query value computation -the score associated between each document and each sub-query will be the average of the neighborhood scores computed as above described; 3. Recency boost -each sub-query has a boost that is associated with its recency. The final score associated with each document and each sub-query will come from the multiplication between the recency value and the intermediate value above presented; 4. Final score -The final document score will be the sum of all neighborhood values computed between each document and each sub-query.

Evaluation
Since the main goal of this work was to build a recommendation system running in a pervasive way, a mature and robust framework was used that serves as the cornerstone of the IR block presented in the last section -Solr (https://lucene.apache.org/solr/). Solr is an open-source search platform from the Apache Lucene project, which features full-text search and real-time indexing and provides distributed search and index replication, whose design was made with scalability and fault tolerance in mind. Furthermore, Solr allows developers to add plugins that can use the indexed data and its structures to modify and enhance the retrieved results.
In order to create a comparative benchmark to evaluate the proposed CLASSY system, we implemented a base approach, that uses the blocks described in the Section 3 without making any enhancements to the IR block nor its score method, whose goal is to add value to the system by developing a valuable pre-processing step of the messages and post-processing of the result list.
Then, we implemented the full CLASSY system, by using the neighborhood approach (described in Section 3.4) as part of our system in order to fully potentiate its pervasive behavior.
To assess the performance of every presented strategy in Section 4, a series of tests were conducted on a dataset containing approximately 2330 messages, originally from an internal chat used by a team of developers of a specific company working with contact center services. The indexed documents were the content of JIRA (https://www.atlassian.com/software/jira) issues created by the team, with the goal of, given the conversational context, suggesting the issue they might be talking about. The objective was to suggest a single issue, rather than a list; therefore, in all conducted tests, the suggestion that was made always respects the top document on the results list.

Results
As already mentioned in the above section, besides taking different approaches, other small improvements and features were added.
Regarding the base approach, the first thing to analyze is the minimum document score and its impact on the results. This study is shown in Table 1. As one might observe, there is a big difference between the total number of performed suggestions and those which are considered useful. This contradicts what was mentioned about the amount of noise resulting from the messages. Taking into account the proposed solution to overcome this issue, the noise files system was then implemented. The next step was to study the impact of the number of noise files in the final result. These results are shown in Table 2. On those tests, the minimum score was maintained at 20, since it was the one showing the best results (and the most logical ones). As it is possible to see, the use of the noise files system not only reduced the total suggestions but also increased the proportion of useful ones. Here, the setup with three noise files also showed to be the best compromise in terms of accuracy. Table 2. Base approach + Noise files system -number of noise files impact. Using the above approach as the baseline, it is interesting to tune the system in order to make it more pervasive. With that in mind, we investigated the impact of the number of messages between consecutive suggestions in the results, as showed in Table 3. As it is possible to observe, these results do not show a systematic evolution, since they are very dependent on the position of the useful messages along with the conversation. However, for the analyzed dataset, one and five messages between suggestions showed to be the best compromise in terms of accuracy, with the first one being more intrusive, as it is logical. Finally, the neighborhood approach, which is slightly different from the base one when it comes to scoring the documents, since the latter uses the Lucene score, which is a variant of the TF-IDF scoring method, while the first computes the score based on the neighborhood vector comparison, as shown in Section 3.4.

Number of Noise Files Minimum Document Score Total Suggestions Useful Suggestions Accuracy
Regarding this approach, the first thing to investigate is the neighborhood window, which defines the maximum distance between two tokens that makes them neighbors. The results in Table 4 showed that using a window size bigger than one will add a considerable amount of noise, thus resulting in much more invasive and useless suggestions. Having set the neighborhood window size, it is time to analyze again the impact of the minimum document score in this approach. The results obtained with this method, presented in Table 5, are very interesting. As we increase the minimum document score, the accuracy also increases, with a notorious value of 80.65% of accuracy for a score of 23. Although these results seem optimistic, the reason for this sudden increase in accuracy is the decrease of contextual suggestions in relation to the direct matches performed by the system, which essentially is a behavior to avoid. In order to mitigate the mentioned behavior but still use its inherent advantages, a new hybrid approach has been proposed that merges the scoring method of both the base and neighborhood approaches, by computing both their values and multiplying them, to make the general scoring method more robust. The results in Table 6 showed that the hybrid approach performance is a good compromise between being pervasive and making useful suggestions (contextual and direct). However, it was observed that when the minimum document score exceeds 160 the behavior the above approach showed starts to rise again, and the contextual suggestions start to decrease.

Conclusions and Future Work
Users demand systems that are pervasive and not invasive. In this paper, we proposed a systemnamed CLASSY -that follows those rules so that it is able to be aware of the context where the users are integrated and make suggestions that prove to be useful to them.
Along with the development of CLASSY, we faced several problems related to the acquisition of context as well as dealing with all uncertainty and noise associated with human speech. Those problems led to the introduction of specific features, such as the noise files system, that made the system more robust and useful.
The use of a mature information retrieval engine and all its features potentiated the development and results of a system like that, and the alliance of the IR engine used with the new proposed neighborhood score provided a way to enhance the behavior and the results of this system.
As future work, we plan on adding intelligence to this system, by allowing it to learn from user feedback. For that, we should associate each document-query pair with a specific weight which will reward good suggestions and punish the opposite. The variations in this weight will result from users' implicit feedback, and they will make the system smarter as they increasingly interact with it. To accomplish this, the system will have to be executed in a live conversational environment, which can make it possible to ignore the logical disadvantages of using offline datasets.