A Conversation History-Based Q&A Cache Mechanism for Multi-Layered Chatbot Services

: Chatbot technologies have made our lives easier. To create a chatbot with high intelligence, a signiﬁcant amount of knowledge processing is required. However, this can slow down the reaction time; hence, a mechanism to enable a quick response is needed. This paper proposes a cache mechanism to improve the response time of the chatbot service; while the cache in CPU utilizes the locality of references within binary code executions, our cache mechanism for chatbots uses the frequency and relevance information which potentially exists within the set of Q&A pairs. The proposed idea is to enable the broker in a multi-layered structure to analyze and store the keyword-wise relevance of the set of Q&A pairs from chatbots. In addition, the cache mechanism accumulates the frequency of the input questions by monitoring the conversation history. When a cache miss occurs, the broker selects a chatbot according to the frequency and relevance, and then delivers the query to the selected chatbot to obtain a response for answer. This mechanism showed a signiﬁcant increase in the cache hit ratio as well as an improvement in the average response time.


Introduction
Due to development in Internet technologies, online learning has grown from Internetpropelled distance education to online higher education enrollments. Online education has such benefits as saving money and time, choosing an area of interest and a course of study individually [1]. One of the ways of learning online is using video-based online courses (MOOC, Coursera, edX, and others) [2][3][4]. The demand for online lectures is increasing, and accordingly, the combination of MOOC and chatbot is becoming more useful [5]. It helps overcome one of the limitations and weaknesses of MOOC, which is taking a long time to receive feedback [6]. One of the examples of the combination of MOOC and chatbot systems is a video tutoring assistant for online video tutoring [7]. The video tutoring assistant for online video tutoring is a chatbot service that plays the role of an assistant and provides real-time feedback to user questions. Thus, the main goal of this chatbot is to answer specific user questions to shorten the time it takes to receive feedback from a lecturer.
In this paper, a multi-layered multi-chatbot system architecture is proposed for the video tutoring assistant to improve the coverage of Q&A that the system deals with. It uses multiple chatbots for each video instead of using a single chatbot. This strategy, however, may decrease the speed of the system, and a cache mechanism is necessarily required consequently to overcome such decline of the speed. The concept of layers comes from edge computing in cloud technologies. This technology helps to ease the system and its process. The cache mechanism is one of the ways to use edge computing.
The cache in CPU utilizes the locality of references within binary code executions [8]; web cache policy is created based on the location of the documents [9]. The cache mechanism proposed in this paper uses the frequency and relevance information which potentially exists within the set of Q&A pairs. The remainder of this paper is organized as follows. Section 2 introduces the problem definition and previous approaches to solve this problem. In Section 3, we introduce the architecture and algorithm of the caching mechanism proposed in this paper. In Section 4, we discuss experiments and the experimental results. Section 5 concludes the paper with a summary of our main findings and contributions.

Related Work
A recent study on chatbot technologies shows the usage of different algorithms and models. M.Boyanov, I. Koychev introduced the chatbot that uses the seq2seq algorithm to give an answer to the users of Web forums [10]. The authors used web forum data and preprocessed them before training the model. It showed good results on answering correctly to questions asked in the forum and conversationally styled questions. The main solution of this paper is the selection of sentences from a Q&A pair-by-word product over averaged word embedding representation. In the case of [11], the chatbot system was created with DialogFlow and BERT algorithm. This chatbot was created for the Georgia Southern University homepage so the students can ask questions related to the university and its staff.
Nowadays, the usage of chatbot technologies and online learning is growing. As was mentioned before, the main goal of the video tutoring assistant is to shorten the time of getting feedback from the lecturer and increasing student's satisfaction and attendance until the end of the course. A. Espasa and J. Meneses presented work on feedback in an online learning environment [12]. The authors show an association between feedback and student performance. This brings us to the relevance of feedback in self-regulatory competencies within online learning and learning progress. This is the reason the authors of [7] considered combining online video learning with chatbot technologies. The other combination of the online video learning platform and chatbots is the usage of chatbot XiaoMu in XuetangX [13,14]. J. Tang presents a chatbot system for XuetangX with three main functions: course recommendation, question answering, and automated video navigation. This chatbot uses a knowledge graph to create a recommendation system and graph for questions.
We suggest using the multi-layered structure in the video tutoring assistant [7] because it uses multiple chatbots for one video and must connect with chatbots every time the user asks the question. This proposal came up from the idea of edge computing in cloud technologies. Edge computing plays a big role in reducing networking pressure and service response time, improving the performance of low-power devices [15]. The authors of [16] introduce the response time as one of the attributes that play a role in users' satisfaction. Bangla chatbot uses a knowledge-based tree structure to reduce steps of finding an answer in the whole conversation [17]. Our approach reduces steps of getting an answer from the chatbot with the concept of edge computing.

Problem Definition and Our Approach
Generally, the cache mechanism of the CPU is optimized for binary code execution based on the fact that the implementation of binary code has a high reference to the locality of the code [18]. Web cache uses the locality of the documents and replaces the least recently used data [19]. In other words, the code has an address, and that address makes the code implementation local; thus, it is assumed that caching nearby code will increase the hit ratio [20,21]. The hit ratio shows how successfully the cache retrieves data and content, and the miss ratio is the number of times the system requested data and content that is not cached. The purpose of the cache is to increase the hit ratio and decrease the miss ratio (1).
To achieve the highest performance, cache misses should occur as seldom as possible. Considering that the cache memory is much smaller than the RAM, it is not easy to achieve a high cache hit ratio. This is the reason the main task of the cache is to load the cache memory with the necessary data and promptly remove data that are no longer required. The simplest algorithm for loading data into the cache is based on the assumption that data from the main memory are processed sequentially, in ascending order of the addresses.
The video tutoring assistant consists of three main parts in different layers. The first layer of the system is the broker. This layer handles the cache mechanism and chatbot management. The next layer is the chatbot layer, which connects multiple chatbots that are registered in the system. Various chatbots may be prepared by different developers with differentiated question and answer (Q&A) sets for the video. The broker connects with chatbots and plays the role of a bridge between the user and chatbots.
This broker can manage the conversation history of the users and provides a personalized cache memory. In the case of LMS (learning management system), one subject has a fixed number of students, and the information of these students is advantageous for the personalization of brokers. The number of brokers equaled to the number of students of the subject makes the management of conversation history easier and convenient. The usage of the broker for each student helps in the personalization of conversation history and the cache memory. These personalized brokers can share frequency information within the whole conversation history of the class. However, it is one of the ideas to show the extensibility of broker-based architecture.
The last layer is a deep learning-based chatbot. Besides, the user terminal is a webpage for the video tutoring assistant that allows the user to have a conversation with the system. It allows users to ask the broker possible questions that may arise while watching the video. Figure 1 shows our multi-layered, multi-chatbot system architecture for a video tutoring assistant. In this paper, we propose a caching mechanism for the broker in a multi-chatbot video tutoring system using the frequency and relevance based on the conversation history and keywords. The purpose of the caching mechanism proposed for this system is to enhance its responsiveness by caching relevant, most likely to be asked Q&A sets. The cache mechanism analyzes the relevance between Q&A pairs. Next, the frequency information of these questions is accumulated continuously as it arrives. In other words, the Q&A sets relevant, with high frequency to the question asked at this moment, are cached.
The mechanism in (2) finds the set of Q&A pairs with the maximum hit ratio of the cache. In this study, frequency and relevance between Q&A sets helps to achieve the maximum hit ratio.k = argmax k (HitRatio(S k )) (2) where S k denotes a set of Q&A pairs that could be cached andk is the id of the set with the maximum hit ratio. The cache mechanism in (3) chooses the most relevant Q&A pairs from the unit to maximize the hit ratio: where i is the index of the current Q, j is the index of relevant Q&A sets, RQ is the relevant Q&A pair, and RelQs is the list of relevant Q&A pairs. Then, the cache mechanism in (4) selects the most frequently asked Q&A pairs based on the unit where i is the index of the currently asked question, j is the number of relevant questions, FrQ is the frequency of the chosen relevant question, and FreqQs is the list of frequencies of the relevant Q&A pairs.

Cache Mechanism
The cache mechanism aims to bring Q&A sets often asked of chatbots to the cache layer. This approach is intended to improve the response time of the broker and ease the system processes.

Cache Mechanism Architecture
The cache mechanism has three main parts. The first part is the reply module, which is responsible for replying to the query of the user if it is already cached. The second part is the keyword extraction module [22], which extracts keywords for all Q&A sets of the chatbots and sends this information to the third module. The third module, which manages frequency and relevance, creates a relevance graph of the Q&A sets based on the extracted keywords. In addition, this module is responsible for creating a frequency graph of the Q&A sets ( Figure 2).

Reply Module
The reply module of the mechanism receives user queries and returns matched answers to the user. It uses cache memory to decrease response time. The module uses the AIML engine [23] to match the question from the user with the cached Q&A sets. The AIML engine is an old one, but it is light, which is a good characteristic for using as a cache-matching engine. In this system, the cache is a micro-chatbot, and it must have the minimum functionality of the chatbot. That is why it is enough to use AIML in the cache. AIML provides tags and has specific rules to match questions. However, increasing matching flexibility by generalizing questions using wildcards in AIML should be a separate study. If the query is cached, the module replies to the user. If the query is not in the cache, the module sends the question to a predicted chatbot to obtain an answer to return to the user.

Keyword Extraction Module
The keyword extraction module uses the TextRank [24] algorithm to extract the top keywords for all Q&A pairs to create a relevance graph of the Q&A sets. This graph indicates the relevance among Q&A pairs by ranking them based on the number of keywords that they share. The nodes of the graph are Q&A pairs, and the edges are the relevance scores of the Q&A based on the number of shared keywords. Table 1 shows the top keyword examples from the keyword extraction by the TextRank algorithm from Q&A sets about computers and AI. One of the limitations of using the relevance graph of the Q&A sets is a requirement of whole Q&A sets of all chatbots, and this requirement makes chatbot developers send Q&A datasets to the system. The request of whole Q&A sets of chatbots might force chatbot developers to open their knowledge, and they would not want to give it for free. One of the possible ways to overcome this limitation is to ask for a local relevance graph from each chatbot and merge such pieces into a whole relevance graph. This local relevance graph does not contain Q&A texts but consists of only Q&A id and the chatbot id with the name of the chatbot.

Frequency and Relevance Management Module
As mentioned above, the keywords extracted by the keyword extraction module are used to create the Q&A graph based on the mutual relevancies of all Q&A pairs. The usage of keywords helps to calculate the relevance of the Q&A pairs. In this study, Q&A pairs with the same keywords are considered relevant to each other. Figure 3 shows an example of the Q&A relevance. The meaning of this graph is as follows: The keyword of Q 1 A 1 C 1 is "computer," and this Q&A pair has relevance to questions from the related questions list RelQs = [ The Q 2 A 2 C 1 pair has keywords "computer," "program," and "software," and the relevant Q&A pairs are In the case of Q 3 A 3 C 1 , the keywords are "computer" and "internet," and RelQs = [ The Q 4 A 4 C 2 pair's keywords are "debugger," "software," and "program," and RelQs The Q 5 A 5 C 3 pair's keywords are "software," "computer," and "program," and RelQs The concept of relevance proposed in this paper depends on the number of similar keywords between Q&A pairs. In the case of [25], the policy of a cache mechanism depends on the document semantics. The algorithm of this paper caches semantically closed documents to the current page and removes the least frequently used documents. The proposed cache in this paper uses similar keywords in Q&A pairs to create relevance. This method might have limitations, but it is one of the easiest ways to make the cache work. Figure 4 shows the representation of the relevance graph. This graph has the id of Q&A (Q 1 A 1 , Q 2 A 2 , Q 3 A 3 , Q 4 A 4 , Q 5 A 5 ) and the chatbot id (C 1 , C 2 , C 3 ) that sent this information to the cache. The frequency in this mechanism depends on the conversation history of the chatbot. Here the frequency calculation works based on the previous and present questions. In other words, the frequency of a question is the number of times each question in the Q&A dataset is followed immediately by this question. This concept is different from the context view of NLP; in fact, handling the context of the NLP view is beyond the scope of this paper. Nevertheless, the method used in this paper, that is, the method based on the occurrences of previous questions in the conversation history, is considered to be engineeringly suitable for the cache mechanism.
The frequency of Q&A pairs is computed using conversation history. Table 2 presents an example of a Q&A set frequency graph showing how many times each question was called after other questions from the Q&A dataset.
where Q i is the current question, Q j is the previous question, Prev(Q i ) is the previous question, and F(i, j) is the number of times the current question (Q i ) was asked immediately after the previous question (Q j ). The table can be represented by a graph, as shown in Figure 5. The number next to a dashed arrow ("-->") shows the number of times a specific question was asked after Q 5 (Q 1 : three times, Q 2 : eight times, Q 4 : two times), and the number next to a solid arrow ("→") shows how many times a specific question was called after Q 1 (Q 2 : one time, Q 3 : five times, Q 5 : two times).

Cache Mechanism Algorithm
The proposed algorithm saves some of the Q&A sets from the chatbot data to generate responses directly from the broker without connecting to the chatbots. The usage of frequency and relevance enables the algorithm to cache effectively. Algorithm 1 shows the cache mechanism algorithm. Algorithm 1 Cache Mechanism. The input of this algorithm is a question from the chatbot service (Q i ). The algorithm uses the keyword-based relevance graph (RelQ) to obtain the next Q&A pairs and save them in RelQs i (C k is the id of the chatbot to get the required related questions). As Q&A sets are cached based on relevance and frequency, it is necessary to obtain the frequency of the relevant Q&A pairs (FreqQ i ) based on the conversation history. In other words, the frequency graph (FreqQ) saves information on how many times each relevant question was asked after Q i . The frequency helps the algorithm to sort relevant questions in reverse order to obtain the top Q&A sets that must be cached. The same is implemented with previously cached Q&A sets (CQ). The algorithm obtains the frequency of the previously cached Q&A sets (FreqCQ i ) after Q i from FreqQ and unites it with FreqQ i . Then, only the top number (size) of Q&A sets are cached.

Handling Cache-Miss
The system does not have the questions in the cache always, but it has cache misses too. In this case, the system requires the answer from the chatbots. The correspondent chatbot is selected based on the relevance graph. The system finds the node of the Q&A pair that has the highest number of keywords registered in the edge. Then it connects to the chatbot mentioned on the node and requires the answer from the chatbot.

Data Preparation
The data required for the experiments needed some modification to suit the form of the experiments. These experiments estimated the caching mechanism by maximizing the hit ratio. The data in this experiment were obtained from the AIML archive for "Computers" and "Artificial intelligence (AI)" [25].
The data consisted of 200 Q&A pairs in the AIML file. The data needed to be modified to be used to obtain relevant Q&A pairs and frequencies. Figure 6a shows an example of a Q&A pair in an AIML file. The question is indicated by the tags <pattern> and </pattern>, and the answer is indicated by <template> and </template>. Figure 6b is an example of a modified Q&A pair and it is a json format. This example was created based on data from the AIML file, with keywords extracted by the TextRank algorithm. The cache mechanism uses data in this structure, which saves information about the answer in "ans," the id of the Q&A pair in "id," keywords of the Q&A pair extracted with the TextRank algorithm in "keywords," and relevant Q&A pairs in "rel.ques." The "keyword" part helps to find the "rel.ques" information, which is used to find the Q&A sets to cache.

Experimental Setup
The cache mechanism proposed in this paper was implemented in Python. All experiments were conducted using a processor with a desktop and the following specifications: CPU: Intel ® Core™ i5-8500 CPU @ 3.00GHz RAM: 8.00 GB OS: Windows Python: 3.6.7

Comparison Experiment and Results
The experiment included three types of sequences, namely random, mixed, and related, and the following five cache sizes: 10,15,20,25,30. Asking random questions with no relevance or frequency is a random sequence. A mixed sequence consists of 4000 random questions followed by 6000 questions relevant to each other. A related sequence is a set of Q&A pairs stored in order of relevance to each other to have humanlike and content-based conversations. The cache size is the number of Q&A pairs that can be saved in the cache. The number of Q&A pairs is different for each case. There were four experimental cases with the same cache size parameter and sequences, but the size of the Q&A pairs was different. Case 1 had 50 Q&A pairs, Case 2 had 100, Case 3 had 150, and Case 4 had 200. Table 3 shows all the parameters for the experiment. The advantage of the proposed method is analyzing relevance among keywords in Q&A sets and frequency based on the conversational history in place of code locality. As was mentioned before, the CPU cache uses code locality to add instructions and data to the cache. However, Q&A sets of the chatbots has no locality information and the usage of the frequency and relevance to cache questions might be one of the possible ways.
This experiment aimed to show how well the proposed cache mechanism worked and the relationship between cache size and Q&A set size. A lower number of Q&A pairs with a large cache size might show a high hit ratio, but a large Q&A set with a small cache size would produce a lower hit ratio. This experiment was intended to show the consistency between Q&A set number and cache size and the reduction in the response time.

Q&A Set Size and Cache Size Relationship
Case 1 (C1) had 50 Q&A pairs; the cache size was equal to 10, 15, 20, 25, and 30, and the caching sequence was random, mixed, and related. Figure 7a-c shows the results of C1 in random, mixed, and related sequences, respectively. These results show how the cache size and order affect the hit ratio. The random sequence does not show good hit ratios, regardless of the cache size (Figure 7a). The C1 results for the mixed-question sequence show how the hit ratio became higher after related questions were asked after the first 4000 steps (Figure 7b). The related-question sequence in C1 produced the best results with cache size 30 (Figure 7c). However, this result cannot be considered a good result because the cache size is more than 50% of the entire dataset. The bar graph of all cache sizes in Figure 7d shows the amount of difference depending on the question sequence. The results for C2 with all cache sizes and random-sequence questions show almost the same results as C1. The mixed-sequence results show improvement after related questions were asked. The sequence of related questions had a high hit ratio, particularly with cache sizes equal to 25 and 30. The random sequence had a hit ratio less than 30%. The hit ratio of the mixed-question sequence was less than 40%. The highest hit ratio had cache size 30, and the lowest hit ratio had cache size 10.
In the case of C3, the experiment produced approximately the same results, with an insignificant difference between the hit ratio averages. As the number of questions grew, the hit ratio decreased. With the mixed-question sequence, the hit ratio increased after step 4000, and with the related-question sequence, the hit ratio also increased. In the related-sequence test of C3, all cache sizes produced a hit ratio greater than 80%.
The random sequence in C4 produced the same low hit ratio as in the other cases (Figure 8a). The mixed-question sequence produced a low hit ratio before the related questions were asked, after which the hit ratio increased. The related-question sequence of C4 produced an increasing hit ratio for all cache sizes, as shown in Figure 8c. The comparison of all cache sizes in C4 shows a significant difference between the related and random sequences (Figure 8d). The first produced an average hit ratio slightly below 80%, but the other one produced a hit ratio less than 30%. The four experimental cases used the same parameters, with the only difference being the number of questions. The experimental results showed that the sequence of the questions affected the hit ratio. A logical conversation with a user would create a better frequency graph of Q&A pairs than random conversations would. The experiment demonstrated the difference between cache sizes and the number of Q&A sets. Table 4 shows the average hit ratio of all cases and all cache sizes in random and related sequence of questions. The results in the table show how the sequence of questions produces the difference in hit ratio. As a random sequence has no meaning, the hit ratio shows low results. In the case of C2, C3, and C4, the hit ratio in the related sequence increased more than two times compared to the random sequence. It shows the high average in a related sequence, which means the system will show the high hit ratio as the user might have a meaningful and context-based conversation. Table 4 shows the high average hit ratio for cache size 30. The average hit ratio decreases as the number of Q&A pairs increases. In Case 1, cache size 30 included 60% of the Q&A pairs, while cache size 30 for Case 2 included only 30% of the pairs. For Case 3, cache size 30 included 20%, and for Case 4, only 15% of the pairs.

Response Time
Response time in this study is the round-trip time from the user terminal to the system and back to the user. Round Trip Time (RTT) is the length of time it takes for a data packet to be sent to a destination plus the time it takes for an acknowledgment of that packet to be received back at the origin. The aim of this study was to use the cache mechanism to decrease the RTT from "user → broker → chatbot → broker → user" to "user → broker (cache) → user" (Figure 9).  Figure 9 shows the RTT of a system without a cache mechanism. The RTT in this case is calculated by (6): In the case of a system with a cache mechanism, RTT is calculated by the following equation: Figure 10 shows the difference between response times for a question in the cache and the question from the chatbot itself. These results show that the main goal of this study was achieved, which was to shorten the response time of chatbot services by caching Q&A sets. The environment for the calculation of response times consisted of three different terminals. The first was the terminal of the user, who was sending questions to the broker. The broker terminal was responsible for caching Q&A sets and answering the user's question from the cache or through the chatbot. The final terminal was the chatbot terminal, which answered the user's questions that were not cached. Two questions were asked 500 times each. One question was already cached, and another question was not cached yet. In the case of a cached question, the user's question was sent to the broker, and the answer was shown to the user immediately from the cache. However, in the case of a question that was not cached, the system sent the user's question to the broker, and the broker sent it to the chatbot, where it was matched with Q&A pairs in the chatbot database, and the answer was returned to the user from the chatbot through the broker. In Figure 10, the blue line shows response times of the system with answers from the chatbot (RTT chatbot ). The red line shows response times of the system with answers from the cache (RTT cache ). The difference in response time is 2-3 times, as the average response times for questions that are not cached is RTT chatbot = 2.9 ms, whereas the average response time for cached questions is RTT cache = 0.9 ms. The experimental results showed what we expected to see. The experiment showed that the Q&A set number and cache size are consistent. As the Q&A set number increases, the hit ratio decreases. An increase in cache size produced an increase in the average hit ratio. As people ask questions based on the context of the conversation, the results using related sequences show the proposed cache mechanism is an effective one to use in the system. The presented cache mechanism can lighten system processes and reduce the response time of the chatbot system. To save all relevant Q&A pairs in the cache is ideal. However, cache size is limited, which is the reason to cache Q&A sets that have higher frequencies. In the results, as Q&A pairs with high frequencies were more likely to be asked, the hit ratio did not significantly decrease.

Conclusions
The proposed cache mechanism is responsible for improving the response time of a multi-layered, multi-chatbot system. It is included in the multi-layered broker system to help the broker provide faster answers to user questions. The caching mechanism aims to store often-asked relevant Q&A sets based on conversational history and keywords. Using the cache mechanism lightens system processes and improves response time by sending immediate answers to questions from the cache memory in the broker.
Experiments with four cases having different numbers of Q&A pairs were conducted to show the performance of the presented cache mechanism. The results of this experi-ment show that the cache mechanism performed well. This mechanism works based on the relevance of Q&A pairs based on the same keywords and frequency based on the conversational history. The experiment achieved up to a 94.00% hit ratio in contextualquestion sequences with 100 Q&A sets and cache size 30, which was better than 66.70% in random-question sequences with 50 Q&A sets and cache size 30.
As was mentioned, in a situation where cloud computing is mainstream, it is a general trend to use edge computing to improve the response speed of the cloud. In the case of chatbots, having a broker with a cache function on the edge has an advantage in terms of reaction speed. In the case of the knowledge coverage of the chatbot system, it is natural that a multi-chatbot with several exposed chatbots is helpful from the point of view of collective intelligence rather than a single chatbot. However, in a multi-chatbot system, if the client uses a structure in which a client asks a question to each chatbot and selects an answer, the slowdown will become more serious. The usage of the cache mechanism improves the speed of a broker.
We used the concepts of relevance and frequency in the cache mechanism in this paper. As one of our future works, we plan to use the concept of likes in addition to relevance and frequency. This idea will promote competition among chatbots for the advancement of knowledge. In addition, to make the cache mechanism more intelligent, one needs to use an ontology or knowledge graph. The knowledge graph allows the cache mechanism to consider the meaning of words based on conversation history and ontology.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: https://code.google.com/archive/p/aiml-en-us-foundation-alice/downloads.