You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

20 December 2024

Design and Implementation of an Intelligent Web Service Agent Based on Seq2Seq and Website Crawler

,
and
1
Department of Product Design, School of Arts and Design, Sanming University, Sanming 365004, China
2
QSAN Technology, Inc., Taipei City 114, Taiwan
3
Department of Computer Science and Engineering, Tatung University, Taipei City 104, Taiwan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Natural Language Processing (NLP) with Applications and Natural Language Understanding (NLU)

Abstract

This paper proposes using a web crawler to organize website content as a dialogue tree in some domains. We build an intelligent customer service agent based on this dialogue tree for general usage. The encoder-decoder architecture Seq2Seq is used to understand natural language and then modified as a bi-directional LSTM to increase the accuracy of the polysemy cases. The attention mechanism is added in the decoder to improve the problem of accuracy decreasing as the sentence grows in length. We conducted four experiments. The first is an ablation experiment demonstrating that the Seq2Seq + Bi-directional LSTM + Attention mechanism is superior to LSTM, Seq2Seq, Seq2Seq + Attention mechanism in natural language processing. Using an open-source Chinese corpus for testing, the accuracy was 82.1%, 63.4%, 69.2%, and 76.1%, respectively. The second experiment uses knowledge of the target domain to ask questions. Five thousand data from Taiwan Water Supply Company were used as the target training data, and a thousand questions that differed from the training data but related to water were used for testing. The accuracy of RasaNLU and this study were 86.4% and 87.1%, respectively. The third experiment uses knowledge from non-target domains to ask questions and compares answers from RasaNLU with the proposed neural network model. Five thousand questions were extracted as the training data, including chat databases from eight public sources such as Weibo, Tieba, Douban, and other well-known social networking sites in mainland China and PTT in Taiwan. Then, 1000 questions from the same corpus that differed from the training data for testing were extracted. The accuracy of this study was 83.2%, which is far better than RasaNLU. It is confirmed that the proposed model is more accurate in the general field. The last experiment compares this study with voice assistants like Xiao Ai, Google Assistant, Siri, and Samsung Bixby. Although this study cannot answer vague questions accurately, it is more accurate in the trained application fields.

1. Introduction

In our daily life, many questions always need to be answered. At this time, we usually ask others for advice, look up books, or search for answers on the Internet. In the 21st century, looking for answers online has become the most common way. With the changes of the times, in addition to basic food, clothing, housing, and transportation, human beings have become increasingly diverse and complex in medical care, entertainment, finance, and logistics. The resulting problems have also gradually increased. Traditional human customer service is slow or unable to answer these questions accurately. They are progressively unable to keep up with the fast pace of modern life regarding their responses’ speed, quality, or professionalism. Coupled with the continuous improvement of computer hardware technology, artificial intelligence, which was initially impossible to achieve, began to flourish. Hence the emergence of intelligent customer service agents.
Intelligent customer service agent is a question-answering system based on much knowledge. Currently, most provide services in the target knowledge domain, using technologies such as natural language understanding (NLU), big data (Big Data), and knowledge management. And because the system is built on servers, it’s not like people need breaks or shifts. It can provide 24-h professional answers and services.
At the end of the last century, many scholars have already been researching intelligent customer service development. In 1950, Turing [1] proposed the famous Turing test [2] in his paper “Computing Machinery and Intelligence” as in Figure 1. This standard is the ultimate goal of every natural language research. In 1966, Weizenbaum [3] published the world’s first chatbot, ELIZA, to imitate a psychologist’s interaction with a patient. Although it only used simple keyword matching and replying to rules, the bot still exceeded the development team’s expectations. In 1988, the University of California, Berkeley, developed “UC” to help users learn to use the UNIX system. An intelligent customer service agent can now analyze the input language, understand the user’s intention, and select the appropriate dialogue content to answer the user. In 1995, Wallace [4] developed the ALICE system, and the AIML language was released along with ALICE. With the high development of AI technology, intelligent customer service has flourished in recent years. Companies such as Google, Amazon, and Microsoft have successively invested in research in this field.
Figure 1. Turing test. C asks A and B whether they are human [2].
In addition to the rapid technological advancement, commercial demands have significantly increased the demand for intelligent customer service. With the development of the Internet, the way people communicate constantly evolves from traditional phone calls and faxes to e-mail, 3/4/5G communication, and current communication software such as Line and Skype. People’s communication patterns are getting faster and faster. Therefore, traditional customer service has gradually failed to keep up with the times and has become a service that needs to be transformed.
Two main problems need to be solved while developing intelligent customer services. One is the enormous internet resources and almost all products or services have multiple websites. However, current usage of these websites mostly requires users to browse and click on their own. Therefore, when the content and structure of the website are slightly larger, users often need to spend a lot of time to find the answer. The other is to create a general natural language processing agent. Most current intelligent customer service agents are developed for general usage rather than specific fields. However, different fields may require corresponding training on brilliant customer service before being used. For example, the intelligent customer service agent of a bank needs to be trained using financial-related data, and the intelligent customer service agent of the Center for Disease Control and Prevention needs to be trained using disease-related data. Whenever a field is crossed, it must be retrained, and the service agent cannot be universal.
Therefore, our main contributions are to solve the above two problems: the first is to deploy a web crawler fetching web pages and using the knowledge on that website to build a knowledge base as a dialogue tree for intelligent customer service. The other is to adopt the Seq2seq model consists of Bi-LSTM layers with attention mechanism introduced to create an intelligent customer service agent. In the proposed model, the training mechanism is changed from the field’s knowledge base to the grammar of that language. This improves the versatility of the service. The ultimate goal is to target similar web services without training. As long as the relevant websites in this field are made into a dialogue tree (knowledge base), the knowledge of the intelligent customer service agent can be rapidly expanded.
This research is divided into five sections. The first is the introduction, which explains the motivation and goal of the study. The second section is divided into two parts. The first part introduces the current standard methods of natural language understanding, including rule-based, traditional machine learning, Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Encoder-Decoder framework-related papers. The second part introduces several practices of using crawlers to build knowledge bases. The third section discusses the details of the implementation of this research. Use the Seq2seq model and introduce the concept of the attention mechanism and bidirectional LSTM. Combined with a crawler and dialogue tree, the knowledge base can be rapidly expanded. Therefore, realize a cross-domain intelligent customer service. The fourth section discusses the system’s functional testing and experimental results. The last section is the conclusion, which presents this study’s contributions, achievements, and prospects.

3. Proposed Methodology

3.1. System Architecture

This research is mainly divided into two parts: The first one is natural language processing. Using the Seq2Seq framework, adding attention mechanism and bi-directional LSTM to improve the existing shortcomings of Seq2Seq. The input is a question, and the output is the intent and entities of the question. The second one is the knowledge base, which uses a web crawler to extract the structure and content of the target web page. After adjustment, it is made into a dialogue tree. Import the intent and entities output from the first part of natural language processing and perform a tree search, as shown in Figure 2.
Figure 2. Proposed system architecture.

3.2. Nature Language Processing

Seq2seq, as in Figure 3a, consists of Encoder and Decoder. xn is the input sentence sequence; the encoder will convert the input sentence (xn) into a fixed unit vector (c), and then the decoder will convert it into the sentence or word (yn) we want. This study will use a decoder to generate intent and entities.
Figure 3. Seq2Seq framework (a) without attention mechanism. (b) with an attention mechanism.

3.2.1. Seq2Seq

In Seq2Seq, the decoder is usually a simpler RNN/LSTM [29], which is used to parse the context vector generated by the encoder, as shown in Figure 3a. A context vector contains the information input by the user, which is the last hidden state of the encoder. Seq2Seq solves the problem of LSTM input and output remaining at the same length. However, the encoder’s mechanism of compressing the input sentence into a fixed-length context vector also creates another problem. The fixed-length context vector will not work well if the input sentence is very long. There is no way to express the meaning of each word in a sentence well. The attention mechanism [30] proposed by Luong et al. in 2015 solves this problem.

3.2.2. Attention Mechanism

The attention mechanism [30], as shown in Figure 3b, was initially proposed to address the problem of significant performance degradation in machine translation as sentence length increases. The paper’s authors treat machine translation as an encoding-decoding problem, encoding sentences into vectors and decoding them into the content to be translated. However, in the Seq2Seq model, the encoder compresses the entire sentence into a fixed-length vector, which makes it challenging to save enough semantic information when the sentence is long. The role of the attention mechanism is to create a context vector for each word or word of the input sentence rather than just making a single context vector for the input sentence. The advantage of doing this is that the context vector generated by each word can be decoded more accurately.
There are similar properties in natural language processing. In early natural language processing, splitting sentences into many small words and processing them individually was common practice. Then, by building a large neural network model, it learns words in related fields and gives the results. The results from this approach can only be applied to that target domain. There is no such limitation when using the attention mechanism. This study significantly improves the accuracy of extracting intent and entity.
The encoder with the attention mechanism is no different from the encoder concept in Seq2Seq. The same is to generate [ h1, h2, h3hn] from the input sentence [X1, X2, X3Xn]. The difference is in how the context vector is calculated. Here, we first assume that the context vector is Ci.
The context vector (Ci) in Equation (1) is the sequential weighted sum of the input multiplied by the Attention score (α). Attention score is an essential concept proposed in the attention mechanism. It can measure the degree of importance each word in the input sentence brings to each word in the target sentence from Equation (2); the Attention score (αij) is calculated by score (eij). So, let’s explain what score (eij) is.
c i = j = 1 T x α i j h j
A t t e n t i o n   s c o r e   α i j = e e i j k = 1 T x e e i k
s c o r e   e i j = a ( s i 1 ,   h j )
In Equation (3), a is an Alignment model that assigns a score eij to the pair of input at position j and output at position i, based on how well they fit. ei,j are weights defining how much of RNN/LSTM decoder hidden state si-1 and the j-th annotation hj of the source sentence should be considered for each output. With score (eij), the Attention score can be calculated by softmax, and then the context vector Ci can be calculated. List the Attention score as a matrix, showing the correspondence between the input and output text.
As the name suggests, the decoder uses the attention mechanism to simulate human attention. There are two standard attention mechanisms: the soft attention mechanism and the hard attention mechanism. When the soft attention mechanism calculates the probability of attention, it will calculate its probability for any word input by the encoder. The advantage is that the weight of each word can be known more accurately, but it is also inefficient because of this. The hard attention mechanism matches the input word with the expected word. Words below the set threshold will directly set their probability to 0. This method has good effect and speed in image processing, but the subsequent accuracy would be significantly reduced in word processing.
Finally, this study uses the static attention mechanism, an extension of the soft attention mechanism. The difference from the soft attention mechanism is that the static attention mechanism will only calculate a single attention probability value for the entire sentence input by the encoder—no need to calculate every word like the soft attention mechanism. Although the accuracy rate will drop slightly, this study uses the static attention mechanism under the performance consideration. We adopted Luong Attention [30], as shown in Figure 4, to implement tensor flow. The calculation process is ht > at > ct > ht. The calculation method of the attention score is as shown in Equation (4), where ht is the state of the hidden layer of all target words, hs is the hidden layer state of the source word, and t is the attention weight.
a t ( s ) = a l i g n h t , h s ¯ = e x p ( s c o r e ( h t ,   h s ¯ ) ) s e x p ( s c o r e ( h t , h s ¯ ) )
Figure 4. Luong Attention.

3.2.3. Bidirectional LSTM with Luong Attention Mechanism

The problem with unidirectional RNN/LSTMs is that they can only make predictions from information before the current time t. But in practice, a sentence sometimes needs to use future information to make predictions. Its mode of operation is that a hidden layer can be left to right or right to left. With a bidirectional RNN/LSTM, we can make better predictions about words. For example, “I want to buy a Mac” and “I love apples”. In these two sentences, if you only read “I want to buy an Apple computer”, you may not know whether Apple refers to fruit or mobile phone. But if you can get the message from the latter sentence, the answer is obvious. Figure 5a,b show the improved RNN and LSTM, respectively, bidirectional RNN [31] and LSTM [32].
Figure 5. Bi-directional (a) RNN and (b) LSTM framework.
As the name suggests, bidirectional LSTM splits the LSTM into two directions, as shown in Figure 5b. The one from the front to the back would use the said words as a reference to adjust the model. The back-to-front LSTM adjusts the model concerning future words. The bidirectional LSTM would generate three variables: Output, State_FW, and State_BW. Output represents the final output, and State_FW and State_BW calculate its value. The training process is divided into three parts: the first step is calculating State_FW from front to back, State_BW from back to front, and Output. The second step calculates the forward gradient from the back to the front and then calculates the reverse gradient from the front to the back. The last step updates the model parameters based on the gradient values from the previous step.
This study uses the encoder-decoder architecture (Seq2Seq) for natural language processing. Bi-directional LSTM is used in the encoder to increase the accuracy of sentences in the case of polysemy. The Luong Attention Mechanism [30] is added to the decoder, as shown in Figure 6 [32], which improves the accuracy problem decreasing as the sentence length increases.
Figure 6. They used model architecture for natural language processing.

3.3. Knowledge Representation

3.3.1. Implementation of the Web Crawler

Beautiful Soup is a Python library module whose functions include parsing HTML or XML files and repairing files with errors such as unclosed tags (often called tag soup). This software package can build a tree structure to parse the page to access the data. It allows developers to quickly and easily parse web pages, find out the information that users are interested in, reduce the development threshold of web crawler programs, and speed up program development.
The crawler is divided into three steps. The first step is to analyze the structure of the web page. Use the request.get() function to crawl the HTML tags of the target web page and use beautifulSoup to convert the removed HTML into text files, perform structural analysis, and find out the rules. The purpose of the second step is to extract the target text. According to the rules, use the find() function to query tags such as class, id, HTML tag, etc., extract the target text, and store it in the Dialogue Tree. The last step is to use the Depth-First Search algorithm to explore the web page deeply and store all the target texts in the Dialogue Tree.

3.3.2. Depth-First Search

There are two standard methods of web crawling: Breadth-First Search (BFD) and Depth-First Search (DFS) [33]. BFS will first crawl pages with a shallow directory structure and then enter a deeper level to crawl. This approach is suitable for crawling complete web content. More complete web page information can be saved at the same time. DFS algorithm traverses a tree or graph structure starting from the tree’s root (or a point in the graph). First, explore an unvisited vertex or node on the edge and search as deep as possible. Until all the nodes on the edges of the nodes have been visited, backtrack to the previous node. Repeatedly explore unsearched nodes until the destination node is found or all nodes have been visited.
DFS accesses the web links of the next layer in descending order of depth until it can no longer go deeper. This method is more suitable for this study because the Dialogue Tree can be built only with nodes consisting of keywords. So, using DFS can make a complete Dialogue Tree faster. This study uses this algorithm to perform a comprehensive scan of the target web page, then convert it into a tree structure and store it according to the scanned structure.

3.3.3. Dialogue Tree

The concept of the Dialogue Tree [34] has flourished since the advent of computer games. But Dialogue Tree existed long before computer games. The earliest known Dialogue Tree appears in Jorge Luis Borges’ 1941 short story “The Garden of Forking Paths”, ref. [35] which allows branching routes from events to enter other branches or the main story via specified conditions. The story starts over as it progresses (since the possible outcomes will be close to n × m, where n is the number of options and m is the depth of the tree). Players advance the story by speaking to a non-player character and selecting a pre-written line from the menu. The non-player would respond accordingly to the player’s choice and guide the player to the specified plot. This cycle continues until the goal is achieved. When the player chooses to leave the game, the dialogue ends, and the current state is remembered. Usually, there will not be only one set of tree diagrams in the game, and there will be a switch among different tree diagrams according to the player’s choices. In addition to the essential Dialogue Tree, some games may be designed with a unique score system. Adjust the score according to different decisions, predict the player’s likely thoughts, and change the plot direction to match the player’s expectations.
The mechanics described above allow players to talk to each other with non-player characters. This study uses this dialogue mode to construct a knowledge-based Dialogue Tree. Replace players with users and replace non-player characters with a knowledge tree system based on a web crawler. Of course, there can be more than one knowledge tree, and they can be connected in series so that users can get a better experience. A set of statistical formulas can be further established to predict what users are interested in and push them to users.
Taking the Taiwan Water Company [36] as an example, the web page must be analyzed first. After analysis, it was found that most of the catalogs of the company are marked in the <ul> and <li> tags, as in Figure 7, and there are apparent tags such as class=“child”. So, find all the hyperlinks <a href> for these rules and use the DFS algorithm to crawl down layer by layer. In this way, a preliminary tree can be generated. After manual trimming, the final dialogue tree can be produced, as in Figure 8.
Figure 7. A part of the web content of the Taiwan Water Company.
Figure 8. An example dialogue tree about the payment by credit card for the Taiwan Water Company.

4. Experimental Results

The experiments in this study were divided into four parts. The first part aims to demonstrate that the neural network model adopted in this study outperforms traditional neural network models in natural language processing. The second part is questions and answers with knowledge of the target domain to test RasaNLU [37] and the proposed model in this study. The purpose is to verify whether the accuracy of the neural network model proposed in this study is better than the popular Rasa open-source conversational AI platform in the target domain. The third part of the experiments aims to question and answer knowledge in non-target domains and test RasaNLU and the proposed model in this study. The purpose is to verify whether the generality of the adopted neural network model is better than the RasaNLU machine learning method in cross-domain. The last part of the experiments compares this study with Xiao Ai, Google Assistant, Siri, and Samsung Bixby on the market to verify the gap between this study and some commercial products.

4.1. Public Dataset

The experiment uses the data compiled by codemayq [38] for the open-source Chinese chat data; some examples are shown in Figure 9, including from Chatterbot, Douban, PPT, Qingyun, TV drama dialogue, Tieba forum, Weibo, Xiaohuangji and other websites containing various dialogue materials. Many of these sentences have familiar Martian texts, such as ㄏㄏ, QQ, etc., that must be filtered and pre-processed.
Figure 9. Chinese chat corpus.

4.2. Experimental Environment and the Tested Models

The same computer is used for training and testing, equipped with an Intel i5 8600K processor, Nvidia GeForce GTX 1080 Ti 11G, and 3584 CUDA cores. The batch size is set to 32, and each training group takes an average of 2 days.

4.2.1. RasaNLU

In this experiment, the Sklearn + Jieba + MITIE packages are used to train the RasaNLU, which has some advantages. First of all, the source data must be marked. This study takes the Taiwan Water Company as an example. Text is the input sentence, the intent is the purpose, and the entity is the keyword of the input sentence, as shown in Figure 10.
Figure 10. Annotation of the training data for RasaNLU.
We used MITIE, which is an unsupervised training model that requires a lot of Chinese data. We use the Chinese version of Wikipedia as the data source. The total_word_feature_extractor_zh.dat in Figure 11 is the file trained by MITIE. After the training, you can import the data and the Jieba tool into the RasaNLU model to start training. Test validation is required after training is completed. After enabling RasaNLU, curl will be used to verify the result. If you see that the intent and the entity can be correctly identified and the results of each direction would be scored as in Figure 12, the experiment is successful.
Figure 11. RasaNLU pipeline.
Figure 12. A test example of the trained RasaNLU (read from left to right) for water fare query.
Because LSTM is unlike the other three Encoder-Decoder models, it can convert the input sentence into a unit vector and then extract the Intent and Entities by the Decoder. Therefore, when training, the sentences are first segmented, and the training is performed according to each word. This study’s LSTM and deep learning experiments deploy the TFLearn API. TFLearn is an API based on Tensor Flow, which can set each neural layer and activation function and filter more intuitively and quickly.

4.2.2. LSTM

The data needs to be pre-processed before training. After the sentences are segmented, the features, such as part of speech of each word, are converted into vectors that the model can understand. Taking Figure 13a as an example, the feature is a two-dimensional vector. The first-dimension vector is the part of speech, and each part of speech is numbered: noun is 0, verb is 1, and so on. The second-dimension vector divides each word into Intent and Entity according to its properties. After the data is pre-processed and basic actions such as batch data generation and parameter setting are processed, the LSTM model can be trained as in Figure 13b.
Figure 13. (a) Training sample data. (b) LSTM training process.

4.2.3. Seq2Seq

We adopt Google’s open-source Seq2Seq model [39] as base. To make a correct comparison with other neural network models, some improved methods in the example are removed first, and only the part of Seq2Seq is retained. The training is divided into three parts: first, collecting the data, then setting the model parameters, and finally, training the model. The trained model has an accuracy of nearly 70%, as shown in Figure 14.
Figure 14. Some test results of Seq2Seq.

4.3. Ablation Test of the Proposed Neural Network Model

The first experiment tested four groups of neural network models. The four groups of neural networks were LSTM, Seq2Seq, Seq2Seq + Attention, Seq2Seq + Bi-directional LSTM + Attention. Using the open-source Chinese corpus [38], 5000 questions were extracted from it as training data, and 1000 questions different from the dataset were extracted from the same corpus as test data. If the output is the expected Intent and Entities, it is judged to be the correct result. Otherwise, it is not. Finally, the number of correct questions/1000 × 100% becomes the accuracy rate. The results are in the order of LSTM, Seq2Seq, Seq2Seq + Attention, Seq2Seq + Bi-directional LSTM + Attention, and the accuracy rates are 63.4%, 69.2%, 76.1%, and 82.1%, respectively. It can be observed in Table 1 that using Seq2Seq and adding the Attention and bidirectional LSTM can indeed significantly improve the accuracy rate.
Table 1. Ablation test of the proposed neural network model.

4.4. Test in the Target Domain and Non-Target Domain

Using 5000 Taiwan Water Company data [37] as the training data and 1000 water-related questions that differ from the training data for testing, the Accuracy rates of RasaNLU and this study are 86.4% and 87.1%, respectively. It can be seen from Table 2 that the results of RasaNLU and this study in the target domain are not much different, and the accuracy rates are near.
Table 2. Compare RasaNLU and ours in general.
Five thousand questions from chat databases [38] from 8 public sources, including Weibo, Tieba, Douban, and other well-known social websites in mainland China and the PTT gossip version in Taiwan, were extracted as training data. And extract 1000 questions from the same corpus that differ from the training data for testing. The correct rates of RasaNLU and this study are 46.3% and 83.2%, respectively. Because of the different training mechanisms, this study can still identify the correct Intent and Entities in areas the system is not targeted. It can be seen from Table 2 that RasaNLU and this study are in non-target areas, and RasaNLU would have misjudged.

4.5. Comparisons with Chatbots on the Market

Verify the difference between this research and commercial products in the professional field. Because the products on the market cannot extract intent and entities for experimentation, we use Trip.com, Taipei MRT, Taiwan Water Company, Wikipedia, and other websites with more standardized answers as test data sources. The dataset for this study is the Chinese corpus [38]. Five thousand questions were extracted from it as training data, and the test data were 500 groups of questions randomly selected from the above websites. Questions have only one answer like: Who founded Microsoft? Answer: Bill Gates. The accuracy rate is shown in Table 3. This experiment indicates that this commercial product has higher accuracy of comprehension for Wikipedia only, which hinted already optimized on it. However, our study has advantages over these products in other application fields.
Table 3. Accuracy comparison of this study with some commercial products in selected application fields.

5. Conclusions and Remarks

5.1. Conclusions

From previous surveyed related works, Refs. [25,26,27,28] are the most recently published works that are closely related to our adopted Bi-LSTM model with the Luong attention mechanism. However, the adopted dataset and target domains are different. So, we could not make a direct comparison. Here, the proposed innovative Bi-LSTM model works as an intelligent agent and is integrated with a web crawler for web service.
It can be seen from the above experimental results that there is much difference between this study and RasaNLU in the target field. Our advantages are that when applying the proposed Seq2seq based on the Bi-LSTM model to other similar websites, only the knowledge bases (Dialogue Tree) need to be expanded and don’t need to re-train the proposed model. The difference is evident in the general field. RasaNLU, it is obvious that if there is no relevant information in the field during training, the trained model will be unable to find the correct answer. The system proposed in this study makes judgments based on grammar, such as part of speech and sentence structure, so the accuracy is much higher than that of RasaNLU. In the experiment, compared with the products on the market, this research can provide better answers than the products on the market if the dialogue tree is appropriately designed in the application field. However, when the answer to the question is more general, it is less likely to give an acceptable answer, as a human would.
As to the contribution, the first one is to demonstrate the function brought by the memory module of LSTM. It makes the computer read sentences no longer only to understand the words it has learned but to evolve to be more human-like. Grammatically determine which words in the sentence are necessary. This concept is similar to when humans read foreign languages; there may be some words in an article that we do not know. But with the context, we can still read this article. By adding the Attention Mechanism and bidirectional LSTM, some of the shortcomings of Seq2Seq are also improved and provide higher accuracy.
The second is using web crawlers to build a knowledge base (Dialogue Tree). After experiments, most websites have not optimized this. Therefore, if some websites use a database-like table listing method or embed many pictures and PDF files in the website, it will increase the difficulty of Dialogue Tree formation. Instead, more staffing must be used to make corrections. Therefore, subsequent systems should make statistics on these exceptions and overcome them individually.

5.2. Future Works

At present, this system has two shortcomings. The first is that the speed is not fast enough. After a question is asked, it often takes about five or six seconds to respond. This part can be improved in the subsequent Dialogue Tree algorithm improvement and optimization of neural network parameters. Another disadvantage is that the accuracy rate in the general environment is about 80%. It should be optimized through more experimentation. In addition, other various Attention Mechanisms, like self-attention and multi-head attention [40], not only proposed a parallelization method but also improved some of the defects of the current Attention Mechanism. Transformer is such a model and is worth trying and studying.

Author Contributions

Methodology, C.-C.H.; software, J.-X.Y.; validation, C.-C.H.; data curation, J.-X.Y.; writing—original draft preparation, M.-H.H.; writing—review and editing, C.-C.H.; supervision, J.-X.Y. and M.-H.H.; project administration, M.-H.H.; funding acquisition, M.-H.H. All authors have read and agreed to the published version of the manuscript.

Funding

Sanming University. Research Foundation for Advanced Talents, Grant Number: 22YG11S.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are available at [https://github.com/codemayq/chinese_chatbot_corpus (accessed on 18 June 2021)], Ref. [35].

Conflicts of Interest

Author Jian-Xin Yang was employed by the company QSAN Technology, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Turing, A.M. Computing Machinery and Intelligence. Mind 1950, LIX, 433–460. [Google Scholar] [CrossRef]
  2. Wallace, R.S. The Anatomy of A.L.I.C.E.; Epstein, R., Roberts, G., Beber, G., Eds.; Parsing the Turing Test; Springer: Dordrecht, The Netherlands, 2009. [Google Scholar] [CrossRef]
  3. Kirrane, S. Intelligent Software Web Agents: A Gap Analysis. J. Web Semant. 2021, 71, 100659. [Google Scholar] [CrossRef]
  4. Mashaabi, M.; Alotaibi, A.; Qudaih, H.; Alnashwan, R.; Al-Khalifa, H. Natural Language Processing in Customer Service: A Systematic Review. arXiv 2022, arXiv:10.48550/arXiv.2212.09523. [Google Scholar]
  5. Huang, C. The Intelligent Agent NLP-based Customer Service System. In Proceedings of the 2021 2nd International Conference on Artificial Intelligence in Electronics Engineering (AIEE ‘21), Phuket, Thailand, 15–17 January 2021; ACM: New York, NY, USA, 2021; pp. 41–50. [Google Scholar] [CrossRef]
  6. Winograd, T. SHRDLU; MIT AI Technical Report 235; MIT: Cambridge, MA, USA, 1971. [Google Scholar]
  7. Weizenbaum, J. ELIZA—A Computer Program for the Study of Natural Language Communication between Man and Machine. Commun. ACM 1966, 9, 35–36. [Google Scholar] [CrossRef]
  8. Chen, G.; Liu, R.; Chen, R.; Fu, C.C. A Historical Review of the Key Technologies for Enterprise Brand Impact Assessment. In Proceedings of the 2024 International Conference on Applied Economics, Management Science and Social Development (AEMSS 2024), Luoyang, China, 22–24 March 2024; pp. 240–246. [Google Scholar]
  9. LeCun, Y. Deep Learning Hardware: Past, Present, and Future. In Proceedings of the 2019 IEEE International Solid-State Circuits Conference—(ISSCC), San Francisco, CA, USA, 11–17 February 2019; pp. 12–19. [Google Scholar] [CrossRef]
  10. Rasa: Open Source Conversational AI—Rasa. 2019. Available online: https://rasa.com (accessed on 18 June 2021).
  11. Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  12. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  13. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215. [Google Scholar]
  14. Xiao, T.; Zhu, J. Introduction to Transformers: An NLP Perspective. arXiv 2023, arXiv:2311.17633. [Google Scholar]
  15. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
  16. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ‘20), Vancouver, BC, Canada, 6–12 December 2020; Article 159. pp. 1877–1901. [Google Scholar]
  17. AIML.com. What Are the Limitations of Transformer Models? Available online: https://aiml.com/what-are-the-drawbacks-of-transformer-models/ (accessed on 18 June 2021).
  18. Rungsawang, A.; Angkawattanawit, N. Learnable Topic-Specific Web Crawler. J. Netw. Comput. Appl. 2005, 28, 97–114. [Google Scholar] [CrossRef]
  19. Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; ACM Press: New York, NY, USA, 1999. [Google Scholar]
  20. Kleinberg, J.M. Authoritative Sources in a Hyperlinked Environment. J. ACM (JACM) 1999, 46, 604–632. [Google Scholar] [CrossRef]
  21. Lee, T.B. Semantic Web; World Wide Web Consortium: Cambridge, MA, USA, 1998. [Google Scholar]
  22. Kim, S.M.; Ha, Y.G. Automated Discovery of Small Business Domain Knowledge Using Web Crawling and Data Mining. In Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), Hong Kong, China, 18–20 January 2016; pp. 481–484. [Google Scholar] [CrossRef]
  23. W3C. RDF—Semantic Web Standards. Available online: http://www.w3.org/RDF/ (accessed on 18 June 2021).
  24. W3C. OWL—Semantic Web Standards. Available online: http://www.w3.org/2001/sw/wiki/OWL (accessed on 18 June 2021).
  25. Choudhary, P.; Chauhan, S. An Intelligent Chatbot Design and Implementation Model Using Long Short-Term Memory with Recurrent Neural Networks and Attention Mechanism. Decis. Anal. J. 2023, 9, 100359. [Google Scholar] [CrossRef]
  26. Budaev, E.S. Development of a Web Application for Intelligent Analysis of Customer Reviews Using a Modified seq2seq Model with an Attention Mechanism. Comput. Nanotechnol. 2024, 11, 151–161. [Google Scholar] [CrossRef]
  27. Jiang, J.W.; Zhang, H.; Dai, C.; Zhao, Q.; Feng, H.; Ji, Z.; Ganchev, I. Enhancements of Attention-Based Bidirectional LSTM for Hybrid Automatic Text Summarization. IEEE Access 2021, 9, 123660–123671. [Google Scholar] [CrossRef]
  28. Xie, T.; Ding, W.; Zhang, J.; Wan, X.; Wang, J. Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning. Appl. Sci. 2023, 13, 7916. [Google Scholar] [CrossRef]
  29. Su, G. Seq2seq Pay Attention to Self-Attention. 3 October 2018. Available online: https://bgg.medium.com/seq2seq-pay-attention-to-self-attention-part-1-%E4%B8%AD%E6%96%87%E7%89%88-2714bbd92727 (accessed on 18 June 2021).
  30. Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  31. Graves, A. Supervised Sequence Labelling with Recurrent Neural Networks; Computer Science; University of Toronto: Toronto, ON, Canada, 2012. [Google Scholar]
  32. Alfattni, G.; Peek, N.; Nenadic, G. Attention-based Bidirectional Long Short-Term Memory Networks for Extracting Temporal Relationships from Clinical Discharge Summaries. J. Biomed. Inform. 2021, 123, 103915. [Google Scholar] [CrossRef] [PubMed]
  33. Wikipedia. Depth-First-Search. 2019. Available online: https://zh.wikipedia.org/wiki/Depth-First-Search (accessed on 18 June 2021).
  34. Wikipedia. Dialogue Tree. 2019. Available online: https://en.wikipedia.org/wiki/Dialogue_tree (accessed on 18 June 2021).
  35. Borges, J.L. Garden of Forking Paths; Penguin Books: London, UK, 22 February 2018; ISBN -13 9780241339053. [Google Scholar]
  36. Taiwan Water Company. 2016. Available online: https://www.water.gov.tw/mp.aspx?mp=1 (accessed on 18 June 2021).
  37. Introduction to Rasa Open Source & Rasa Pro, RasaNLU. Available online: https://rasa.com/docs/rasa/nlu-training-data/ (accessed on 18 June 2021).
  38. GitHub. Chinese_Chatbot_Corpus. 2018. Available online: https://github.com/codemayq/chinese_chatbot_corpus (accessed on 18 June 2021).
  39. GitHub. Seq2Seq. 2017. Available online: https://github.com/google/seq2seq (accessed on 18 June 2021).
  40. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Cornell University, 16 December 2017. Available online: https://arxiv.org/pdf/1706.03762.pdf (accessed on 18 June 2021).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.