Next Article in Journal
Factors Associated with Musculoskeletal Injuries in Pre-Professional Modern Dancers before and after the COVID-19 Pandemic
Next Article in Special Issue
A Personalized Multi-Turn Generation-Based Chatbot with Various-Persona-Distribution Data
Previous Article in Journal
Plasma-Activated Tap Water Production and Its Application in Atomization Disinfection
Previous Article in Special Issue
A Multi-Granularity Word Fusion Method for Chinese NER
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot

1
School of Software, Dalian Jiaotong University, Dalian 116028, China
2
School of Computer Communication and Engineering, Dalian Jiaotong University, Dalian 116028, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(5), 3016; https://doi.org/10.3390/app13053016
Submission received: 31 January 2023 / Revised: 21 February 2023 / Accepted: 23 February 2023 / Published: 26 February 2023
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

Abstract

Answer selection, as a crucial method for intelligent medical service robots, has become more and more important in natural language processing (NLP). However, there are still some critical issues in the answer selection model. On the one hand, the model lacks semantic understanding of long questions because of noise information in a question–answer (QA) pair. On the other hand, some researchers combine two or more neural network models to improve the quality of answer selection. However, these models focus on the similarity between questions and answers without considering background information. To this end, this paper proposes a novel refined answer selection method, which uses an attentive bidirectional long short-term memory (Bi-LSTM) network and a self-attention mechanism to solve these issues. First of all, this paper constructs the required knowledge-based text as background information and converts the questions and answers from words to vectors, respectively. Furthermore, the self-attention mechanism is adopted to extract the global features from the vectors. Finally, an attentive Bi-LSTM network is designed to address long-distance dependent learning problems and calculate the similarity between the question and answer with consideration of the background knowledge information. To verify the effectiveness of the proposed method, this paper constructs a knowledge-based QA dataset including multiple medical QA pairs and conducts a series of experiments on it. The experimental results reveal that the proposed approach could achieve impressive performance on the answer selection task and reach an accuracy of 71.4%, MAP of 68.8%, and decrease the BLUE indicator to 3.10.

1. Introduction

With the development of artificial intelligence technology, answer selection has become a hot and urgent research topic in natural language processing (NLP) because it is the foundational condition for intelligent voice assistants and intelligent medical service robots to understand the question and realize intelligent answering service. Answer selection is designed to select the correct answer based on a given question and corresponding candidate answers and reflects the matching relationship between question and candidate answers by extracting deep features of the question–answer (QA) pair. It is the basis for developing more efficient intelligent Q&A systems, such as medical voice Q&A service robots, which are of significant interest in the medical field [1,2,3,4].
Answer selection is of abundant importance in intelligent question answering (Q&A) tasks and draws increasing attention from researchers. A broad range of prior research studies have been proposed in recent years. Traditional answer selection models mostly use lexical or syntactic analysis and manual marking to select answers. It is difficult to capture the semantic connection between QA pairs. With the development of deep learning, some prominent performance frameworks have been introduced to answer selection tasks, among which are convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory networks (LSTM) [5], etc. These neural networks can better understand the relationship between QA pairs.
Most of the existing models of answer selection are generative models. The generative model learns a Q&A mapping to generate responses by maximizing P (a|q), where q is the input question and a is the answer. The most popular generative model is the sequence-to-sequence (Seq2Seq) model [6], which consists of an encoder and a decoder. Seq2Seq can achieve high coherence between questions and answers. Reference [7] first used a Bi-LSTM encoder to encode question and candidate answers, and then adopted feed-forward attention to encode a question sentence. A growing number of researchers connect deep learning models with other advanced technologies to improve the ability of answer selection model. Good results have been achieved by utilizing these neural networks. Although they pay much attention to the semantic understanding of natural language and global information of context, answer selection models still face the following challenges due to the complexity of natural language. Firstly, questions and answers are semantically connected, thus, it is not enough to just utilize the similarity as an indicator. Secondly, some noise and information not related to the question can interfere with the process of choosing the correct answer. Therefore, we need to design a refined answer selection model to address these problems.
This paper proposes a novel refined answer selection method with an attentive bidirectional long short-term memory network and a self-attention mechanism to address these challenges based on the observations above. Specifically, it adopts self-attention to obtain global information and designs an attentive Bi-LSTM network to gain a deeper understanding of the semantic information of the dataset. The experimental results reveal that the proposed method could achieve better performance on intelligent answer selection tasks. It is of great importance for a medical service robot to realize an intelligent question answering service. The main contributions of this article are summarized as follows:
(1)
A refined answer selection method is proposed for a medical service robot to realize an intelligent question-answering service. The required knowledge-based text is constructed as background information to match the question and answer.
(2)
Self-attention is adopted to extract global features of the input data before passing the data to the circulating layer to solve the long-distance dependent learning. An attentive Bi-LSTM network is designed to have a more precise measurement of the similarity between the QA pair with consideration of the background knowledge information.
(3)
A knowledge-based QA dataset is built to verify the effectiveness of the proposed approach. The proposed method could achieve impressive performance on the answer selection task with an accuracy of 71.4%, MAP of 68.8%, and the BLUE indica-tor of 3.10.
The rest of the paper elaborates this project in the following sequence. The related work for answer selection and the materials and methods are listed in Section 2 and Section 3, respectively. Section 4 outlines the experiments and results. Finally, our conclusions and our future work are expounded in Section 5.

2. Related Work

2.1. Natural Language Processing (NLP)

Natural language processing (NLP) is a subject that understands and analyzes language by a computer. It is aimed to design algorithms to help computers understand and process language like a human does. Its basic task is to divide the corpus to be processed based on ontology dictionaries, word frequency statistics, and contextual semantic analysis, to obtain the smallest unit that is semantically rich in the smallest lexical unit [8]. It is widely used in text classification, sentiment analysis, machine translation, question answering, and other fields. Deep learning (DL) has become a mainstream technology in the field of natural language processing due to its powerful feature extraction and learning capabilities. It is called DL-NLP. Reference [9] proposed a neural language model for learning distributed representations of words (NNML) in 2003. Reference [10] extended previous research and proposed a general CNN-based framework to solve a large number of NLP tasks in 2011. From 2015 to 2018, references [11,12,13] focused on text summarization, image description, and sentiment analysis, which promote the use of attentional mechanisms in natural language processing. From 2019 to the present, improved models based on BERT have been proposed one after another, which have advanced the field of NLP [14,15,16,17].

2.2. Bidirectional Long Short-Term Memory (Bi-LSTM) Network

A Bi-LSTM neural network can be seen as a combination of forward LSTM and backward LSTM. At a finer granularity of classification, Bi-LSTM can better capture the semantic dependencies in both directions. It controls the stored contents of memory units mainly through three gating devices: the input gate ( x t ), the forgetting gate (f), and the output gate ( h t ), as shown in Figure 1.
The Bi-LSTM neural network is now widely used in many scenarios in NLP and good results have been achieved. Reference [18] proposed a multi-scale deformable CNN to capture the non-consecutive n-gram features by adding an offset to the convolutional kernel for answer selection. Reference [19] used Bi-LSTM for Q&A on marriage law. Reference [20] built a model by LSTM to achieve answer selection for non-factual questions in different scenario contexts. Reference [21] used CNN-LSTM to act as the encoder to predict the most useful pair as the final output from multiple QA pairs by conditional random fields.
LSTM and gate recurrent unit (GRU) are two variants of RNN, which can solve the problem of gradient disappearance. The performance of GRU and LSTM is similar in many tasks. Compared with GRU, LSTM has more parameters, more powerful functions, and stronger expression ability. However, there is a problem with LSTM: it is unable to encode information from the back to the front. Therefore, Bi-LSTM can better capture bidirectional semantics. In this paper, the QA pair is trained with the Bi-LSTM neural network, which improves the accuracy of the answer selection by enhancing memory function through three gating devices.

2.3. Attention Mechanism

In recent years, the attention mechanism [22] has been quite successful in the fields of speech recognition, image processing, and natural language processing. It is a technique worth investigating in deep learning. Attention considers different weight parameters for each element of the input, thus focusing more on the parts that are similar to the input elements and suppressing other useless information [23,24]. Most of the current attention models are attached to the encoder–decoder framework to solve the problem of the traditional LSTM/Bi-LSTM neural network models encoding the input sentences into a fixed-length vector representation regardless of how long it is. Reference [25] was the first to introduce an attention mechanism into the NLP task, trying to match the output of the target with the input of the source in the machine translation process. Reference [26] found that the attention mechanism can capture both the long-term and short-term interests of users in the recommendation system to improve accuracy. In the answer selection task, reference [27] investigated answer selection ranking based on the attention mechanism, which solved the answer selection ranking problem by extracting lexical features of question-and-answer sentences. Reference [28] proposed a gated group self-attention (GGSA) answer selection model to solve the problem of global attention and local attention that cannot be distinguished accurately. Reference [29] proposed a co-attention fusion-based network for Chinese medical answer selection. In this paper, self-attention is used as a feature extraction tool and the outputs include global information for each word of the vector. Also, the problem of long-distance dependent learning is solved. Attention is also added to the Bi-LSTM phase to ignore the noise information to improve accuracy.

3. Materials and Methods

In this section, a novel refined answer selection method is proposed with an attentive bi-directional long short-term memory network and self-attention mechanism for an intelligent medical service robot. Then a detailed introduction is given about each part of the model. As shown in Figure 2, our model comprises four parts: (1) word embedding: the question and answer are converted from words to vectors, separately. (2) Self-attention-based features extraction: the self-attention mechanism was employed to extract important information in QA pairs. (3) Attentive Bi-LSTM circulating: we adopted Bi-LSTM as the encoder and decoder to train datasets and redesign it by introducing the attention mechanism to reinforce features. The representation of the QA pair is the output of this phase after max-pooling. (4) Similarity calculation: we obtained a cosine distance to measure the similarity of the question and answer.

3.1. Word Embedding

In natural language processing, words are converted to vectors for machine understanding, which is also called word embedding. This study utilized word2vec [30] pre-training data to obtain the fixed vector corresponding to each word. The vector of the word in the sentence is initialized with a pre-trained word vector. Given an input question [q1, q2, …, qi, …, qn] and candidate answers [a1, a2, …, ai, …, am], the length of question is n. They are encoded into an n x d matrix, where d is the dimension of the pre-trained word vector. The output matrixes are expressed as [Q1, Q2, …, Qi, …, Qn] and [A1, A2, …, Ai, …, Am]. In order to better understand the semantic relationship, the background information ks with length Lk is also pre-trained during word embedding. The output is Ks, which will be input to an attentive Bi-LSTM circulating module.

3.2. Self-Attention Based Feature Extraction

BERT-kind of models show good performance in natural language processing. However, in this paper, self-attention has been chosen as our feature extractor. There are two reasons for not using BERT-kind of models. On the one hand, the training data and model parameters required by BERT are enormous. On the other hand, due to the high demand for computing resources and a large amount of computing, it is not suitable for our model and method in this paper.
We chose self-attention as our feature extractor. After obtaining the sentence vector embedding of Q&A, this paper adopts the self-attention mechanism to extract global features from vectors and remove noise information from the sentence embedding. It is revealed from Figure 2 that the self-attention structure can directly capture the relationship between two tokens regardless of their distance. It is simpler and faster than a traditional attention mechanism. The outputs of this layer are new vectors with contextual information. They are formatted as [Q1q, Q2q, …, Qiq, …, Qnq] and [A1a, A2a, …, Aia, …, Ama].

3.3. Attentive Bi-LSTM Circulating

Bi-LSTM, which can remember the information from both directions, was adopted. A dropout operation [31] could randomly ignore a part of layer’s neurons during training. Thus, this paper introduces dropout to reduce complex coadaptation relationships between neurons. When the neurons are reduced, it can have an advantage of model fusion in addition to allowing the network to achieve better parameters. The dropout process is shown in Figure 3.
The vectors that are obtained by the previous layer are trained forward and backward several times to obtain the hidden state vector ht. ht is the output of the last timestamp to represent the meaning of the inputs. The Bi-LSTM circulating process is shown in Figure 4. The calculation process can be expressed by the following equations.
f t = σ ( W f · [ h t 1 , x t ] + b f ) ,
i t = σ ( W i · [ h t 1 , x t ] + b i ) ,
g t = t a n h ( W c · [ h t 1 , x t ] + b c ) ,
c t = i t g t + f t c t 1 ,
o t = σ ( W o · [ h t 1 , x t ] + b 0 )
h t = o t t a n h ( c t )
As shown in the equations above, the output of the model at moment t is noted as ht. W and b are denoted as weight and bias, respectively. A sigmoid function is used in various gates to forget and retain information. A tanh function is used to process data for the state and output.
We redesigned the Bi-LSTM encoder–decoder in a circulating layer by constructing an attention mechanism, which fuses with the Bi-LSTM. The attention dependent on the Bi-LSTM part is shown in Figure 5. Before the hidden state vector ht passes to the next layer, it is weighted by attention. Finally, we obtain hta.
The attention mechanism includes an attention weight layer and a weight sum layer. First, the attention weight layer calculates the weights of each word in the hidden state vector ht that is obtained from the Bi-LSTM encoder–decoder. The similarity s between ht and h is calculated by the inner product and regularized by the Softmax function to obtain the weights of each word. Then, the weight sum layer calculates the weighted sum of a and ht and outputs the context vector c, as is shown in Figure 6.
The attention mechanism calculation process can be divided into three stages. Firstly, the similarity of the query and key is computed as follows:
S i m i l a r i t y ( Q u e r y , K e y i ) = ( Q u e r y K e y i ) ( | | Q u e r y | | ) . | | K e y | |
Then, normalization of the obtained weights is formulated as follows:
a i = S o f t m a x [ S i m i ] = e S i m i j = 1 L x e S i m j .
Finally, in the weighted summation of the sequence elements above, ai is represented as follows:
A t t e n t i o n ( Q u e r y , S o u r c e ) = i = 1 L x a i V a l u e i
Max pooling: To reduce information redundancy and prevent overfitting, the max pooling method is utilized to represent the questions and answers. The maxing pooling formula is as follows:
r j = max 1 < i < l P i , j ,
where rj denotes the fixed size vector representation of the jth element and l denotes the input length.

3.4. Similarity Calculation

In the context vector above, c is utilized to calculate the results for answer selection. For some questions that are small-scale and seek the relationship between the distribution of some topics quickly and coarsely, the LSI topic model is usually used. In this study, the LSI model was used to calculate the similarity of the training data. The calculation method is generally cosine similarity. It was adopted to determine the relevance of questions and answers. The formula is as follows:
cos ( V Q V A ) = V Q · V A | | V Q | | | | V A | | ,
where VQ is the question vector representation and VA is the answer vector representation.

4. Experiments and Results

In this section, we review the conducted experiments to validate the feasibility of our introduced model on answer selection. Section 4.1 describes experimental setup. Then, we introduce the evaluation metrics and baselines in Section 4.2 and Section 4.3. The results and detailed analysis are shown in Section 4.4. We implemented ablation analysis and analysis of background knowledge information in Section 4.5 and Section 4.6. Finally, error analysis is employed.

4.1. Experimental Setup

4.1.1. Dataset Building

To verify the effectiveness of the proposed method, we built a new knowledge-based QA dataset and evaluate the proposed approach. It is described briefly as follows:
Knowledge-based QA: The dataset is a self-built Chinese knowledge-based QA with 24k QA pairs corpus extracted from multiple resources, including various websites. The question words employed are mainly “what”, “why”, and “how” for non-factoid questions. We also added multiple QA pairs about healthcare from cMedQA, which is collected from an online Chinese medical question answering service. Since the knowledge-based QA contains numerous aspects of knowledge questions and answers, it is possible to verify the feasibility of our model in the answer selection. The dataset is characterized as shown in Table 1. The correct answer is marked in bold, among which 80% are training data and 20% are test data. The medical QA pair is shown in Table 2.
The knowledge-based text is named as knowledge.txt, which is shown in Table 3. Before testing, the knowledge-based test was applied by our model to learn the background of the QA pairs.

4.1.2. Data Pre-Processing

In this part, pre-processing was carried out on the created knowledge-based QA dataset according to the pre-constructed Chinese stop words text that includes “!”, “#”, “3” and other common Chinese words. The paper uses the jieba.cut() function for word splitting. The stop-words.txt is shown as Table 4.
Remove stop words: Jieba is a Chinese open-source word splitting tool with high performance, accuracy, scalability, and other features. In addition to word splitting, Jieba also can perform word annotation, keyword extraction, and word frequency statistics. Custom dictionaries: words that appear only once in the dataset are removed and stored in the text. The knowledge text and the sliced training text are used as the lexicon and corpus for subsequent operations.

4.1.3. Implementation Details

The experiments were performed on the Windows10 operating system, 4Gi video memory, NVIDIA GeForce MX450 graphics card, Intel(R)Core(TM) i7-1165G7 @2.80 GHz, and TensorFlow1.18 framework. The performance of the proposed model on the answer selection task was analyzed by conducting multiple sets of comparative experiments. The parameter settings are described in Section 4.4, along with the results of different parameter settings.

4.2. Evaluation Metric

Evaluation is a necessary task in natural language processing. The following evaluation metrics are used in this paper.
Accuracy is a metric widely used in the fields of information retrieval and statistical classification for evaluating the quality of the results. In natural language processing, accuracy is defined as the ratio of the number of samples correctly classified by the classifier to the total number of samples for a given test dataset, that is, the accuracy on the test dataset when the loss function is from 0 to 1. The calculation formula is as follows:
A c c u r a c y =   N u m b e r   o f   c o r r e c t   r e s u l t s   T o t a l   n u m b e r   o f   t e s t × 100 %
MAP refers to the average precision mean, which is the main evaluation metric for target detection algorithms. A higher MAP value indicates that the model is better at detecting in a given dataset.
M A P = 1 Q j = 1 | Q | 1 m j k = 1 m j P r e c i s i o n ( R j k )
In a Q&A task, the evaluation metric BLEU (bilingual evaluation understudy), a Chinese name for the bilingual translation quality aid, is usually used for machine translation and text summarization [32]. This metric is used to measure the degree of matching between the knowledge-based text and the question text and the candidate answers.
B L E U = B P × E X P ( n = 1 N W n l o g P n ) ,
where BP is the penalty factor and Pn is the N-gram precision.

4.3. Baseline Methods

Multiple answer selection models are employed as baselines in a knowledge-based QA in order to conduct the effectiveness of our introduced model.
  • Bi-LSTM A Bi-LSTM neural network is employed as feature extraction for QA pairs. The model is one of the baselines of a knowledge-based dataset.
  • Double Bi-LSTM This model proposes double Bi-LSTM to learn the features of given questions and candidate answers. The model is one of the baselines of a knowledge-based dataset.
  • Attentive Bi-LSTM The model integrates an attention mechanism into the Bi-LSTM model for the purpose of improving the semantic understanding of questions. The cosine similarity is employed for calculation. The model is one of the baselines of a knowledge-based dataset.
  • Multi-scale CNN [33] The multi-scale CNN model constructs different sizes of feature maps to extract the semantic information from the text. This model is one of the baselines of a medical QA.
  • QA-CLWR [34] The method proposes a collaborative learning-based answer selection model (QA-CL), where a parallel training architecture is deployed to collaboratively learn the initial word vector matrix of the sentence by CNN and bidirectional LSTM (Bi-LSTM) at the same time.
  • Attentive LSTM [35] The method combines attention and LSTM to extract the contextual representation for the question and answer separately, which is the baseline of accuracy.
  • Multi-Level Composite CNNs (MCCNN) [36] This method presents stacking CNNs with different sizes to capture multi-level semantic information, which is the baseline of accuracy. The output of each CNN layer is generated by max pooling, and, finally, they are concatenated for prediction.
  • BiCNN [37] This method proposed the average pooling with a bigram CNN model, which is the baseline of MAP.
  • ABCNN [38] The authors propose an attention-based CNN, which is the baseline of MAP.
  • Reference [39] The authors utilized LSTM-based deep learning models for non-factoid answer selection, which is the baseline of MAP.

4.4. Results and Analysis

4.4.1. Experimental Results with Different Parameter Settings

In the proposed model, two layers of self-attention were used to obtain the global sentence vectors. One layer of convolution and one layer of attention were combined to circulate. If the length of the question and answer are beyond 100, the output is also a vector of fixed length. When the training of the model is finished, the accuracy and BLEU values of the model are output. Table 5 shows the results of accuracy of our model with different parameters.
Rows A to D are the results of our model with different parameters. The first parameter refers to the learning rate and the next is the hidden size. The learning rate for the parameter update is determined by trying different learning rates. Different learning rates were tried in steps of 10 until the cost function converged to the lowest position. The hidden size was gradually increased to reach the operating range of the computer. The more neurons, the better the answer selection. The best results are obtained when the value of similar knowledge k is set to 5 and the dropout is set to 0.35. It can be seen from Table 5 that compared to rows A, B, and C, row D achieved the best result when the learning rate was set to 0.3 and the hidden size was 130.
Comparative experiments were performed on the datasets so as to compare the effects of different self-attention layers. The performance of different self-attention layers on our model is shown in Figure 7. As is revealed by Figure 7, when the number of layers was two, we observed the greatest impact on our answer selection model. As the number of layers increased, there was not a significant impact on our model. Therefore, two layers of self-attention is the best choice in the model.

4.4.2. Experimental Results of a Knowledge-Based QA Dataset

In order to further validate the effectiveness of the proposed model in this paper, three comparison models were selected to compare with our model in knowledge-based QA datasets. The experimental results are shown in Table 6. For rows A to D, the learning rate was 0.3 and the hidden size was 130. The results indicate that the proposed method improves 6% on accuracy and BLEU declines to 3.10.
The Bi-LSTM model received the state of the previous layer and maintained the sequential features to make semantic features more accurate. The double Bi-LSTM model adopted Bi-LSTM to extract the semantic feature. Then, it used Bi-LSTM to extract contextual information. The results are slightly improved over the single experiment. The results of the attentive Bi-LSTM are significantly improved than that of rows A and B. This shows that the calculation of the weight of each word is helpful to improve the question–answer matching. Row D uses multiple CNNs to extract the semantic information from the text in medical QA pairs.
On the basis of row C, the proposed method adopts self-attention to extract global features of a sentence before continuing to next phase. It can better capture the semantic association between answers and questions. The experiments illustrate that self-attention outperforms basic neural networks in answer selection tasks.

4.4.3. Experimental Results of Different Datasets

For the accuracy metric, we compare our model with three baselines using different datasets in different areas, such as insurance, laws, and so on. The experimental results are shown in Figure 8. Compared with these baselines, which use a neural network or multi-networks for answer selection, our model could achieve better performance with an accuracy of 71.4%.
For the MAP metric, we compared our method with three more baselines. As shown in Figure 9, the improved model is more accurate in extracting location information. The Bi-LSTM gating mechanism incorporates the features extracted by self-attention and the feature representation extracted by convolutional attention, which contains both global information and local features and improves the feature extraction capability. Therefore, the semantic feature extraction ability of the model in this paper is stronger in building a better answer selection model. The feasibility of the model is verified.

4.5. Ablation Analysis

In order to analyze the effectiveness of each component of the proposed method on answer selection, we removed one component of our model and evaluated the rest of the model on knowledge-based QA datasets.
Table 7 shows the performance of our model and its ablation, where “Ours without Bi-LSTM” means the answer selection model uses the attention mechanism to process the data. “Ours without attention” means the answer selection model uses self-attention to extract the global features, then uses the Bi-LSTM neural network to circulate the data. “Ours without self-attention” means the answer selection model uses an attentive Bi-LSTM to process the data.
Bi-LSTM can better capture the semantic dependencies in both directions. When adding an attention mechanism, attention can ignore the noise information to improve the performance of Bi-LSTM. Self-attention is used as a feature extraction tool before data circulating, and the outputs include global information for each word of the vector. The proposed model uses self-attention to weigh each word before circulating. The faster the convergence speed is, the more accurate the extraction of location information. We can see from Table 7 that each component in our model could make a good performance.

4.6. Analysis of Background Knowledge Information

We randomly chose a case form knowledge-based QA dataset to validate the effectiveness of background knowledge information. The question, candidate answers, and knowledge are shown in Figure 10. The shade of color marks the importance of the word. The darker the color, the more important.
The question is “I have inflammation in my throat, how do I relieve it?”. The difference between the selected candidate answers is that one has background knowledge and the other does not. It can be observed that our model with knowledge not only pays much attention to these words that are related to the question, but also takes knowledge information into account. The result indicates that background knowledge can help the model learn semantic information better.

4.7. Error Analysis

In order to improve the performance of the proposed method, zero failure cases were analyzed and these error cases were classified into the following categories, which can be used as references to further improve the performance of the model in the long run.
Incomplete knowledge: (≈35%) There is no corresponding knowledge base in the given knowledge, resulting in a mismatch between the question and knowledge. This is common in knowledge-based datasets, which reduces the accuracy of the model.
Unbalanced information: (≈25%) It is difficult to match the question-and-answer pairs because of information imbalance. Some answers in the datasets are short and simple, such as “Of course” or “In the west” in knowledge-based QA. Some questions and answers are too long to understand. Therefore, the model cannot determine the right answer.
Wrong spelling or labeling: (≈20%) Some questions have two answers, but only one is labeled correct. The model cannot understand because of misspellings. These are common in datasets. This indicates that the accuracy of the knowledge and datasets construction is essential.
Other issues: (≈20%) There are still some issues t that have not appeared and been analyzed, such as the equipment setup, training process, and noise issues.

5. Conclusions

This paper proposes a novel refined answer selection method with attentive bidirectional long short-term memory network and a self-attention mechanism for an intelligent medical service robot. Firstly, the required knowledge-based text was constructed as background information and converts the question and answer from words to vectors, separately. Secondly, self-attention mechanism was adopted to extract the global features from the vectors. Thirdly, an attentive Bi-LSTM network was designed to better understand semantic information of the dataset and calculate the similarity between the question and answer considering background knowledge information. To evaluate the performance of the proposed approach, this study constructed a knowledge-based QA dataset including multiple medical QA pairs and conducted a series of experiments to verify the effectiveness of the proposed method. Through the experiments, the proposed method improved 6% in accuracy and decreased the BLEU indicator to 3.10. MAP could reach 68.6%. It effectively verifies its feasibility for answer selection in intelligent Q&A systems, which is crucial for a medical service robot to realize an intelligent question answering service.
In future work, we will perform transfer learning to see whether the proposed method can be applied to other lower-level applications in the natural language processing domain, such as machine translation and sentiment analysis. We will try to construct a knowledge graph (KG) as our background information to mine hidden relations beyond the question and answer. We will also further explore the impact of other existing pre-training models on answer selection, such as GPT, BERT, and so on. We will further expand the medical service data through python crawlers and build a special medical dataset to study the proposed answer selection task deeply for a medical service robot.

Author Contributions

Conceptualization, D.W. and Y.L.; methodology, D.W.; software, Y.L.; validation, H.M. and F.X.; formal analysis, F.X.; investigation, Y.L.; resources, H.M.; data curation, D.W.; writing—original draft preparation, D.W. and Y.L.; writing—review and editing, H.M. and F.X.; visualization, H.M.; supervision, F.X.; project administration, D.W; funding acquisition, F.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China (No. 61902051), by the International Science and Technology Cooperation Program of Liaoning Province (No.2022JH2/10700012), by the Applied Basic Research Program of Liaoning Province (No.2022JH2/101300269), by the Dalian Science and Technology Innovation Fund (No. 2021JJ12GX013, No.2021JJ12GX014), and by the Research Foundation of Liaoning Province (No. LJKQZ20222447).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ahmed, S.T.; Kumar, V.; Kim, J. AITel: eHealth Augmented Intelligence based Telemedicine Resource Recommendation Framework for IoT devices in Smart cities. IEEE Internet Things J. 2023, 1. [Google Scholar] [CrossRef]
  2. Ahmed, S.T.; Thouheed, S.; Kumar, V. 6G enabled federated learning for secure IoMT resource recommendation and propagation analysis. Comput. Electr. Eng. 2022, 102, 108210. [Google Scholar] [CrossRef]
  3. Zhang, L.; Li, F.; Wang, P. A blockchain-assisted massive IoT data collection intelligent framework. IEEE Internet Things J. 2021, 9, 14708–14722. [Google Scholar]
  4. Li, F.; Liu, K.; Zhang, L. EHRChain: A Blockchain-based EHR System Using Attribute-Based and Homomorphic Cryptosystem. IEEE Trans. Serv. Comput. 2021, 15, 2755–2765. [Google Scholar]
  5. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  6. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 3104–3112. [Google Scholar]
  7. Hou, J.Z.; Zhang, S.Y.; Yu, K. Algorithm of answer sentence selection based on Q and A interaction. Comput. Mod. 2021, 1, 120–126. [Google Scholar]
  8. Maulud, D.H.; Ameen, S.Y.; Omar, N. Review on natural language processing based on different techniques. Asian J. Res. Comput. Sci. 2021, 10, 1–17. [Google Scholar] [CrossRef]
  9. Bengio, Y.; Ducharme, R.; Vincent, P. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar]
  10. Collobert, R.; Weston, J.; Bottou, L. Natural language processing from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
  11. Rush, A.M.; Chopra, S. A neural attention model for abstractive sentence summarization. arXiv 2015, arXiv:1509.00685. Available online: https://arxiv.org/pdf/150900685.pdf (accessed on 14 March 2022).
  12. Xu, K.; Ba, J.; Kiros, R. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
  13. Wang, Y.; Huang, M.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
  14. Liu, Z.; Lin, W.; Shi, Y. A Robustly Optimized BERT Pre-training Approach with Post-training. In Proceedings of the China National Conference on Chinese Computational Linguistics, Hohhot, China, 12–15 August 2021; pp. 471–484. [Google Scholar]
  15. Özçift, A.; Akarsu, K.; Yumuk, F. Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): An empirical case study for Turkish. Autom. Časopis Autom. Mjer. Elektron. Računarstvo Komun. 2021, 62, 226–238. [Google Scholar] [CrossRef]
  16. Neutel, S.; de Boer, M.H.T. Towards Automatic Ontology Alignment using BERT. In Proceedings of the AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering, Palo Alto, CA, USA, 22–24 March 2021. [Google Scholar]
  17. Grail, Q.; Perez, J.; Gaussier, E. Globalizing BERT-based transformer architectures for long document summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Kiev, Ukraine, 21–23 April 2021; pp. 1792–1810. [Google Scholar]
  18. Liu, D.L.; Niu, Z.D. Multi-Scale Deformable CNN for Answer Selection. IEEE Access 2019, 7, 164986–164995. [Google Scholar] [CrossRef]
  19. Liu, Y.H.; Yang, B. Bi-LSTM-based Natural Language Q&A for Marriage Law. Comput. Eng. Des. 2019, 1000–7024. [Google Scholar]
  20. Hanifah, A.F.; Kusumaningrum, R. Non-Factoid Answer Selection in Indonesian Science Question Answering System using Long Short-Term Memory (LSTM). Procedia Comput. Sci. 2021, 179, 736–746. [Google Scholar] [CrossRef]
  21. Wakchaure, M.; Kulkarni, P. A Scheme of Answer Selection in Community Question Answering Using Machine Learning Techniques. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 879–883. [Google Scholar]
  22. Vaswani, A.; Shaxeer, N.; Parmar, N. Attention Is All You Need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  23. Liu, Y.; Bao, Z.; Zhang, Z. Information cascades prediction with attention neural network. Hum.-Cent. Comput. Inf. Sci. 2020, 10, 13. [Google Scholar] [CrossRef]
  24. Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
  25. Bahdanau, D.; Cho, K.H.; Bengio, Y. Neural Machine Transition by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representation, San Diego, CA, USA, 7–9 May 2015; Volume 1049, p. 0473. Available online: https://arxiv.org/pdf/1409.0473.pdf (accessed on 20 February 2023).
  26. Yu, S.; Wang, Y.B. NAIRS: A Neural Attentive Interpretable Recommendation System. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM), Melbourne, Australia, 11–15 February 2019; pp. 786–789. [Google Scholar] [CrossRef]
  27. Cu, J.Y. A Study of Answer Selection Ranking Based on Attention Mechanism; Northern Polytechnic University: Grande Prairie, AB, Canada, 2020. [Google Scholar]
  28. Xu, D.; Ji, J.H.; Huang, H.K. Gated Group Self-attention for Answer Selection. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2019, 1905, 10720. [Google Scholar]
  29. Chen, X.; Yang, Z.; Liang, N. Co-attention fusion based deep neural network for Chinese medical answer selection. Appl. Intell. 2021, 51, 6633–6646. [Google Scholar] [CrossRef]
  30. Mikolov, T.; Sutskever, I.; Chen, K. Distributed representations g words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing System, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 2, pp. 3111–3119. [Google Scholar]
  31. Srivastava, N.; Geoffrey, E. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  32. Papineni, K.; Roukos, S. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association-for-Computational-Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
  33. Zhang, S.; Zhang, X.; Wang, H.; Cheng, J.; Li, P.; Ding, Z. Chinese medical question answer matching using end-to-end character-level multiscale CNNs. Appl. Sci. 2017, 7, 767. [Google Scholar] [CrossRef]
  34. Shao, T.H.; Kui, X.Y.; Zhang, P.E.; Chen, H.H. Collaborative Learning for Answer Selection in Question Answering. IEEE Access 2019, 7, 7337–7347. [Google Scholar] [CrossRef]
  35. Tan, M.; dos Santos, C.N.; Xiang, B. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 7–12. [Google Scholar]
  36. Ye, D.; Zhang, S.; Wang, H. Multi-level composite neural networks for medical question answer matching. In Proceedings of the Third IEEE International Conference on Data Science in Cyberspace, Guangzhou, China, 18–21 June 2018; pp. 18–21. [Google Scholar]
  37. Yang, Y.; Yih, W.T.; Meek, C. WikiQA: A challenge dataset for open domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, 17–21 September 2015; pp. 2013–2018. [Google Scholar]
  38. Yin, W.; Schütze, H.; Xiang, B.; Zhou, B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 2016, 4, 259–272. [Google Scholar] [CrossRef]
  39. Tan, M.; Xiang, B.; Zhou, B. LSTM-based deep learning models for non-factoid answer selection. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016; Available online: https://arxiv.org/pdf/1511.04108.pdf (accessed on 20 February 2023).
Figure 1. One-way Bi-LSTM structure diagram.
Figure 1. One-way Bi-LSTM structure diagram.
Applsci 13 03016 g001
Figure 2. The architecture of proposed answer selection method.
Figure 2. The architecture of proposed answer selection method.
Applsci 13 03016 g002
Figure 3. The dropout operation in the neural network. The dashed lines indicate randomly deleted neurons to prevent overfitting.
Figure 3. The dropout operation in the neural network. The dashed lines indicate randomly deleted neurons to prevent overfitting.
Applsci 13 03016 g003
Figure 4. The process of Bi-LSTM circulation.
Figure 4. The process of Bi-LSTM circulation.
Applsci 13 03016 g004
Figure 5. Attention operation depend on Bi-LSTM.
Figure 5. Attention operation depend on Bi-LSTM.
Applsci 13 03016 g005
Figure 6. Attention mechanism.
Figure 6. Attention mechanism.
Applsci 13 03016 g006
Figure 7. Accuracy changes with the number of self-attention layers.
Figure 7. Accuracy changes with the number of self-attention layers.
Applsci 13 03016 g007
Figure 8. Accuracy performance of datasets.
Figure 8. Accuracy performance of datasets.
Applsci 13 03016 g008
Figure 9. MAP performance of datasets.
Figure 9. MAP performance of datasets.
Applsci 13 03016 g009
Figure 10. The attentive visualization of background knowledge effectiveness in answer selection. The shade of color marks the importance of the word. The darker the color, the more important.
Figure 10. The attentive visualization of background knowledge effectiveness in answer selection. The shade of color marks the importance of the word. The darker the color, the more important.
Applsci 13 03016 g010
Table 1. Example of knowledge-based QA.
Table 1. Example of knowledge-based QA.
QuestionAnswers
What is the main reason why fog tends to form on clear nights from late fall to early spring of the following year?Right: Atmospheric inverse radiation is weak on clear nights, and cooling near the ground is rapid.
False: The amount of water vapor in the atmosphere is high on clear days.
False: Evaporation of water vapor from the ground is strong on clear nights.
False: There is less condensed nuclei material in the clear atmosphere.
Table 2. Example of a medical QA pair in knowledge-based QA.
Table 2. Example of a medical QA pair in knowledge-based QA.
QuestionAnswers
After touching the small animals at the zoo yesterday, I had itchy skin and an allergic reaction. What should I do?Right: Hello. Your situation is a manifestation of a skin allergy. The most common treatment is anti-allergy treatment, such as taking loratadine vitamin c and ketotifen. You also need to drink more water and do not eat spicy food.
False: Measles and chickenpox are two different diseases. For example, if measles is prevalent in the area, vaccination is recommended.
Table 3. Example of knowledge.
Table 3. Example of knowledge.
Knowledge
The original Earth’s atmosphere was composed mainly of carbon dioxide, carbon monoxide, methane, and ammonia, and lacked oxygen, which was not suitable for biological survival.
After a long evolutionary process, the Earth’s atmosphere was transformed into an atmosphere suitable for biological respiration, with nitrogen and oxygen as the main components.
Oxygen at high altitudes in the Earth’s atmosphere is synthesized into ozone under the sun’s ultraviolet light, forming the ozone layer.
Table 4. Examples of stop words. (一些: some 下: below).
Table 4. Examples of stop words. (一些: some 下: below).
Stop Words
!     #     +     &     a]     B     W     R     1     3     5     9     一些     下
Table 5. Experimental results with different parameters.
Table 5. Experimental results with different parameters.
RowLearning RateHidden SizeAccuracyBLEU
A0.79063.6%5.70
B0.611069.4%4.80
C0.312070.6%3.00
D0.313071.4%3.10
Table 6. Experimental results on Knowledge-based QA dataset.
Table 6. Experimental results on Knowledge-based QA dataset.
RowMethodAccuracyBLEU
ABi-LSTM59.6%5.35
BDouble Bi-LSTM57.4%5.60
CAttentive Bi-LSTM62.7%3.22
DMulti-scale CNN64.7%--
EOurs71.4%3.10
Table 7. Ablation experimental results.
Table 7. Ablation experimental results.
RowMethodAccuracyBLEU
AOurs without Bi-LSTM59.6%5.35
BOurs without attention61.6%4.20
COurs without self-attention62.7%3.22
DOurs71.4%3.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, D.; Liang, Y.; Ma, H.; Xu, F. Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot. Appl. Sci. 2023, 13, 3016. https://doi.org/10.3390/app13053016

AMA Style

Wang D, Liang Y, Ma H, Xu F. Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot. Applied Sciences. 2023; 13(5):3016. https://doi.org/10.3390/app13053016

Chicago/Turabian Style

Wang, Deguang, Ye Liang, Hengrui Ma, and Fengqiang Xu. 2023. "Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot" Applied Sciences 13, no. 5: 3016. https://doi.org/10.3390/app13053016

APA Style

Wang, D., Liang, Y., Ma, H., & Xu, F. (2023). Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot. Applied Sciences, 13(5), 3016. https://doi.org/10.3390/app13053016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop