Next Article in Journal
SemaTopic: A Framework for Semantic-Adaptive Probabilistic Topic Modeling
Previous Article in Journal
Integration of Information and Communication Technology in Curriculum Practices: The Case of Preservice Accounting Teachers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching

Guangxi Key Lab of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Computers 2025, 14(9), 399; https://doi.org/10.3390/computers14090399
Submission received: 25 June 2025 / Revised: 26 August 2025 / Accepted: 8 September 2025 / Published: 19 September 2025

Abstract

Question-answering systems have become an important tool for learning and knowledge acquisition. However, current answer selection models often rely on representing features using whole sentences, which leads to neglecting individual words and losing important information. To address this challenge, the paper proposes a novel answer selection model based on focus fusion of multi-perspective word matching. First, according to the different combination relationships between sentences, focus distribution in terms of words is obtained from the matching perspectives of serial, parallel, and transfer. Then, the sentence’s key position information is inferred from its focus distribution. Finally, a method of aligning key information points is designed to fuse the focus distribution for each perspective, which obtains match scores for each candidate answer to the question. Experimental results show that the proposed model significantly outperforms the Transformer encoder fine-tuned model based on contextual embedding, achieving a 4.07% and 5.51% increase in MAP and a 1.63% and 4.86% increase in MRR, respectively.

1. Introduction

Community question answering (CQA) platforms are very popular for learning due to their flexibility and convenience. However, low-quality questions and irrelevant answers often plague these platforms, leading to a frustrating experience for users, who spend a significant amount of time retrieving useful information. Answer selection techniques have been developed in CQA platforms to filter out low-quality or low-matching answers. For example, Stack Overflow employs a recently optimized answer selection algorithm that rapidly recommends high-quality answers to users [1]. However, QA requires recognition of intention and logical reasoning capabilities, which makes it more difficult than other natural language processing tasks such as text classification, machine translation, and sequence labeling [2].
The current educational model is transforming towards openness and diversity, which promotes the continuous iterative updates of CQA. The open and diverse teaching characteristics necessitate that answer selection technology pays more attention to interactive effects and the tracing of complex causal relationships, which further tests the model’s intent recognition and reasoning abilities [3]. One of the most effective ways to enhance these abilities is through interaction, which can build communication between sentences, ensuring the consistency of information and improving the model’s representation ability of the text [4]. However, the advantage of interaction is limited to expanding the scope of information from a single sentence to multiple sentences, and cannot endow the model with general semantic understanding ability. To address this issue, pre-trained models such as ELMo [5], BERT [6], GPT [7] and the general purpose large language models they derive have emerged, which store a considerable amount of general language information through parameter storage before downstream tasks are given. Therefore, a question answering system with a pre-training model has a stronger sentence comprehension ability. However, most QA statements, especially in open-domain QA [8], contain multiple and correlated features that cannot be addressed by interaction techniques and pre-training methods alone. In recent years, many scholars have proposed multi-process and multi-level analysis models for answer selection tasks, inspired by the human-like thinking of reasoning and analysis that involves multiple points, aspects, and processes. These models incorporate additional processes for sentence information extraction and feature recognition, naturally enhancing the reasoning abilities of the models. However, existing answer selection models typically rely on sentence-level features, which may obscure the information of key vocabulary in the sentence. For example, in the question “When and where did Lucy and Bob meet in London and for how long did they chat?”, there are three key points: “when”, “where”, and “how long”. However, if the analysis is conducted based on the entire sentence, the model may overlook one or more of these key points.
This paper proposes an answer selection model called the Fusing Multi-perspective Word Matching Focus (FMWMF) model, which is based on sentence words. Drawing inspiration from the current multi-process and multi-level analysis approach, the FMWMF model calculates the focus distribution results of the question and answer from three matching perspectives, serial, parallel, and transfer between QA sentences, and then calculates the matching degree between questions and answers from the information focus distribution of vocabulary. The model not only complements the idea of answer selection algorithm research from vocabulary, but also makes it easier for the model to focus on the key information inside the sentence and complete the tracing of complex causal relationships. Next, we will introduce some related work in Section 2. Section 3 describes the model structure in detail, Section 4 verifies the model effect, and Section 5 summarizes the paper.

2. Related Work

QA systems have gained popularity across various fields, with educational QA systems being a prime example that has attracted extensive research attention. For instance, in study [9], a QA system was designed for high school education by integrating knowledge graphs, intelligent QA, and big data technology. This system not only provides timely and accurate answers but also offers feedback on students’ learning progress. Meanwhile, study [10] proposes an intelligent QA model based on user search behavior, which enhances the model’s ability to extract keywords by utilizing the regularities between user cognition and library elements, thereby improving the quality of the generated answers. In addition, some scholars have applied QA models to automatic grading systems for subjective questions [11]. The foundation for implementing these QA systems is answer selection, which is crucial. For example, in study [12], a deep learning-based intelligent QA system was proposed for railway technical specifications, which accurately understands user intent in a specific domain and improves the success rate of semantic matching. Meanwhile, study [13] points out that, while the answer selection model based on deep neural networks is better than most machine learning methods, machine learning methods have strong interpretability [14,15,16], which can assist deep learning models in improving the rationality of answers. Early deep learning models lacked interactivity. For instance, two studies [17,18] proposed answer selection models based on LSTM and CNN, respectively, both of which focused on representing sentence features while ignoring the role of interaction in sentence matching. However, with the introduction of compare-aggregate frameworks into sentence matching models [4], many answer selection models have started to utilize different interaction mechanisms to enhance the model’s ability to represent sentences. For instance, study [19] proposes a bidirectional matching model for questions and answers, enabling the model to identify sentence matching levels from different sentences. Additionally, study [20] utilizes self-attention mechanisms and the RoBERTa model to build a health education system, achieving impressive results in non-descriptive question answer generation. Of course, the idea of interaction also directly affects the training methods of today’s large models [21] and API calls [22]. Apart from interaction, adding pre-training structures is an effective approach to enhancing the model’s ability to represent sentences. For example, study [23] uses sample replacement instead of masking to train language models, resulting in similar results to BERT with less than a quarter of the computational cost. Meanwhile, study [24] utilizes contextual word embeddings and a Transformer encoder (CETE) to model sentence similarity in answer selection. The authors proposed an answer selection model based on fine-tuning BERT and achieved the best answer selection performance on RoBERTa. At present, the most popular approach is to use general purpose large language models and set appropriate prompts to complete the automatic judgment and output of answers [25].
Answer selection models currently employ multi-process and multi-level strategies to mimic human reasoning processes from shallow to deep levels. For instance, study [26] proposed a staged reading comprehension model that involves rough reading to obtain key information points, followed by reinforced reading to verify the answer and make the final prediction. In most cases, a question has more than one key information point, and the answer must contain multiple points related to it. To address this, study [8] used a multi-hop network that iteratively scanned each possible information point in the sentence using attention mechanisms and summed up the matching results to obtain the final QA matching score. Additionally, study [27] proposed a multi-segment interaction matching model that interacts with different segment combinations, capturing more enriched semantics than previous models that interact with isolated words. Furthermore, study [28] proposed a sentence matching model with multi-turn reasoning that focuses on matching features in each turn and uses a memory component to connect the reasoning results of multiple rounds, providing the ability to reason through complex problems at multiple levels. At present, the most modified multi-round reasoning method is COT based on large language models, which completes a very challenging task by role-playing multiple large models [29]. Alternatively, a single MoE-structured model trained with a tailored reinforcement-learning regimen can be employed to obtain exceptional long-horizon reasoning capabilities [30].

3. Model Architecture

Figure 1 illustrates the overarching framework of FMWMF. In this framework, the input question Q = { w 1 q , w 2 q , w 3 q , , w n q } is assumed to have a length of n, while the input answer A = { w 1 a , w 2 a , w 3 a , , w m a } is assumed to have a length of m. Here, w n q and w m a represent the n-th word in the question and the m-th word in the answer, respectively. The first step is to input the (Q,A) pair into three different models: the serial matching model, parallel matching model, and transformational matching model. These three models map sentence-to-sentence coherence, relevance, and logical information into the vocabulary in the sentence, and form an information focus. Subsequently, we extract key information points and their positions in the sentence based on the focus distribution results. Finally, we fuse the word focus matching results from different matching perspectives to obtain the matching score of the input question–answer pair. Based on this score, we can compare and sort the potential answers, ultimately arriving at the best answer. The detailed implementation process of the entire model can be found in Algorithm 1 and the corresponding model design schematics diagram, as shown in Figure 2.
Algorithm 1: Pseudo-code for implementing the FMWMF
Input: train set  T = { ( q , a , l a b ) } , test set  V = { ( q , a , l a b ) } , hyperparameters of each matching model  H s H p H t , and the epoch e.
1:   θ s θ p θ t < -Initialize model parameters
2:  #Step 1: train matching models respectively
3:  for epoch   e do
4:    for  { ( q , a , l a b ) } T mini_batch  do
5:       { o s } mini_batch = f θ s ( { q , a } mini_batch ) , l = l o s s s ( { o s } mini_batch , { l a b } mini_batch ) ;
6:       { o p } mini_batch = f θ p ( { q , a } mini_batch ) , l = l o s s p ( { o p } mini_batch , { l a b } mini_batch ) ;
7:       { o t } mini_batch = f θ t ( { q , a } mini_batch ) , l = l o s s t ( { o t } mini_batch , { l a b } mini_batch ) ;
8:      Update θ s θ p θ t  by backpropagation
9:    end for
10:    { ( q , a , l a b ) } V ;
11:    { o s } t e s t = f θ s ( { q , a } t e s t ) , a c c s = c o m p ( { o s } t e s t , { l a b } t e s t ) ;
12:    { o p } t e s t = f θ p ( { q , a } t e s t ) , a c c p = c o m p ( { o p } t e s t , { l a b } t e s t ) ;
13:    { o t } t e s t = f θ t ( { q , a } t e s t ) , a c c t = c o m p ( { o t } t e s t , { l a b } t e s t ) ;
14:    a c c s , θ s = max θ s ( a c c , a c c s ) ;
15:    a c c p , θ p = max θ p ( a c c , a c c p ) ;
16:    a c c t , θ t = max θ t ( a c c , a c c t ) ;
17: end for
18: Save θ s θ p θ t ;
19: #Step 2: Utilize each matching model to extract the focus distribution of words in a sentence, and output the question–answer matching score.
20:  d i s s = d θ s ( q , a ) , d i s p = d θ p ( q , a ) , d i s t = d θ t ( q , a ) ;
21:  x s , x p , x t R k = t o p K ( d i s s , d i s p , d i s t ) ;
22:  s = i m 1 , m 2 { s , p , t } x m 1 [ i ] x m 2 [ i ] 3 K
23: Output s

3.1. Serial Matching Structure

Serial matching is a technique used in QA that involves combining a question and its corresponding answer into a single word sequence. This concatenated sequence is then fed into a classifier model to determine whether the two sentences form a valid QA pair. Figure 3 illustrates the architecture of the serial matching model used for QA.
In this approach, a classifier model based on BERT is utilized. To begin, the input consists of a question Q = { w 1 q , w 2 q , , w n q } with a length of n and an answer A = { w , 1 a w 2 a , , w m a } with a length of m. The n-th word in the question sentence and the m-th word in the answer sentence are represented as w n q , w m a and, respectively. The question and answer are then concatenated using [CLS] and [SEP] tokens to create a token sequence, which is then passed through the BERT model to generate a feature vector c R 1 × d for the classifier. Additionally, dynamic word vectors q c R n × d , a c R m × d are obtained for the question and answer sentences, as demonstrated in Equation (1):
c , q c , a c = b e r t ( [ c l s ] , w 1 q , w 2 q , , w n q , [ s e p ] , w 1 a , w 2 a , , w n a }
After obtaining the classification feature vector, the next step is to use a multilayer perceptron (MLP) model to map it to a binary classification label. This is represented by Equation (2):
y = c · W 1 · W 2
where y R 1 × 2 represents the prediction vector, in which the first element of the vector represents the probability that there is no correlation between sentences, and the second element represents the probability that there is a correlation. W 1 R d × k and W 2 R k × 2 are parameter matrices that need to be learned. The loss function for training the serial matching model is shown in Equation (3):
l = ( ( 1 y l a b l e ) log y 1 + y l a b e l log y 2 )
where y 1 , y 2 respectively represent the first and second elements of the vector, and y l a b e l { 0 , 1 } represents the true label of whether there is a relationship between the question–answer pairs. W 1 ,   W 2 records the correlation information between the question–answer sentences. Then, we can calculate the focus of words in a sentence with W 1 ,   W 2 and q c ,   a c , mapping each word to a result related to the classification, as shown in Equations (4) to (5):
y q = q c · W 1 · W 2
y a = a c · W 1 · W 2
where y q R n × 2 , y a R m × 2 and because the matching focus is on the contribution of each word to the relationship, the final matching focus distribution between the question and answer is shown in Equations (6) to (7):
s q = y q [ : 1 ]
s a = y a [ : 1 ]

3.2. Parallel Matching Structure

Parallel matching is a technique that uses word embedding models to obtain embeddings for both the question and answer sentences, as well as for individual words. This approach allows us to calculate the cosine similarity between the sentence embedding of the question and each vector of words in the answer, resulting in the focus matching distribution of the answer. Similarly, we can derive the distribution of the word focus of the question. Figure 4 illustrates the structure of the parallel matching approach.
The embedding model is based on SBERT, which is typically trained using two structures: either a Siamese network or triplet network [31]. Here, we directly utilize pre-trained open-source models and their corresponding parameters available on the internet. Regarding question Q = { w 1 q , w 2 q , , w n q } and answer A = { w , 1 a w 2 a , , w m a } , inputting them into the SBERT will yield the sentence embeddings for both the question and answer, as well as the word embeddings for each word in the sentence, as shown in Equations (8) and (9):
e q = s b e r t ( w 1 q , w 2 q , , w n q )
e a = s b e r t ( w , 1 a w 2 a , , w m a )
where e q R 1 × d ,   e a R 1 × d are the embedding results of sentences, and d represents the dimension of the sentence embedding vector. After inputting the question and answer into the BERT model, the dynamic word embeddings of each word can be obtained, as shown in Equations (10) and (11):
o q = b e r t ( w 1 q , w 2 q , , w n q )
o a = b e r t ( w , 1 a w 2 a , , w m a )
where o q R d × n represents the dynamic word embedding vector of each word in the question sentence, and o a R d × m represents the dynamic word embedding vector of each word in the answer sentence. After obtaining e q ,   e a , o q ,   o a , the word matching focus distribution under parallel structure can be obtained through Equations (12) and (13):
p q = ( e a · o q ) | o q |
p a = ( e q · o a ) | o a |
where p q and p a respectively represent the question word focus distribution result and the answer word focus distribution result; | o q | R 1 × n , and its n-th element is | o q | n = 1 | o q [ : n ] | | e a | , and the values of each element in | o a | can be obtained in the same way; represents element-wise multiplication.

3.3. Transformational Matching Structure

Transformational matching is a technique that achieves question–answer matching through translation between questions and answers. The general process of this matching is shown in Figure 5.
The matching structure shown in Figure 4 includes two seq2seq models, each of which contains an encoder and a decoder. Using the above structure, the computation of the question focus distribution result is a process of answer-to-question transformation. The first step is to encode the answer A = { w , 1 a w 2 a , , w m a } , and the encoding result is shown in Equation (14):
o a e n = e n c o d e r ( w 1 a ,   w 2 a ,   ,   w m a )
where o a e n represents the encoding matrix of the answer, h a e n and denotes the feature vector of the answer. The encoding matrix o a e n output by the encoder and the feature vector h a e n are transformed into decoding states through two fully connected layers, as shown in Equation (15):
o a d e = w o d e · o a e n
where o a d e , h a d e are the encoding matrix and feature matrix vector of the answer in the decoding state. W o d e R h d × h e and W h d e R h d × h e are the parameter matrices that need to be learned; h d , h e are the sizes of the decoder and encoder hidden layers. After encoding the input answer text, h a d e it is input to the decoder layer, and the input question Q = { w 1 q , w 2 q , , w n q } is encoded using the GRU component in the decoder, resulting in the decoding matrix and feature vector o q d e , h q d e for the question, as shown in Equation (16):
o q d e = d e c o d e r ( w 1 q , w 2 q , . . . , w n q , o a d e )
Afterwards, the outputs o q d e , o a d e from the encoder and decoder are used to calculate the attention level of each word in the question on the answer, as shown in Equation (17):
a t t = o q d e · o a d e
The encoding matrix o a e n of the answer in the encoding state is transformed into the encoding matrix o q e n of the question in the encoding state after passing through the attention layer, as shown in Equation (18):
o q e n = a t t · o a e n
Then, using Equation (19), we can obtain the focus distribution of the question.
t q = o a e n · o q e n
where t q represents the word matching focus distribution of the question under the transformational matching structure. Finally, to train the parameters W o d e , W h d e , the loss value needs to be calculated using the decoder’s output and the true question. The state matrices of the question encoding state and the decoding state are connected to the encoding matrix o q e n , o q d e , and after passing through two fully connected layers, the decoding output w o u t of the answer is obtained, as shown in Equation (20):
w o u t = W o u t · ( W d e · ( o q e n ; o a d e ) )
where w o u t R 1 × | V | ( | V | represents the size of the vocabulary) W o u t R | V | × | h | and W o u t R | V | × | h | are both parameter matrices that need to be learned, and | h | is the size of the hidden layer output by the first fully connected layer. Finally, the training sentence-level word pair matching loss function L is shown in Equation (21):
L = i = 0 n log s o f t m a x ( w o u t i ) [ p i ] n
where w o u t i is the vector of predicted values of all words in the dictionary for the i-th word in decoding, and p i is the position of the i-th word in the dictionary in the ground truth answer.
Similarly, for the semantic focus distribution of the answer, a seq2seq model is used to convert the question into an answer and obtain the word matching focus distribution t a during the conversion process.

3.4. Information Extraction and Fusion

After obtaining the results of word matching focal distributions through various matching models, the focus distribution results for the question and answer are s q ,   p q ,   t q , s a ,   p a ,   t a , respectively. Then, the positions of key information points in the sentence are extracted based on the focus distribution results, as shown in Equation (22):
x = t o p K ( f x )
where t o p K ( . ) represents the top K numbers extracted from the vector and outputs the positions of each element x R K f x { s q , p q , t q ,   s a , p a , t a } , as represents the resulting focus distribution obtained. Different models have different matching characteristics. The serial matching model tends to look for vocabularies that indicate whether there is a question–answer relationship in the question and answer, the parallel matching model pays more attention to whether there are similar expressions in the question and answer, and the transformational matching model is better at finding vocabulary that indicates a logical connection between the question–answer sentences. However, regardless of which matching model is used to extract key information from the question–answer sentences, the positions of the information should be very close to each other to reflect the stability of language expression. So, based on Equation (22), we can obtain the position distribution of key information points in the question under the three matching modes as x s q , x p q , x t q and x s a , x p a , x t a , and then calculate the distance between the corresponding positions of key information points in the question and answer, respectively, as shown in Equations (23) to (24):
d q = i = 1 K | x s q [ i ] x p q [ i ] | + | x s q [ i ] x t q [ i ] | + | x p q [ i ] x t q [ i ] | 3 × K
d q = i = 1 K | x s a [ i ] x p a [ i ] | + | x s a [ i ] x t a [ i ] | + | x p a [ i ] x t a [ i ] | 3 × K
Finally, using d = d q + d a as the deviation between the two sentences indicating the existence of a question–answer relationship, the smaller the value of d , the higher the possibility of a question–answer relationship between the two sentences.

4. Experiments and Results Analysis

4.1. Experimental Dataset and Evaluation Metrics

Experimental Dataset We selected two typical QA datasets for testing: TREC-QA [32] and Wiki-QA [33]. The basic information about these two datasets is shown in Table 1, with all values obtained after removing entirely negative answers.
Evaluation Metrics The evaluation metrics for the experiment are mean average precision (MAP) and mean reciprocal rank (MRR)
MAP This refers to the mean average precision (MAP) for all the questions. The expression for the average precision for each question is shown in Equation (25):
P a v e q = k = 1 n p k × r k m
where p k is calculated using Equation (26):
p k = k = 1 i r k k = 1 j k   ( i = 1 , 2 , 3 , m j = 1 , 2 , 3 , , n )
where q represents a single question, n represents all the candidate answers for a single question, and m represents the number of correct answers. The value r k can be either 0 or 1. When the k-th candidate’s answer is the correct answer, r k = 1 ; otherwise, r k = 0 . The final expression for MAP is shown in Equation (27):
M A P = q = 1 Q P a v e q Q
When all the correct answers are ranked before any incorrect answers, the MAP approaches 1. When all the correct answers for a question are ranked after all the incorrect answers, the MAP approaches 0.
MRR This refers to the average value of the reciprocal number of the position where the first correct answer appears in the list of candidate answers predicted by the question. The calculation method is shown in Equation (28):
M R R = q = 1 Q 1 r a n k q Q
where r a n k q represents the rank of the first correct answer for the question q among all candidate answers. The MRR value also ranges from 0 to 1 and measures the position of the first correct answer in the list of candidate answers.
Models: We compare four classical answer selection algorithms, including PQSG, a classical probability-based statistical learning method; Finetune-RoBERTa, a fine-tuned model based on RoBERTa; MANS (Multihop Attention Networks), a Multi hop inference model based on attention mechanism; and HR (Hierarchical Ranking), in which the relationship between Q&A sentence pairs is divided into sentences, sentence pairs, and sentence lists, and then the relationship between Q&A sentence is matched hierarchically.

4.2. Experimental Environment

The experiments were performed on a system equipped with an Intel Core i5-9400@2.90GHz six-core CPU, 24 GB of RAM, and an Nvidia GeForce GTX 1660 Ti GPU with 6 GB of memory. For the serial matching, we employed the pre-trained bert-base-uncased version of BERT, which had two fully connected layers with parameters of (768, 100) and (100, 2), respectively. In the case of parallel matching, we utilized SBert from the all-distilroberta-v1 version available on the Hugging Face website. For the Encoder-Decoder model in transformational matching, we used the Adam optimizer with a learning rate of 0.001 to update the parameters. We trained the word embedding representations based on the question–answer pairs using a batch size of 128 and the sentence-level matching module using a batch size of 64. When processing data with the recurrent neural network model, we set the size of its hidden layer to 128 and kept all other parameters at their default values.

4.3. Comparison of Matching Focus Distribution Results Under Different Question–Answer Matching Models

To verify that the focus of key information should yield similar distribution results under different matching structures, we selected two representative questions from the two datasets. We then identified two answer sentences with similar lengths based on the questions, one of which provided a positive answer that was relevant to the question, while the other presented a negative answer that was unrelated to the question. Table 2 provides detailed descriptions of the sentences used in our study. By comparing the distribution results obtained using different matching structures, we can assess the effectiveness of each method in capturing the key information of the answer relevant to the question.
Regarding Example 1, the word lengths are as follows: 8 for the question, 19 for the positive answer, and 20 for the negative answer. The results of the word matching focus distribution for the question and the positive/negative answers in different matching structures are illustrated in Figure 6 and Figure 7.
Looking at Figure 5, it is evident that, when the question is matched with the answer, two key information points are present in the question: one near “How much” and the other near “worth”. This implies that the extracted key points from the question revolve around the price or value of something, and the matching trends across the three models are largely similar at these crucial points. However, when the question is paired with a negative answer to form a sentence pair, the distribution of focus in the question varies significantly across different matching modes. The resulting distribution of focus is unable to extract the key information points, thus leading to a large degree of offset in focus positions observed from different models. Figure 6 shows the focus distribution results of answer sentences. When the answer is a positive response to the question, similar trends can be obtained across different matching structures, especially at the key information position. However, the distribution of negative answers is relatively more chaotic. For instance, the trend of focus changes in the serial matching structure displays a random state and is less able to obtain the position of key information points. Similarly, the length of the second question is 8 words, the length of the positive answer is 19 words, and the length of the negative answer is 20 words. The focal distribution results obtained after combining the questions with positive and negative answers under different matching modes are shown in Figure 8 and Figure 9.
Figure 8 and Figure 9 suggest a pattern similar to the first example, indicating that the distribution trends and variation rules of the key information positions corresponding to each matching focus under different matching structures are either similar or closely related. To further verify this pattern, we divided the question–answer data in the test set of the two datasets into two categories: positive answers with questions and negative answers with questions. We then calculated the correlation between the distribution of matching focus in different matching perspectives for the two categories of question–answer data, using the Pearson correlation coefficient as the measure. The experimental results are presented in Table 3.
Based on the experimental results in Table 3, we can conclude that the distribution of word matching focus for positive answers exhibits a similar trend across different perspectives, whereas the distribution of word matching focus for negative answers appears to be more random. This finding confirms that the location distribution of key information points for two sentences with question–answer relationships under different perspectives is relatively stable.

4.4. The Impact of the Number of Key Information K on the Effectiveness of Answer Selection

A sentence may contain more than one key information point, and it is not always the case that all words in a sentence are key information points. Hence, it is necessary to assess the ability of questions and answers to include a certain number of information points and the effectiveness of answer selection under different numbers of extracted information points. In this regard, the value K is used to represent the number of extracted key information points in the sentence. Initially, the K value in the answer is fixed to 4, based on previous research findings [34]. Then, the K value in the question is varied to observe the model’s effectiveness. Subsequently, the K value in the question is fixed at K = 2 (based on the best effectiveness obtained in the previous experiment), and the K value in the answer is changed to observe the model’s effectiveness. The experimental results are depicted in Figure 10 and Figure 11.
Based on the relationship between the model’s answer selection performance and the change of K value in Figure 9 and Figure 10, we can infer that, for the TREC-QA and Wiki-QA datasets, questions usually have two key information points. In contrast, the key information points in answer choices need to be determined based on the specific category of the question-answering dataset. The optimal performance is generally achieved around K = 4, which indicates that answers usually require twice as much information to explain compared to questions, with two key information points in questions and four in answers. Furthermore, we observe that the effectiveness of answer selection shows an upward trend followed by a downward trend as the K value increases. Once the optimal performance is achieved, increasing the K value in the question leads to a monotonic decrease in answer selection performance, while increasing the K value in answer matching leads to a fluctuating decrease in answer selection performance. This also suggests that the distribution of information in answers is more complex from another perspective.

4.5. Comparison of the Effect of Different Answer Selection Models

4.5.1. Comparison of the Classical Answer Selection Models

Based on the aforementioned findings, it can be inferred that different matching models have similar patterns in extracting word focus for sentences with question–answer relationships. Hence, any combination of two matching models can be employed for answer selection. To verify the complementary roles of different matching models in extracting sentence information, we combined the three matching models in pairs, denoted as S for serial matching, P for parallel matching, and T for transformational matching mode. For this purpose, we fixed K = 2 for questions and K = 4 for answers. The answer selection performances of the proposed models on the two question-answering datasets under different combination forms are summarized in Table 4 and Table 5, respectively. These tables also include comparison results between our proposed model and some typical answer selection models used in the past.
Table 4 and Table 5 present the comparison results of FMWMF with other models and different matching perspectives combinations on the TREC-QA and Wiki-QA datasets. Compared to the CETE model, FMWMF has a significant improvement in MAP indicators on both datasets, with an increase of 4.07% on the TREC-QA dataset and 5.51% on the Wiki-QA dataset. The performance improvement on the Wiki-QA dataset is particularly noteworthy, as it covers a wider range of domains and answer formats, indicating that FMWMF can distinguish more subtle differences between sentences in identifying different answers to the same question and reasonably extend and expand based on the question. This allows the model to still make a question–answer relationship judgment, even when the question and answer have fewer similar and connecting words. For the MRR indicator, FMWMF achieved an improvement of 1.63% on the TREC-QA dataset and 4.86% on the Wiki-QA dataset, demonstrating its enhanced ability to select the best answer. The word-based modeling approach can extract more comprehensive sentence information, and the model also has stronger understanding and reasoning capabilities due to the introduction of the multi-perspective matching mechanism. Moreover, none of the models resulting from the arbitrary combination of the three matching modes can surpass the effect of the simultaneous combination of the three matching models, indicating that the three matching models complement each other in extracting the question–answer relationship between sentences. Looking at the results of the combined experiments, the transformational matching has the strongest ability to extract the question–answer relationship, followed by the serial matching, and finally the parallel matching.

4.5.2. Comparison of the Large Language Models

In order to verify the use of the model in the real environment, we selected the latest question and answer dataset in natural language processing, mmlu_pro, which contains 14 question and answer questions in different fields, one of which corresponds to at least 10 answers. Answer selection for such a complex task is difficult for the general model to handle and achieve good results. Here we select Deepseek-R1 and qwen3_MoE, two models with reasoning ability, and set the first 100 questions to be selected for each subject, and the output length of the model as 4096. The following are the results of the control experiment, including the accuracy of answer selection, the time spent, and the optimal K value of the FMWMF model.
It can be seen from Table 6 that the K values selected for Q&A data in different fields are different, which is obviously related to the subject data corresponding to the data, but the K values selected by the Q&A datasets in most fields are a combination of (2 and 4). From the perspective of experimental accuracy, the Qwen2.5–1.5B model with the same parameter level as the FMWMF model performs very poorly in any discipline, indicating that LLMs with small parameter levels are not competent for complex tasks; however, for the FMWMF model, the results of the model in the textual reasoning discipline are equivalent to the results of Deepseek-R1 or Qwen3_MoE, and even surpass the two reasoning LLM models in the Law discipline. Finally, in terms of calculation time, FMWMF spends the least time because FMWMF only processes the input questions and answer options, and does not need the prompt and analysis process of the second subject, so the time spent on each subject is basically the same.

5. Conclusions

This paper proposes a novel answer selection model, FMWMF, that utilizes word focus from multiple perspectives. FMWMF can capture more detailed information in the sentence and solve the problem of information omission. This method enables rapid and accurate retrieval of the target answer from a rich list of candidates while simultaneously extracting the key information points embedded in each question–answer pair. Empirically, the identified answers consistently contain multiple of the information points present in the original query. Deploying FMWMF in community-based and educational Q&A platforms would therefore drastically accelerate the curation of high-quality answers and mitigate the inefficiencies caused by information redundancy. Nevertheless, we observe that FMWMF’s efficacy is highly domain-dependent. In text-centric disciplines such as philosophy, law, and political science, its performance rivals that of state-of-the-art open-source reasoning models like DeepSeek-R1 and Qwen3-MoE. Conversely, in mathematics, physics, and chemistry, its results are statistically indistinguishable from random selection. Moreover, FMWMF is a pure answer-selection architecture; it offers no transparent rationale for why a given question–answer pair is matched and, unlike DeepSeek-R1, cannot articulate this alignment in natural language. An intriguing and forward-looking direction for future work is to embed the FMWMF framework directly within large language models. In particular, for complex problem-solving scenarios, integrating FMWMF with an attention mechanism could direct the model’s focus onto the most salient information points, offering a principled way to alleviate hallucinations, response incoherence, and training instabilities currently observed in LLMs.

Author Contributions

Methodology, Z.S.; Validation, Z.L.; Writing—original draft, J.H.; Writing—review & editing, X.H.; Supervision, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (62267003, 62177012).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

  1. Amancio, L.; Dorneles, C.F.; Daniel, H.; Dalip, D.H. Recency and quality-based ranking question in CQAs: A Stack Overflow case study. Inf. Process. Manag. 2021, 58, 102552. [Google Scholar] [CrossRef]
  2. Minaee, S.; Kalchbrenner, N.; Cambria, E. Deep Learning Based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2020, 54, 1–40. [Google Scholar] [CrossRef]
  3. Xu, J.; Zhang, W. Study of Intelligent Question Answering System Based on Ontology. In Proceedings of the 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 16–18 October 2020; pp. 428–431. [Google Scholar] [CrossRef]
  4. Wang, S.; Jiang, J. A compare-aggregate model for matching text sequences. arXiv 2016, arXiv:1611.01747. [Google Scholar] [CrossRef]
  5. Matthew, E.P.; Mark, N.; Mohit, I.; Matt, G. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 2–4 June 2018; Volume 1, pp. 2227–2237. [Google Scholar] [CrossRef]
  6. Jacob, D.; Chang, M.-W.; Kenton, L. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
  7. Alec, R.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 25 August 2025).
  8. Tran, N.K.; Claudia, N. Multihop attention networks for question answer matching. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘18), Ann Arbor, MI, USA, 8–12 July 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 325–334. [Google Scholar] [CrossRef]
  9. Yang, Z.; Wang, Y.; Gan, J. Design and research of intelligent question-answering (Q&A) system based on high school course knowledge graph. Mob. Netw. Appl. 2021, 26, 1884–1890. [Google Scholar] [CrossRef]
  10. Qian, Y. The Semantic Framework of Library Intelligent Question Answering System Based on Exploratory Search Behavior. In Proceedings of the 2022 IEEE 2nd International Conference on Computer Communication and Artificial Intelligence (CCAI), Beijing, China, 6–8 May 2022; pp. 65–70. [Google Scholar] [CrossRef]
  11. Wildan, A.H.; Aji, R.F. Transformer and Large Language Models for Automatic Multiple-Choice Question Generation: A Systematic Literature Review. IEEE Access 2025, 13, 127100–127112. [Google Scholar] [CrossRef]
  12. Hu, Z. Research and implementation of railway technical specification question answering system based on deep learning. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 5–9. [Google Scholar] [CrossRef]
  13. Yu, L.; Hermann, K.M.; Blunsom, P.; Pulman, S. Deep learning for answer sentence selection. arXiv 2014, arXiv:1412.1632. [Google Scholar] [CrossRef]
  14. Vasin, P.; Roth, D.; Yih, W.-T. Mapping dependencies trees: An application to question answering. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Chongqing, China, 7–9 April 2023; Available online: https://api.semanticscholar.org/CorpusID:8214465 (accessed on 25 August 2025).
  15. Yih, S.W.-T.; Chang, M.-W.; Meek, C.; Pastusiak, A. Question answering using enhanced lexical semantic models. In Proceedings of the 51st Annual Meeting of the Association for Computational linguistics, Sofia, Bulgaria, 3 June 2013; Available online: https://www.microsoft.com/en-us/research/publication/question-answering-using-enhanced-lexical-semantic-models/ (accessed on 25 August 2025).
  16. Adebayo, I.M.; Kim, B.-S. Question-answering system powered by knowledge graph and generative pretrained transformer to support risk identification in tunnel projects. J. Constr. Eng. Manag. 2025, 151, 04024193. [Google Scholar] [CrossRef]
  17. Di, W.; Eric, N. A long short-term memory model for answer sentence selection in question answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 2, pp. 707–712. [Google Scholar] [CrossRef]
  18. He, H.; Gimpel, K.; Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1576–1586. [Google Scholar] [CrossRef]
  19. Wang, Z.; Hamza, W.; Florian, R. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19 August 2017; pp. 4144–4150. [Google Scholar] [CrossRef]
  20. Wiwin, S.; Pratama, R.A.; Rahadika, F.Y.; Purnomo, M.H.A. Self-Attention Mechanism of RoBERTa to Improve QAS for e-health Education. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), Depok, Indonesia, 14–15 September 2021; pp. 221–225. [Google Scholar] [CrossRef]
  21. Guo, Y.; Zhang, J.; Chen, X.; Ji, X.; Wang, Y.-J.; Hu, Y.; Chen, J. Improving vision-language-action model with online reinforcement learning. arXiv 2025, arXiv:2501.16664. [Google Scholar] [CrossRef]
  22. Wang, R.; Zhang, Z.; Rossetto, L.; Ruosch, F.; Bernstein, A. NLQxform-UI: An Interactive and Intuitive Scholarly Question Answering System. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025; pp. 3990–3993. [Google Scholar] [CrossRef]
  23. Clark, K.; Luong, M.-T.; Le, Q.V.; Manning, C.D. Electra: Pre-training text encoders as discriminators rather than generators. arXiv 2020, arXiv:2003.10555. [Google Scholar] [CrossRef]
  24. Laskar, M.T.R.; Huang, J.X.; Hoque, E. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 5505–5514. Available online: https://aclanthology.org/2020.lrec-1.676/ (accessed on 25 August 2025).
  25. Song, J.; Ashktorab, Z.; Pan, Q.; Dugan, C.; Geyer, W.; Malone, T.W. Interaction Configurations and Prompt Guidance in Conversational AI for Question Answering in Human-AI Teams. arXiv 2025, arXiv:2505.01648. [Google Scholar] [CrossRef]
  26. Zhang, Z.; Yang, J.; Zhao, H. Retrospective reader for machine reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2021; Volume 35, pp. 14506–14514. [Google Scholar] [CrossRef]
  27. Li, L.; Zhou, A.; Zhang, B.; Xiao, F. Multiple fragment-level interactive networks for answer selection. Neurocomputing 2020, 420, 80–88. [Google Scholar] [CrossRef]
  28. Liu, C.; Jiang, S.; Yu, H.; Yu, D. Multi-turn Inference Matching Network for Natural Language Inference. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Hohhot, China, 26–30 August 2018; pp. 131–143. [Google Scholar] [CrossRef]
  29. Guo, D.; Yang, D.; Zhang, H. Deepseek-r1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2025, arXiv:2501.12948. [Google Scholar] [CrossRef]
  30. Wang, A.; Shu, D.; Wang, Y.; Ma, Y.; Du, M. Improving LLM Reasoning through Interpretable Role-Playing Steering. arXiv 2025, arXiv:2506.07335. [Google Scholar] [CrossRef]
  31. Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In Proceedings of the Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, 12–14 October 2015; pp. 84–92. [Google Scholar] [CrossRef]
  32. Wang, M.; Smith, N.A.; Mitamura, T. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 22–32. Available online: https://aclanthology.org/D07-1003/ (accessed on 25 August 2025).
  33. Yang, Y.; Yih, W.-T.; Meek, C. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 2013–2018. [Google Scholar] [CrossRef]
  34. Gao, H.; Hu, M.; Cheng, R.; Gao, T. Hierarchical ranking for answer selection. arXiv 2021, arXiv:2102.00677. [Google Scholar] [CrossRef]
  35. He, J.; Zhang, H.; Hu, X. Open domain answer selection model fusing double matching-focus. Comput. Eng. 2023, 49, 303–310. [Google Scholar] [CrossRef]
Figure 1. Overall structure of FMWMF.
Figure 1. Overall structure of FMWMF.
Computers 14 00399 g001
Figure 2. Model design schematics diagram.
Figure 2. Model design schematics diagram.
Computers 14 00399 g002
Figure 3. Structure of the serial matching model.
Figure 3. Structure of the serial matching model.
Computers 14 00399 g003
Figure 4. Structure of the parallel matching model.
Figure 4. Structure of the parallel matching model.
Computers 14 00399 g004
Figure 5. Structure of the transformational matching model.
Figure 5. Structure of the transformational matching model.
Computers 14 00399 g005
Figure 6. The Q distribution obtained by matching question 1 with positive and negative answers based on word focus.
Figure 6. The Q distribution obtained by matching question 1 with positive and negative answers based on word focus.
Computers 14 00399 g006
Figure 7. The A distribution obtained by matching question 1 with positive and negative answers based on the word focus.
Figure 7. The A distribution obtained by matching question 1 with positive and negative answers based on the word focus.
Computers 14 00399 g007
Figure 8. The Q distribution obtained by matching question 2 with positive and negative answers based on word focus.
Figure 8. The Q distribution obtained by matching question 2 with positive and negative answers based on word focus.
Computers 14 00399 g008
Figure 9. The A distribution obtained by matching question 2 with positive and negative answers based on the word focus.
Figure 9. The A distribution obtained by matching question 2 with positive and negative answers based on the word focus.
Computers 14 00399 g009
Figure 10. The relationship between different K values in the question and the effectiveness of answer selection.
Figure 10. The relationship between different K values in the question and the effectiveness of answer selection.
Computers 14 00399 g010
Figure 11. The relationship between different K values in the answer and the effectiveness of answer selection.
Figure 11. The relationship between different K values in the answer and the effectiveness of answer selection.
Computers 14 00399 g011
Table 1. Information on the dataset.
Table 1. Information on the dataset.
DatasetTREC-QAWiki-QA
Train Set1162873
Validate Set65126
Test Set68243
Table 2. Examples of question–answer sentence pairs.
Table 2. Examples of question–answer sentence pairs.
QuestionPositive AnswerNegative Answer
How much are the Harry Potter movies worth?The series also originated many types of tie-in merchandise, making the Harry Potter brand worth in excess of USD 15 billion.The initial major publishers of the books were Bloomsbury in the United Kingdom and Scholastic Press in the United States.
How deep can deep underwater drilling go?Deepwater drilling is the process of oil and gas exploration and production at depths of more than 500 feet.It has been economically infeasible for many years, but with rising oil prices, more companies are investing in this area.
Table 3. The correlation between the distribution of match focus of positive and negative answers under different matching perspectives.
Table 3. The correlation between the distribution of match focus of positive and negative answers under different matching perspectives.
DatasetAnswer TypeRelevance
TREC-QAPositive0.7929
Negative−0.1926
Wiki-QAPositive0.6322
Negative0.2437
Table 4. Comparison of experimental results on the TRECQA dataset.
Table 4. Comparison of experimental results on the TRECQA dataset.
ModelMAPMRR
Study [8]0.81300.8930
Study [17]0.71340.7913
Study [24]0.89100.9250
Study [35]0.84200.9040
SP0.84970.8738
ST0.90620.9246
TP0.88050.9027
SPT(FMWMF)0.92730.9401
Table 5. Comparison of experimental results on the WikiQA dataset.
Table 5. Comparison of experimental results on the WikiQA dataset.
ModelMAPMRR
Study [8]0.72200.7380
Study [24]0.82900.8430
Study [32]0.65200.6652
Study [34]0.74200.7540
SP0.79340.7977
ST0.83060.8361
TP0.81470.8213
SPT(FMWMF)0.87730.8840
Table 6. Comparison of the large language models.
Table 6. Comparison of the large language models.
SubjectDeepSeek-R1 (16*H20)Qwen_MoE-think (8*H20)Qwen2.5–1.5B (1*H20)FMWMF (1*H20)
accTimeaccTimeaccTimeaccTimeK
(Question)
K
(Answer)
Biology0.8640 min 21 s0.9230 min 36 s0.21933 min 50 s0.6767 s24
Business0.839 min 07 s0.8326 min 48 s0.253 min 07 s0.7362 s25
Chemistry0.7655 min 00 s0.8639 min 54 s0.285 min 27 s0.2755 s47
Computer Science0.7923 min 09 s0.8121 min 56 s0.112 min 34 s0.7668 s24
Economics0.8535 min 13 s0.8725 min 41 s0.313 min 27 s0.6477 s24
Engineering0.6419 min 18 s0.7115 min 00 s0.261 min 45 s0.6859 s24
Health0.6912 min 04 s0.6910 min 30 s0.191 min 11 s0.5860 s24
History0.6736 min 17 s0.6825 min 54 s0.213 min 30 s0.6178 s36
Law0.5938 min 05 s0.6529 min 42 s0.133 min 40 s0.7293 s24
Math0.9234 min 48 s0.927 min 45 s0.133 min 38 s0.1551 s22
Other0.7512 min 11 s0.659 min 51 s0.341 min 09 s0.7061 s24
Philosophy0.8128 min 35 s0.722 min 19 s0.142 min 55 s0.8184 s36
Physics0.8734 min 19 s0.929 min 23 s0.243 min 35 s0.2256 s26
Psychology0.7211 min 26 s0.799 min 59 s0.191 min 10 s0.7563 s36
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, X.; He, J.; Shou, Z.; Liu, Z.; Zhang, H. Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching. Computers 2025, 14, 399. https://doi.org/10.3390/computers14090399

AMA Style

Hu X, He J, Shou Z, Liu Z, Zhang H. Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching. Computers. 2025; 14(9):399. https://doi.org/10.3390/computers14090399

Chicago/Turabian Style

Hu, Xiaoli, Junfei He, Zhaoyu Shou, Ziming Liu, and Huibing Zhang. 2025. "Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching" Computers 14, no. 9: 399. https://doi.org/10.3390/computers14090399

APA Style

Hu, X., He, J., Shou, Z., Liu, Z., & Zhang, H. (2025). Educational QA System-Oriented Answer Selection Model Based on Focus Fusion of Multi-Perspective Word Matching. Computers, 14(9), 399. https://doi.org/10.3390/computers14090399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop