Research on the Identification of a Winter Olympic Multi‑Intent Chinese Problem Based on Multi‑Model Fusion

: Aiming at addressing the inability of traditional web technologies to effectively respond to Winter‑Olympics‑related user questions containing multiple intentions, this paper explores a multi‑ model fusion‑based multi‑intention recognition model BCNBLMATT to solve this problem. The model is proposed to address the characteristics of complex semantics, strong contextual relevance, and a large number of informative features of the Chinese problem text related to the Winter Olympics, as well as the limitations of the traditional word vector model, such as insufficient expression in the textual representation and the relative concern mechanism of feature expression. The BCNBLMATT model first obtains a comprehensive feature vector representation of the problem text through BERT. Then, a multi‑scale text convolutional neural network model and a BiLSTM‑Multi‑heads attention model (a joint model combining a bidirectional long‑ and short‑term attention network with a multi‑ head attention mechanism) are used to capture local features at more scales and contextually critical information features at more levels. Finally, the two obtained kinds of features are concatenated and fused to obtain richer and more comprehensive information about the problem text features, which improves the model’s performance in the multi‑attention recognition task. Comparative experiments on the Winter Olympics Chinese question dataset and the MixATIS question dataset show that the BCNBLMATT model significantly improves the three metrics of macro‑averaged precision, macro‑ averaged recall, and macro‑averagedF1 value and exhibits better generalization. This study provides an effective solution to the multi‑intent recognition task for Winter Olympic problems, overcomes the limitations of traditional models, and provides new ideas for improving the performance of multi‑ intent recognition.


Introduction
The successful conclusion of the 2022 Beijing Winter Olympics has triggered widespread interest in issues related to the Winter Olympics.The Winter Olympics is held only once every four years and has been held 24 times so far, accumulating rich information resources.Although widely used, general purpose search engines face the problems of much information interference and difficulty in quality assurance.Especially when dealing with queries containing multiple question intents, general purpose search engines often fail to identify all the user's intents, thus making it difficult to answer the user's questions effectively.Therefore, how to accurately identify the user's complete intention has become a core problem that needs to be solved.Against this background, this paper is dedicated to exploring a key problem, i.e., the "Winter Olympics Multi-intent Chinese Problem", and describing in detail the "Multi-model Fusion" approach that we adopt.
Winter Olympics multi-intent Chinese questions involve multiple aspects that users may care about at the same time, such as inquiring about an athlete's age, nationality, and career.Such multi-intent contexts are often characterized by complex semantics, strong contextual associations, and rich information features.Therefore, the goal of our research is to more comprehensively extract the information features of user question texts to improve the performance of multi-intent recognition questions.To achieve this goal, we constructed a multi-intent recognition model BCNBLMATT, which is based on the improvement of Bert and the multi-heads attention mechanism, and by exploring our problem domain in detail, we will reveal how to parse user inputs more accurately and comprehensively under the "multi-model fusion" approach to provide new possibilities for a deep understanding of Winter-Olympics-related problems.The main contributions of this article are as follows: 1.
Aiming at the scarcity of corpus for Winter-Olympics-related problems, crawler technology is used to obtain Winter-Olympics-related information, extracting information such as the basic information of the athletes being inquired about, the achievements they have received, their careers, and their competitions.A user question dataset about the Winter Olympics domain is automatically generated through a customized template containing single-, two-, and three-intent question data.

2.
The Chinese pretraining language model Bert-base-chinese learning is used to obtain dynamic text semantic vector representations containing richer semantic information and improve the text's semantic representation ability.3.
To address the problem of a single-head attention mechanism with single feature expression, the multi-heads attention mechanism is introduced so that the model can obtain more information about the problem text from different perspectives and improve the feature expression ability of the model.4.
The improved multi-intent recognition model BCNBLMATT based on BERT and the multi-heads attention mechanism is proposed and the problem text is encoded through Bert-base-chinese to obtain the dynamic text semantic vector representation; the local feature extraction of TextCNN and the context-dependent relationship of BiLSTM-Multi-heads attention feature extraction are combined to obtain the local feature and contextual feature information of the problem text.By fusing these two kinds of features, the problem of incomplete feature extraction is solved, and the superiority of this model in terms of the multi-intent recognition effect is verified by comparing and analyzing it with other models on the Winter Olympics Chinese question dataset and MixATIS question dataset.

Winter Olympics Field
In the face of the problem that universal search engines on the internet contain a large amount of data resources, which can lead to data redundancy and clutter, thus affecting the efficiency of people's access to information, Luo Ling et al. [1] proposed three Winter Olympics knowledge Q&A system models based on knowledge graph, TF-IDF, and BERT models, and demonstrated through experiments that the overall performance of the BERT model was slightly better than the other two types of models in these three methods.The dataset used in this experiment was factual information crawled by Luo Ling et al. [1] using web crawler technology as the answer, and simple questions were generated through templates.The dataset contained basically simple, single-intention question-and-answer pairs.For the TF-IDF model and BERT model, the method used was to find the answer that was most similar to the question as the answer.For the knowledge graph method, as the dataset was all simple questions, it only needed to predict the header entity and predicate of the question and obtain the answer through a triplet, without considering the complex sentence structure and multiple intentions of the problem text.

Multi-Intention Recognition Task
For multi-intention recognition tasks, early on, Xu et al. [2] used feature-based logarithmic linear models and perceptron training methods to utilize shared intention information between different intention combinations for multi-intention recognition tasks.However, when faced with a large number of intention combinations, the problem of sparse training data may arise.Kim et al. [3] proposed a multi-intention recognition method based on single-intention training data, which divides the problem text into single-intention problems, two-intention problems with conjunctions, and multi-intention problems with two clauses without conjunctions.The method utilizes maximum entropy and conditional random field models to perform multi-intention recognition using a two-stage method.However, the model is based on the maximum number of user intentions being two intentions.Later, with the rapid development of deep learning technology, experts began to use this technology for multi-intention recognition tasks.Yang Chunni et al. [4] first used a dependency syntax analysis to determine whether text was multi-intention, and calculated the distance between the words in the sentence and the keywords in the intention category through word frequency-inverse document frequency (TF-IDF) and trained word vectors to determine the number of intentions.Then, CNN (Convolutional Neural Networks) models were used for intention classification.Finally, the true intention of the user was obtained by determining the polarity of the positive and negative emotions of the intention.It relies too heavily on the results of dependency parsing, but the complexity of Chinese is high, and errors in dependency parsing can have a significant impact on the results.Liu Jiao et al. [5] further extracted the deep semantics of text by adding convolutional capsule layers into the capsule network [6] to improve the performance of the multi-intention recognition.However, the capsule network is currently immature and only performs well on MNIST dataset.Some scholars believe that intention and semantic slots are interrelated, so they have proposed a model that combines intention detection and semantic slot filling [7,8] to improve the accuracy of intention recognition and semantic slot filling.However, this method requires a lot of manual labeling, and the cost of manually implementing feature representation is high.

Multi-Label Text Classification Task Based on Deep Learning Technology
At present, there is relatively little research on multi-intention recognition, and the multi-intention recognition task is similar to the multi-label text classification task.For multi-label text classification tasks, with the rapid development of deep learning technology in recent years, using deep learning methods to solve multi-label classification tasks has become a research hotspot.Deep learning methods automatically extract features through neural network structures such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), which reduces labor costs and enhances feature expression ability [9].Convolutional Neural Networks (CNN) [10] were initially applied in the field of computer vision.Later, Kim [11] proposed the TextCNN model, which is suitable for text classification tasks and was obtained by Kim through some modifications to the input layer of a CNN.It only has one layer of convolution and one layer of maximum pooling, so it has advantages such as a simple network structure and a fast training speed; it is widely used in multi-label text classification tasks [12,13].Its characteristic is that it can effectively extract the local features of sentences, but its disadvantage is that it requires the use of fixed windows.Therefore, when facing long texts, it is not suitable for capturing the long-distance dependencies of text sequences and is prone to losing key information.Recurrent Neural Networks (RNN) [14] can capture long-distance dependencies and are therefore applied to multi-label text classification tasks [15].However, RNNs may experience gradient vanishing or exploding when processing text.To address this issue, the RNN variant BiLSTM [16] has emerged, which can effectively capture contextual information, extract the global features of sentences, and improve the classification accuracy of multi-label texts [17].
In text classification tasks, in order to extract key information that is more effective for classification, Zhou et al. [18] proposed using the BiLSTM model to extract key features globally by introducing an attention mechanism to assign higher weights to key information, thereby improving the effectiveness of text classification.Some scholars have combined a CNN with BiLSTM for use.Some scholars have first obtained the local features of text through a CNN and then input them into LSTM for classification [19,20], while oth-ers have first extracted contextual information through BiLSTM-attention and then input them into a CNN to extract the features for classification [21,22].Song Zhongshan et al. [23] concatenated the output results of a CNN and the BiLSTM method that introduced the attention mechanism.They believed that the feature extraction capabilities of BiLSTM and CNN models are limited, resulting in a low classification accuracy.Using this method can improve the classification performance of the model by utilizing complementary advantages.At the same time, it was verified through comparative experiments that this concatenation method is better than LSTM-CNN and LSTM-CNN.
The above research provides some reference ideas for the Winter Olympics multiintention problem recognition task: (1) Currently, most classification models use word vector models when extracting features, which still have limitations in terms of text representation, such as Word2vec.Its vocabulary is determined from the beginning of training, which cannot effectively solve the impact of words outside its vocabulary on text feature learning, and it also ignores the polysemy of words; the BERT [24] pre-trained language model has strong text representation and semantic understanding abilities, which can better solve the above problems, and also has good results in text classification [25][26][27].Due to the particularity of Chinese itself, this article uses Bert-base-chinese as the pre-trained language model to maximize the text representation; (2) Faced with the problem of limited feature extraction for TextCNN and BiLSTM, combining the advantages of complementary advantages can extract richer and more complete text semantic feature information; and (3) Previous models have mostly used single-layer attention mechanisms with relatively single feature expression.For multi-intention recognition models, this article introduces a multi-head attention mechanism [28], aiming to enable the model to obtain more information about sentence levels from different representation spaces and improve its feature expression ability.

Overall Architecture of the Model
The BCNBLMATT model framework proposed in this paper is shown in Figure 1 and is divided into three parts.The first part is the text representation layer, which uses Bertbase-chinese to encode the problem text to obtain a semantic feature representation of the text containing richer semantic information.The second part is the feature extraction part, including the TextCNN layer and BiLSTM-Multi-heads attention layer.The TextCNN layer captures the local key features (F1) of the problem text via convolutional operation, while the BiLSTM-Multi-heads attention layer extracts the contextual key features (F2) of the problem text by using the mechanism of Multi-heads attention information.The third part is the feature splicing and fusion and intent classification layer, which splices and fuses the two semantic features obtained in the second part to obtain a richer feature representation and perform intent classification.

Text Representation Layer
Due to the fact that this article focuses on Chinese datasets, the Bert-base-chinese pretraining model is adopted.This model is trained based on a relevant corpus from Chinese Wikipedia, and is used as the baseline.The model uses a multi-layer bidirectional Transformer encoder as the feature extractor.As shown in Figure 2, Firstly, the question text is converted into a word vector embedded representation En, and then, the Transformer encoder is used to convert the text into a text vector Tn rich in semantic features as the input for the downstream BiLSTM-Multi-heads attention model and TextCNN model.

Text Representation Layer
Due to the fact that this article focuses on Chinese datasets, the Bert-base-chinese pretraining model is adopted.This model is trained based on a relevant corpus from Chinese Wikipedia, and is used as the baseline.The model uses a multi-layer bidirectional Transformer encoder as the feature extractor.As shown in Figure 2, Firstly, the question text is converted into a word vector embedded representation En, and then, the Transformer encoder is used to convert the text into a text vector Tn rich in semantic features as the input for the downstream BiLSTM-Multi-heads attention model and TextCNN model.

TextCNN Layer
TextCNN is a convolutional neural network for text classification that utilizes multiple sliding windows of different sizes to perform convolution pooling operations on text vectors.It extracts the key information from sentences by capturing the local features of text sequences.As shown in Figure 3, it consists of four parts: input layer, convolutional layer, pooling layer and output layer, which will use the text vector Tn obtained after training by Bert-base-chinese as the word vector input, and the corresponding feature vector is obtained by convolution operation through sliding windows of different sizes, and

Text Representation Layer
Due to the fact that this article focuses on Chinese datasets, the Bert-base-chines pretraining model is adopted.This model is trained based on a relevant corpus from Chi nese Wikipedia, and is used as the baseline.The model uses a multi-layer bidirectiona Transformer encoder as the feature extractor.As shown in Figure 2, Firstly, the questio text is converted into a word vector embedded representation En, and then, the Trans former encoder is used to convert the text into a text vector Tn rich in semantic features a the input for the downstream BiLSTM-Multi-heads attention model and TextCNN mode

TextCNN Layer
TextCNN is a convolutional neural network for text classification that utilizes multi ple sliding windows of different sizes to perform convolution pooling operations on tex vectors.It extracts the key information from sentences by capturing the local features o text sequences.As shown in Figure 3, it consists of four parts: input layer, convolutiona layer, pooling layer and output layer, which will use the text vector Tn obtained afte training by Bert-base-chinese as the word vector input, and the corresponding feature vec tor is obtained by convolution operation through sliding windows of different sizes, and

TextCNN Layer
TextCNN is a convolutional neural network for text classification that utilizes multiple sliding windows of different sizes to perform convolution pooling operations on text vectors.It extracts the key information from sentences by capturing the local features of text sequences.As shown in Figure 3, it consists of four parts: input layer, convolutional layer, pooling layer and output layer, which will use the text vector Tn obtained after training by Bert-base-chinese as the word vector input, and the corresponding feature vector is obtained by convolution operation through sliding windows of different sizes, and then the largest feature is selected from the feature vector generated by each sliding window by the maximum pooling operation, and then these features are spliced together to obtain the local feature vector F1 of the text.
then the largest feature is selected from the feature vector generated by each sliding window by the maximum pooling operation, and then these features are spliced together to obtain the local feature vector F1 of the text.

Word embedding
Convolution layer Max pooling Feature output The formula is as follows: where h represents the size of the convolutional kernel, k represents the word vector dimension corresponding to each word in the text sequence, w represents the h × k-dimensional weight matrix, Ti:i+h−1 represents a sliding window of size h × k composed of rows i to i + h−1 of the input matrix, consisting of Ti, Ti+1, … Ti+h−1 is concatenated, b is the bias parameter, f is the nonlinear activation function, and w and Ti:i+h−1 are sequential dot products for obtaining the corresponding eigenvectors.The pooling layer adopts the Max Pooling maximum pooling strategy to filter out a maximum feature value from each sliding window, as follows: Among them, n represents the number of words in the text, and finally, all the pooled feature values are concatenated to obtain the high-level feature vector F1 of the text.

BiLSTM-Multi-Heads Attention Layer
The Bidirectional Long Short Term Memory neural network model (BiLSTM) is a combination of forward LSTM and backward LSTM.Although the LSTM mode [29] can capture long-distance dependencies, it cannot encode information from back to front.Therefore, the BiLSTM model is proposed to perform bidirectional processing on sequence data to capture the bidirectional semantic dependencies of sentences, and it can be used to extract the contextual information from text sequences.In order to capture the more important features in sentences, this article adopts a multi-head attention mechanism for key information extraction.The multi-head attention mechanism divides the model into multiple heads, performs multiple independent attention calculations [28], focuses on different aspects of information, and learns more key feature information.It can focus on more positional information and improve the model's expressive ability.As shown in Figure 4,This article inputs the text vector Tn obtained after the Bert-base-chinese training into BiLSTM for the forward and reverse processing of the input sequence.The two processed feature vectors are concatenated as the output sentence feature vector hn, which contains all the information of the sentence in both forward and backward directions.Then, hn is input into the multi-head attention mechanism model for multiple sets of attention processing to obtain the key feature information from different angles.The formula is as follows: where h represents the size of the convolutional kernel, k represents the word vector dimension corresponding to each word in the text sequence, w represents the h × k-dimensional weight matrix, T i:i+h−1 represents a sliding window of size h × k composed of rows i to i + h−1 of the input matrix, consisting of T i , T i+1 , … T i+h−1 is concatenated, b is the bias parameter, f is the nonlinear activation function, and w and T i:i+h−1 are sequential dot products for obtaining the corresponding eigenvectors.The pooling layer adopts the Max Pooling maximum pooling strategy to filter out a maximum feature value from each sliding window, as follows: Among them, n represents the number of words in the text, and finally, all the pooled feature values are concatenated to obtain the high-level feature vector F1 of the text.

BiLSTM-Multi-Heads Attention Layer
The Bidirectional Long Short Term Memory neural network model (BiLSTM) is a combination of forward LSTM and backward LSTM.Although the LSTM mode [29] can capture long-distance dependencies, it cannot encode information from back to front.Therefore, the BiLSTM model is proposed to perform bidirectional processing on sequence data to capture the bidirectional semantic dependencies of sentences, and it can be used to extract the contextual information from text sequences.In order to capture the more important features in sentences, this article adopts a multi-head attention mechanism for key information extraction.The multi-head attention mechanism divides the model into multiple heads, performs multiple independent attention calculations [28], focuses on different aspects of information, and learns more key feature information.It can focus on more positional information and improve the model's expressive ability.As shown in Figure 4, This article inputs the text vector Tn obtained after the Bert-base-chinese training into BiLSTM for the forward and reverse processing of the input sequence.The two processed feature vectors are concatenated as the output sentence feature vector hn, which contains all the information of the sentence in both forward and backward directions.Then, hn is input into the multi-head attention mechanism model for multiple sets of attention processing to obtain the key feature information from different angles.Finally, the features obtained from each group are concatenated and a linear transformation is performed to obtain the final high-level feature vector F2 of the text.
Finally, the features obtained from each group are concatenated and a linear transformation is performed to obtain the final high-level feature vector F2 of the text.The formula is as follows:

Multi-head attention
Among them, Ti is the vector of the i-th word, ℎ  ⃗⃗⃗ represents the forward output of the i-th word LSTM and ℎ  ⃖⃗⃗⃗ represents the reverse output of the i-th word LSTM.The two are concatenated as the output value of the i-th word BiLSTM.

{
where i represents the i-th attention, with n attention heads, i = 1,..., n, and l is the word vector dimension, dk = l/n.Its existence is to shrink the value to avoid an excessive inner product value,    ,    ,    , and   are the weight matrices, and concat represents the concatenation operation.

Feature Fusion and Intention Classification
The features extracted from the TextCNN layer and BiLSTM-Multi-heads attention layer are concatenated and fused to obtain the final feature vector F, which is passed into the fully connected network.The output dimension of the fully connected network is equal to the number of intention categories, and the prediction results obtained through the fully connected network are normalized.Since each intention predicted in this paper is independently distributed, the sigmoid function is used for processing.The selection criteria for intention are based on a probability greater than 0.5.The formula is as follows: The formula is as follows: Among them, Ti is the vector of the i-th word, → h i represents the forward output of the i-th word LSTM and ← h i represents the reverse output of the i-th word LSTM.The two are concatenated as the output value of the i-th word BiLSTM.
where i represents the i-th attention, with n attention heads, i = 1,..., n, and l is the word vector dimension, dk = l/n.Its existence is to shrink the value to avoid an excessive inner product value, W Q i , W K i , W V i , and W o are the weight matrices, and concat represents the concatenation operation.

Feature Fusion and Intention Classification
The features extracted from the TextCNN layer and BiLSTM-Multi-heads attention layer are concatenated and fused to obtain the final feature vector F, which is passed into the fully connected network.The output dimension of the fully connected network is equal to the number of intention categories, and the prediction results obtained through the fully connected network are normalized.Since each intention predicted in this paper is independently distributed, the sigmoid function is used for processing.The selection criteria for intention are based on a probability greater than 0.5.The formula is as follows: P e = (i f P e ≥ 0.5 1 else 0) Among them, W f is the weight matrix, sigmoid is the nonlinear activation function, s is the total number of intention categories, P represents the probability of each intention category corresponding to the problem text, and P e represents the intention category to which the problem text belongs, that is, the intention category corresponding to the problem text with a probability greater than or equal to 0.5 is set to 1, and others are set to 0, indicating that they do not belong to the corresponding intention category.

Experiment and Result Analysis
In this paper, we first divided the dataset into training, validation, and testing sets.In the model's training process, we used the AdamW optimizer, the learning rate scheduler, and the binary cross-entropy loss function (BCEWithLogitsLoss).Through iterative training on the training set, we gradually adjusted the model parameters to improve the performance.At the end of each training cycle, we evaluated the model using the validation set to comprehensively assess the model performance, mainly based on the Macro_F1 metrics, and retained the model that performed best on the validation set.We chose Macro_F1 as the evaluation metric because it balanced the precision and recall of the model, thus providing a more comprehensive picture of its performance.In the end, we used the test set for the final comprehensive evaluation of the models using the three metrics Macro_P, Macro_R, and Macro_F1.In order to highlight the effectiveness and accuracy of the models, four models were selected for comparison, including BERT, BERT+TextCNN (Bert-textcnn for short), BERT+BiLSTM+Multi-head attention (Bert-blmatt), and BERT-TextCNN+BiLSTMattention (Bert-cnn+blatt).To enhance the comparability, we also added a dataset for comparison.

Dataset
Since the question data about the Winter Olympics are too sparse, the main questions focus on asking basic information about athletes, achievements, careers, and the competition, etc.There are relatively few question patterns for this kind of vertical domain, so this paper collected the information containing the names of athletes, the competition results, and the names of the events of the previous Winter Olympics in advance, and firstly, set up a specific template and replaced it with entities, synonym replacement, sentence splicing, and other methods to automatically batch generate user question datasets about the Winter Olympics domain, in which the questions containing three intentions, two intentions, and a single intention were 2000, 1483, and 1494, respectively.At the same time, to validate the generalization of the present model, this paper adopted the multi-intention dataset Mix-ATIS provided by Qin [30] for validation since it is an English dataset.Therefore, this paper translated the English problem dataset into a Chinese problem dataset through Baidu Translate, in which the problem datasets containing three intentions, two intentions, and one intention were 3983, 9371, and 1366, respectively.The dataset is divided into training set, validation set, and test set, the specific number of which is shown in Table 1.Examples of Winter Olympics datasets are shown in Table 2, and MixATIS datasets are shown in Table 3.

Experimental Parameters
The model and benchmark experiments were conducted on the Windows 10 operating system and implemented using the Pytorch deep learning framework.The parameter settings of this model are shown in Table 4.

Evaluating Indicator
Because this article is aimed at identifying multiple intentions, the macro average accuracy (Macro_P), macro average recall rate (Macro_R), and macro average F1 value (Macro_F1) were used as evaluation indicators.The confusion matrix is shown in Table 5.The calculation method for the evaluation indicators is as follows: In the following formula, i represents the i-th intention category and N represents the total number of intention categories.
This value is the macro average accuracy, which represents the overall accuracy of the model's prediction results for each type of label.The closer this value is to 1, the better the model's accuracy.
This value is the macro average recall rate, which represents the coverage of the model's prediction results for each type of label and the sum of the average values to obtain the overall coverage.The closer this value is to 1, the better the recall rate of the model.
This value is the macro average F1 value, which represents the accuracy and coverage of the model's prediction results for each type of label.This value is the harmonic average of the accuracy and recall.The closer this value is to 1, the better the model's performance.

The Influence of the Number of Attention Heads and the Size of Convolutional Kernels on the Model's Intention Recognition Performance
To verify the effect of the number of heads and the size of the convolutional kernel on the intention recognition, this article used the method of controlling variables for comparative verification.Firstly, the convolutional kernel size was set to [3,4,5] unchanged, and the number of attention heads was set to 4, 8, 12, and 16, respectively.The results are shown in the first row of Figure 5: when the number of attention heads was set to 12, the model performed best; therefore, in the subsequent experiments, this model set the number of attention layers of multiple heads to 12 unchanged, and set the convolutional kernel sizes to [2,3,4], [3,4,5], and [4,5,6] for comparative experiments.The results are shown in the second row of Figure 5: when the convolutional kernel size was [3,4,5], [3,4,5,6], the model had the best intention recognition effect.Due to the very similar values, in order to reduce computational costs, this model set the convolutional kernel size to [3,4,5].

Comparative Experiment
This article uses comparisons with the following models:

Experimental Results and Analysis
This article tested the macro average accuracy P, macro average recall R, and macro average F1 values of each model on the Winter Olympics Chinese question data test set, as shown in Table 6.

Comparative Experiment
This article uses comparisons with the following models:

Experimental Results and Analysis
This article tested the macro average accuracy P, macro average recall R, and macro average F1 values of each model on the Winter Olympics Chinese question data test set, as shown in Table 6.By comparing the performances of the different models on the multi-intent recognition task for the Winter Olympics problem in Table 6, we were able to gain a deeper understanding of their performances, strengths, and limitations.First, we used Bert as the baseline model, which performed well on the Macro_P value (98.40%) and Macro_F1 value (95.51%), highlighting Bert's superior ability to deal with multi-intent problems.
Further comparison experiments showed that the introduction of both TextCNN and BiLSTM-Multi-heads attention models alone could effectively improve the performance of the multi-intent recognition model, resulting in improvements of 0.96% and 1.14% in the Macro_F1 value, respectively.However, our in-depth analysis revealed that this enhancement effect may have been affected by the characteristics of the problem dataset in real tasks.Considering the characteristics of the problem dataset with longer text and more contextual associations, we paid special attention to the performances of the TextCNN and BiLSTM-Multi-heads attention models introduced separately.
The TextCNN model focuses more on the local key features in text features, which is suitable for dealing with short texts and situations where local features are more important.However, in dealing with long texts and multi-intent recognition tasks with more contextual relevance, TextCNN may be limited by its local attention property, resulting in a relatively weak performance in capturing overall contextual features.In contrast, the BiLSTM-Multi-heads attention model performs better in this case, as it is good at capturing the semantic features of long-range text and more suitable for processing tasks with a high contextual relevance.
In terms of model blending, by combining the two models, BiLSTM-Multi-heads attention and TextCNN, the BCNBLMATT model improved the Macro_P, Macro_R, and Macro_F1 values by 1.2%, 2.52%, and 2.61%, respectively, compared to the single model.This confirms the superiority of multi-model fusion, which makes full use of the advantages of each model to extract the semantic features of the problem text more comprehensively, thus improving the overall performance.
We further validated the effectiveness of the Multi-head Attention mechanism in the BCNBLMATT model.Compared to the regular Attention mechanism, the Multi-head Attention mechanism improved the Macro_F1 value by 1.9%.This provides empirical support for the choice of the Attention mechanism, especially the effectiveness of Multi-head Attention in dealing with multi-intention problems.This result re-emphasizes the importance of selecting Multi-intent Attention mechanisms when solving complex tasks.
In the results of the comparative experiments on the MixATIS dataset, we first focused on the performance of Bert as a benchmark model.Even in the multi-intent recognition task in different domains, Bert still showed an excellent level of performance.Its performance in terms of the Macro_P, Macro_R, and Macro_F1 values reached 97.62%, 94.74%, and 95.78%, respectively.This indicates that Bert has a strong generalization ability and can handle natural language understanding tasks in different domains.
Subsequently, we introduced the BCNBLMATT model and observed its performance on the MixATIS dataset.The results showed that the BCNBLMATT model achieved 97.70%, 97.04%, and 97.20% in terms of the Macro_P, Macro_R, and Macro_F1 values.This further validates its superiority in solving multi-intention problems.
Taken together, the BCNBLMATT model can maintain an excellent performance on multi-intent recognition tasks in different domains, laying a solid foundation for its generalization in practical applications and making it a reliable solution with wide applicability.

Conclusions and Outlook
Synthesizing the results and findings of the study, we conducted an in-depth performance comparison and detailed analysis of a multi-intent recognition task for the Winter Olympics problem.Through a comprehensive understanding of the performances of different models, we found that each model had different advantages and disadvantages in dealing with long text and contextual relevance.In particular, in facing the context of long text and contextual relevance, such as the Winter Olympics multi-intent problem, the TextCNN model focused more on local key features, while the BiLSTM-Multi-heads attention model was better at capturing the semantic features of long-distance text.In terms of model fusion, we used the self-constructed BCNBLMATT model, which combined the advantages of both the BiLSTM-Multi-heads attention and TextCNN models, and the model achieved a significant performance improvement compared to a single model in terms of the Macro_P, Macro_R, and Macro_F1 values, respectively, reaching 1.2%, 2.52%, and 2.61%.This highlights the superiority of the Multi-model fusion strategy, which extracted the semantic features of the problem text more comprehensively and successfully meets the challenges of complex tasks.In addition, we verified the effectiveness of the Multi-head Attention mechanism in improving the Macro_F1 value by 1.9% compared to the conventional Attention mechanism, highlighting the importance of choosing the Multi-head Attention mechanism when solving multi-intent problems.
Overall, by deeply exploring the performance of the BCNBLMATT model and cleverly integrating the advantages of different models, this study provides a more comprehensive understanding of solving the multi-intent recognition task for Winter Olympic problems.These findings have a guiding significance for academic research and provide strong insights for practical applications.The future research direction can consider integrating the BCNBLMATT model into the Winter Olympics Q&A model to solve more complex problems, provide more rapid, convenient, and accurate information query services for the Winter Olympics, and provide strong support for the development of China's sports industry.

Figure 5 .
Figure 5.The impact of different parameters on indicator values.

( 1 )
Bert: the BERT pre-trained language model was only used for feature extraction and intention classification.(2) Bert-textcnn: BERT was used as a pre-trained language model for local feature extraction and intention classification using TextCNN.(3) Bert-blmatt: To verify this model's effectiveness, local decomposition validation was performed on the model, and Bert-blmat was used as a comparative experiment.BERT was used as a pre-trained language model, global semantic features were extracted through BiLSTM, the noise was reduced through Multi-heads attention, and key features were extracted for intention recognition.(4) Bert-cnn+blatt: In order to verify the effectiveness of Multi-heads attention, this article replaced Attention with Multi-head attention for comparative experiments.BERT was used as a pre-trained language model, and the local features of the text and the key features in the global semantic information were extracted through TextCNN and BiLSTM-attention.Finally, the two were concatenated and fused for intention recognition.(5) BCNBLMATT: Using BERT as a pre-trained language model, the problem text was transformed into text semantic features.This feature was then used to obtain the local features of the text and the key features in the global semantic information through TextCNN and BiLSTM-Multi-heads attention.Finally, the two were concatenated and fused for intention recognition

Figure 5 .
Figure 5.The impact of different parameters on indicator values.

( 1 )
Bert: the BERT pre-trained language model was only used for feature extraction and intention classification.(2) Bert-textcnn: BERT was used as a pre-trained language model for local feature extraction and intention classification using TextCNN.(3) Bert-blmatt: To verify this model's effectiveness, local decomposition validation was performed on the model, and Bert-blmat was used as a comparative experiment.BERT was used as a pre-trained language model, global semantic features were extracted through BiLSTM, the noise was reduced through Multi-heads attention, and key features were extracted for intention recognition.(4) Bert-cnn+blatt: In order to verify the effectiveness of Multi-heads attention, this article replaced Attention with Multi-head attention for comparative experiments.BERT was used as a pre-trained language model, and the local features of the text and the key features in the global semantic information were extracted through TextCNN and BiLSTMattention.Finally, the two were concatenated and fused for intention recognition.(5) BCNBLMATT: Using BERT as a pre-trained language model, the problem text was transformed into text semantic features.This feature was then used to obtain the local features of the text and the key features in the global semantic information through TextCNN and BiLSTM-Multi-heads attention.Finally, the two were concatenated and fused for intention recognition