Sentence Representation Method Based on Multi-Layer Semantic Network

: With the development of artiﬁcial intelligence, more and more people hope that computers can understand human language through natural language technology, learn to think like human beings, and ﬁnally replace human beings to complete the highly difﬁcult tasks with cognitive ability. As the key technology of natural language understanding, sentence representation reasoning technology mainly focuses on the sentence representation method and the reasoning model. Although the performance has been improved, there are still some problems such as incomplete sentence semantic expression, lack of depth of reasoning model, and lack of interpretability of the reasoning process. In this paper, a multi-layer semantic representation network is designed for sentence representation. The multi-attention mechanism obtains the semantic information of different levels of a sentence. The word order information of the sentence is also integrated by adding the relative position mask between words to reduce the uncertainty caused by word order. Finally, the method is veriﬁed on the task of text implication recognition and emotion classiﬁcation. The experimental results show that the multi-layer semantic representation network can promote sentence representation’s accuracy and comprehensiveness.


Introduction
Natural language inference (NLI) has become one of the most important benchmark tasks in the field of natural language understanding because of its complex language understanding and in-depth information involved in reasoning [1]. Natural language reasoning technology is widely used in automatic reasoning [2], machine translation [3], question answering systems [4], and large-scale content analysis. Compared with computer vision and speech recognition technology, natural language reasoning has not reached a high level because of its technical difficulties and complex application scenarios [5,6]. Once the natural language reasoning technology makes a breakthrough and realizes the real barrier-free communication between humans and machines, human life quality will be greatly improved.
The sentence representation method's research focuses on improving the sentence representation module's performance to obtain complete and accurate semantic information coding to improve the method's performance. The sentence representation module aims to map natural language sentences into a dense vector space under the premise of keeping the sentence expression semantics unchanged. It transforms the complex logical reasoning process into the solution process of similarity between sentences and solving the relationship between sentences by computer [7][8][9].
Before the emergence of sentence-level representation technology, sentence semantic representation used Continuous Bag of Words (CBOW) embedded distributed representation technology based on word coding to represent the text as a fixed-length sentence vector. A man selling donuts to a customer during a world exhibition event held in the city of Angeles.

Contradiction C C C C C
A woman drinks her coffee in a small café.
A man in a blue shirt standing in front of a garage-like structure painted with geometric designs.
Neutral N E N N N A man is repainting a garage.

Multi-NLI Dataset
The Multi-NLI dataset [30] contains 433 k text pairs, which is different from the SNLI dataset [31]. It covers more data close to real life, such as novels and telephone voice. The sample data is shown in Table 2. The data set contains 10 categories of data. According to whether the same category appears in the training set and test set simultaneously, it is divided into matched and mismatched set. Table 2. Sample data of Multi-NLI dataset.

Category Premise Sentence The Label Hypothetical Sentence
Novel The Old One always comforted Ca'daan, except today. neutral Ca'daan knew the Old One very well.

Message
Your gift is appreciated by each and every student who will benefit from your generosity.
neutral Hundreds of students will benefit from your generosity.
Cell yes now you know if everybody like in August when everybody's on vacation or something we can dress a little more casual or contradiction August is a black out month for vacations in the company.
The text implication task is carried out on the matching set and the unmatched set. The data are divided into a training set (392,702 samples) and a matching/mismatching verification set (9815/9832 samples). Since the test set data cannot be obtained, this paper uses the verification set instead of a test set.

Yelp Dataset
Yelp data set is the public data provided by Yelp, the largest review website in the United States. It is mainly used for recommendation systems and sentiment analysis tasks. It provides the forms of JSON and SQL, which can be directly used in any application. This paper randomly selects 500,000 samples from the Yelp dataset as the model's training set, 2000 samples as the verification set, and 2000 samples as the test set.

Methods
Natural language semantics refers to the probability and meaning contained in natural language sentences. Semantics express the meaning of words and covers various logical relationships such as causality and transition between things. In short, semantics is the description and logical representation of things [32,33].

Semantic Extraction Based on Bidirectional Long-Short Memory Network
The core of the sentence representation method is the design of the representation network. Sentence representation based on the neural network has gradually become the main method in sentence representation. Although the recurrent neural network can retain the information between adjacent words, it cannot learn the information between the words far away in the sentence. The emergence of long-short term memory (LSTM) solves the problem that the recurrent neural networks cannot learn long-term information. The LSTM and RNN both have the chain shape of repetitive neural network modules. The difference is that the unit modules are different. As shown in Figure 1, three gates are added to the LSTM unit model: input gate, forgetting gate, and output gate. Yelp data set is the public data provided by Yelp, the largest review website in the United States. It is mainly used for recommendation systems and sentiment analysis tasks. It provides the forms of JSON and SQL, which can be directly used in any application. This paper randomly selects 500,000 samples from the Yelp dataset as the model's training set, 2000 samples as the verification set, and 2000 samples as the test set.

Methods
Natural language semantics refers to the probability and meaning contained in natural language sentences. Semantics express the meaning of words and covers various logical relationships such as causality and transition between things. In short, semantics is the description and logical representation of things [32,33].

Semantic Extraction Based on Bidirectional Long-Short Memory Network
The core of the sentence representation method is the design of the representation network. Sentence representation based on the neural network has gradually become the main method in sentence representation. Although the recurrent neural network can retain the information between adjacent words, it cannot learn the information between the words far away in the sentence. The emergence of long-short term memory (LSTM) solves the problem that the recurrent neural networks cannot learn long-term information. The LSTM and RNN both have the chain shape of repetitive neural network modules. The difference is that the unit modules are different. As shown in Figure 1, three gates are added to the LSTM unit model: input gate, forgetting gate, and output gate.  LSTM first determines what information should be discarded at time t through the forgetting gate, then determines what information should be saved at time t through the input layer, then updates the temporary information according to the information of input gate and forgetting gate, and finally outputs the result of time t and the hidden layer state h t . The hidden layer state h t will serve as input to the next neuron. Suppose that the input at time t is x t . The temporary information is C t . The output is o t . The calculation formula of each door is shown in Formulas (1)-(6).
(1) Forget gate Among them, W f , W i , W o , and W C are the weight parameter of each gate in the unit module; B is the weight parameter of each gate in the unit module; b f , b i , b o , and b C are the corresponding bias variables; σ and tanh are the activation functions of the network; h t−1 and C t−1 were the state of hidden layer and cell at the last moment; and C t represents the information that should be updated in the cell at time t.
The disadvantage of LSTM in sentence representation modeling is that it only considers single direction word order when more fine-grained classification is needed.
Bi-directional long-short memory network (BiLSTM) considers the hidden layer state h t = [h Lt , h Rt ] of two directions of LSTM at the same time and solves the problem of bidirectional semantic dependency. The structure of BiLSTM is shown in Figure 2. The forward and backward network structures can be consistent or different from each other. LSTM first determines what information should be discarded at time through the forgetting gate, then determines what information should be saved at time through the input layer, then updates the temporary information according to the information of input gate and forgetting gate, and finally outputs the result of time t and the hidden layer state ℎ . The hidden layer state ℎ will serve as input to the next neuron. Suppose that the input at time is . The temporary information is . The output is . The calculation formula of each door is shown in Formulas (1)-(6).
(1) Forget gate (2) Input gate (3) Output gate Among them, , , , and are the weight parameter of each gate in the unit module; B is the weight parameter of each gate in the unit module; , , , and are the corresponding bias variables; and ℎ are the activation functions of the network; ℎ and were the state of hidden layer and cell at the last moment; represents the information that should be updated in the cell at time .
The disadvantage of LSTM in sentence representation modeling is that it only considers single direction word order when more fine-grained classification is needed.
Bi-directional long-short memory network (BiLSTM) considers the hidden layer state ℎ = [ℎ , ℎ ] of two directions of LSTM at the same time and solves the problem of bidirectional semantic dependency. The structure of BiLSTM is shown in Figure 2. The forward and backward network structures can be consistent or different from each other.

Design of Semantic Representation Network Based on Multi Attention
The multi-layer semantic representation network includes three steps: sentence vectorization, semantic information extraction, and reinforcement, and sentence embedding representation. Specifically, the two-way long-term and short-term memory networks are used as the basic framework of the network. The multi-attention mechanism is used to construct a multi-layer semantic representation network to obtain the semantic information of different levels of sentences. The relative position information is added to the network to strengthen the core semantic relationship in the sentence and reduce the interference of redundant information.

Natural Language Sentence Vectorization
The function of sentence vectorization is to obtain the initial embedded representation of the sentence and the semantic representation network's embedding layer. This part is composed of the Seq2Word module, Word2Char module, and embedded layer. The network structure is shown in Figure 3.

Design of Semantic Representation Network Based on Multi Attention
The multi-layer semantic representation network includes three steps: sentence vectorization, semantic information extraction, and reinforcement, and sentence embedding representation. Specifically, the two-way long-term and short-term memory networks are used as the basic framework of the network. The multi-attention mechanism is used to construct a multi-layer semantic representation network to obtain the semantic information of different levels of sentences. The relative position information is added to the network to strengthen the core semantic relationship in the sentence and reduce the interference of redundant information.

Natural Language Sentence Vectorization
The function of sentence vectorization is to obtain the initial embedded representation of the sentence and the semantic representation network's embedding layer. This part is composed of the Seq2Word module, Word2Char module, and embedded layer. The network structure is shown in Figure 3. Suppose input = ( , , … , ), depending on the word in sentence input into the embedding layer of multi-layer semantic representation network, and calculate the initial embedding representation of sentence , where ( = 1, 2, . . . , ) is the th word of sentence , and is the length of sentence. The calculation process of each module is as follows: 1. Seq2Word module Suppose input S = (W 1 , W i , . . . , W L ), depending on the word W i in sentence S input into the embedding layer of multi-layer semantic representation network, and calculate the initial embedding representation H of sentence S, where W i (i = 1, 2, . . . , L) is the ith word of sentence S, and L is the length of sentence. The calculation process of each module is as follows: 1. Seq2Word module The module is mainly used to obtain the embedded representation of the word vector corresponding to each word in the sentence S, the word-level word vectorization processing. The sentence is mapped from the natural language space to the vector space by splicing all the word vector representations of the sentence. The specific steps are as follows: (1) Word2vec technology [10] combined with GloVe-840B-300D [34] was used to train the word vector embedding representation of the dictionary database; (2) According to the word's position in the dictionary, the embedding vector w corresponding to each word in the sentence is obtained w i ∈ R d w ×1 .

Word2Char module
The module uses convolution networks of different sizes to extract character features in words [35] and obtain word embedding representation at the character level. The specific steps are as follows: (1) According to the given character dictionary (including 24 letters and common punctuation and special characters), the word W i is split into a list of characters; (2) After a single-layer convolution network and pooling layer, the character vector c i ∈ R m×d c of the word W i is obtained, where d c is the dimension of each letter mapping, and m is the number of pooling layers. The convolution layer uses n * h convolution kernel (n is the dimension of character embedding vector, h is the size of convolution kernel window).

Embedded layer
Firstly, the word vector w i obtained from the Seq2Word module and the character vector c i obtained from the Word2Char module are combined to form the semantic vector e i ∈ R d e of the word w i . Finally, the word vector sequence x = {e 1 , · · · , e t , · · · , e L } of sentence S, where x ∈ R L×d e , d e is the dimension of the semantic vector e i , and d e is the sum of d w and d c .
The N-layer BiLSTM network is used to obtain further information of the word vector sequence x of the sentence to obtain the context information in the sentence. For each layer of the network, the hidden layer state of the forward network and the backward network → h t and ← h t , are as follows: splicing → h t and ← h t to obtain the hidden layer where u is the number of neurons in the forward (backward) network of each layer. After N-layer cycle, the hidden layer state of the top-level network is output as the embedded representation H ∈ R N×2u of sentence S.

Multi-Layer Semantic Information Extraction and Enhancement
This part is the core of the multi-layer semantic representation network. Its purpose is to obtain as many layers of sentence semantic information as possible. As shown in Figure 4, this paper constructs a multi-layer attention network using the multi-head attention mechanism. It emphasizes the sentence word order by adding a location mask. Finally, the fused multi-layer semantic vector is used as the input of the representation layer.

Multi-layer semantic information extraction
The single-layer attention network can only capture the semantic information of a single layer. Influenced by Ling [36], we build a semantic extraction network based on the multi-attention mechanism to capture more semantic information. For the layer in the multilayer attention network, the attention weight vector ∈ × of the layer is calculated first; the calculation method of attention weight vector is shown in Formula (8).
Although the attention mechanism can capture important semantic information in a sentence, the attention mechanism's calculation method to obtain semantic information is based on the word bag model. It does not pay attention to each word's order. Suppose the self-attention layer mainly captures the phrase components in the sentence. As long as the sentence phrases still have strong internal relations, the sentence is disordered; the results obtained after ordering are consistent with those obtained before the disorder. However, based on the attention weight vector, there are great differences in the understanding of sentences. Therefore, word order , information is added to strengthen the core semantic relationship of sentences.
2. Word order information enhancement Word order information can be a deterministic function of position [37] or obtained by network representation learning. This paper obtains word order information by calculating the relative distance between words considering the network structure to not increase the network complexity. Firstly, the absolute distance = | − | of any two words in the word vector sequence corresponding to sentence is calculated, where , = 1, 2, . . . . , , are the number of participles contained in the sentence. Then, for the distance, carries on the deviation standardization processing, obtains the word order information , ensuring that even in complex sentences (containing more words) will not calculate the model too large and cause the calculation to be too slow. The calculation formula of is as follows:

Multi-layer semantic information extraction
The single-layer attention network can only capture the semantic information of a single layer. Influenced by Ling [36], we build a semantic extraction network based on the multi-attention mechanism to capture more semantic information. For the layer τ in the multilayer attention network, the attention weight vector a τ ∈ R N×2u of the layer is calculated first; the calculation method of attention weight vector a τ is shown in Formula (8).
where τ = 1, 2, · · · , λ, λ denotes the number of layers of multilayer attention network, α τ ∈ R d a ×2u , β τ ∈ R 2u×d a , α τ ∈ R d a ×1 , β τ ∈ R 2u×1 is the parameter of layer τ, T is a transposition, σ is activation function, and H ∈ R N×2u is the embedded representation of sentence S. Although the attention mechanism can capture important semantic information in a sentence, the attention mechanism's calculation method to obtain semantic information is based on the word bag model. It does not pay attention to each word's order. Suppose the self-attention layer mainly captures the phrase components in the sentence. As long as the sentence phrases still have strong internal relations, the sentence is disordered; the results obtained after ordering are consistent with those obtained before the disorder. However, based on the attention weight vector, there are great differences in the understanding of sentences. Therefore, word order a τ , information is added to strengthen the core semantic relationship of sentences.
2. Word order information enhancement Word order information can be a deterministic function of position [37] or obtained by network representation learning. This paper obtains word order information by calculating the relative distance between words considering the network structure to not increase the network complexity. Firstly, the absolute distance D ij = |i − j| of any two words in the word vector sequence x corresponding to sentence S is calculated, where i, j = 1, 2, . . . , L, L are the number of participles contained in the sentence.
Then, for the distance, D ij carries on the deviation standardization processing, obtains the word order information P ij , ensuring that even in complex sentences (containing more words) D ij will not calculate the model too large and cause the calculation to be too slow. The calculation formula of P ij is as follows: where D max is the maximum absolute distance between any two words, and D min is the minimum absolute distance between any two words.
Then, the word order information is used to weight the attention weight vector a τ , that is, Formula (9) is replaced into Formula (8) to calculate the weighted semantic weight a τ , The calculation formula of a τ is shown in (10).
where τ = 1, 2, · · · , λ, λ denotes the number of layers of multilayer attention network, and α τ , β τ , α τ , β τ is the parameter of the τ layer in the multi-layer semantic network. The parameter al pha regulates the influence of word order information on hierarchical semantic information, adjusted according to specific tasks.
Finally, the multi-layer semantic weight vector A ∈ R λ×N×2u is formed by splicing the semantic information extracted from the multi-layer attention network where concat(·) is the splicing operation.

Sentence Embedding Representation Generation
Using the multi-layer semantic weight A , obtained from the multi-layer attention network, the sentence embedding representation H obtained by the embedding layer is weighted to obtain the multi-layer semantic information M , as shown in Formula (12).
where denotes point multiplication. Due to the difference in sentence length, the length of multi-layer semantic information obtained is also different. For sentence representation, the final representation result should be transferable. The size of the sentence embedded representation needs to be unified before the representation result is output. In this paper, through the maximum pooling operation, the sentence embedding representation V of the representation layer is output, as shown below.

Experiment
Experiments are carried out on the text implication task and emotion classification task, compared with the existing sentence representation methods to verify the proposed representation method's performance based on the multi-layer semantic network.

Experimental Steps
For the task of text implication recognition, to avoid the interference caused by the reasoning process, to judge the performance of a multi-layer semantic network, a complete sentence representation reasoning model is formed by combining the general reasoning model [38] and multi-layer semantic representation network. The inference module uses a simple heuristic matching method. The inference information includes the embedded representation of the premise sentence and the hypothetical sentence, the difference between the premise sentence and the hypothetical sentence, and the product of the two.
The prediction module comprises a fully connected neural network and a classifier used to predict and classify the implication relationship. The specific experimental steps are as follows: (1) Firstly, the multi-layer semantic representation network is used to obtain the sentence representation of the premise statement and the hypothetical sentence as u and v; (2) Then, the inference information |u − v| and u * v between sentences are calculated by the inference module; (3) Then, the inference information |u − v|, u * v, premise sentence representation u, and hypothetical sentence representation v are spliced and input into the fully connected network [26] to predict the classification implication relationship.
This paper uses Yelp data to do an emotion classification task [29]. According to their comments, this paper judges users' star ratings. It divides them into one star to five stars, from negative evaluation to positive evaluation. The basic experimental steps are the same as above. The only difference is that the final three classifications are replaced by the five.

Evaluation Index
The evaluation of sentence representation in this paper is mainly divided into the objective evaluation and subjective evaluation. The higher the correlation, the closer the representation content is to the original meaning of the sentence.
The objective evaluation uses the accuracy index to evaluate the performance of the model. The formula of accuracy is as follows: where N t is the number of samples with a correct prediction, N p is the number of samples consistent with the target labels in the prediction results, and the total number of samples is N all .

Parameter Setting
This paper's experiments are based on the Theano1.0 deep learning platform, using NVIDIA GeForce GTX 1070 hardware acceleration. The general model parameter settings refer to the work of Zhu [29], and the specific parameters are as follows: (1) Word vector pre-training is initialized by GloVe-840B-300D [34]. For words not in this table, Gaussian distribution is used for random initial vectorization. The final word vector dimension is 300D. (2) The dimension of the character vector in the Word2Vec model is set to 15. CNN network adopts a single-layer network with 100 channels, and the length of each layer of the network is 1/3/5. (3) To prevent overfitting, we set Dropout [39] to select from (0.5, 0.65, 0.7, 0.75, 0. 8) according to the task to achieve the best effect, and early stop strategy is adopted. (4) The BiLSTM network dimension of the model on the SNLI dataset is 600. The number of batch learning is set to 32. The initial learning rate is 0.0004; the BiLSTM network dimension of Multi-NLI and Yelp data sets is 300, the number of batch learning is set to 8. The initial learning rate is 0.0002 and 0.0001, respectively. (5) The model training adopts Adam [40] optimization algorithm. Table 3 shows different SNLI data models' performance, including the neural networkbased model, attention-mechanism-based model, and sentence representation method proposed in this paper. The models based on attention mechanism include (1) Multi-head [23]: using only an eight-layer multi-head attention mechanism; (2) DiSAN [24]: 300-dimensional bidirectional self-attention network, including forward and backward self-attention network. The models based on the neural network include (1) 300D LSTM [41]: composed of a 300-dimensional unidirectional LSTM network; (2) BiLSTM [42]: composed of a 600dimensional BiLSTM, including 300-dimensional forward LSTM and 300-dimensional backward LSTM; (3) BiLSTM Intra-attention [41]: combining a single layer self-attention mechanism based on BiLSTM network; and (4) BiLSTM Deep-Gated attention [22]: Based on the BiLSTM network, it combines the depth threshold attention mechanism.

Experimental Result on SNLI Dataset
The experimental results show that the multi-layer semantic representation network's accuracy rate in this paper reaches 91.7% on the SNLI training set and 86.1% on the test set. The highest accuracy rate of the other two models is only 85.6%. The model's performance in this chapter is better than that of the other two types of models. It shows that the accuracy of sentence representation can be effectively improved; by adding multi-layer semantics in the representation, the reasoning result is improved. Table 4 shows the performance of each model on the Multi-NLI dataset. The CBOW model and BiLSTM model [30] use simple single-layer semantics to generate sentence embedding representation. The accuracy rates of the CBOW model and BiLSTM model [30] are 64.8% and 66.9% on the matching test set, and 64.5% and 66.9% on the unmatched test set, respectively. Compared with the other three models, the proposed multi-layer semantic representation network's accuracy is 73.6% on the Multi-NLI matching set and 73.8% on the non-matching set. The proposed method's accuracy is 0.2% higher than the best one at present. This result indicates that this paper's sentence representation method is more conducive to capturing complex semantic relations.

Experimental Results on Yelp Dataset
The performance of each model on the Yelp dataset is shown in Table 5. The baseline model adopts the neural network structure (BiLSTM/CNN) and pooling layer (maximum pooling/self-attention pooling). Among them, the model of BiLSTM combined with selfattention pooling proposed by Zhu [29] performs best, and the accuracy rate on the Yelp test set reaches 64.2%. In contrast, the accuracy rate of the multi-layer semantic representation network proposed in this paper is 63.8% on the Yelp test set. However, it does not exceed Lin's model, which also has relatively high accuracy. Table 5. The accuracy of each model in emotion classification task on Yelp dataset.

Model
The The reason for this situation is that, compared with the general classification problem, emotional analysis needs to be divided into several types according to the author's attitude. The various types cannot be completely distinguished. For example, "very good" and "good" have an inclusive relationship and cannot be completely separated. Therefore, this requires more in-depth mining of potential emotions in sentence representation sensory level information. Although the model proposed in this paper can obtain more information and make up for the lack of single-layer attention, the model's depth is not enough to capture deeper emotional analysis tasks. The accuracy rate on emotional tasks has not been improved.

Semantic Relevance Analysis
In addition to quantitative analysis of the model, to test whether the model can capture multi-layer semantic information, qualitative analysis, namely semantic correlation analysis, is carried out. Figures 5 and 6 show the results of visualizing the semantic relevance of the sentence "a person is training his horse for a competition" by using the BiLSTM model [42] and our sentence representation method based on the multi-layer semantic network.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 17 Table 5. The accuracy of each model in emotion classification task on Yelp dataset.

Model
The The reason for this situation is that, compared with the general classification problem, emotional analysis needs to be divided into several types according to the author's attitude. The various types cannot be completely distinguished. For example, "very good" and "good" have an inclusive relationship and cannot be completely separated. Therefore, this requires more in-depth mining of potential emotions in sentence representation sensory level information. Although the model proposed in this paper can obtain more information and make up for the lack of single-layer attention, the model's depth is not enough to capture deeper emotional analysis tasks. The accuracy rate on emotional tasks has not been improved.

Semantic Relevance Analysis
In addition to quantitative analysis of the model, to test whether the model can capture multi-layer semantic information, qualitative analysis, namely semantic correlation analysis, is carried out. Figures 5 and 6 show the results of visualizing the semantic relevance of the sentence "a person is training his horse for a competition" by using the BiLSTM model [42] and our sentence representation method based on the multi-layer semantic network.  The darker color indicates that the correlation between the two words is more important to the representation of the sentence. Compared with Figures 5 and 6, the multilayer semantic representation network not only focuses on the relationship between "a person", but also pays attention to other levels of information, such as "trialing his horse" and "for a competition" due to the existence of the multi-attention mechanism. This shows that the sentence representation method based on the multi-layer semantic network can express more comprehensive and accurate sentence information.

Ablation Analysis
Two ablation experiments were designed to explore semantic level and word order information on the representation effect to explore each module's influence in the semantic representation network on the sentence representation method's effect.

The Influence of Semantic Levels
In this chapter, semantic layer ablation experiments are carried out on the SNLI dataset for different layers of multi-layer semantic networks. The experimental results are shown in Figure 7. The darker color indicates that the correlation between the two words is more important to the representation of the sentence. Compared with Figures 5 and 6, the multi-layer semantic representation network not only focuses on the relationship between "a person", but also pays attention to other levels of information, such as "trialing his horse" and "for a competition" due to the existence of the multi-attention mechanism. This shows that the sentence representation method based on the multi-layer semantic network can express more comprehensive and accurate sentence information.

Ablation Analysis
Two ablation experiments were designed to explore semantic level and word order information on the representation effect to explore each module's influence in the semantic representation network on the sentence representation method's effect.

The Influence of Semantic Levels
In this chapter, semantic layer ablation experiments are carried out on the SNLI dataset for different layers of multi-layer semantic networks. The experimental results are shown in Figure 7. Figure 7 shows the change of the representation network's accuracy rate with different semantic layers on the SNLI test set. The alignment of different examples in the figure corresponds to a different number of semantic layers (vectors). When the number of semantic layers is less than five, the SNLI data set's accuracy increases with the number of layers. When the number of semantic layers exceeds five, with the SNLI data set as shown in the curve with a vector equal to 7 or 9 in the figure, the accuracy decreases with the increase of the number of layers.  To sum up, too many or too few layers of the multi-layer semantic network may result in the model's poor effect. They cannot improve the sentence representation ability. This situation is that different semantic levels represent the specific semantic information (word meaning, word order, phrase) of a sentence. Too few semantic levels will lead to the loss of important semantics in obtaining sentence semantics. On the contrary, too many semantic levels may lead to redundancy of semantic information of sentences, both of which will have a negative impact on sentence representation.

The Influence of Word Order Information
The parameter alpha is the proportion of word order information in the multilayer semantic network. Figure 8 shows the learning curve of the sentence representation method in this chapter with alpha change on the SNLI dataset. When the number of iterations is equal to 1, the smaller the alpha value, the higher the corresponding SNLI test data set's accuracy. With the increase of the number of iterations, the model's overall learning rate curve first increases and decreases with the increase of alpha. When the learning curve tends to be stable, and the alpha value is equal to 1.5, the model's performance on the SNLI test set is the best. To sum up, too many or too few layers of the multi-layer semantic network may result in the model's poor effect. They cannot improve the sentence representation ability. This situation is that different semantic levels represent the specific semantic information (word meaning, word order, phrase) of a sentence. Too few semantic levels will lead to the loss of important semantics in obtaining sentence semantics. On the contrary, too many semantic levels may lead to redundancy of semantic information of sentences, both of which will have a negative impact on sentence representation.

The Influence of Word Order Information
The parameter alpha is the proportion of word order information P ij in the multi-layer semantic network. Figure 8 shows the learning curve of the sentence representation method in this chapter with alpha change on the SNLI dataset. When the number of iterations is equal to 1, the smaller the alpha value, the higher the corresponding SNLI test data set's accuracy. With the increase of the number of iterations, the model's overall learning rate curve first increases and decreases with the increase of alpha. When the learning curve tends to be stable, and the alpha value is equal to 1.5, the model's performance on the SNLI test set is the best.
From the above analysis, we can see that alpha has a great influence on sentence representation. When alpha is equal to 2, the accuracy rate is 85.4%, which is 0.7% lower than the best performance of 86.1%. It also shows that the word order information is of great significance to the semantic representation. From the above analysis, we can see that alpha has a great influence on sentence representation. When alpha is equal to 2, the accuracy rate is 85.4%, which is 0.7% lower than the best performance of 86.1%. It also shows that the word order information is of great significance to the semantic representation.

Conclusions
This paper analyzes the characteristics of sentence semantics firstly and then lists the key technologies of existing sentence representation methods. A semantic representation network based on a multi-attention mechanism is designed for sentence representation by summarizing the advantages and disadvantages of existing technologies. This method obtains the semantic information of the same sentence at different levels through a multiple attention mechanism, increases the semantic representation of the sentence, and at the same time, by adding the relative position mask between words, integrates the word order information of the sentence to reduce the uncertainty brought by the word order. Finally, quantitative and qualitative experiments are carried out on the text implication recognition task and emotion classification task. The experimental results show that compared with some traditional networks, the multi-layer semantic representation network designed in this paper significantly improved accuracy on both the SNLI data set and the multi-NLI data set. The model can promote the accuracy and comprehensiveness of sentence meaning representation. However, the Yelp data set's accuracy did not improve significantly, possibly because the model was not deep enough to capture deeper information for emotion analysis tasks.
Although this paper explores the method of sentence representation and reasoning from sentence representation, which improves the reasoning accuracy to some extent, it is far from achieving the best effect. Because of this, future research work can be further studied from the following contents: In this paper, sentence representation uses multiple levels of semantic and word order information to represent the meaning of sentences. In future research work, the redundant relationship between multiple semantic information levels can be considered to deal. This paper, to extract the semantic information of multiple levels by joining together

Conclusions
This paper analyzes the characteristics of sentence semantics firstly and then lists the key technologies of existing sentence representation methods. A semantic representation network based on a multi-attention mechanism is designed for sentence representation by summarizing the advantages and disadvantages of existing technologies. This method obtains the semantic information of the same sentence at different levels through a multiple attention mechanism, increases the semantic representation of the sentence, and at the same time, by adding the relative position mask between words, integrates the word order information of the sentence to reduce the uncertainty brought by the word order. Finally, quantitative and qualitative experiments are carried out on the text implication recognition task and emotion classification task. The experimental results show that compared with some traditional networks, the multi-layer semantic representation network designed in this paper significantly improved accuracy on both the SNLI data set and the multi-NLI data set. The model can promote the accuracy and comprehensiveness of sentence meaning representation. However, the Yelp data set's accuracy did not improve significantly, possibly because the model was not deep enough to capture deeper information for emotion analysis tasks.
Although this paper explores the method of sentence representation and reasoning from sentence representation, which improves the reasoning accuracy to some extent, it is far from achieving the best effect. Because of this, future research work can be further studied from the following contents: In this paper, sentence representation uses multiple levels of semantic and word order information to represent the meaning of sentences. In future research work, the redundant relationship between multiple semantic information levels can be considered to deal. This paper, to extract the semantic information of multiple levels by joining together to form sentences embedded representation directly, does not consider semantic information redundancy between the relationship. Suppose the multilayer extraction between semantic meanings is similar, not only conducive to enhancing the meaning of the sentence comprehensiveness, accumulated in redundant information. In that case, this will also reduce the proportion of the core semantic information, resulting in decreased sentence accuracy.