A Deep Fusion Matching Network Semantic Reasoning Model

: As the vital technology of natural language understanding, sentence representation reasoning technology mainly focuses on sentence representation methods and reasoning models. Although the performance has been improved, there are still some problems, such as incomplete sentence semantic expression, lack of depth of reasoning model, and lack of interpretability of the reasoning process. Given the reasoning model’s lack of reasoning depth and interpretability, a deep fusion matching network is designed in this paper, which mainly includes a coding layer, matching layer, dependency convolution layer, information aggregation layer, and inference prediction layer. Based on a deep matching network, the matching layer is improved. Furthermore, the heuristic matching algorithm replaces the bidirectional long-short memory neural network to simplify the interactive fusion. As a result, it improves the reasoning depth and reduces the complexity of the model; the dependency convolution layer uses the tree-type convolution network to extract the sentence structure information along with the sentence dependency tree structure, which improves the interpretability of the reasoning process. Finally, the performance of the model is veriﬁed on several datasets. The results show that the reasoning effect of the model is better than that of the shallow reasoning model, and the accuracy rate on the SNLI test set reaches 89.0%. At the same time, the semantic correlation analysis results show that the dependency convolution layer is beneﬁcial in improving the interpretability of the reasoning process.


Introduction
Natural language inference (NLI) is a process in which the abstract representation of natural language text pairs becomes space vectors. The reasoning learns the potential relationship between text pairs. NLI has become one of the most critical benchmark tasks in natural language understanding because of its complex language understanding and in-depth information involved in reasoning. At present, NLI technology mainly consists of three parts: encoding, sentence representation (understanding), and reasoning learning. Among them, sentence reasoning learning is still a long way from the goal of practical application-the construction and optimization of the reasoning model. The research on deep learning for the NLI model is still in its infancy. Although the existing deep learning algorithm models, such as cyclic neural network and convolutional neural network, have achieved initial results in the construction of the reasoning model, they have failed to achieve a breakthrough. Therefore, there is a broad space for research on the construction and optimization of the NLI model. The construction and optimization of sentence semantic representation and reasoning models have become two core problems in NLI. No matter which aspect is improved, the effect of the whole NLI method will be affected. At the same time, it is of great significance to study the influence of the two methods on the NLI method.
In existing studies, to focus on the construction and optimization of a semantic representation and inference model, input is usually simplified as sentence pairs to avoid the interference caused by miscellaneous data. Therefore, NLI technology is also known as sentence representation reasoning technology. Before the appearance of sentence-level representation technology, the method of sentence-level representation was to use CBOW embedded distributed representation technology based on word encoding to represent text as a fixed-length sentence vector. However, with the development of neural networks and deep learning, recently, sentence representation technology has gradually developed from a combination of simple word-embedded models to more complex architecture, such as the convolutional network [1], cyclic neural network [2,3], and its deformation [4]. They have been applied to improve the performance of sentence representation. Inspired by these works, this paper decided to use a tree convolutional network for the extraction of sentence structural information.
Besides sentence representation, semantic reasoning is a process that infers the logical relationship of text pairs by analyzing the internal relationship between text information and text according to a given natural language text pair. NLI mainly adopted the method based on logical formal reasoning [5,6], which transformed sentences expressed in natural language form into logical expressions that computers can understand, and then it realized semantic reasoning using a logic interpreter. Moldovan [7] proposed a logic-based reasoning method COGEX, which is based on logic, to represent the relationship between inferential text pairs, such as syntactic objects, syntactic subjects, and causal relationships. In addition to the logical representation of input text pairs, the method also uses knowledge base content with logical representation. Raina [8] proposed a dependency syntactic logic reasoning method, which parses the syntactic relationship of the text, constructs a syntax tree, and then completes semantic reasoning through the relationship between the parent node and the child node. Logic-based reasoning technology has achieved good results in processing small-scale data. However, with the increasing data volume and the sentence structure complexity, the applicability and accuracy of the model are limited [9]. With the development of deep learning in natural language processing, semantic reasoning technology gradually changes from logic-based reasoning technology to deep learning-based reasoning technology [10]. The core of reasoning technology based on deep learning is to calculate the similarity of two semantic objects and simulate the potential correspondence between different abstract levels and different properties of "semantic objects" [11].
In order to solve the problems of the lack of reasoning depth and interpretability in the reasoning model, this paper designs a deep fusion matching network, which mainly includes a coding layer, matching layer, dependence convolution layer, information aggregation layer, and inference prediction layer. We first improve the matching layer based on the deep matching network and use a heuristic matching algorithm to replace the complex neural network as the interactive fusion mode of matching information. Secondly, the dependency convolutional layer uses a tree convolutional network (TBCNN) to extract the structural information of sentences. Finally, we analyze prediction accuracy, semantic correlation analysis, and ablation analysis of the model's performance on multiple datasets.

SNLI Dataset
The SNLI dataset is a text implication recognition dataset published by Stanford University. SNLI is manually annotated and contains 570 k text pairs. There are three kinds of marks: implication, contradiction, and neutral. In this paper, all data are divided into a training set (549,367 samples), a verification set (9842 samples), and a test set (9824 samples), according to Zhu's [12] data partition rules, and some SNLI data forms are shown in Table 1. A man in a blue shirt standing in front of a garage-like structure painted with geometric designs.
Neutral N E N N N A man is repainting a garage.

Multi-NLI Dataset
The Multi-NLI dataset published by Adina Williams, Nikita Nangia, and Sam Bowman [13] contains 433 k text pairs. Different from the SNLI dataset, it covers more data close to real life, such as novels and telephone voice. The sample data are shown in Table 2. The dataset contains 10 categories of data. Whether the same category appears in the training and test sets simultaneously, it is divided into the matched and unmatched sets.

Message
Your gift is appreciated by each and every student who will benefit from your generosity.
neutral Hundreds of students will benefit from your generosity.
Cell yes now you know if everybody like in August when everybody's on vacation or something we can dress a little more casual or contradiction August is a black out month for vacations in the company.
In this paper, the text implication task is performed on the unmatched and matching sets. The data are divided: training set (392,702 samples), matching/unmatching verification set (9815/9832 samples). Since the test set data cannot be obtained, this paper uses a verification set instead of a test set.

Methods
The semantic reasoning model based on matching [14][15][16][17][18] comprises the coding, matching, and prediction layers. The detailed explanation of the sentence representation and semantic reasoning section are presented in the following section. The information extraction method includes the matching model and the syntactic structure extraction model. The semantic reasoning section based on deep fusion matching network includes sentence coding layer, local reasoning, syntactic structure model, global reasoning, and resulting reasoning and prediction.  [19], Duan [20] proposed an attention-fused deep matching network, referred to as AF-DMN, based on the matching reasoning model. However, AF-DMN has a more complex matching layer. The matching layer is composed of T identical calculation blocks. Each calculation block contains four sub-modules: (1) cross attention layer; (2) cross attention fusion layer; (3) self-focus layer; (4) self-focus fusion layer. In order to capture the syntactic structure of a sentence, Mou [21] proposed a tree-based convolutional neural network (TBCNN). It can effectively capture syntactic information compared with a conventional convolutional neural network.

Reasoning Information Extraction Method
First, sentences are converted into parse trees. Then, the structure information of sentences is extracted along with the tree structure by a sliding window. Finally, the syntactic information is captured through the hidden and output layers. Thus, TBCNN contains syntactic parsing and convolution layers, divided into dependency convolution networks (d-TBCNN) and component convolution networks (c-TBCNN).

Design of Reasoning Model Based on Deep Fusion Matching Network
This paper proposes a semantic fusion deep matching network, referred to as SCF-DMN. The core of the model is as follows: using the improved AF-DMN model to obtain the local inference information between sentences and help to obtain the deep reasoning information; using d-TBCNN to simulate the syntactic structure information of the sentence to improve the interpretability of the reasoning process; finally, the idea of the control gate is used to fuse the local inference information and syntactic structure information of sentences to form the global reasoning information of the reasoning model, thus expanding the reasoning depth and interpretability of the model.
As shown in Figure 1, the whole matching network consists of five parts: coding layer, matching layer, dependency convolution layer, information aggregation layer, and inference prediction layer. The specific functions of each part are as follows: (1) Coding layer: it mainly completes the transformation from natural language representation to sentence embedding representation, including sentence preprocessing, vectorization, semantic information coding, and embedded representation generation. (2) Matching layer and dependency convolution layer: they mainly complete the extraction of local inference information between sentences and syntactic structure inference information. Moreover, by extracting the interactive information between sentences, implicit logic is introduced into the reasoning process to improve the interpretability of the reasoning process. (3) Information aggregation layer: it mainly completes the integration of representation information, interactive reasoning information, and syntactic structure reasoning information. All information is integrated into fixed-length semantic information using cyclic neural networks and pooling in deep learning. (4) Reasoning and prediction layer: it mainly completes the output of prediction results of specific reasoning tasks. In general, linear function and multi-layer fully connected network are used to infer the global reasoning information after fusion to predict the implication relationship of a given sentence pair. The detailed structure and function of the sub-networks are given below.
The detailed explanation and design of each part are given below.

Sentence Coding
In order to avoid the result's interference of the sentence representation on the judgment of the reasoning model, the coding layer of the deep fusion matching network is designed in this paper. The bidirectional long-short memory network is used to obtain the sentence embedded representation of the premise sentence pair (p, q). If there is no clear indication in the following text, the premise sentence is represented by substitution, and use p instead of presupposition, use q instead of hypothetical sentences.
Firstly, this paper preprocesses the p and q sentences, including English word segmentation and the removing of stop words, to obtain the word list p = (p 1 , p i , . . . , p m ) and q = (q 1 , q j , . . . , q n ), where m and n represent the number of words in the sentence and respectively.

Sentence Coding
In order to avoid the result′s interference of the sentence representation on the judgment of the reasoning model, the coding layer of the deep fusion matching network is designed in this paper. The bidirectional long-short memory network is used to obtain the sentence embedded representation of the premise sentence pair ( , ). If there is no clear indication in the following text, the premise sentence is represented by substitution, and use instead of presupposition, use instead of hypothetical sentences.
Firstly, this paper preprocesses the and sentences, including English word segmentation and the removing of stop words, to obtain the word list = ( , , . . . , ) and = ( , , . . . , ), where and represent the number of words in the sentence and respectively.
Then, the bidirectional long-short memory network combined with sentence context information is used to encode sentence semantic-information, and the hidden layer state ℎ of the th word in the sentence is obtained as shown in Formula (1).
Among them, represents the dimension word vector corresponding to theth word, which is generated by word2vec technology; ℎ and ℎ represent the hidden layer state corresponding to the previous and the next word of the th word, respectively.
Combining the hidden layer state of each word in the sentence, we obtain the sentence embedding representation , as shown in Formula (2): where ∈ × , is the length of the sentence. After the final sentence and pass through the coding layer, the sentences embedded are expressed as = ℎ , ℎ , … , ℎ and = (ℎ , ℎ , … , ℎ ).

Local Reasoning Based on Improved AF-DMN
The matching layer of the deep fusion matching network model refers to the chain structure of the AF-DMN model. It then passes through T identical matching modules to collect the local interactive reasoning information based on the sequence. The reasoning information specifically includes the internal context information of sentences and , Then, the bidirectional long-short memory network combined with sentence context information is used to encode sentence semantic-information, and the hidden layer state h i of the ith word in the sentence is obtained as shown in Formula (1).
Among them, e i represents the n e dimension word vector corresponding to the i-th word, which is generated by word2vec technology; h i−1 and h i+1 represent the hidden layer state corresponding to the previous and the next word of the ith word, respectively.
Combining the hidden layer state of each word in the sentence, we obtain the sentence embedding representation H, as shown in Formula (2): where H ∈ R 1×L , L is the length of the sentence. After the final sentence p and q pass through the coding layer, the sentences embedded are expressed as

Local Reasoning Based on Improved AF-DMN
The matching layer of the deep fusion matching network model refers to the chain structure of the AF-DMN model. It then passes through T identical matching modules to collect the local interactive reasoning information based on the sequence. The reasoning information specifically includes the internal context information of sentences p and q, and the interaction information between sentences p and q.
As shown in Figure 2, each matching module is divided into four sub-layers: interaction layer, interaction fusion layer, self-focus layer, and self-focus fusion layer. The interaction layer obtains the interactive information between sentences p and q. The interaction fusion layer enhances the extraction process of interactive information. The self-focus layer obtains the context information within the sentence to solve the long-term dependence problem. Finally, the self-focus fusion layer enhances the effect of content extraction. Figure 2, each matching module is divided into four sub-layers: interaction layer, interaction fusion layer, self-focus layer, and self-focus fusion layer. The interaction layer obtains the interactive information between sentences and . The interaction fusion layer enhances the extraction process of interactive information. The self-focus layer obtains the context information within the sentence to solve the long-term dependence problem. Finally, the self-focus fusion layer enhances the effect of content extraction. Before obtaining the interactive information between the premise statement and the hypothetical statement, it is necessary to obtain the alignment information of the relevant sub-components between the sentences, namely, the interactive attention matrix. Alignment information acquisition methods are divided into hard alignment and soft alignment. Hard alignment [5] requires one-to-one correspondence between words, while soft alignment [22] is closer to semantic information alignment. Words or phrases with consistent semantics have a higher weight on the attention matrix. For example, "near" is aligned with "be close to". Therefore, the interaction layer of the semantic fusion depth matching network uses the soft alignment proposed by Chen [23] and calculates the inner product between sentences and to obtain the correlation between sentences.

As shown in
Firstly, the correlation sub-component weight between sentences in the th matching module is calculated, and represents the correlation between the ith word in sentence and the ith word in sentence . The calculation method is shown in formula (3).
where ∈ × ， ∈ ， ∈ represents the parameter of the th matching module and represents the point multiplication operation.
Then, the weight of relevant sub-components is replaced into Formula (4) and Formula (5) to calculate the correlation matrix of the premise sentence on the hypothetical sentence and the correlation matrix of the hypothetical sentence on the premise sentence.
where and denote the participles number of sentences and respectively, and ( ⋅) denotes the exponential function with natural constant as the base. Before obtaining the interactive information between the premise statement and the hypothetical statement, it is necessary to obtain the alignment information of the relevant sub-components between the sentences, namely, the interactive attention matrix. Alignment information acquisition methods are divided into hard alignment and soft alignment. Hard alignment [5] requires one-to-one correspondence between words, while soft alignment [22] is closer to semantic information alignment. Words or phrases with consistent semantics have a higher weight on the attention matrix. For example, "near" is aligned with "be close to". Therefore, the interaction layer of the semantic fusion depth matching network uses the soft alignment proposed by Chen [23] and calculates the inner product between sentences p and q to obtain the correlation between sentences.
Firstly, the correlation sub-component weight e t between sentences in the ith matching module is calculated, and e t ij represents the correlation between the ith word in sentence p and the ith word in sentence q. The calculation method is shown in formula (3).
where W t ∈ R 2h×2h , U t l ∈ R 2h , U t r ∈ R 2h represents the parameter of the tth matching module and represents the point multiplication operation.
Then, the weight of relevant sub-components e t ij is replaced into Formulas (4) and (5) to calculate the correlation matrix a t p i of the premise sentence q on the hypothetical sentence q and the correlation matrix a t q j of the hypothetical sentence p on the premise sentence.
where m and n denote the participles number of sentences p and q respectively, and exp(·) denotes the exponential function with natural constant e as the base. The interaction information h p i between the ith word in the sentence p and the sentence, q is obtained by solving the correlation matrix a t p i and the previous matching module H t−1 p , as shown in Formula (6). where is the sentence embedding representation of the tth matching module connected to the t − 1th matching module. Similarly, the interactive information h q j between the j-th word in the sentence q, and the sentence p can be obtained, as shown in Formula (7). where, In order to further enhance the interaction between sentences p and q, SCF-DMN sets a fusion layer after the interaction layer. Because the interactive information between sentences does not depend on the previous state of a single sentence, a bidirectional long-short memory network cannot significantly improve the correlation between sentences. However, it will cause an unnecessary calculation process for the reasoning model. Therefore, the interaction fusion layer of SCF-DMN only uses the heuristic matching method to fuse the interactive information h t p i and h t q j of sentences p and q. The calculation formula of the cross fusion representation F t p i and F t q j of sentence p and q in the t-th matching module are shown in Formulas (8) and (9).
where h t p i − h t p i is the similarity difference between the word representation h t p i of the i-th word of sentence p in the t-th matching module and the corresponding interactive information h t p i . In the same way, h t p i − h t p i represents the similarity difference of the word representation h t q j of the j-th word of sentence q in the t-th matching module and corresponding interactive information h t q j . represents point multiplication operation, [. . . ; . . . ; . . .] represents splicing operation.
The self-focus mechanism is introduced into the self-focus layer to solve the longterm dependence problem in the reasoning process. Long-term dependence means that the current system's state may have been affected by the system's state a long time ago, especially for long sentences (sentence length is greater than or equal to 17).
For the premise sentence p, firstly, based on the cross fusion representation F t p i obtained from the interaction fusion layer, the internal correlation degree s t ij of each word in the sentence is calculated as follows: where F t p i and F t p j represent the cross fusion representation of the i-th word and the j-th word of the sentence p in the first matching module, and m is the number of participles in the sentence p, and · represents the Euclidean distance solution. Then, we calculate the self-focus matrix S t p i of the word using the Formula (11).
Finally, the self-focus vector h t p i of the i-th word in the sentence, p, is obtained by multiplying the self-focus matrix with the cross-fusion representation, as shown in Formula (12).
Similarly, we can obtain the self-focus vector h t q j of the j-th word in the sentence q, as shown in Formula (13). In this layer, in addition to using the heuristic matching method to model the high-order information between words in a sentence, a bidirectional long-short memory network is used to strengthen the internal information dependence of the h t p i and h t q j of self-focus vectors. For the premise sentence p, the heuristic matching method is used to obtain the self-focus fusion informationĥ t p i of the word.
Then, the fusion information of self-focusĥ t p i is input into bidirectional long-term and short-term memory network after activation function to further enhance the capture of internal information dependency, and the top-level hidden layer state is taken as the enhanced local interactive reasoning informationĥ t p i . The whole calculation process is shown in Formulas (15) and (16).
where σ(·) represents the activation function, W t h and b t h represent the parameters of the activation function; h t p i−1 and h t p i+1 represent the context of the first word in the sentence. Then, the interactive inference information H t p of the premise sentence p and the local interactive inference information H t q of the sentence q obtained by the t-th matching module in the matching layer are shown in Formula (17) and Formula (18).
The matching layer of SCF-DMN adopts a circular chain network, and the whole matching layer consists of t-th identical matching modules. The above calculation process is repeated in turn. Finally, the output of the t-th matching module is used as the local interactive inference information v q and v q .

Syntactic Structure Modeling Based on d-TBCNN
The semantic fusion depth matching network designed in this paper uses a dependency tree convolution network (d-TBCNN) to collect sentence syntactic structure reasoning information and improve inference information.
As shown in Figure 3, the dependency convolution network will perform a convolution operation on each subtree according to the result of the dependency analysis tree of the sentence, extract the syntactic structure features of the subtree, and then splice all the syntactic structure features to form the syntactic structure inference information of the sentence. The specific calculation steps are as follows: (1.) Firstly, the natural language parser [24] proposed by Stanford University is used transform sentences into a dependency syntax tree. Taking the premise sentence as example, each node in the syntax tree corresponds to a word in the sentence. The arc tween nodes indicates that the node (child node) and node (parent node) have a depe (1). Firstly, the natural language parser [24] proposed by Stanford University is used to transform sentences into a dependency syntax tree. Taking the premise sentence as an example, each node in the syntax tree corresponds to a word in the sentence. The arc between nodes indicates that the node (child node) and node (parent node) have a dependency relationship, and the arc is marked with the syntax relationship between the two nodes [25]. Because there are too many dependencies between words, some are meaningless for inferring sentence structure information. Therefore, referring to the work of Mou [26], the dependency convolution layer only retains 34 grammatical relationships that are frequently used and are more important. Some of the dependencies are shown in Table 3.  (2). (Then, the syntactic structure features corresponding to the subtree are extracted along with the dependency subtree. The feature extractor adopts a double-layer convolution layer [27]. Suppose that the child nodes connected to the parent node n p d are n c i (i = 1, 2, . . . , m c ), in which m c represents the total number of child nodes. For each subtree y c , the extracted local sentence structure features are as follows: Among them, the structural feature y c ∈ R m c , p d is the word embedding vector corresponding to the parent node, and the word embedding vector corresponding to the j-th child node is c j . The word vector representations in the dependency tree are obtained by pretraining in the coding layer. W d p ∈ R m c ×n e is the weight corresponding to the node, the weight assigned according to the dependency type between words is W d r[c i ] ∈ R m c ×n e . b d ∈ R m c is the offset vector, in which r[c j ] represents the dependency relationship between nodes p and c j .
(3). By pooling the structural features of each subtree in the sentence p, the syntactic structure features of the sentence p are shown in Formula (20), and the syntactic structure features u q of the sentence q are shown in Formula (21).
u q = (y c q 1 , y c q j , . . . , y c q n )

Global Reasoning Information
The purpose of the information aggregation layer is to combine the matching semantics v p and v q extracted from the matching layer, and the syntactic structure features u p and u q are extracted from the tree convolution layer to construct the input of the final inference prediction layer.
For the premise sentence p, firstly, the local interactive inference information v p of the sentence is connected with the corresponding syntactic structure features u p . The fusion proportion of each part is determined by adding a control gate. For the i-th word in the premise sentence p, the calculation process of the fusion reasoning information is shown in Formula (22).
where g p i is the control gate, [; ] represents the splicing operation, represents the point multiplication operation, and W g p represents the training parameters. Then, the global inference information V p = (h p 1 , h p i , . . . , h p m ) of sentence p is generated by the bidirectional long-short memory network. The calculation formula of the global inference information of each word is shown in Formula (23).
Similarly, the global inference information V q = (h q 1 , h q j , . . . , h q n ) of the sentence q can be obtained.

Result Reasoning and Prediction
The length of the global reasoning information generated by the information aggregation layer is consistent with the original length of the sentence, which may lead to the sentence pair being unable to achieve reasoning because of the inconsistent information length. Therefore, in order to unify the sentence dimension without changing the inference information, the inference prediction layer of SCF-DMN designed in this paper uses a pooling operation to convert the inference information V p and V q and the fixed-length input.
There are two common pooling operations: average pooling and maximum pooling. Maximum pooling only retains the most robust features for the fused semantic information. It discards the weaker features to reduce the impact of noise and improve the robustness of the model. The disadvantage of maximum pooling is that it is easy to lose the feature location information. However, it can be made up by combining with average pooling. Therefore, the SCF-DMN model refers to the work of Chen [23], adopts the method of average pooling and maximum pooling set, and then splices the results to form the final fixed-length sentence pair vector V. The calculation process is shown in Formulas (24) and (25).
where [; ] is the splicing operation. The sentence pair vector V is input into the multi-layer perceptual classifier to calculate the probability P i of each tag in the corresponding task. For all tasks, the objective function of training is to minimize cross-entropy, as shown in Formula (26).
where y i is the relation label; N represents the total number of sentence pairs.

Results
In this paper, an NVIDIA GeForce GTX 1070 video card is used in the experiment. All experimental codes are built based on the Theano framework. Based on the parameters of previous studies on the same datasets, the parameters of the semantic fusion deep matching network are set as follows: • The maximum length of the sentence is set to 100. The model uses word2vec technology [28] to obtain the word embedding vector, where the dimension of the word embedding vector is 300, and GloVe-840B-300D is used to initialize the pretraining word vector. For words not included in the dictionary, the value of [−0.1, 0.1] is used for random initialization, and the word vector is kept updated with the training process.

•
The dimensions of all LSTM networks in the model are 300, the activation function adopts the Relu function, and the weight parameters in the network are initialized randomly [29].

•
The model optimization uses the Adam optimization algorithm [30], the default parameters 1α and 2α are set to 0.9 and 0.99, respectively, and the initial learning rate of the network is set to 0.0002.

•
In order to prevent data overfitting, we use the Dropout strategy during training [31]. The input and output layers of each layer of the network are added to the Dropout layer and the dropout is set to 0.8.

•
For the SNLI dataset, the training process, the number of training batches, and the number of verification matches are set to 32; for the Multi-NLI dataset, the number of training batches and the number of verification matches in the training process is set to 8.

•
For the SNLI dataset, the number of matching modules in the matching layer is set to 3; for the Multi-NLI matching module, the number T is set to 2 [20].

•
The models are tested on two datasets to see if they could produce the correct answer or not. The accuracy of their performances is then calculated as the main evaluation index.

Experimental Results on SNLI Dataset
The reasoning models based on the model of Li and the model of this paper are designed and compared. The experimental results of each model on the SNLI dataset are shown in Table 4. The coding-based reasoning model includes (1) the TBCNN model combining sentence structure information into sentence representation [26]; (2) memory-enhancing neural network NES proposed by Munkhdalai [32].
Matching based reasoning models include (1) matching reasoning model based on attention mechanism [33]; (2) matching LSTM model using match LSTM instead of traditional LSTM network based on matching model [34]; (3) re-read LSTM model focusing on attention vector interaction in sentences [35]; (4) deep fusion that pays more attention to text-to-text interaction LSTM model [36]; (5) decomposable attention model using attention mechanism to decompose the responsible problem into subproblems that can be solved independently [22]; (6) ESIM model including chain LSTM and tree LSTM [23].

Results on Multi-NLI Datasets
Compared to Multi-NLI datasets, comparison models can be divided into baseline and attention-based reasoning models, as shown in Table 5. Table 5. Accuracy of each model on Multi-NLI dataset.

Model
Matching The performance of each model on the Multi-NLI dataset is shown. The baseline model CBOW and BiLSTM [13] have 64.8% and 66.9% accuracy on the Multi-NLI dataset and 64.5% and 66.9% on the unmatched set, respectively. The accuracy rates of ESIM [37] and AF-DMN model [20] based on attention mechanism are similar on the matching test set, which are 76.8% and 76.9%, respectively, and the accuracy on the unmatched set is 75.8% and 76.3%, respectively. In contrast, the accuracy of the SCF-DMN model designed in this paper is 77.1% on the matched dataset and 75.3% on the non-matching set.

Analysis of Prediction Accuracy
The SNLI results show that the accuracy of the matching-based reasoning model is higher than that of the sentence coding-based reasoning model on the SNLI test set. The highest accuracy rate of the AF-DMN model is 88.6% of that of the AF-DMN model. The reasoning model based on semantic fusion depth matching network (SCF-DMN) designed in this paper achieves 95.8% and 89.0% accuracy in the SNLI training set and test set, respectively, which improves by 1.3% and 0.4% compared with the AF-DMN model. This shows that the SCF-DMN model has deeper reasoning depth and can capture the interactive information between sentences better than the AF-DMN model, indirectly indicating that the SCF-DMN model is added to the reasoning process. The syntactic structure information is effective and can promote the inference result.
The Multi-NLI results show that the performance of the SCF-DMN model is better than AF-DMN on the matching set, but the data on the unmatched set is lower than AF-DMN. This result may be because the data in the unmatched set does not appear in the training data completely. Therefore, some relations not shown in the training data may not be learned. At the same time, the simple bidirectional long-term and short-term memory network (BiLSTM) is only used in the coding layer, which may lead to the deviation of the content expression of complex sentences, resulting in a poor learning effect. Figure 4 shows the visual results of semantic correlation between the premise sentence "a person is training his horse for a competition." and the hypothetical sentence "a person on a horse jumper over a broken-down airplane." the darker the color, the stronger the correlation between them.

Analysis of Semantic Relevance
The SCF-DMN model focuses on the close relationship between the core word "training" and the core word "jumps" in the hypothetical sentence, and "competition" is closely related to "airport". At the same time, it is also concerned that the correlation between the subject "person" and "horse" in the premise sentence and the subject "person" and "horse" in the hypothetical sentence is significantly higher than that with other words. Figure 4 shows the visual results of semantic correlation between the premise s tence "a person is training his horse for a competition." and the hypothetical sentenc person on a horse jumper over a broken-down airplane." the darker the color, the stron the correlation between them. The SCF-DMN model focuses on the close relationship between the core word "tr ing" and the core word "jumps" in the hypothetical sentence, and "competition" is clo related to "airport". At the same time, it is also concerned that the correlation between subject "person" and "horse" in the premise sentence and the subject "person" "horse" in the hypothetical sentence is significantly higher than that with other word Figure 5a,b, respectively, show the dependency syntactic relationship between premise and hypothetical sentences. The result is consistent with the result of sente pair correlation. Thus, it shows that adding syntactic structure information to the S DMN model can promote the capture of sentence structure information and the interp tation of the reasoning process.

Ablation Analysis
In order to verify each module′s influence of the SCF-DMN model on the model ablation test of the module was carried out on the SNLI dataset. The specific test res and analysis are as follows:   The SCF-DMN model focuses on the close relationship between the core word "training" and the core word "jumps" in the hypothetical sentence, and "competition" is closely related to "airport". At the same time, it is also concerned that the correlation between the subject "person" and "horse" in the premise sentence and the subject "person" and "horse" in the hypothetical sentence is significantly higher than that with other words. Figure 5a,b, respectively, show the dependency syntactic relationship between the premise and hypothetical sentences. The result is consistent with the result of sentence pair correlation. Thus, it shows that adding syntactic structure information to the SCF-DMN model can promote the capture of sentence structure information and the interpretation of the reasoning process.

Ablation Analysis
In order to verify each module′s influence of the SCF-DMN model on the model, an ablation test of the module was carried out on the SNLI dataset. The specific test results and analysis are as follows:

Ablation Analysis
In order to verify each module's influence of the SCF-DMN model on the model, an ablation test of the module was carried out on the SNLI dataset. The specific test results and analysis are as follows:

The Influence of Interactive Fusion Mode
In order to explore the impact of interactive fusion on semantic fusion deep matching networks, this paper compares the impact of two interactive fusion methods on the performance of the SCF-DMN model, namely, heuristic matching mode and heuristic matching + BiLSTM network. The final experimental results are shown in Table 6. The experimental results show that the heuristic matching method improves the accuracy rate by 0.3% and reduces the number of super parameters by 8%. Moreover, under the same training parameters, the training time is also reduced by 4 h. However, since there is not much correlation between the interactive information of sentences in the interaction fusion layer, the BiLSTM network does not improve the model's performance. On the contrary, it will introduce redundant information to reduce the reasoning accuracy and increase the super parameters. Figure 6 shows the learning curve of the SFC-DMN model with the heuristic matching method and SCF-DMN model with BiLSTM network and heuristic matching method on the SNLI dataset. The figure shows that the learning curve of the SCF-DMN model with the heuristic matching method tends to stabilize faster. As a result, the final stable state's accuracy of the former model is higher than that of the latter.
In order to explore the impact of interactive fusion on semantic fusion deep matching networks, this paper compares the impact of two interactive fusion methods on the performance of the SCF-DMN model, namely, heuristic matching mode and heuristic matching + BiLSTM network. The final experimental results are shown in Table 6. The experimental results show that the heuristic matching method improves the accuracy rate by 0.3% and reduces the number of super parameters by 8%. Moreover, under the same training parameters, the training time is also reduced by 4 h. However, since there is not much correlation between the interactive information of sentences in the interaction fusion layer, the BiLSTM network does not improve the model′s performance. On the contrary, it will introduce redundant information to reduce the reasoning accuracy and increase the super parameters. Figure 6 shows the learning curve of the SFC-DMN model with the heuristic matching method and SCF-DMN model with BiLSTM network and heuristic matching method on the SNLI dataset. The figure shows that the learning curve of the SCF-DMN model with the heuristic matching method tends to stabilize faster. As a result, the final stable state′s accuracy of the former model is higher than that of the latter. After the above comparison, it can be found that the information of the fusion layer has a certain impact on improving the model′s accuracy. This is because the heuristic matching method can emphasize the similarity and differences between sentence pairs. However, at the same time, if the fusion information is further strengthened, it will lead to information redundancy and reduce the model′s performance.  After the above comparison, it can be found that the information of the fusion layer has a certain impact on improving the model's accuracy. This is because the heuristic matching method can emphasize the similarity and differences between sentence pairs. However, at the same time, if the fusion information is further strengthened, it will lead to information redundancy and reduce the model's performance.

The Influence of Syntactic Structure Information
In order to analyze the influence of syntactic structure information on semantic fusion depth matching network, this paper designs and compares the performance of the SCF-DMN model and SCF-DMN model, excluding dependent convolution layer on SNLI dataset. The experimental results are shown in Table 7. As shown in Table 7, compared with the SCF-DMN model, the training time of the SCF-DMN model without a dependent convolution layer is reduced by 38.1%. The accuracy rate on the SNLI dataset is reduced by 0.7%, and the accuracy rate is reduced by 0.3% compared with the AF-DMN model.
As shown in Figure 7, the learning curves of each model on the SNLI dataset are shown. It is found that the convergence speed of the SCF-DMN model (without dependency convolution layer) is faster than that of the SCF-DMN model. However, after it is stable, the accuracy rate of the SCF-DMN model on SNLI data is significantly higher than that of the SCF-DMN model (without dependency convolution layer). It is consistent with the results in Table 7, indicating the inference process between sentences by dependency convolution layer. Thus, it has an essential promoting effect. sion depth matching network, this paper designs and compares the performance of the SCF-DMN model and SCF-DMN model, excluding dependent convolution layer on SNLI dataset. The experimental results are shown in Table 7. As shown in Table 7, compared with the SCF-DMN model, the training time of the SCF-DMN model without a dependent convolution layer is reduced by 38.1%. The accuracy rate on the SNLI dataset is reduced by 0.7%, and the accuracy rate is reduced by 0.3% compared with the AF-DMN model.
As shown in Figure 7, the learning curves of each model on the SNLI dataset are shown. It is found that the convergence speed of the SCF-DMN model (without dependency convolution layer) is faster than that of the SCF-DMN model. However, after it is stable, the accuracy rate of the SCF-DMN model on SNLI data is significantly higher than that of the SCF-DMN model (without dependency convolution layer). It is consistent with the results in Table 7, indicating the inference process between sentences by dependency convolution layer. Thus, it has an essential promoting effect. Although this paper explores sentence representation and reasoning methods from the perspective of the reasoning model, which improves the reasoning accuracy to some extent, it is far from achieving the best effect. Given this, future research can be further studied from the following limitations of this study, including improving hardware quality with custom ASIC or FPGA [38], considering the heavy calculation requirements for the proposed model. Although this paper explores sentence representation and reasoning methods from the perspective of the reasoning model, which improves the reasoning accuracy to some extent, it is far from achieving the best effect. Given this, future research can be further studied from the following limitations of this study, including improving hardware quality with custom ASIC or FPGA [38], considering the heavy calculation requirements for the proposed model.

Conclusions
At present, the field of NLI has become another research hotspot after the field of images. Many scholars have carried out research work in this field. This paper first introduces some basic reasoning models with their series of challenges. It then introduces the principles, advantages, and disadvantages of the AF-DMN and tree convolution networks. Finally, this paper proposes a deep fusion matching network to the reasoning model, aiming at the lack of reasoning depth and interpretability. The network consists of the coding, matching, dependency convolutional, information aggregation, and inference prediction layers. The matching layer is improved based on the deep matching network. The heuristic matching algorithm replaces the complex neural network as the interactive fusion mode of matching information, which improves the reasoning depth and reduces the complexity of the model. The dependency convolutional layer uses a tree convolutional network to extract the structural information of sentences along the dependency tree structure of sentences to make up for the unexplainably of the reasoning process. The experimental results show that the reasoning effect of the network is superior to that of the shallow matching reasoning model, and the accuracy rate on the SNLI test set reaches 89.0%. At the same time, it is found from the visualization results that the explanatory ability of the reasoning process by relying on the convolutional layer is significantly improved.
However, due to cognition and understanding, sentence representational reasoning is still the focus and difficulty in this field. The inference model in this paper is designed only considering the characteristics of the inference domain of sentence representation. Its design and performance are narrowed to this specific idea and purpose. However, with the development of transfer learning, it is found that introducing other natural language processing tasks with similar semantic characteristics to the target domain can help improve the performance of the target domain.