A Densely Connected GRU Neural Network Based on Coattention Mechanism for Chinese Rice-Related Question Similarity Matching

: In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Document Frequency) method to process and analyze the text data and compare it with the Word2vec, GloVe, and TF-IDF methods. Combined with the agricultural word segmentation dictionary, we applied Word2vec with the TF-IDF method, effectively solving the problem of high dimension and sparse data in the rice-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results show that rice-related question similarity matching based on Coattention-DenseGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the rice-related question dataset. The precision and F1 values of the proposed model were 96.3% and 96.9%, respectively. Compared with seven other kinds of question similarity matching models, we present a new state-of-the-art method with our rice-related question dataset.


Introduction
Question-and-answer (Q&A) communities [1] are knowledge service communities based on the Internet, allowing users to ask, answer, and discuss questions. These can meet the users' needs to obtain information and exchange knowledge. They can be used for research with broad development prospects in natural language processing [2] and information retrieval [3]. The China Agricultural Technology Extension Information Platform is a professional platform for agricultural technicians, in which Q&A communities play a vital role in helping farmers find solutions to their problems. As rice is one of the most widely cultivated grain crops in China, users submit more than a thousand questions in the rice-related question-and-answer module every day, and agricultural experts answer the questions quickly. However, due to the complexity of Chinese semantic expression, there are many questions with different description methods but the same semantics; the experts might answer the same questions repeatedly, which is a waste of human resources. The sparse [4], real-time, and nonstandard text data aggravate the sparseness of keyword features, making it challenging to mine the correlation between features fully. This has become one of the main tasks of text mining in agricultural information classification: finding a method to easily and quickly mine questions with the same semantics from a rice-related text dataset and provide higher quality and intelligent agricultural information services [5]. It is challenging to complete the data processing of classifying similar questions [6] with manual screening using the traditional methods. At present, the traditional and commonly used keyword query and shallow classification model [7] can assist in completing similar questions to judge; however, without automatic extraction from the data and the ability of organization, its excessive reliance on artificial selection features and classifier performance makes the classic text analysis method inapplicable in the short term. Therefore, a significant problem to be solved by the China Agricultural Technology Extension Information Platform is finding an intelligent method to classify rice-related questions automatically. A neural network model with the characteristics of flexibility and diversity shows good performance in natural language processing tasks such as text classification [8], text similarity calculation [9], and emotion analysis [10]. This kind of model can train the data in an end-to-end way, automatically learn specific tasks, and mine many semantic relations in the text; it effectively reduces the traditional statistical machine learning [11], where the researchers set a large number of features manually.
With the rapid development of computer technology, deep learning techniques such as deep convolutional neural networks and recurrent neural networks have become the mainstream text similarity calculation methods. This technology can automatically extract the key features of images and text without complex feature engineering and combine them with the classification process. These models have excellent adaptability and can be migrated easily. Nowadays, many scholars have researched the similarity calculation of English and Chinese texts by using deep learning technology.
The DSSM (Deep Structured Semantic Models) algorithm proposed by Huang [12] was designed to apply a Siamese network architecture for semantic text similarity calculation. The DSSM model achieved outstanding performance in a text-matching task but ignored word order information and contextual information. Shen et al. [13] introduced CNN (Convolutional Neural Networks) networks into the DSSM model to retain more contextual information. The improvement of this method to DSSM occurred mainly in the representation layer, where convolutional and pooling layers were added so that the contextual information was effectively retained. However, the contextual information was still lost at longer distances due to the limitation of convolutional kernels. To retain more contextual information, Palangi et al. [14] introduced the LSTM (Long Short-Term Memory) [15] network, which took into account more distant contextual information and some discourse order information to make the algorithm more practical. Mueller et al. [16] also encoded sentences using Siamese-LSTM based on pre-trained word vectors. It was experimentally demonstrated that the combination of this method and a SVM (Support Vector Machines) [17] for sentiment classification resulted in significant improvement.
With the application of self-attention technology in the field of image and natural language processing, Lin et al. [18] combined BiLSTM (Bi-directional Long Short-Term Memory) with self-attention [19] technology to obtain sentence vector representation and kept a Siamese architecture in the training network, which improved the precision of text matching. Pontes et al. [20] applied the CNN and LSTM models to calculate semantic text similarity, which improved the text similarity calculation. The method based on the Siamese network is independent when coding sentences at the coding layer, and there is no interaction between sentence pairs, which would constrain the model's capability to calculate the semantic similarity of sentence pairs. This limitation can be eliminated by the interaction model, which adds interaction between two parallel networks based on twin networks; thus, more abundant interactive information is extracted between sentence pairs. Yin et al. [21] proposed the ABCNN (Attention-Based Convolutional Neural Network) model based on word vectors to process sentences through CNN. While convolution and pooling of sentences in sentence pairs are independently carried out, an attention mechanism is used to connect two intermediate steps. Wang et al. [22] proposed a BiMPM (Bilateral Multi-perspective Matching) model based on the BiLSTM network. The input layer utilizes word vector and character vector splicing. Gong et al. [23] proposed the DIIN (Densely Interactive Inference Network) model. The input layer utilizes word embedding, character feature, and syntactic features concatenation; the coding layer utilizes a self-attention mechanism; the interaction layer utilizes dot-product operation to obtain the interaction matrix and then utilizes DenseNet [24] to extract the features. The extracted features are incorporated into a multi-layer perceptron model to obtain the final results. This simple and effective method achieved good performance in the NLI task. The above research shows that the interaction model performs better in text-similarity matching. Compared with the convolutional neural network, the recurrent neural network performs better in dealing with sequence problems. In recurrent neural network, GRU (Gated Recurrent Unit) has the advantages of fewer parameters, simple structure, ease of calculation, and convergence. We utilized the GRU of a multi-layer intensive connection to extract text features and utilized a connection operation to combine the attention mechanism information for the interaction between two question sentences into the repeated features of a dense connection. However, because of the lack of a large-scale dataset available in the agricultural field, there is little research on the similarity calculation for agricultural texts. The main contributions of this paper are as follows.
(1) A data-set of 21,300 rice-related questions and answers was constructed, and 8000 common high-quality rice-related question data were extracted, and 32,000 rice-related question pairs were divided into five categories. (2) Combined with agricultural word segmentation dictionary, we utilized Word2vec [25] with the TF-IDF (Term Frequency-Inverse Document Frequency) [26] method, which can effectively solve the problems of high dimensions and sparsity of data in ricerelated texts. (3) The vector representation of input sentences can be obtained using a stacked GRU neural network, which can effectively capture the sentences' semantics. The model combined with an attention mechanism to encode the sentences to obtain the interaction and influence between rice-related question pairs.

Corpus Preparation
The data in this study were derived from the Q&A community of China Agricultural Technology Extension Information Platform. We applied Python's Regular Expressions to clean and filter the obtained text data to remove useless information. More than 20,000 pairs of Q&A community data related to rice cultivation, fertilization, weeding, pest control, and other aspects were captured; among them 8000 high-quality pairs were selected for our dataset to be used as our FAQs. Moreover, these 8000 rice-related questions were classified into five categories: diseases and pests, weeds and pesticides, cultivation management, storage and transportation, and other.
The input of the model comprised two sentences and their similarity tags. Firstly, 8000 rice-related frequently asked questions were manually combined, and their similarities were calculated.
(1) Question classification. We applied the set QS = {q 1 , q 2 , q 3 , · · · , q 8000 } to represent the questions for our 8000 FAQs, where q n (1 ≤ n ≤ 8000) represents a specific question. Each question was classified into one of the five categories in QS. After that, we classified similar questions within each category; in total, these 8000 questions were classified into 1200 classes. Then, 1200 types of questions were obtained, namely QS = {Q 1 , Q 2 , · · · , Q 1200 }, where Q m = {q m1 , q m2 , q m3 , · · · , q mk } (1 ≤ m ≤ 1200, 1 ≤ k) represents a set of similar questions with one or more different methods, and q mk stands for problem class Q m . (2) Question combination. For a specific question q 11 in subset Q 1 = {q 11 , q 12 , · · · , q 1k } of QS , the number of questions similar to q 11 is {q 12 , q 13 , · · · , q 1k }, and the number is k − 1. The questions that are not similar to q 11 exist in the complement C QS Q 1 of Q 1 , which consists of two parts: one is for randomly extracting (k − 1)/2 questions from C QS Q 1 ; the other is for extracting the first (k − 1)/2 questions that have the highest number of identical keywords as q 11 from C QS Q 1 . In this way, we can avoid making the neural network think that, the higher the number of identical keywords, the more similar they are, so as to better learn the features of the two sentences from the semantic level. The specific process is shown in Figure 1 below. of ′, the number of questions similar to is { , , ⋯ , }, and the number is k-1. The questions that are not similar to exist in the complement of , which consists of two parts: one is for randomly extracting ( k-1) /2 questions from ; the other is for extracting the first (k-1)/2 questions that have the highest number of identical keywords as from .In this way, we can avoid making the neural network think that, the higher the number of identical keywords, the more similar they are, so as to better learn the features of the two sentences from the semantic level. The specific process is shown in Figure 1   After the above processing, 32,000 question pairs were obtained; based on the first question of the pair, we classified the question pairs into five categories, including 11,650, 2773, 10,767, 3658, and 5152 pairs of data regarding diseases and pests, weeds, and pesticides, cultivation management, storage and transportation, and others. Question 1 and question 2 represent two problems after word segmentation, 1 represents similarity, and 0 represents dissimilarity. Examples of training set samples are shown in Table 1: Table 1. Sample of training set.

Quesetion1
Quesetion2 Similarity How to control rice blast? What are the control methods of rice blast? 1 How to treat rice bacterial streak? What are the current integrated control measures of rice bacterial streak? 1 What are the characteristics of rice bakanae? What are the most effective methods to control rice seedling disease? 0 What should be focused on when raising ducks in rice fields?
What should be focused on during fish farming in rice? 0 Are genetic factor responsible for the formation and rate of rice blight?
Do genetic factors account for the formation of rice empty grain rate? 1 What problems should be paid attention to before rice sowing?
What are the conditions of the whole rice field before sowing? 1 How to use imipramine to control rice seedling disease?
The cause of rice bakanae? 0

Coattention-DenseGRU Model
This paper utilized the Coattention-DenseGRU model, shown in Figure 2. The model consists of four parts: the text preprocessing layer, DenseGRU layer, coattention layer, and interactive classification layer. Compared with the traditional deep learning classification model, Coattention-DenseGRU added weighted preprocessing to the text. In this study we utilized Word2vec with the TF-IDF algorithm to expand the text feature words and calculated the weighted word vector according to its importance. A variety of methods were used to extract text features, and DenseGRU and coattention were used to extract local features of different granularities of text. Finally, the extracted feature vectors were input into the interactive classification layer.

Text Preprocessing Layer
The computer cannot classify text directly as the model's input, it is necessary to convert text into a digital vector. In order to keep the text features and semantic information as complete and comprehensive as possible, we first preprocessed the question text, for example by means of noise removal and word segmentation, we utilized Python's jieba to

Text Preprocessing Layer
The computer cannot classify text directly as the model's input, it is necessary to convert text into a digital vector. In order to keep the text features and semantic information as complete and comprehensive as possible, we first preprocessed the question text, for example by means of noise removal and word segmentation, we utilized Python's jieba to segment text. The segmentation results of Chinese are greatly influenced by semantics and context; in order to improve the precision of segmentation, the stop words table was loaded before segmentation, which can remove the noise of the words, memorable characters, and spaces in the text that are not conducive to feature extraction and reduce the redundant information of the text. Based on the rice-related question-and-answer dataset characteristics, we loaded the Sogou agricultural vocabulary as a word segmentation dictionary instead of the primary vocabulary and improved agricultural vocabulary recognition. And then utilized a word vector transformation tool to convert the segmentation result into a word vector.
Word2vec has become a popular distributed representation method for text in recent years. Word2vec can predict the contextual information according to the input target words and map words with a similar meaning to similar positions in the vector space, which effectively solves word vector isolation and high dimensions. In this study, the skip-gram model of Word2vec was used to train the segmentation results, and words were transformed into low-dimension and continuous word vectors. To further highlight the contribution of the representative feature words to the importance of this paper, the TF-IDF value of words and the word vector represented by Word2vec were weighted.
After obtaining the weighted word vector of each word, each word in the text was replaced by its corresponding word vector to form a weighted text vector group. The different lengths of the questions must be unified to input them into the neural network model for training. According to the statistics of our rice-related question data, 99.9% of the questions contained fewer than 100 words, so we set the length of the questions to 100. If the rest of the questions were not long enough, we filled in 0 to complete the text vector. If the length exceeded 100, only the first 100 words were taken.

DenseGRU Layer
GRU is a special kind of recurrent neural network that can effectively solve the gradient problem in the long-term memory and backpropagation of recurrent neural networks. Compared with LSTM, GRU [27] has fewer parameters, a more straightforward structure, easier calculation, and more substantial convergence.
The GRU structure includes two states and two control gates: hidden state h, candidate state, reset gate r, and update gate z. The update gate decides how much of the previous information will be transferred into the current state, and the reset gate decides how much will be ignored. At the time t, the computation of r t depends on the input word vectors x t and h t−1 , r t acts on h t−1 , and the degree of preserving the past implied state is controlled according to its importance h t−1 . The greater the r t , the greater the influence of h t−1 on the growth rate. Calculation formula of GRU.
x t is the input vector of the t time step, which will be multiplied by the weight matrix w z (w r ) through a linear transformation. h t−1 stores the information of the previous time step t−1, which also goes through a linear transformation; the reset gate and update gate add these two parts of information and put them into a sigmoid activation function. We entered x t and last time step information h t−1 through a linear transformation; the matrixes W and U were right-multiplied, respectively calculating the Hadamard product of reset gate r t and h t−1 . Z t is the activation result of the update gate. The Hadamard product of z t and h t−1 represents the information retained in the previous time step to the final memory.
The GRU neural network is a one-way output from front to back. This is different from the structure of Chinese semantics, which is related to the context. In the task of question similarity calculation, if the current moment's output can be related to the state of the previous and subsequent moments, it will be more conducive to extract highlevel features of the text and highlight the text's critical information. Based on Chinese semantic understanding characteristics, we utilized the BiGRU model to extract the feature vectors of questions. The BiGRU model is a neural network model composed of two multiple unidirectional GRUs with opposite directions. The word vector of the j-th word of the i-th sentence input at time t is c tij , and the state of hidden layer h t is weighted by forward the hidden layer state h t−1 and reverse hidden layer state h t−1 . GRU(·): nonlinear transformation of word vector, w t : forward weight matrix, v t : inverse weight matrix, b t : offset value.
This layer is the key to the model. It adopts the structure of multi-layer GRUs stacked together with DenseNet and circulated four times. We employed the bidirectional GRU (BiGRU) as a base block of H l ; L represents the number of GRU layers, t represents the time, and its hidden state value is: However, the incomplete network structure also has some disadvantages, which will hinder the transmission of information between networks. Therefore, DenseNet is used to solve this problem. Its tail is not an additive structure but a splicing structure. In this way, it does not hinder the transmission of information and retains the original information; that is, the output value of the first layer can be effectively transmitted to the last layer, avoiding loss of the gradient. The hidden state is as follows:

Coattention Layer
The attention mechanism has achieved great success in many fields. It is an effective technique to learn context vector matching on specific sequences. Given two sentences, in each GRU layer, the context vector is determined using an attention mechanism that focuses on the related parts of the two sentences. The calculated attention information represents the soft alignment between two sentences. We utilized the operation to merge-common concern information into the repeated features of DenseGRU. The dense relation feature series circulation and standard attention features were obtained from the bottom to the top, enriching the collective awareness of lexical and part semantics. The weighted total of the attention knowledge σ pi of i-th word pi∈ P of sentence q is determined by h hi , and the weighted value is the Softmax weight, as follows: We paid attention to context vector e i,j and trigger vector h hi connection, keeping attention information as the input of the next layer:

Interactive Classification Layer
The model presented in this paper treats all layers' production as a group of semantic information. However, the network is a system with input features that increase in number as the layer depth increases, and it has too many parameters, especially in the link layer. An automated encoder was used to minimize the number of features while preserving the original information structure to solve this problem. Additionally, this part served as a regularization in the experiment, which improved the test performance.
To extract each sentence's appropriate representation, we pooled the densely connected GRU with attention features step by step. Specifically, if the GRU layer's final output was a 100-dimensional vector of 30 words in a sentence, a 30 * 100 matrix was obtained so that the size of the synthesized vector M or N was 100. Then, the representation forms of the two sentences M and N were aggregated in various ways in the interaction layer, and finally, the feature vector D of semantic sentence matching was obtained as follows: D = [m; n; m + n; m − n; |m − n|] We inferred the relationship between two sentences by performing operations +, −, |. |, from the perspective of elements. Element subtraction m − n is an asymmetric operator for one-way tasks. Two completely linked layers were used after extracting the feature D. The activation function was ReLU, and the output layer was completely associated. Finally, the probability distribution of each class was calculated using the Softmax equation.

Model Training
In this paper, a stochastic gradient descent (SGD) [28] was used to optimize the model parameters. The random gradient descent algorithm trained one sample with one category label each time to update the parameters. The objective function is as follows: ϕ-objective function, η-learning rate, x (i) -sample, y (i) -category label, ∇ ϕ Jparameter gradient.
The discrepancy between the probability distribution obtained by current training and the real distribution was evaluated using the cross-entropy loss function. This was 1 if the semantics of the query pairs were the same; otherwise, it was 0. The cross-entropy loss function has the following formula: M-number of categories, y-indicating variable (0 or 1), and p-the prediction probability of observation samples belonging to category C.

Hardware, Software Environment, and Evaluation Indicators
The experimental software environment was Python 3.6.2 and TensorFlow 1.13.1, and the server's hardware environment was an NVIDIA Corporation device 1e04 (Rev A1); the GPU was an NVIDIA GeForce RTX 2080ti. In this study, the TensorFlow neural network framework was used to construct the neural network. A total of 32,000 question pairs were divided into the training set, verification set, and test set according to the ratio 7:2:1. The random gradient descent algorithm was used to update the model weight. There were 22,400 training sets, 6400 verification sets, and 3200 test sets. Precision (P), recall (R), and F1-score (F1) were used as evaluation indexes. The formulas are as follows:

Text Vectorization Processing and Analysis
In this paper, we applied word2vec with TF-IDF to vectorize the rice-related question data. The word vector dimension was set to 300, and the training window size was set to 5. We compared the GloVe [29], TF-IDF, Word2vec and Word2vec with TF-IDF models. The results for precision, recall, and F1 values are shown in Table 2. It can be seen from Table 2 that among the four external neural networks based on text vectorization methods, word2vec with the TF-IDF method had the highest precision and F1 values compared with the other methods, with a precision of 86.3% and F1 value of 81.6%. The TF-IDF method had the worst outcomes. Although TF-IDF considers the semantic information between words, it does not solve the problems of high vector dimensions and sparse data; with the increase in extracting continuous words, the dimension will become higher. Compared with Word2vec, the precision and F1 values of Word2vec weighted by TF-IDF were improved by 1.7% and 2.5%, respectively, which shows that the precision and F1 values of the neural network can be improved by combining Word2vec with TF-IDF weighted representative feature words.

Parameter Setting
We set the number of training rounds of the model to 50, and we set the learning rate to 0.01. Moreover, we set the dense connection recurrent neural network to 5 layers; each layer had 100 hidden units, and the hidden units of the whole connection layer were set to 1000. After the word and character embedding layer, we set dropout to 0.5. For the autoencoder, 200 hidden units are set as the auto encoder's encoding features, and dropout was set to 0.2. We applied the rmsprop optimizer with an initial learning rate of 0.001. We test Coattention-DenseGRU and BiLSTM [30], Selfattention-BiLSTM [31], TextCNN [32],ABCNN, BiGRU [33], Attention-BiGRU [34], DenseGRU on the rice-related question similarity pair dataset. Table 3 shows the comparison of precision, recall, and F1 values of the eight different deep learning models. The proposed model, Coattention-DenseGRU, achieved the highest F1 value and precision, and the precision and F1 value reached 96.3% and 96.9%, respectively, which shows that the dense connected GRU can enhance the transmission and extraction of features and reduce feature loss, and is conducive to the final matching effect. Compared with traditional BiLSTM, BiLSTM based on the self-attention mechanism had better precision and F1 values, but it performed slightly worse than DenseGRU, which indicates that the attention mechanism can better express feature information through weight reset during training. The DenseGRU model was better than BiLSTM model in the training outcomes. In the DenseGRU model, the features were extracted through five GRUs connected densely. Compared with other models, only the previous layer's output was used as the input of this layer when features were transferred. DenseGRU takes the output of all previous layers instead of the previous one when features are transferred to reduce the loss of text features effectively. Through densely connected GRUs, text features can be better transferred and expressed and improve the text matching effect.  Figure 3 shows the text matching precision of the eight experimental models under Word2vec text representation and Word2vec with TF-IDF weighted text representation. As shown in Figure 3, the classification precision of the TF-IDF + Word2vec text representation method proposed in this paper is significantly higher than that of the word2vec text representation method in eight experimental models. The Coattention-DenseGRU model achieved the best results in word2vec text representation and TF-IDF + word2vec weighted text representation, and the precision were 96.3% and 91.5%, respectively. Compared with the other six comparative models, the Coattention-DenseGRU model has significant advantages. It can be seen from Figure 3 that the weighted text representation method of TF-IDF + word2vec improved the precision in each group of the comparative experiments. Therefore, the weighted method of TF-IDF + word2vec can improve the importance of keywords and the precision of question similarity matching. As can be seen from Table 4, compared with BiLSTM, Selfattention-BiLSTM, TextCNN, ABCNN, BiGRU, Attention-BiGRU and DenseGRU, Coattention-DenseGRU had the highest matching performance in the dataset of rice-related question pairs (five categories) of diseases and pests, weeds and pesticides, cultivation management, storage and transportation, and other. The precision, recall, and F1 value of matching rice-related question pairs are greater than 93.6%, 92.7%, and 94.9%, respectively, and the overall classification effect was better than other models. The F1 value of this model was slightly higher than that of other models in the dataset with sufficient data of diseases and pests, cultivation management experiments. The F1 value of this model is significantly higher than that of other models in the data sets with fewer data relating to weeds and pesticides, storage and transportation, and other, three categories, which indicates that the Coattention-DenseGRU model can still effectively extract the features of a short text in the case of insufficient data.  As can be seen from Table 4, compared with BiLSTM, Selfattention-BiLSTM, TextCNN, ABCNN, BiGRU, Attention-BiGRU and DenseGRU, Coattention-DenseGRU had the highest matching performance in the dataset of rice-related question pairs (five categories) of diseases and pests, weeds and pesticides, cultivation management, storage and transportation, and other. The precision, recall, and F1 value of matching rice-related question pairs are greater than 93.6%, 92.7%, and 94.9%, respectively, and the overall classification effect was better than other models. The F1 value of this model was slightly higher than that of other models in the dataset with sufficient data of diseases and pests, cultivation management experiments. The F1 value of this model is significantly higher than that of other models in the data sets with fewer data relating to weeds and pesticides, storage and transportation, and other, three categories, which indicates that the Coattention-DenseGRU model can still effectively extract the features of a short text in the case of insufficient data. Note: 1, 2, 3, 4 and 5 represent the data (five categories) of diseases and pests, weeds and pesticides, cultivation management, storage and transportation, and other, respectively. Table 5 shows that a set of experiments was undertaken to investigate the effectiveness of each module in the Coattention-DenseGRU model. Firstly, model 2 was obtained after deleting the autoencoder in the model. It can be seen that the precision and recall rate of model 2 was decreased, which verified the effectiveness of the self-encoder. Then, we deleted the dense connection and collaborative attention mechanism between GRUs and obtained models 3 and 4. It can be seen that the precision and F1 value of models 3 and 4 decreased by 0.6% and 0.7%; the results show that the dense connection between GRUs can improve the effectiveness of the model more than the collaborative attention mechanism. Models 5 and 6 are a five-layer GRU model based on the attention mechanism and a five-layer GRU model without attention mechanism. From the Table 5, we can see that the attention mechanism can improve the model's effect by paying good attention to keywords in question similarity matching.  Figure 4 shows the classification effect of models 1-7 on the rice-related question dataset at different GRU levels. From the figure, it can be seen that Coattention-DenseGRU had the highest precision in a five-layer GRU, which shows that the text feature extraction can be effectively improved, and feature loss and classification efficiency can be improved by increasing the number of layers and dense connection. Models 6 and 7 had the highest precision at the second layer and then gradually declined, which indicates the feature loss will be caused by the multi-layer extraction of features without intensive connection of the GRU.  Table 6 shows the response time and precision of four neural network models based on the attention mechanism on 3200 test sets, which meets the requirements for quick classification of rice-related question pairs. ABCNN is the fastest in response time due to the simple structure of the ABCNN model, fewer training layers, and fewer model param-  Table 6 shows the response time and precision of four neural network models based on the attention mechanism on 3200 test sets, which meets the requirements for quick classification of rice-related question pairs. ABCNN is the fastest in response time due to the simple structure of the ABCNN model, fewer training layers, and fewer model parameters. The model proposed in this paper, Coattention-DenseGRU model was able to accurately judge rice-related question sentences' similarity in the test set of 3200 question pairs in 12 s; the precision rate reached 93.6%. The features obtained through the GRU network's dense connection and collaborative attention mechanism are linked to the classification layer via the max-pooling layer, causing the loss function to be affected by the features of each layer, and the deep supervised learning is carried out. Therefore, we applied attention to weight and maximum pooling position to explain the classification results. Attention weight includes the information relating to question pairs and the max-pooling position information in each dimension. Attention weight plays a vital role in classification. Figure 5 shows the visualization of the attention weight value of the model in different layers. Except for duck and fish, most of the words in question 1 and question 2 exist simultaneously. In the first layer of the attention weight graph, the corresponding degree of the same or similar words in each sentence was higher. However, with the increase in the level, the attention weight of ducks and fish also increased. There were apparent differences between ducks and fish; in the fifth layer, the attention weights of other words in question 1 and question 2 except ducks and fish became very small. As there are noticeable semantic differences between "fish" and "ducks", the model judged that the question pairs were semantically dissimilar; that is, the label was 0.

Conclusions
To solve the problem that Chinese Agricultural Technology Extension Q&A communities not being able to automatically and accurately detect repetitive semantic questions, a corpus with five categories and 32,000 pairs of rice-related questions was constructed. A densely connected GRU model based on the coattention mechanism was introduced to solve the problems of rice-related question matching. We utilized the dense connection GRU model based on the collaborative attention mechanism to carry out rapid and automatic repetitive semantic detection of rice-related Q&A community query data. We introduced the agricultural word segmentation dictionary to word segmentation and word vector representation. We utilized the DenseGRU network to extract texts' emotional expression as a text feature vector used for question similarity matching. Furthermore, we optimized and improved its important structural parameters and training strategies and built a rice-related text similarity matching algorithm based on Coattention-DenseGRU to realize the precise and efficient identification of the rice-related questions in a questionand-answer community. The proposed model achieved the best performance on the ricerelated question similarity dataset compared to other models. Future work will focus on the following three aspects:

Conclusions
To solve the problem that Chinese Agricultural Technology Extension Q&A communities not being able to automatically and accurately detect repetitive semantic questions, a corpus with five categories and 32,000 pairs of rice-related questions was constructed. A densely connected GRU model based on the coattention mechanism was introduced to solve the problems of rice-related question matching. We utilized the dense connection GRU model based on the collaborative attention mechanism to carry out rapid and automatic repetitive semantic detection of rice-related Q&A community query data. We introduced the agricultural word segmentation dictionary to word segmentation and word vector representation. We utilized the DenseGRU network to extract texts' emotional expression as a text feature vector used for question similarity matching. Furthermore, we optimized and improved its important structural parameters and training strategies and built a rice-related text similarity matching algorithm based on Coattention-DenseGRU to realize the precise and efficient identification of the rice-related questions in a question-andanswer community. The proposed model achieved the best performance on the rice-related question similarity dataset compared to other models. Future work will focus on the following three aspects: (1) The noise errors and limited amount of data for rice-related questions will be revised and expanded. (2) On the China Agricultural Technology Extension Information Platform, there are corresponding pictures uploaded with all question-answering data. Nowadays, the multimodal fusion question-answering system that integrates image and text representation has achieved good results. In future work, using the multimodal question-answering model of images and text will be our focus. (3) Some useful features and advanced pre-trained models, such as BERT, will be used to further improve the model outcomes.