BERNN: A Transformer-BiLSTM Hybrid Model for Cross-Domain Short Text Classification in Agricultural Expert Systems

Xueyong Li; Menghao Zhang; Xiaojuan Guo; Jiaxin Zhang; Jiaxia Sun; Xianqin Yun; Liyuan Zheng; Wenyue Zhao; Lican Li; Haohao Zhang

doi:10.3390/sym17091374

,

and

School of Computer Science and Technology, Henan Institute of Science and Technology, Xinxiang 453003, China

^*

Author to whom correspondence should be addressed.

Symmetry2025, 17(9), 1374;https://doi.org/10.3390/sym17091374

This article belongs to the Section Computer

Version Notes

Order Reprints

Abstract

With the advancement of artificial intelligence, Agricultural Expert Systems (AESs) show great potential in enhancing agricultural management efficiency and resource utilization. Accurate extraction of semantic features from agricultural short texts is fundamental to enabling key functions such as intelligent question answering, semantic retrieval, and decision support. However, existing single-structure deep neural networks struggle to capture the hierarchical linguistic patterns and contextual dependencies inherent in domain-specific texts. To address this limitation, we propose a hybrid deep learning model—Bidirectional Encoder Recurrent Neural Network (BERNN)—which combines a domain-specific pre-trained Transformer encoder (AgQsBERT) with a Bidirectional Long Short-Term Memory (BiLSTM) network. AgQsBERT generates contextualized word embeddings by leveraging domain-specific pretraining, effectively capturing the semantics of agricultural terminology. These embeddings are then passed to the BiLSTM, which models sequential dependencies in both directions, enhancing the model’s understanding of contextual flow and word disambiguation. Importantly, the bidirectional nature of the BiLSTM introduces a form of architectural symmetry, allowing the model to process input in both forward and backward directions. This symmetric design enables balanced context modeling, which improves the understanding of fragmented and ambiguous phrases frequently encountered in agricultural texts. The synergy between semantic abstraction from AgQsBERT and symmetric contextual modeling from BiLSTM significantly enhances the expressiveness and generalizability of the model. Evaluated on a self-constructed agricultural question dataset with 110,647 annotated samples, BERNN achieved a classification accuracy of 97.19%, surpassing the baseline by 3.2%. Cross-domain validation on the Tsinghua News dataset further demonstrates its robust generalization capability. This architecture provides a powerful foundation for intelligent agricultural question-answering systems, semantic retrieval, and decision support within smart agriculture applications.

Keywords:

agricultural short text; question classification; fusion model; complex semantic extraction; bidirectional LSTM; symmetry

1. Introduction

With the rapid advancement of smart agriculture, information technologies represented by the Internet, big data, and artificial intelligence have profoundly transformed the ways agricultural knowledge is acquired. Despite the advantages of current knowledge platforms, such as abundant resources and rapid updates, the fragmentation of agricultural information remains prominent. This fragmentation results in delayed knowledge acquisition, inaccurate resource matching, and inconsistent service standards, severely hindering farmers’ ability to efficiently access and utilize agricultural knowledge through digital means [].

The Agricultural Expert System (AES), as an intelligent management tool, offers precise support for production decision making by integrating multi-source agricultural domain knowledge with advanced information technologies []. AES primarily takes unstructured agricultural text data as its core input, leveraging natural language processing (NLP) techniques to extract deep semantic information and support agricultural production management decisions [].

The core challenge of deep semantic extraction lies in the intelligent representation and classification of text, which directly impacts the reliability of AES’s key functions, including intelligent question answering, knowledge retrieval, and diagnostic reasoning. However, agricultural text data presents unique challenges. It often contains extensive fragmented data, specialized terminology, and domain-specific expressions; moreover, text structures vary widely, ranging from lengthy technical documents to concise agricultural records. These characteristics limit the effectiveness of traditional rule-based and statistical methods in fully capturing deep semantic associations within texts [,].

Although machine learning approaches have achieved notable progress in text processing, they still face significant challenges in feature extraction when applied to the agricultural domain []. In contrast, recent advances in deep learning have demonstrated superior capabilities in semantic understanding through automatic feature learning []. Nevertheless, there remains a lack of fusion neural network models tailored to the unique needs of agricultural text processing.

To cope with these issues, in this paper, a novel intelligent framework for analyzing text is proposed—the Bidirectional Encoder Recurrent Neural Network (BERNN). It is designed to overcome key technical bottlenecks in agricultural text classification and aims to enhance the performance of intelligent agricultural question-answering systems by accurately classifying user-submitted short agricultural questions into predefined categories. While the model can be integrated into broader Agricultural Expert Systems, its core application focuses on semantic understanding and intent classification within automated Q&A systems. BERNN is a fusion architecture composed of two complementary components: a domain-specific pre-trained Transformer encoder (AgQsBERT) and a bidirectional recurrent module (BiLSTM). AgQsBERT generates contextualized word embeddings by encoding domain-specific semantics, including specialized terminology and implicit expressions. These embeddings are then fed into the BiLSTM, whose bidirectional design captures sequential dependencies in both forward and backward directions. This structure enhances contextual reasoning and enables balanced interpretation of fragmented and irregular agricultural expressions. Together, these components form a hybrid framework that enhances both semantic abstraction and contextual reasoning, enabling robust understanding of fragmented and irregular agricultural expressions.

The principal contributions of this research are listed below.

(1): A large-scale agricultural question dataset is constructed, containing 110,647 entries categorized into seven classes. This dataset is carefully designed to provide a generalized benchmark for model training and evaluation.
(2): A fusion semantic feature extraction and classification architecture is proposed for the BERNN model. Specifically, we integrate the Agricultural Question Bidirectional Encoder Representations from Transformers (AgQsBERT) with a BiLSTM network, enabling the model to simultaneously capture sentence-level attributes and contextual dependencies, effectively addressing challenges such as word polysemy. AgQsBERT contributes domain-sensitive semantic understanding by leveraging pretraining on agricultural texts, while the bidirectional nature of BiLSTM reflects a symmetric structure that plays a crucial role in modeling contextual interactions. Their integration yields a synergistic semantic framework tailored to the linguistic characteristics of agricultural short texts.
(3): Extensive and comprehensive experiments are conducted. The results demonstrate the superior performance of BERNN in handling diverse and complex agricultural text data, establishing its advantages in the field of agricultural short-text classification.

The rest of this paper is structured as follows. Section 2 discusses the related studies. Section 3 details the materials and methods used. Section 4 provides the experimental results and analysis. Section 5 concludes the paper.

2. Related Studies

2.1. Machine Learning-Based Agricultural Text Classification Methods

When applying traditional machine learning techniques to text classification, the processes of feature extraction and classification are generally treated as distinct. Feature extraction often requires manual effort, making the procedure both intricate and less precise. Commonly utilized methods for text classification include the K-means algorithm, Naive Bayes, and support vector machines [,,]. Researchers have made notable progress in agricultural text classification through the optimization and enhancement of traditional machine learning methods. For instance, Wei et al. [] developed an agricultural text classification model based on SVM by creating a specialized keyword library for agriculture. Cui et al. [] integrated the Spark computing framework with the distributed gradient boosting algorithm on forestry text datasets, resulting in real-time and accurate forestry text classification. Du et al. [] used the Naive Bayes algorithm to classify agricultural scientific and technological literature, achieving an average accuracy rate of 94%. Lu et al. [] proposed an algorithm called the Centered Kernel Alignment-based Fuzzy Support Vector Machine (CKA-FSVM) for classifying Chinese agricultural texts. Experimental results demonstrated that the model achieved an average F1 score of 94.64%.

Although traditional machine learning-based text classification methods have demonstrated improved results, they continue to rely heavily on labor-intensive feature engineering to achieve their effectiveness []. Moreover, classifiers based on traditional machine learning methods are restricted to identifying only superficial semantic features in the text, neglecting the logical connections and contextual nuances present in the dataset. These models are incapable of comprehensively or intelligently learning the semantic relationships between words or achieving complex mappings from input to output variables.

2.2. Deep Learning-Based Agricultural Text Classification Approaches

In recent times, deep learning has made considerable progress in the field of intelligent agricultural text processing, owing to its powerful feature learning and pattern recognition capabilities. In agricultural text classification tasks, researchers have substantially improved classification performance by introducing deep learning models.

Shi et al. [] employed the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve accurate recognition of agriculture-related texts on a large-scale news corpus, with its attention mechanism effectively capturing key agricultural terms. Zhao et al. [] developed a Pest and Disease Interrogative Classification System (PDCS) based on a Bidirectional Gated Recurrent Unit (BiGRU) network, which demonstrated strong performance in fine-grained classification tasks by integrating word order features and contextual information. Wang et al. [] enhanced agricultural short text classification performance by 1–4% through the use of FastText’s bag-of-words features combined with a subword information strategy. To tackle the issues arising from the imbalanced distribution of agricultural data, Bao et al. [] incorporated an attention mechanism into a Text Convolutional Neural Network (TextCNN) model and introduced a focal loss function, enabling the model to maintain over 93% classification accuracy even for minority classes.

In the field of agricultural intelligent Question and Answer (Q&A), Tang et al. [] proposed an improved BiLSTM network utilizing a hierarchical attention mechanism, which significantly enhanced the semantic understanding of complex agricultural Q&A tasks. Chen et al. [] designed a novel cascaded word vector CNN method, jointly modeling the multidimensional features of questions, answers, and a synonym pool to construct a more robust classification framework. Rose Mary C. A. et al. [] developed an RNN-based intelligent answering system that integrates agricultural knowledge graphs to deliver accurate, real-time advisory services to farmers.

These studies have not only driven the innovative application of natural language processing technologies in agriculture but have also established a strong technical basis for the advancement of intelligent agricultural knowledge service systems.

However, despite notable achievements in extracting local features, contextual dependencies, and key information, existing methods exhibit significant limitations due to their reliance on single-model architectures. Especially when faced with the complex data characteristics inherent to agricultural issue datasets, traditional single-feature extraction paradigms struggle to comprehensively capture the multi-level semantic information embedded in texts. In particular, they lack robustness in recognizing low-frequency but critical agricultural terminology and capturing intricate semantic associations within specific contexts.

2.3. Fusion-Based Agricultural Text Classification Strategies

As deep learning continues to advance, there are increasingly higher requirements for agricultural text understanding and classification performance. In response to these demands, researchers have developed a variety of hybrid models that integrate multidimensional feature extraction capabilities to enhance text classification and comprehension.

Yang et al. [] proposed a hybrid model combining Enhanced Representation through kNowledge IntEgration (ERNIE), Deep Pyramidal Convolutional Neural Network (DPCNN), and BiGRU, leveraging ERNIE’s strengths in semantic modeling alongside the local feature extraction and sequence dependency modeling capabilities of DPCNN and BiGRU. Compared with the use of ERNIE alone, the hybrid model achieved improvements of 1.47, 1.29, and 1.42 percentage points in precision, recall, and F1 score, respectively. To address challenges such as high sparsity, strong noise, and poor format specification commonly found in massive agricultural text corpora, Jin et al. [] proposed a model combining BiGRU and a Multi-scale Convolutional Neural Network (Multi-CNN), which extracts features at different convolutional scales and achieved a classification accuracy of 95.9%.

Chen et al. [] explored a fusion strategy combining Word2Vec word embeddings with a BiLSTM network, demonstrating its effectiveness in agricultural short text classification tasks. Zhao [] introduced a hybrid model based on CNN and BiLSTM for tackling agricultural text classification and extraction in online media scenarios, achieving promising classification results.

Agricultural query texts are typically characterized by short length and sparse domain-specific terminology, making it difficult for traditional static word embedding models to effectively capture semantic information. With the widespread adoption of pre-trained language models such as BERT, researchers have achieved more accurate text representations using dynamic contextualized word embeddings, leading to significant improvements in classification performance.

Wang et al. [] combined the lightweight pre-trained model ALBERT with a match-LSTM structure, designing a model well suited for agricultural short text classification and achieving excellent performance in question intent recognition tasks. Xiang et al. [] proposed a fusion model of ALBERT and a Seq2Seq structure to address the shortcomings of static word embeddings in agricultural multi-label classification tasks, further enhancing classification accuracy and robustness. Guo et al. [] developed a BERT-DPCNN hybrid model, which achieved an accuracy of 99.07% in short agricultural text classification tasks, demonstrating strong feature extraction and classification capabilities.

2.4. Advances in Large Language Models for Agricultural Text Understanding

In recent years, Large Language Models (LLMs) such as GPT-3, T5, BART, and LLaMA have revolutionized natural language processing by leveraging massive corpora and deep transformer-based architectures to learn powerful and generalizable language representations [,,,]. Unlike traditional models that require extensive task-specific engineering, LLMs enable zero-shot and few-shot learning through in-context understanding, which significantly improves performance across a variety of downstream tasks.

Although most LLMs are trained on general-domain corpora, their ability to capture complex linguistic patterns has led to increasing interest in their application to agriculture-specific NLP tasks. For instance, ChatGPT (powered by GPT-4, accessed in March 2024) has shown preliminary success in answering agricultural questions, while fine-tuned variants of T5 and BERT have been employed for agricultural named entity recognition and relation extraction. However, challenges remain in domain adaptation, knowledge grounding, and the high resource cost of fine-tuning these models on specialized corpora.

In the context of agricultural expert systems and intelligent Q&A, LLMs hold great promise for enhancing semantic understanding, enabling context-aware reasoning, and supporting knowledge-intensive dialogue systems []. Incorporating LLM-based architectures into agricultural text classification frameworks could further improve the model’s ability to capture deep semantic relationships, especially in low-resource or ambiguous scenarios. Therefore, future work may explore integrating instruction-tuned LLMs or domain-adapted transformers into agricultural NLP pipelines to leverage their full potential.

In summary, although considerable progress has been made in agricultural text classification research, several challenges remain:

(1): The lack of large-scale, standardized agricultural text corpora limits the development of downstream research and applications.
(2): The prevalence of synonymous and heterogeneous expressions in agricultural texts complicates feature extraction and causes category imbalance, ultimately affecting classification performance.

To address these challenges, this paper systematically collects and organizes online agricultural Q&A resources to construct a high-quality agricultural question dataset, providing a solid foundation of data for subsequent model training and evaluation. Building on this, we introduce a model that incorporates multiple feature extraction mechanisms to effectively capture complex semantic information in agricultural texts and further improve classification performance.

The proposed method not only alleviates the performance bottlenecks faced by traditional models in dealing with expression diversity and category imbalance but also provides strong technical support for the construction of intelligent agricultural application systems.

3. Materials and Methods

3.1. Data Sources

3.1.1. Corpus Construction

To address the issue of insufficient public agricultural corpora for text classification, this paper constructs an agricultural Q&A corpus. Figure 1 outlines the process flow for acquiring the Q&A corpus data.

Figure 1. Flow of Q&A corpus generation.

In this paper, we utilize the China Agricultural Technology Extension Information Service Platform (https://njtg.nercita.org.cn/user/index.shtml, accessed on 30 July 2025) and the China Agricultural Network (http://www.agronet.com.cn/, accessed on 30 July 2025) as our data sources. By employing web crawler technology, we automatically collected a 257MB corpus of Q&A data from their agricultural community domains. The Q&A corpus was collected over 49 months during the period between April 2020 and April 2024. Seven categories related to agricultural Q&A pairs were included in the corpus. Table 1 presents several examples of Q&A pairs along with their corresponding categories.

Table 1. Example of corpus of Q&A.

3.1.2. Dataset Preprocessing

The acquired agricultural Q&A pair corpus was preprocessed with data cleaning, data tagging, and data labeling. Figure 2 illustrates the data preprocessing steps used in this study. An agricultural question dataset comprising 110,647 question texts is thus constructed.

Figure 2. Flow of data preprocessing.

Agricultural Q&A text is interspersed with many noisy data, such as special web page tags, carriage returns, special characters, etc. These useless characters may affect the results of the model feature extraction. These useless characters need to be cleaned out before building the corpus to ensure the data quality of the corpus. Data cleaning mainly accomplishes preprocessing operations such as denoising, de-duplication, and missing information processing []. The cleaned data are labeled with “[CLS]” and “[SEP]” for questions and answers. The “[CLS]” and “[ SEP]” text between “[CLS]” and “[SEP]” is the question, and the text between “[SEP]” and “[SEP]” is the answer to the question []. Finally, the text is labeled and categorized into seven distinct classes: breeding, planting, processing, fishery, edible fungi, technology consulting, and disease and insects. The distribution of data and theme words across these categories is detailed in Table 2.

Table 2. Textual data of agricultural questions.

Table 2 presents seven categories of agricultural question datasets, along with the number of entries in each category and the relevant theme words. The dataset is divided into an 8:1:1 configuration. In each category, 80% of the questions are allocated to the training set (88,577 questions), 10% to the test set (11,035 questions), and the remaining 10% to the validation set (11,035 questions).

3.1.3. Dataset Analysis

To ensure data quality and consistency, all samples in the agricultural question dataset were manually reviewed and labeled by domain experts. During construction, at least two annotators independently classified each question, and disagreements were resolved through discussion. This validation process enhances the reliability of the dataset and ensures accurate model training. The characteristics of the dataset are as follows.

(1): High Specialization—All questions are situated within the agricultural domain, with fuzzy boundaries between sentence categories. For example, questions like “Preventing rickets in chickens during winter” and “Diagnosis and prevention of wheat nutrient deficiency” highlight the challenges in transferring general domain models to the agricultural domain for detailed question categorization.
(2): Imbalanced category distribution—As illustrated in Figure 2, the distribution of question categories is uneven. The largest category is planting, which makes up 45.6% of the entire dataset. In contrast, categories such as processing, edible fungi and technology consulting are underrepresented, each comprising less than 5% of the dataset. To address the class imbalance, particularly the overrepresentation of the “Planting” category and the underrepresentation of categories like “Edible Fungi”, we employed weighted loss during model training. Specifically, we used class weights inversely proportional to the class frequencies to penalize misclassification of minority classes more heavily. Additionally, performance metrics such as precision, recall, and F1 score were computed using weighted averages to ensure a fair evaluation across all categories. These strategies effectively mitigated the imbalance and improved classification performance for low-resource categories.
(3): Shorter text—Most question texts in the agricultural datasets are under 30 characters and often around 10 characters, such as “Sheep rumen acidosis” and “How to treat hemorrhagic disease in grass carp?” This brevity complicates semantic extraction and model classification.

3.2. Model of BERNN

3.2.1. The Architecture of BERNN

BERNN is a hybrid semantic extraction and classification model that integrates Agricultural Questions Bidirectional Encoder Representations from Transformers (AgQsBERT) and Bidirectional Long Short-Term Memory (BiLSTM). This architecture is specifically designed to address the distinctive linguistic characteristics of agricultural short texts, which are typically domain-specific, context-poor, and semantically ambiguous. Such texts often contain technical jargon, abbreviated expressions, and implicit overlaps between categories.

AgQsBERT generates deep contextualized embeddings that capture fine-grained semantic relationships within agricultural terminology, while BiLSTM enhances the model’s ability to model bidirectional dependencies in short, ambiguous sentences. By combining these components, BERNN improves both classification accuracy and generalization across a wide range of agricultural question categories. The overall architecture of the proposed BERNN model is illustrated in Figure 3.

Figure 3. Structure of BERNN. The arrows indicate the direction of information flow between components.

The core of BERNN is complex semantic extraction, which is mainly performed by AgQsBERT and BiLSTM. The AgQsBERT is capable of learning the contextual dependencies of text and provides rich contextual features for subsequent models. BiLSTM, a bidirectional architecture based on recurrent neural networks, is well suited for handling sequence data as it can capture the sequence information within the text. The BERNN treats each word as the fundamental unit and feeds the output sequences from the six-layer transformer encoder into the BiLSTM layer. The integration of contextual understanding and BiLSTM’s sequential modeling enables the model to learn semantic features of text at multiple levels. The model receives an agricultural question as input, such as “What are the advantages of deep plowing in wheat”. Subsequent to preprocessing, n initial expression vectors

W_{i} (0 \leq i \leq n)

are generated as inputs. Following the AgQsBERT layer, the vectors are transformed into different word representation vectors. Ultimately, the adoption of a structurally symmetric BiLSTM layer ensures balanced processing of contextual information from both directions, thereby generating the final classification outcome.

3.2.2. Layers of BERNN

As illustrated in Figure 3, the BERNN model is composed of an input layer, an AgQsBERT layer, a BiLSTM layer, a fully connected layer, and an output layer. The specific functions of each layer are described below.

The input layer is structured to handle preprocessing tasks for text data, including preprocessing steps such as data cleaning, noise reduction, and tokenizing text into words; adding [CLS] and [SEP] tags; padding or truncating text to a uniform length; and converting text data into matrix or tensor forms.

The AgQsBERT layer contains three-layer embeddings and six-layer Transformer Encoders. Taking the input question “What are the advantages of deep plowing in wheat” as input, the input vector

e = (e_{1}, e_{2}, e_{3}, \dots, e_{n})

is summed by Token, Segment, and Position embedding to obtain the output vector

E = (E_{1}, E_{2}, E_{3}, \dots, E_{n})

. The input vector E is passed through six layers of Transformer Encoder to obtain the final output vector

T = (t_{1}, t_{2}, t_{3}, \dots, t_{n})

of AgQsBERT.

The BiLSTM layer is made up of two LSTM layers: one for the forward direction and the other for the backward direction. By merging the outputs of the forward and backward layers, it captures contextual information in sequence data more thoroughly []. For the output vector T from AgQsBERT, the BiLSTM layer concatenates the outputs of the forward and backward LSTM layers at each time step, forming a new feature representation. This new representation is the output vector H of the BiLSTM layer.

The output layer takes the feature vectors H from the BiLSTM model and feeds them into a fully connected layer for dimensionality reduction. The fully connected layer serves as a linear transformation layer that maps the high-dimensional contextual representations obtained from the BiLSTM layer to a fixed-size vector corresponding to the number of classes. This transformation reduces dimensionality and acts as a classifier, with the output logits normalized by the Softmax function to produce category probabilities.

3.2.3. Agricultural Question Bidirectional Encoder Representations from Transformers

To fully leverage the semantic information contained in the word-level sequences of question text, this research introduces AgQsBERT, a feature representation method for agricultural questions that incorporates a six-layer Transformer encoder. The AgQsBERT is used to generate the hidden vectors of individual word in each agricultural text. Initially, let the input sentence of the model be denoted e

= (e_{1}, e_{2}, e_{3}, \dots, e_{n})

.

e_{i}

denotes the i-th character in the input sentence, while n indicates the sentence’s length.

Within the Embedding layer, the input sentence e

= (e_{1}, e_{2}, e_{3}, \dots, e_{n})

is converted into an input sequence E

= (E_{1}, E_{2}, E_{3}, \dots, E_{n})

by adding the token embeddings, segment embeddings, and position embeddings. The token embedding vector is derived from the word vector table; the segment embedding vector reflects the sentence the word belongs to; and the position embedding vector encodes the word’s position in the sequence. Here, T is an n × m-dimensional matrix, and

E_{i}

is the m-dimensional word embedding vector corresponding to

e_{i}

. The process of Embedding is illustrated in Figure 4.

Figure 4. Process of embedding. The Chinese characters represent tokens of a sample sentence. [CLS] and [SEP] are special tokens used by the model.

After embedding, the input sequence is processed by a six-layer Transformer encoder. Each encoder layer includes multi-head self-attention, feed-forward networks, residual connections, and layer normalization (as shown in Figure 5). The input sequence is first linearly projected into query (Q), key (K), and value (V) matrices, which are used to compute attention scores. These scores help capture contextual associations between words in the sequence, enabling the extraction of rich semantic features. The output is a contextualized sequence representation, which serves as the sentence encoding for the joint extraction of entity relationships.

Figure 5. Architecture of the Transformer encoder in AgQsBERT. The diagram illustrates the flow of data through multi-head attention and feed-forward layers in each encoder block (repeated ×6). Colored blocks represent key components: green for attention heads, blue for embedding and normalization operations, and yellow for feed-forward sublayers. Arrows indicate the direction of data flow across the architecture.

As illustrated in Figure 5, the encoder consists of four layers; the initial layer employs a multi-head attention mechanism [], and the multi-head attention mechanism allows the model to jointly attend to information from different representation subspaces at different positions. Instead of performing a single attention operation, it runs multiple attention operations (heads) in parallel. This enhances the model’s capacity to capture various aspects of semantic relationships between words, such as synonymy, syntax, or topic similarity, making it especially useful for processing short agricultural sentences with complex context. The second and fourth layers are residual networks incorporating Add and Normalize operations, and the third layer is a feed-forward neural network. The multi-head attention mechanism, which is central to the Transformer layer, is mainly utilized to extract features by modifying the weight coefficient matrix according to the association between words within the same sentence.

First, the input sequence

E = (E_{1}, E_{2}, E_{3}, \dots, E_{n})

is fed into encoder and linearly transformed to obtain the Q, K, V matrices. The Q, K, and V are defined as follows (1)–(3):

Q = L i n e a r (E) = E W_{Q}

(1)

K = L i n e a r (E) = E W_{K}

(2)

V = L i n e a r (E) = E W_{V}

(3)

where

Q

,

K

,

V

are the linear mapping matrices of

E

, and

W_{Q}

,

W_{K}

, and

W_{V}

are the assigned weight matrices.

Second, the self-attention scores are obtained by calculating the scaled dot product, which determines the degree of attention the model assigns to other words in the input sentence when encoding a particular word. This calculation is defined as shown in (4).

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

Next, the self-attention scores are concatenated and linearly transformed after being computed i times. This process results in an enhanced semantic vector that retains the same length as the original word vector, serving as the output of the multi-head attention layer. The calculations are defined in Equations (5) and (6).

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, {h e a d}_{2}, \dots, {h e a d}_{h}) W^{0}

(5)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(6)

In this context,

Q

,

K

, and

V

represent the input word vector matrices, and

d_{k}

denotes the input dimension,

W_{i}^{Q}

,

W_{i}^{K}, W_{i}^{V}

are the weight matrices for the

{h e a d}_{i}

, and

W^{0}

is an additional weight matrix.

Ultimately, a new word representation vector T is derived via residual connections and normalization.

3.2.4. Bidirectional Long Short-Term Memory

The BiLSTM model is primarily employed to extract hidden semantic information from agricultural question texts, effectively addressing challenges such as sparse features and limited semantic context in short sentences.

The BiLSTM architecture extends the traditional LSTM by adding a second LSTM layer that processes the input sequence in reverse. Specifically, a forward LSTM reads the sequence from left to right, while a backward LSTM reads it from right to left. The outputs from both directions are concatenated to form a comprehensive representation, enabling the model to leverage both past and future contextual information. This bidirectional structure is especially useful for capturing complex semantic dependencies in agricultural question understanding.

The Long Short-Term Memory (LSTM) model is an improved version of the standard Recurrent Neural Network (RNN), designed to address issues such as vanishing or exploding gradients. While RNNs capture sequential information through recursive connections, they often suffer from memory degradation over long sequences. In contrast, LSTM introduces memory cells and three types of gates—input gate, forgetting gate, and output gate—that regulate information flow. This gate mechanism allows the network to retain useful information while filtering out irrelevant details, thus maintaining long-term dependencies.

To overcome the limitations of standard RNNs, such as information loss and unstable gradient updates, LSTM adopts a gating mechanism as shown in Figure 6. This structure improves the stability and performance of sequential data processing.

Figure 6. Structure of LSTM.

As illustrated in Figure 6, the LSTM unit comprises an input gate i, a forgetting gate f, an output gate o, and a memory cell c. The forgetting gate regulates which information is discarded from the current memory cell, while the input gate controls which new information is added. Additionally, the output gate specifies which features the current memory unit should output. The calculations for these processes are presented in Equations (7)–(13).

f_{_t} = σ (W_{_f} \cdot [h_{_t - 1}, x_{_t}] + b_{_f})

(7)

i_{_t} = σ (W_{_i} \cdot [h_{_t - 1}, x_{_t}] + b_{_i})

(8)

\tilde{c_{_t}} = t a n h (W_{_c} \cdot [h_{_t - 1}, x_{_t}] + b_{_c})

(9)

c_{_t} = f_{_t} \times c_{_t - 1} + i_{_t} \times \tilde{c_{_t}}

(10)

o_{_t} = σ (W_{_o} \cdot [h_{_t - 1}, x_{_t}] + b_{_o})

(11)

V = L i n e a r (E) = E W_{V}

(12)

h_{_t} = o_{_t} \times t a n h (c_{_t})

(13)

Equations (7)–(13) represent the internal operations of the LSTM cell. Specifically, Equation (7) computes the forgetting gate, determining which information from the previous cell state should be discarded. Equation (8) calculates the input gate, deciding which new information will be added. Equation (9) generates a candidate memory cell update. Equation (10) updates the memory cell by combining the forgetting and input gates. Equation (11) computes the output gate, and Equations (12) and (13) generate the final hidden state output. These gates collectively allow the LSTM to regulate long-term and short-term dependencies during sequence modeling.

σ

refers to the sigmoid function, while tanh () serves as the activation function. The variable

x_{_t}

is the input at time t. The terms

f_{_t}, i_{_t}, o_{_t}

correspond to the outputs of the forgetting gate, input gate, and output gate, respectively. The variables

c_{_t - 1} a n d c_{_t}

denote the memory cell states at times t−1 ant t. The

h_{_t - 1} a n d h_{_t}

, respectively, represent the state of t−1 ant t moment. The matrices

W_{_f}, W_{_i}, W_{_o}, {a n d W}_{_c}

are the weight matrices for the forgetting gate, the input gate, the output gate and memory cell, respectively. Correspondingly,

b_{_f}, b_{_i}, b_{_o}, a n d b_{_c}

are the bias terms associated with these gates and the memory cell.

In order to make the most of the contextual information in the text, we extract data from both the forward and backward sequences. This is achieved by merging two distinct hidden layers into a bi-directional layer. The outputs from both directions are then processed through an activation function, which merges them to produce the final results. The calculations involved in this process are defined in Equations (14)–(16).

\vec{h_{_t}} = \vec{L S T M} (h_{_t - 1}, W_{_t}, c_{_t - 1}), t \in [1, T]

(14)

\overset{\leftarrow}{h_{_t}} = \overset{\leftarrow}{L S T M} (h_{_t + 1}, W_{_t}, c_{_t + 1}), t \in [T, 1]

(15)

H_{_t} = [\vec{h_{_t}}, \overset{\leftarrow}{h_{_t}}]

(16)

H_{_t}

is the output text feature vector. The BiLSTM algorithm is presented in Algorithm 1.

Algorithm 1 The Bidirectional Long Short-Term Memory Algorithm
1:	$input : X = (x_{1}$ $, x_{2}$ $, x_{3}$ $, \dots, x_{t}$ )
	// Outline LSTM Cell Procedures
2:	function LSTM_cell(h_prev, c_prev, xt):
3:	$f_{_t} \leftarrow σ (W_{_f} \cdot [h_{_t - 1}, x_{_t}] + b_{_f})$
4:	$i_{_t} \leftarrow σ (W_{_i} \cdot [h_{_t - 1}, x_{_t}] + b_{_i})$
5:	$\tilde{c_{_t}} \leftarrow t a n h (W_{_c} \cdot [h_{_t - 1}, x_{_t}] + b_{_c})$
6:	$c_{_t} \leftarrow f_{_t} \times c_{_t - 1} + i_{_t} \times \tilde{c_{_t}}$
7:	$o_{_t} \leftarrow σ (W_{_o} \cdot [h_{_t - 1}, x_{_t}] + {b_}_{o})$
8:	$h_{t} \leftarrow o_{t} \times t a n h (c_{t})$
9:	$return h_{t}$ $, c_{t}$
10:	end function
	// Perform bidirectional semantic analysis
	// Set initial values for hidden and cell states
11:	$h_forward [0] \leftarrow$ $0, c_forward [0] \leftarrow$ 0
12:	$h_backward [T + 1] \leftarrow$ $0, c_backward [T + 1] \leftarrow$ 0
13:	$for (t = 1; i \leq T; t + +$ ) // Forward feature extraction
14:	$h_forward [t], c_forward [t] \leftarrow$ LSTM_cell(h_forward [t−1], c_forward [t−1], xt)
15:	end for
16:	$for (t = T; i \geq 0; t - -$ ) // Backward feature extraction
17:	$h_backward [t], c_backward [t] \leftarrow$ LSTM_cell(h_backward [t + 1], c_backward [t + 1], xt)
18	end for
19:	$for (t = 1; i \leq T; t + +$ ) // Integrating forward and backward outcomes
20:	$H_{t} \leftarrow$ [h_forward [t], h_backward [t]]
21:	end for
22:	$output : H \leftarrow$ (H₁, H₂, …, H_T)

3.2.5. Computational Complexity of BERNN

The computational complexity of the BERNN model can be analyzed in terms of both the AgQsBERT and the BiLSTM component.

(1)

Computational Complexity of AgQsBERT

AgQsBERT is based on the Transformer architecture, which includes a six-layers encoder; the computational complexity can be analyzed as follows.

(1): Input Representation: AgQsBERT takes a series of tokens as input, which are transformed into embeddings. The complexity for this step is typically $O (n)$ , where n is the number of tokens in the input sequence.
(2): Transformer Layers: AgQsBERT consists of six layers, with each layer integrating a self-attention mechanism and a feed-forward neural network. The self-attention mechanism has a complexity of $O (n^{2} \cdot d)$ for each layer, where n represents the length of the sequence, and d denotes the dimensionality of the embeddings (768 for the AgQsBERT). The feed-forward networks have a complexity of $O ({n \cdot d}^{2})$ for each layer.

Overall, for six layers, the complexity of the AgQsBERT component can be approximated as follows (17):

\begin{matrix} C_{A g Q s B E R T} = O (6 \times (n^{2} \cdot d + {n \cdot d}^{2})) \\ = O (n^{2} \cdot d + {n \cdot d}^{2}) \end{matrix}

(17)

(2): Computational Complexity of BiLSTM

BiLSTM networks are recurrent neural networks that process sequences in both forward and backward directions. The complexity of processing a sequence of length n with a hidden state size of h is

O (n \cdot h^{2})

for each direction (forward and backward). Therefore, for a BiLSTM, the overall complexity is shown as follows (18):

C_{B i L S T M} = O (2 \cdot {n \cdot h}^{2}) = O ({n \cdot h}^{2})

(18)

(3): Computational Complexity of BERNN

When combining AgQsBERT and BiLSTM in a model, the overall complexity can be thought of as the sum of the complexities of both components. The overall complexity of the BERNN model can be approximated as follows (19):

C_{B E R N N} = O ({(n^{2} \cdot d + {n \cdot d}^{2}) + n \cdot h}^{2})

(19)

The complexity of BERNN depends on the dimensionality of the embeddings, the sequence length, and the hidden state size of the BiLSTM.

4. Experiments and Analysis

We developed our model using Python 3.7 and CUDA 12.0 within the PyCharm 2024 environment, running on a 64-bit CentOS operating system. Our hardware setup includes an Intel (R) Xeon (R) Gold 6248R CPU and 4 Tesla V100S GPUs with 32GB HBM2 memory. The experimental datasets included 110,647 agricultural questions gathered specifically for this research, along with the publicly available Tsinghua News dataset. Both datasets were divided into training, validation, and testing sets with an 8:1:1 ratio.

4.1. Parameter Settings

In this research, we utilized AgQsBERT to capture the semantic features of agricultural short texts, with a hidden layer size of 768. To optimize hyperparameter selection, we manually tuned key parameters using grid search over a range of learning rates and dropout values, with the validation F1 score serving as the performance criterion. Early stopping was applied to prevent overfitting during training. Other parameters, such as batch size and sequence length, were selected based on prior empirical studies in BERT-based architectures. The final settings, as presented in Table 3, represent the best-performing configuration identified through repeated three-fold cross-validation. The tuning process for the learning rate and dropout will be explained in Section 3 of this chapter.

Table 3. Model hyperparameter settings.

4.2. Evaluation Metrics

To evaluate the model’s performance, we utilize precision, recall, F1 score, and accuracy as the main evaluation metrics. Precision measures the ratio of correctly predicted positive instances to the total number of predicted positive instances. It reflects the accuracy of the model’s positive predictions. Recall indicates the ratio of correctly predicted positive instances to the total number of actual positive instances in the dataset. It highlights the model’s ability to correctly detect all pertinent instances within the dataset. The F1 score, being the harmonic average of precision and recall, offers an equitable assessment by considering both false positives and false negatives. This metric is particularly valuable when dealing with imbalanced datasets. Accuracy indicates the percentage of correctly predicted samples out of the total samples. It provides a general measure of the model’s overall performance. The formulas are defined as shown in (20)–(23).

p r e c i s i o n = \frac{T P}{T P + F P}

(20)

r e c a l l = \frac{T P}{T P + F N}

(21)

F 1 = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(22)

A c c u r a c y = \frac{T P + T N}{T P + T N + F N + F P}

(23)

In multi-class classification problems, it is important to evaluate the model’s performance using metrics such as macro average, micro average, and weighted average. These metrics provide different perspectives on the model’s effectiveness across all categories. Macro averaging evaluates the model’s classification performance for all categories by computing the arithmetic mean of precision, recall, and F1 score across each category. This method gives equal weight to all categories, independent of their sample sizes. Micro averaging combines the prediction results of all categories to calculate the overall evaluation metrics. It computes the metrics globally by aggregating the total number of true positives, false negatives, and false positives. The weighted average accounts for the imbalance in the number of samples across categories by multiplying each category’s metrics by the proportion of its sample size relative to the total sample size, followed by weighted averaging. This approach provides a balanced evaluation that considers the varying importance of each category. Given the imbalance in category distribution within the agricultural question dataset constructed in this study, weighted averages are used for the evaluation metrics of multiple comparative models []. This ensures that the evaluation is fair and reflects the performance of the model across all categories, considering their respective sample sizes. The formulas for weighted average precision (Precision_w), weighted average recall (Recall_w), and weighted average F1 score (F1_w) are provided in Equations (24)–(27).

{P r e c i s i o n}_{w} = \sum_{i} w_{i} P_{i}

(24)

{R e c a l l}_{w} = \sum_{i} w_{i} R_{i}

(25)

{F 1}_{w} = \frac{2 \times {P r e c i s i o n}_{w} \times {R e c a l l}_{w}}{{P r e c i s i o n}_{w} + {R e c a l l}_{w}}

(26)

w_{i} = \frac{S_{i}}{S}

(27)

where

S_{i}

refers to the number of samples in the i-th category, and S represents the total number of samples;

P_{i}

and

R_{i}

denote the precision and recall for the i-th category, respectively.

4.3. Hyperparameter Selection of the BERNN

To improve the performance of BERNN, multiple learning rates (LR) and dropout configurations were evaluated to determine the most suitable parameter combination.

(1): Comparison of learning rates

Table 4 presents the weighted average precision (Precision_w), weighted average recall (Recall_w), weighted average F1 score (F1_w), and accuracy across various learning rates. The highest values are highlighted in bold for better comparison.

Table 4. Results of BERNN with different LRs. Bold values indicate the best result for each metric. Loss values are not bolded due to minimal or identical differences.

As illustrated in Table 4, selecting an appropriate learning rate significantly improves the model’s effectiveness. When the learning rate was set to 5 × 10⁻⁵, BERNN achieved the best accuracy and F1 score.

(2): comparison of dropout rates

Table 5 presents BERNN’s performance indicators across various dropout configurations. Dropout works by reducing the number of intermediate features and increasing the orthogonality of individual features, which improves the model’s generalization and prevents overfitting []. Table 5 demonstrates that the model’s effectiveness improved with higher dropout rates. The optimal results for both F1 score and accuracy were achieved at a dropout rate of 0.4.

Table 5. Results of BERNN with different dropout configurations. Bold values indicate the best result for each metric. Loss values are not bolded due to minimal or identical differences.

4.4. Ablation Experiments

An ablation study was carried out to analyze the effect of each component in our agricultural text classifier. The specific settings of the experiment are as follows: (1) AgQsBERT model, (2) BiLSTM model, (3) AgQsBERT-BiLSTM. The outcomes of the ablation study on the agricultural question test sets are shown in Table 6.

Table 6. Ablation experiment results. Boldface highlights the performance of our proposed BERNN model.

As shown in Table 6, the BERNN model demonstrates that integrating AgQsBERT with BiLSTM often leads to better performance than using either model independently. AgQsBERT’s ability to generate contextual embeddings enables the model to grasp the meanings of words based on their surrounding context within a sentence, thereby enhancing overall text comprehension. Additionally, BiLSTM is effective at managing long sequences of text, as it can retain information over longer distances compared to traditional LSTMs.

4.5. Comparative Experiments

To provide a comprehensive evaluation of different models used for text classification tasks, we can categorize them into single models and fusion models. Below is a summary of the models and their respective characteristics.

(1): Single models

DPCNN [] employs a sophisticated architecture that consists of a series of convolutional and pooling layers, systematically reducing the dimensionality of the input while meticulously preserving essential information. This design enables the model to effectively capture long-range dependencies within the text. On the other hand, the BiLSTM [] is adept at processing text sequence data, emphasizing contextual feature information by analyzing the input in both forward and backward directions. The BiGRU [], being a more streamlined alternative, offers quicker training times while still proficiently capturing contextual information. CNN is able to perform multicore convolutional operations on multiple sequence blocks, so many NLP problems can be solved [].

(2): Fusion models

The BiLSTM-CNN [] synergistically merges the strengths of BiLSTM and CNNs, enabling it to capture both sequential and local features within the text, which leads to enhanced performance in text classification tasks. Similarly, the BiGRU-CNN [] fuses BiGRU with CNNs, harnessing the advantages of both architectures to achieve improved results in text classification. The BERT-CNN [] model can learn the relationship between semantic and syntactic structures both via BERT in the pre-training phase and via CNN to capture the features in the text with good classification results. Additionally, the BERT-DPCNN integrates BERT’s contextual embeddings with the hierarchical feature extraction capabilities of DPCNN, rendering this model exceptionally effective for tackling complex text classification challenges.

The effectiveness of these models in various text classification scenarios is underscored by a comprehensive comparison. Table 7 provides a detailed overview of the parameters used for the comparison models. All listed parameters were obtained from actual experiments conducted in this study. To ensure a fair comparison, all models were trained with the same experimental settings and computational environment. Table 8 presents the experimental results on the test dataset. These tables would typically include metrics such as precision, recall, F1 score, and accuracy, measured using weighted averages to account for the imbalance in category distribution within the agricultural question dataset.

Table 7. Parameter settings for comparison models.

Table 8. Comparative model experimental results on the agriculture question test set. Boldface highlights the performance of our proposed BERNN model.

Table 8 clearly illustrates that the BERNN model outperforms all other models, achieving an impressive classification accuracy of 97.19% and a loss of 0.12 on the agricultural question test dataset. In comparison to BiLSTM, BiGRU, BiLSTM-CNN, and BiGRU-CNN, the DPCNN model demonstrates superior classification performance due to its enhanced spatial modeling capabilities. Specifically, the F1 scores of the DPCNN exceed those of BiLSTM-CNN and BiGRU-CNN by 1.64 and 1.66 percentage points, respectively, indicating that spatial modeling is more effective than the temporal modeling employed by BiLSTM for question classification.

Furthermore, the accuracy of the fusion models BiLSTM-CNN and BiGRU-CNN is higher by 1 and 1.11 percentage points than that of BiLSTM and BiGRU, respectively, highlighting the positive impact of incorporating CNNs into classification performance. The F1 score of the BERT-DPCNN model reaches 96.90%, surpassing the two fusion models and demonstrating that pre-trained models with word-embedding vectors significantly enhance classification performance.

Our proposed BERNN model generates embedded hidden vectors for words using AgQsBERT, which are then processed in temporal order, thereby improving the model’s spatial modeling capabilities. This approach effectively tackles the challenges of insufficient features in short texts and the difficulties in learning semantic information. The F1 score of the BERNN model reached 97.19% on the test dataset, which is 2.25, 2.27,0.44 and 0.29 percentage points higher than those of the fusion models BiLSTM-CNN, BiGRU-CNN, BERT-CNN and BERT-DPCNN, respectively.

To compare the convergence efficiency of the BERNN model with other models, we thoroughly analyzed the fluctuations in training loss and accuracy across the training epochs. Figure 7 provides a comprehensive comparison of accuracy and loss across 10 epochs on the agricultural question training set. As illustrated in Figure 7, the BERNN model exhibited a significantly faster convergence speed than the other models, while achieving the same level of classification accuracy. Furthermore, at equivalent convergence rates, BERNN achieved higher classification accuracy. These findings underscore the exceptional convergence efficiency of BERNN in the classification of question text.

Figure 7. Training set model accuracy and loss.

4.6. Classification Experiments

To further explore the distinctions and effects among the models, we analyzed the precision, recall, and F1 scores for each model across different categories. The results of this analysis are presented in Figure 8. All models achieve strong performance metrics in the areas of breeding, fishery, planting, and technology consulting. However, performance is reduced in the processing and edible fungi categories, likely due to the larger dataset in the former and the smaller one in the latter. It is important to mention that all models perform better in the technology consulting category, particularly the BERT pre-trained models. This is attributed to the richer vocabulary and better semantic support in these models, resulting in superior categorization. Figure 8 demonstrates that even with small sample sizes, such as those related to processing and edible fungi categories, robust classification performance can still be attained. This indicates that the BERNN model effectively enhances the classification accuracy of short agricultural texts, highlighting its potential for applications in situations where data availability is limited.

Figure 8. Experimental comparison of model performance based on precision, recall, and F1.

To better understand the classification performance of each model across various categories, we created a confusion matrix using the test set results. Figure 9 presents this matrix, with rows representing predicted categories and columns denoting true categories. Diagonal elements correspond to correctly classified questions, whereas off-diagonal elements represent misclassifications. The analysis of the confusion matrix shows that the BERNN model outperforms other models in accurately classifying a diverse range of question categories.

Figure 9. Confusion matrix of comparative model.

Figure 9 illustrates that the classification results for “technology consulting,” “fishery” and “edible fungi” are relatively focused, with a lower frequency of misclassification. in comparison, the categories of “breeding,” “processing,” “planting,” and “diseases and insects” exhibit more cross-category confusion. An in-depth analysis of the question texts reveals that the boundaries between these categories are sometimes semantically ambiguous. For example, the question “the cultivation and processing methods of sweet potatoes” contains keywords related to both “planting” and “processing,” making it difficult for the model to assign it to a single category.

To address these shortcomings, future work could explore several strategies. One approach is to enrich the training data through targeted data augmentation for overlapping categories. Another is to incorporate external domain knowledge (e.g., agricultural ontologies or knowledge graphs) to help disambiguate category-specific terms. Additionally, introducing a hierarchical classification framework or multi-label classification scheme could allow the model to better handle questions that naturally span multiple categories.

4.7. Generalization Experiments

In order to examine the generalization performance of the models presented in this study, we tested them on the Tsinghua News dataset using the comparative models. This assessment aimed to determine how well the models perform on a different dataset, thereby providing insights into their robustness and applicability across various contexts. The results of this evaluation will help us understand the models’ ability to adapt to new data and their overall effectiveness in diverse scenarios. The Tsinghua News dataset is composed of 10 separate categories: realty, education, finance, science, society, politics, games, sports, stocks and entertainment. Each category includes 20,000 texts, guaranteeing an equal distribution across all categories. The dataset features a maximum text length of 38 words and an average text length of 18.8 words.

For the purpose of the experiment, the dataset was split into three subsets—180,000 texts for training, 10,000 texts for validation, and 10,000 texts for testing—resulting in a total of 200,000 texts. The experimental results of the BERNN model on the Tsinghua News test dataset are detailed in Table 9, showcasing the model’s performance across the various categories.

Table 9. Experimental comparison results on the Tsinghua News test set. Boldface highlights the performance of our proposed BERNN model.

Table 9 shows that BERNN achieves a classification accuracy of 90.76% with a low loss of 0.3 in the generalization experiment. While its performance is slightly inferior to BERT-CNN, it still demonstrates a relatively strong generalization ability across different text lengths and semantic densities, indicating that BERNN can effectively transfer its learned representations to new domains to a certain extent.

5. Discussion

This study proposes the Bidirectional Encoder Recurrent Neural Network (BERNN), a novel hybrid deep learning model that demonstrates superior performance in agricultural short text classification. By constructing a large-scale agricultural question dataset containing 110,647 entries and designing a model that integrates domain-specific semantic modeling with contextual dependency capture, BERNN effectively addresses key challenges such as short text length, semantic ambiguity, and class imbalance in agricultural texts. This section discusses the effectiveness and research significance of BERNN from three perspectives: an ablation study, comparative performance analysis, and limitations with future directions.

5.1. Ablation Study

To assess the individual contributions of each component within the BERNN architecture, we conducted ablation experiments comparing three model variants on the agricultural short text classification task:

(1): a model utilizing only the AgQsBERT encoder;
(2): a baseline model using only the BiLSTM module;
(3): the full BERNN model integrating both AgQsBERT and BiLSTM.

The results show that AgQsBERT alone (without contextual modeling) achieved an F1 score of 96.84%, indicating that domain-specific pretrained language models can effectively capture sentence-level semantic representations and improve the understanding of agricultural terminology. The BiLSTM-only model (without pretrained embeddings) obtained an F1 score of 93.88%, revealing limited capability in handling ambiguous terms and complex syntactic patterns due to its reliance on static word embeddings and sequential modeling.

The complete BERNN model outperformed both variants across all evaluation metrics, achieving an F1 score of 97.19%, with improvements of 0.35% and 3.31% over AgQsBERT and BiLSTM, respectively. Notably, the loss value of BERNN (0.12) remains comparable to that of AgQsBERT (0.11), suggesting that the integration of BiLSTM does not introduce significant overfitting or training instability despite increased model complexity. These findings confirm the effectiveness of the BERNN architecture: AgQsBERT provides a strong semantic foundation, while BiLSTM enhances contextual understanding, and their combination offers synergistic benefits in capturing the semantic and contextual features of agricultural texts.

5.2. Comparative Model Performance Analysis

To further validate the classification capability of BERNN, we conducted a comprehensive comparison with eight baseline models, including traditional neural networks (CNN, BiLSTM, BiGRU, DPCNN), hybrid architectures (BiLSTM-CNN, BiGRU-CNN), and BERT-based models (BERT-CNN, BERT-DPCNN).

The experimental results show that BERNN consistently delivers superior performance across five key evaluation metrics. Specifically, it achieved an accuracy of 97.19%, outperforming BERT-DPCNN (96.90%) and the best-performing non-BERT model DPCNN (96.62%), reflecting relative improvements of 0.29% and 0.57%, respectively. The model also attained a precision of 97.22% and a recall of 97.19%, indicating strong class discrimination and stable predictive performance. BERNN’s F1 score reached 97.19%, significantly higher than that of BiLSTM (93.88%) and BiGRU (93.80%), highlighting its capability to model the inherent semantic complexity in agricultural texts.

In addition, BERNN maintained a low loss value (0.12), which is comparable to those of BERT-CNN and BERT-DPCNN (both 0.11), suggesting stable convergence without signs of overfitting. Traditional neural models generally underperformed compared to BERT-based hybrid models, reaffirming the effectiveness of pretrained language models in domain-specific NLP tasks. However, purely BERT-based architectures exhibited slight limitations in certain metrics, underscoring the value of incorporating contextual modeling layers. By integrating sentence-level semantics with contextual representations, BERNN enhances both discriminative power and robustness, demonstrating its effectiveness in agricultural short-text classification tasks.

5.3. Limitations and Future Work

Despite its promising performance, BERNN still presents several limitations that warrant further investigation, detailed as follows.

The model demonstrates limited generalization on minority classes, indicating sensitivity to class imbalance and insufficient adaptability in low-resource scenarios.

The combined use of BERT and BiLSTM increases computational overhead during inference, posing challenges for deployment on resource-constrained edge devices and mobile platforms in agricultural settings.

Currently, the BERNN framework is trained in a supervised setting where all categories must be known in advance. As such, it cannot directly identify or handle completely new categories that were not present during training.

To improve BERNN’s practical applicability and scalability, future work may consider the following directions:

Incorporating data augmentation strategies and few-shot learning techniques to enhance robustness on rare or underrepresented categories;

Exploring model compression and lightweight design approaches to reduce inference latency and enable flexible deployment in edge computing environments;

In future work, we plan to explore open-set recognition or zero-shot learning techniques to enable BERNN to detect and adapt to novel categories in real-time agricultural question-answering scenarios.

In conclusion, BERNN presents a novel and effective modeling strategy for agricultural natural language processing, offering a solid foundation for intelligent applications in agricultural expert systems and exhibiting strong potential for broader adoption in diverse smart agriculture scenarios.

6. Results

This study addresses the limitations of existing models in extracting semantic information from agricultural texts, such as shallow model depth and inadequate feature representation, by proposing a fusion semantic extraction model named BERNN. First, a large-scale agricultural Q&A corpus was constructed, containing 110,647 text pairs across 7 categories, effectively solving the problem of the lack of publicly available datasets in the agricultural domain and providing valuable data support for downstream tasks. Then, by integrating AgQsBERT with a BiLSTM network, a deep semantic feature extraction and classification approach was designed to handle the characteristics of agricultural question texts, such as short length, sparse features, and implicit semantics, thereby significantly enhancing feature representation and classification performance.

A notable strength of the BERNN architecture lies in the symmetric design of its BiLSTM component, which processes input sequences in both forward and backward directions. This bidirectional symmetry enables the model to capture contextual dependencies in a balanced and comprehensive manner, which is particularly beneficial for fragmented or ambiguous expressions common in agricultural texts. In parallel, the AgQsBERT encoder leverages domain-specific pretraining to extract deep semantic features, enabling precise understanding of specialized terminology and subtle linguistic variations in the agricultural domain.

Experimental findings reveal that the BERNN model achieved an F1 score of 97.19% on the self-constructed agricultural Q&A dataset, outperforming mainstream models such as DPCNN, BiLSTM, BiGRU, CNN, BERT-CNN, BiLSTM-CNN, BiGRU-CNN, and BERT-DPCNN. Moreover, the model demonstrated strong generalization ability on the Tsinghua News dataset. BERNN significantly improves the utilization of semantic information in texts, providing a solid foundation for intelligent agricultural question answering systems, semantic retrieval, and decision-making support in smart agriculture. The incorporation of structural symmetry and domain-aware semantic modeling not only enhances performance but also offers a generalizable strategy for future research in domain-specific natural language processing tasks.

Author Contributions

Methodology, X.L., X.G. and J.Z.; Data curation, X.Y., L.Z. and H.Z.; Writing—original draft, M.Z.; Writing—review & editing, J.S., W.Z. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the Key R&D projects in Henan Province (251111210800, 241111211800); the Key Scientific and Technological Project of Henan Province (252102210088, 252102210146, 232102111128, 232102210079, 222102210098, 222102320181, 212102210431, 212102310087); the Ministry of Education Supply and Demand Matching Employment Nurturing Project (2023122984581); the Ministry of Education Industry–University Cooperation Collaborative Education Project (220503372133344); the Humanities and Social Science Fund of Ministry of Education (22YJCZH091); in part by the Major Special Project of Xinxiang City (21ZD003); in part by the Key Scientific Research Projects of Colleges and Universities in Henan Province (23B520003, 21A520001, 20A520013); and in part by the Henan Province Postdoctoral Support Program (HN2022165).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used to support this study are available on GitHub; interested readers can search for files at https://github.com/guoxiaojuanhist/Agriculture-QuestionDataset (accessed on 31 July 2025) and download them. Any downloading, accessing, or use of this dataset for commercial or non-academic purposes is prohibited.

Acknowledgments

The authors approved the version of the manuscript to be published. They agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AES	Agricultural Expert Systems
BERNN	Bidirectional Encoder Recurrent Neural Network
BiLSTM	Bidirectional Long Short-Term Memory
BiLSTM	Bidirectional LSTM.
NLP	Natural Language Processing
AgQsBERT	Agricultural Question Bidirectional Encoder Representations from Transformers
CKA-FSVM	Centered Kernel Alignment-based Fuzzy Support Vector Machine
BERT	Bidirectional Encoder Representations from Transformers
PDCS	Pest and Disease Interrogative Classification System
BiGRU	Bidirectional Gated Recurrent Unit
TextCNN	Text Convolutional Neural Network
Q&A	Question and Answer
ERNIE	Enhanced Representation through kNowledge IntEgration
DPCNN	Deep Pyramidal Convolutional Neural Network
Multi-CNN	Multi-scale Convolutional Neural Network
LLMs	Large Language Models

References

Wu, H.R.; Guo, W.; Deng, Y.; Wang, H.Q.; Han, X.; Huang, S.F. Review of semantic analysis techniques of agricultural texts. Trans. Chin. Soc. Agric. Mach. 2022, 53, 1–16. [Google Scholar] [CrossRef]
Elbasi, E.; Mostafa, N.; AlArnaout, Z.; Zreikat, A.I.; Cina, E.; Varghese, G.; Shdefat, A.; Topcu, A.E.; Abdelbaki, W.; Mathew, S.; et al. Artificial intelligence technology in the agricultural sector: A systematic literature review. IEEE Access 2022, 11, 171–202. [Google Scholar] [CrossRef]
Aladakatti, S.S.; Senthil Kumar, S. Exploring natural language processing techniques to extract semantics from unstructured dataset which will aid in effective semantic interlinking. Int. J. Model. Simul. Sci. Comput. 2023, 14, 2243004. [Google Scholar] [CrossRef]
Zhao, X.; Song, Y. Review of Chinese Text Mining in Agriculture. In Big Data Quantification for Complex Decision-Making; IGI Global: Hershey, PA, USA, 2024; pp. 192–218. [Google Scholar] [CrossRef]
Palanivinayagam, A.; El-Bayeh, C.Z.; Damaševičius, R. Twenty years of machine-learning-based text classification: A systematic review. Algorithms 2023, 16, 236. [Google Scholar] [CrossRef]
Duan, X.; Liu, Y.; You, Z.; Li, Z. Agricultural Text Classification Method Based on ERNIE 2.0 and Multi-Feature Dynamic Fusion. IEEE Access 2025, 13, 52959–52971. [Google Scholar] [CrossRef]
Wang, H.; Wu, H.; Wang, Q.; Qiao, S.; Xu, T.; Zhu, H. A dynamic attention and multi-strategy-matching neural network based on bert for chinese rice-related answer selection. Agriculture 2022, 12, 176. [Google Scholar] [CrossRef]
Song, J.; Huang, X.; Qin, S.; Song, Q. A bi-directional sampling based on K-means method for imbalance text classification. In Proceedings of the 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 26–29 June 2016; pp. 1–5. [Google Scholar] [CrossRef]
Feng, G.; Guo, J.; Jing, B.Y.; Sun, T. Feature subset selection using naive Bayes for text classification. Pattern Recognit. Lett. 2015, 65, 109–115. [Google Scholar] [CrossRef]
Haddoud, M.; Mokhtari, A.; Lecroq, T.; Abdeddaïm, S. Combining supervised term-weighting metrics for SVM text classification with extended term representation. Knowl. Inf. Syst. 2016, 49, 909–931. [Google Scholar] [CrossRef]
Wei, F.; Duan, Q.; Xiao, X.; Zhang, L. Classification technique of Chinese agricultural text information based on SVM. Trans. Chin. Soc. Agric. Mach 2015, 46, 174–179. [Google Scholar] [CrossRef]
Cui, X.; Shi, D.; Chen, Z.; Xu, F. parallel forestry text classification technology based on XGBoost in spark framework. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach 2019, 50, 280–287. [Google Scholar] [CrossRef]
Du, R.P.; Xian, G.J.; Kou, Y.T. Improvement and application of TF-IDF-CHI in agricultural science text feature extraction. Digit. Libr. Forum 2019, 8, 18–24. [Google Scholar] [CrossRef]
Lu, H.; Qiu, Y.Z.; Dai, X.L.; Wang, T.H. Chinese Agricultural Text Classification Using Centered Kernel Alignment-based Fuzzy Support Vector Machine. J. Gannan Norm. Univ. 2021, 6, 57–61. [Google Scholar] [CrossRef]
Li, C.; Liu, N.; Zheng, G.; Yang, J.; Dao, L. Research Review on Short Text Classification Method Based on Word Vector Model. J. Nanjing Norm. Univ. (Eng. Technol. Ed.) 2025, 25, 54–68. [Google Scholar] [CrossRef]
Yunlai, S.H.I.; Yunpeng, C.U.I. A classification method of agricultural news text based on BERT and deep active learning. J. Libr. Inf. Sci. Agric. 2022, 34, 19–29. [Google Scholar] [CrossRef]
Zhao, M.; Dong, C.C.; Dong, Q.X.; Chen, Y. Question classification of tomato pests and diseases question answering system based on BIGRU. Trans. Chin. Soc. Agric. Mach. 2018, 49, 271–276. [Google Scholar] [CrossRef]
Wang, F.J.; Wei, L.J.; An, Z.X.; Liu, Z.Z. Classification of Agricultural Short Texts Based on FastText Model. SOFTWARE 2022, 43, 27–29. [Google Scholar] [CrossRef]
Bao, T.; Luo, R.; Guo, T.; Gui, S.T.; Ren, N. Agricultural question classification model based on BERT word vector and TextCNN. J. South. Agric. 2022, 53, 2068–2076. [Google Scholar] [CrossRef]
Tang, R.; Yang, J.; Tang, J.; Aridas, N.K.; Talip, M.S.A. Design of agricultural question answering information extraction method based on improved BILSTM algorithm. Sci. Rep. 2024, 14, 24444. [Google Scholar] [CrossRef]
Chen, L.; Gao, J.; Yuan, Y.; Wan, L. Agricultural Question Classification Based on CNN of Cascade Word Vectors. In Pattern Recognition and Computer Vision. PRCV 2018; Lai, J.H., Liu, C.L., Chen, X., Zhou, J., Tan, T., Zheng, N., Zha, H., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11257, pp. 110–121. [Google Scholar] [CrossRef]
Rose Mary, C.A.; Raji Sukumar, A.; Hemalatha, N. Text based smart answering system in agriculture using RNN. arXiv 2021, arXiv:20210310498. [Google Scholar] [CrossRef]
Yang, S.Q.; Duan, X.L.; Xiao, Z.; Lang, S.S.; Li, Z.Y. Text classification of agricultural news based on ERNIE+ DPCNN+ BiGRU. Comput. Appl. 2023, 5, 1461–1466. [Google Scholar]
Jin, N.; Zhao, C.J.; Wu, H.R.; Miao, Y.S.; Li, S.; Yang, B.Z. Classification technology of agricultural questions based on BiGRU_MulCNN. Trans. Chin. Soc. Agric. Mach. 2020, 51, 199–206. [Google Scholar] [CrossRef]
Chen, P.; Guo, X.Y. Study on agricultural short text information classification based on LSTM-attention. Softw. Guide 2020, 19, 21–26. [Google Scholar] [CrossRef]
Zhao, Q. Construction of agricultural local culture text classification model based on CNN-Bi-LSTM. Inf. Technol. 2025, 03, 107–115. [Google Scholar] [CrossRef]
Wang, H.; Wu, H.; Zhu, H.; Miao, Y.; Wang, Q.; Qiao, S.; Zhao, H.; Chen, C.; Zhang, J. A residual LSTM and Seq2Seq neural network based on GPT for Chinese rice-related question and answer system. Agriculture 2022, 12, 813. [Google Scholar] [CrossRef]
Xiang, H.; Li, D.; Bai, T. Multi-label classification for agricultural text based on ALBERT-Seq2Seq model. Inf. Technol. 2024, 05, 22–29+37. [Google Scholar] [CrossRef]
Guo, X.; Wang, J.; Gao, G.; Zhou, J.; Li, Y.; Cheng, Z.; Miao, G. Efficient Agricultural Question Classification with a BERT-Enhanced DPCNN Model. IEEE Access 2024, 12, 109255–109268. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar] [CrossRef]
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar] [CrossRef]
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar] [CrossRef]
Wang, H.R.Q.; Wu, H.R.; Feng, S.; Liu, Z.C.; Xu, T.Y. Classification technology of rice questions in question answer system based on attention_densecnn. Trans. Chin. Soc. Agric. Mach. 2021, 52, 237–243. [Google Scholar] [CrossRef]
Chen, X.; Cong, P.; Lv, S. A long-text classification method of Chinese news based on BERT and CNN. IEEE Access 2022, 10, 34046–34057. [Google Scholar] [CrossRef]
Mao, K.; Xu, J.; Yao, X.; Qiu, J.; Chi, K.; Dai, G. A text classification model via multi-level semantic features. Symmetry 2022, 14, 1938. [Google Scholar] [CrossRef]
Bianbadroma; Ngodrup; Zhao, E.; Wang, Y.; Zhang, Y. Multi-level attention based coreference resolution with gated recurrent unit and convolutional neural networks. IEEE Access 2023, 11, 4895–4904. [Google Scholar] [CrossRef]
Zheng, L.M.; Qiao, Z.D.; Tian, L.J.; Yang, L. Multi-label Classification of Food Safety Regulatory Issues Based on BERT-LEAM. Trans. Chin. Soc. Agric. Mach. 2021, 52, 244–250. [Google Scholar] [CrossRef]
Li, F.; Zhao, Z.; Wang, L.; Deng, H. Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method. Appl. Sci. 2024, 14, 2989. [Google Scholar] [CrossRef]
Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar] [CrossRef]
Correa-Delval, M.; Sun, H.J.; Matthews, P.C.; Jiang, J. Appliance Classification using BiLSTM Neural Networks and Feature Extraction. In Proceedings of the 2021 IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), Espoo, Finland, 18–21 October 2021; pp. 1–5. [Google Scholar] [CrossRef]
Chen, X.; Wang, C.; Li, D.; Sun, X. A new early rumor detection model based on bigru neural network. Discret. Dyn. Nat. Soc. 2021, 2021, 2296605. [Google Scholar] [CrossRef]
Gao, S.; Li, S.J.; Cai, Z.P. A survey of Chinese text classification based on deep learning. Comput. Eng. Sci. 2024, 46, 684–692. [Google Scholar] [CrossRef]
Huang, X.R.; Sun, S.J.; Yang, X.; Peng, S.L. Recognition of Channel Codes based on BiLSTM-CNN. In Proceedings of the 2022 31st Wireless and Optical Communications Conference (WOCC), Shenzhen, China, 11–12 August 2022; pp. 151–154. [Google Scholar] [CrossRef]
Yao, J.; Wang, C.; Hu, C.; Huang, X. Chinese spam detection using a hybrid BiGRU-CNN network with joint textual and phonetic embedding. Electronics 2022, 11, 2418. [Google Scholar] [CrossRef]
Haibo, Z.; Tian, L. Research on Short Text Satirical Text Classification based on BERT-CNN Intermediate Task Transfer Model. Intell. Comput. Appl. 2023, 13, 156–160. [Google Scholar]

Figure 1. Flow of Q&A corpus generation.

Figure 2. Flow of data preprocessing.

Figure 3. Structure of BERNN. The arrows indicate the direction of information flow between components.

Figure 4. Process of embedding. The Chinese characters represent tokens of a sample sentence. [CLS] and [SEP] are special tokens used by the model.

Figure 5. Architecture of the Transformer encoder in AgQsBERT. The diagram illustrates the flow of data through multi-head attention and feed-forward layers in each encoder block (repeated ×6). Colored blocks represent key components: green for attention heads, blue for embedding and normalization operations, and yellow for feed-forward sublayers. Arrows indicate the direction of data flow across the architecture.

Figure 6. Structure of LSTM.

Figure 7. Training set model accuracy and loss.

Figure 8. Experimental comparison of model performance based on precision, recall, and F1.

Figure 9. Confusion matrix of comparative model.

Table 1. Example of corpus of Q&A.

ID	Question	Answer	Category
1	How can Mount Huangshan squirrels be raised?	The diet for artificially raised squirrels is categorized into two types: green feed and concentrated feed. Green feed primarily consists of carrots, water spinach, various fresh fruits, and other fresh vegetables. Concentrated feed includes ingredients such as wheat bran, corn flour, soybean flour, and grains, supplemented with small amounts of salt, yeast powder, bone meal, honey, and trace elements.	Breeding
2	Can clownfish be raised in fresh water?	Clownfish cannot be kept in freshwater because they are ornamental marine fish that require a saltwater environment. They can only thrive in natural seawater or in water made by adding sea salt to replicate seawater conditions. Attempting to raise clownfish in freshwater will result in their inability to adapt, ultimately leading to their death.	Fishery
3	What are the advantages of deep cultivation of wheat?	Enhancing the depth of deep cultivation in the soil can encourage the extension and expansion of wheat roots, ensuring they receive adequate nutrition. Additionally, sufficiently deep cultivation helps prevent the loss of water and fertilizers, which significantly benefits the nutrient absorption of wheat roots. This practice improves the soil’s ability to retain water and nutrients, strengthens the topsoil, promotes wheat growth, and ultimately increases yield.	Planting
4	How are potatoes processed into vinegar essence?	Potatoes will be cleaned, boiled, and added to amylase for saccharification; then, yeast will be added for fermentation into alcohol and bacteria fermentation into acetic acid. Finally, distillation and concentration are carried out to obtain a high concentration of vinasse. The whole process needs to be strictly controlled in terms of temperature, sanitary conditions, and good safety protections.	Processing
5	What are the technical aspects of growing edibles in season?	Winter mushroom planting should be temperature-controlled, moisturized (8–20 °C/80–90% humidity for flat mushrooms), and ventilated in the morning and evening to prevent smothering of the bag. The bag must be covered with heat preservation and disinfected regularly to prevent mold and mildew.	Edible fungi
6	How can AI optimize planting decisions?	AI analyzes data on weather, soil, and crop growth; combines machine learning to predict yields, pest and disease risks; optimal planting plans; and intelligently recommends sowing times, water and fertilizer dosages, etc. to help farmers accurately manage, reduce costs, and improve yields and quality.	Technology consulting
7	How can peach tree pests and diseases be controlled?	Cleaning and disinfecting the garden, alternately spraying low-toxicity pesticides, hanging insect traps, strengthening pruning and ventilation, and scientific fertilizer management.	Disease and insects

Table 2. Textual data of agricultural questions.

Category	Number of Entries	Thematic Words
Breeding	16,715	Feeding, Breeding, Preventive measures, Treatment approaches
Planting	50,412	Cultivation methods, Field control, High output
Processing	2733	Manufacturing, Production, Preservation, Craft techniques
Fishery	8065	Fish farming, Reproduction, Ponds, and Aquatic goods
Edible Fungi	3458	Shiitake, Edible fungi, Oyster mushroom, Cultivation techniques
Technology Consulting	5154	Scientific advancements, Agriculture, Growth, Technology
Disease and Insects	24,110	Disease identification, Insect pest identification

Table 3. Model hyperparameter settings.

Parameters	Value
Batch_size	128
Maximum sequence length	32
Hidden_size	768
Epoch	10
Learning_rate	5 × 10⁻⁵
Dropout	0.4
Filter_sizes	(2, 3, 4)
Num_filters	256
Optimizer	Adam
Loss function	CrossEntropy
Max_Position_Embedding	512
Attention_Heads_Num	6
Hidden_Layers_Num	6
Pooler_FC_Size	768
Pooler_Attention_Heads_Num	6
Pooler_FC_Layers_Num	3
Pooler_Perhead_Num	128
Vocab_Size	18,000
Rnn_Hidden	768
Activation	Softmax

Table 4. Results of BERNN with different LRs. Bold values indicate the best result for each metric. Loss values are not bolded due to minimal or identical differences.

LR	Loss	Accuracy (%)	Precision_w (%)	Recall_w (%)	F1_w (%)
1 × 10⁻⁵	0.12	96.97	97.03	96.98	96.98
2 × 10⁻⁵	0.11	97.09	97.14	97.09	97.09
3 × 10⁻⁵	0.11	96.96	97.01	96.96	96.96
4 × 10⁻⁵	0.11	97.05	97.10	97.05	97.05
5 × 10⁻⁵	0.12	97.19	97.22	97.19	97.19
6 × 10⁻⁵	0.12	96.63	96.69	96.63	96.63

Table 5. Results of BERNN with different dropout configurations. Bold values indicate the best result for each metric. Loss values are not bolded due to minimal or identical differences.

Dropout	Loss	Accuracy (%)	Precision_w (%)	Recall_w (%)	F1_w (%)
0.1	0.12	96.77	96.89	96.77	96.79
0.2	0.12	96.95	97.01	96.95	96.95
0.3	0.12	97.04	97.07	97.04	97.04
0.4	0.12	97.19	97.22	97.19	97.19
0.5	0.12	96.95	96.99	96.95	96.95
0.6	0.12	96.93	97.01	96.93	96.94

Table 6. Ablation experiment results. Boldface highlights the performance of our proposed BERNN model.

Model	Loss	Accuracy (%)	Precision_w (%)	Recall_w (%)	F1_w (%)
AgQsBERT	0.11	96.83	96.93	96.83	96.84
BiLSTM	0.23	93.96	93.91	93.96	93.88
AgQsBERT-BiLSTM (BERNN)	0.12	97.19	97.22	97.19	97.19

Table 7. Parameter settings for comparison models.

Model Parameters	Batch_Size	Maximum Sequence Length	Epoch	Learning_Rate	Dropout	Num_Filters	Hidden_Size
DPCNN	128	32	10	3 × 10⁻⁵	0.4	250	/
BiLSTM	128	32	10	3 × 10⁻⁵	0.4	/	128
BiGRU	128	32	10	3 × 10⁻⁵	0.4		128
CNN	128	32	10	3 × 10⁻⁵	0.4	256	/
BiLSTM-CNN	128	32	10	3 × 10⁻⁵	0.4	/	256
BiGRU-CNN	128	32	10	3 × 10⁻⁵	0.4	/	256
BERT-CNN	128	32	10	3 × 10⁻⁵	0.4	/	768
BERT-DPCNN	128	32	10	3 × 10⁻⁵	/	128	768

Table 8. Comparative model experimental results on the agriculture question test set. Boldface highlights the performance of our proposed BERNN model.

Model	Loss	Accuracy (%)	Precision_w (%)	Recall_w (%)	F1_w (%)
DPCNN	0.12	96.62	96.66	96.62	96.58
BiLSTM	0.23	93.96	93.91	93.96	93.88
BiGRU	0.22	93.86	93.85	93.86	93.80
CNN	0.17	95.23	95.22	95.23	95.20
BiLSTM-CNN	0.19	94.96	94.97	94.96	94.94
BiGRU-CNN	0.18	94.97	94.96	94.97	94.92
BERT-CNN	0.11	96.74	96.84	96.74	96.75
BERT-DPCNN	0.11	96.90	96.92	96.90	96.90
BERNN	0.12	97.19	97.22	97.19	97.19

Table 9. Experimental comparison results on the Tsinghua News test set. Boldface highlights the performance of our proposed BERNN model.

Model	Loss	Accuracy (%)	Precision_w (%)	Recall_w (%)	F1_w (%)
DPCNN	0.32	89.93	89.97	89.93	89.92
BiLSTM	0.43	86.44	86.51	86.44	86.42
BiGRU	0.45	85.32	85.28	85.32	85.27
CNN	0.35	88.81	88.84	88.81	88.81
BERT-CNN	0.28	91.05	91.01	91.01	91.02
BiLSTM-CNN	0.36	88.05	88.1	88.05	88.03
BiGRU-CNN	0.34	89	89.01	89	88.97
BERT-DPCNN	0.31	90.74	90.92	90.74	90.76
BERNN	0.3	90.76	90.81	90.76	90.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

BERNN: A Transformer-BiLSTM Hybrid Model for Cross-Domain Short Text Classification in Agricultural Expert Systems

Abstract

1. Introduction

2. Related Studies

2.1. Machine Learning-Based Agricultural Text Classification Methods

2.2. Deep Learning-Based Agricultural Text Classification Approaches

2.3. Fusion-Based Agricultural Text Classification Strategies

2.4. Advances in Large Language Models for Agricultural Text Understanding

3. Materials and Methods

3.1. Data Sources

3.1.1. Corpus Construction

3.1.2. Dataset Preprocessing

3.1.3. Dataset Analysis

3.2. Model of BERNN

3.2.1. The Architecture of BERNN

3.2.2. Layers of BERNN

3.2.3. Agricultural Question Bidirectional Encoder Representations from Transformers

3.2.4. Bidirectional Long Short-Term Memory

3.2.5. Computational Complexity of BERNN

4. Experiments and Analysis

4.1. Parameter Settings

4.2. Evaluation Metrics

4.3. Hyperparameter Selection of the BERNN

4.4. Ablation Experiments

4.5. Comparative Experiments

4.6. Classification Experiments

4.7. Generalization Experiments

5. Discussion

5.1. Ablation Study

5.2. Comparative Model Performance Analysis

5.3. Limitations and Future Work

6. Results

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics