Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution

Tang, Wentao; Hu, Zelin

doi:10.3390/agriculture15050493

Open AccessArticle

Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution

by

Wentao Tang

and

Zelin Hu

^*

School of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(5), 493; https://doi.org/10.3390/agriculture15050493

Submission received: 15 January 2025 / Revised: 23 February 2025 / Accepted: 24 February 2025 / Published: 25 February 2025

(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Currently, there is no publicly available dataset for the classification of potato pest and disease-related queries. Moreover, traditional query classification models generally adopt a single maximum-pooling strategy when performing down-sampling operations. This mechanism only extracts the extreme value responses within the local receptive field, which leads to the degradation of fine-grained feature representation and significantly amplifies text noise. To address these issues, a dataset construction method based on prompt engineering is proposed, along with a question classification method utilizing a gated fusion–convolutional neural network (GF-CNN). By interacting with large language models, prompt words are used to generate potato disease and pest question templates and efficiently construct the Potato Pest and Disease Question Classification Dataset (PDPQCD) by batch importing named entities. The GF-CNN combines outputs from convolutional kernels of varying sizes, and after processing with max-pooling and average-pooling, a gating mechanism is employed to regulate the flow of information, thereby optimizing the text feature extraction process. Experiments using GF-CNN on the PDPQCD, Subj, and THUCNews datasets show F1 scores of 100.00%, 96.70%, and 93.55%, respectively, outperforming other models. The prompt engineering-based method provides a new paradigm for constructing question classification datasets, and the GF-CNN can also be extended for application in other domains.

Keywords:

potato pests and diseases; question classification; prompt engineering; GF-CNN; large language model

1. Introduction

Potatoes, as an important crop with multiple functions, are widely cultivated around the world and are one of the four major staple crops globally [1]. However, in agricultural practice, potatoes face a variety of disease threats, among which early blight and late blight, caused by the pathogens Alternaria solani and Phytophthora infestans, respectively, are considered the most destructive fungal diseases due to their rapid spread and severe impact [2,3]. In addition to diseases, pest issues are also significant. Pests such as the Epilachna vigintioctomaculata and the Acanthoscelides obtectus cause compound damage by feeding on leaves and spreading pathogens, leading to yield losses of 15–30%. In severe cases, entire fields may be lost, significantly impacting the economic benefits of the industry. Therefore, establishing a comprehensive potato pest and disease control system is crucial for ensuring stable potato production [4].

With the development of agricultural informatization, intelligent monitoring and decision-making systems based on agricultural data services are gradually replacing traditional experience-based agricultural production models [5]. In this context, question-answering systems, due to their portability and efficiency, have become ideal tools for querying. Users interact with the system using natural language, and the system analyzes these inputs to generate useful answers, helping users obtain timely and accurate agricultural advice. Previous studies have shown that question-answering systems have demonstrated significant effectiveness in disease and pest identification, improving crop yields, and reducing pesticide use [6]. In this regard, developing a question-answering system integrated with potato pest and disease knowledge is of great practical significance for reducing the threat of potato pests and diseases and minimizing economic losses for farmers.

In the process of building a question-answering system, question classification, also referred to as intent recognition, plays a crucial role. Its core function is to map the natural language questions input by the user into predefined semantic categories, thus laying the foundation for accurately interpreting user needs and generating targeted responses. In the field of agriculture, there are two commonly used text classification methods: machine learning-based methods and deep learning-based methods [7]. In the context of classical machine learning paradigms, algorithms such as support vector machine (SVM) and naive Bayes (NB) have demonstrated significant application results. Notable studies include that by Wei et al. [8], who, after constructing a keyword classification library, performed feature word selection and weight optimization strategies using linear SVM to classify Chinese agricultural texts. Cui et al. [9] used the XGBoost parallelization algorithm within the Spark framework to accurately classify forestry texts. Espejo-Garcia et al. [10], using pesticide usage documents from the Spanish Official Plant Health Products Registration Office as their data source, compared the accuracy of SVM, NB, logistic regression, and random forest models for agricultural regulation classification. Through training, machine learning models can adapt to different text data and classification tasks, demonstrating excellent flexibility and scalability. Although machine learning-based methods can automatically learn from data, they rely heavily on manually defined features, and the quality of these features directly impacts the model’s learning efficiency and final performance [11]. Therefore, the careful design and selection of features are crucial for improving the performance of machine learning models.

Deep learning-based methods are currently the most mainstream approach in question classification tasks. Deep learning models such as convolutional neural networks (CNN), gated recurrent units (GRU), and long short-term memory networks (LSTM) can automatically learn complex feature representations from raw text data, thus eliminating the need for tedious manual feature engineering. Jin et al. [12] proposed the BiGRU-MulCNN agricultural question classification model, which achieves good classification performance even on datasets with insufficient data and unclear features. Wang et al. [13] constructed the Attention-DenseCNN model, which establishes dense connections (where each layer receives the outputs of all preceding layers as additional inputs, thereby avoiding the issue of information loss) between upstream and downstream convolutional blocks in CNNs. This effectively enhances feature propagation and gradient flow. Moreover, the attention mechanism assigns higher weights to key features, further improving the model’s focus on important information. The model significantly improved the accuracy of rice question classification. Feng et al. [14] further broke through the performance bottleneck of the model through system architecture optimization. Their proposed RIC-Net model deeply integrates a 4-layer residual network with a capsule network. This not only alleviates the gradient vanishing problem through skip connections but also effectively models semantic hierarchical relationships using dynamic routing mechanisms. The model achieved an accuracy of 98.62% on a rice knowledge classification dataset.

Pre-trained models, such as BERT (Bidirectional Encoder Representations from Transformers) [15] and ERNIE (Enhanced Representation through Knowledge Integration) [16], have proven to be effective in various natural language processing tasks and represent a significant breakthrough in the field of deep learning. Agriculture, as a highly specialized domain, involves a vast array of complex terms and proprietary vocabulary. Traditional models, such as Word2Vec and GloVe, are limited by two major drawbacks due to their static nature: first, the generated word vectors are fixed and cannot dynamically capture contextual information; second, they may fail to accurately understand domain-specific terminology in agriculture, thereby impacting task performance. In contrast, pre-trained models, trained on large-scale corpora, can dynamically generate word vectors based on context and possess stronger semantic understanding capabilities, thus effectively overcoming these two major limitations. The BERT-Stacked LSTM model proposed by Li et al. [17] utilizes multiple stacked LSTMs to learn complex semantic information from text, showing significant advantages over six other models in agricultural pest and disease question classification. Li et al. [18] improved the DPCNN model by integrating the word vectors provided by the ERNIE pre-trained model, enabling efficient identification of cotton pest and disease questions. Duan et al. [19] proposed a multimodal agricultural news text classification method, which extracts features from both text and images using ERNIE and Vision Transformer, respectively. An interactive attention mechanism is then applied to compute attention weights between the text and image features, extracting shared features and enhancing cross-modal synergy. Experimental results indicate that this multimodal model outperforms individual text or visual models in classification performance.

In response to the lack of publicly available question-answer datasets in the field of potato pests and diseases as well as the limitations of traditional question classification methods in terms of representational capacity, this paper proposes a dataset construction method that integrates prompt engineering and a question classification model based on gated fusion convolution. Specifically, a structured prompt is used to guide a large language model (LLM) in generating potato pest and disease question templates. These templates are then populated in bulk using entity information extracted from previous named entity recognition (NER) work [20], resulting in the efficient construction of the Potato Pest and Disease Question Classification Dataset (PDPQCD). Moreover, this paper introduces the gated fusion–convolutional neural network (GF-CNN) to optimize the text feature extraction process. The model captures local features through parallel multi-scale convolutions and then employs both max-pooling and average-pooling to extract salient and global statistical features. A gating unit was designed to dynamically fuse these two feature representations, enhancing feature robustness while preserving critical information.

In summary, the main contributions of this paper can be summarized as follows:

(a): The construction of the first question classification dataset for the potato pest and disease domain, which achieves effective integration of LLM with domain knowledge through prompt engineering, outperforming existing datasets in both scale and granularity of categories;
(b): The proposal of the GF-CNN question classification model with dynamic feature fusion capability. This model adapts the feature weights of max-pooling and average-pooling through a gating mechanism, enhancing the model’s representational power while maintaining good interpretability;
(c): Experimental results on the PDPQCD, Subj, and THUCNews datasets show that GF-CNN not only accurately classifies potato pest and disease questions into predefined categories but also demonstrates strong generalization ability in cross-domain classification tasks.

2. Materials and Methods

2.1. Prompt Engineering

Prompt engineering aims to improve the quality and relevance of a model’s output by carefully designing the textual prompts input into the LLM. When conducting prompt engineering, users must consider how to precisely express their needs, select appropriate keywords, and structure sentences to guide the model in generating more accurate and relevant responses. For example, if the goal is to have the LLM analyze the sentiment of an article, a simple and vague prompt might be “Analyze this article”, whereas an optimized prompt might be more specific, such as “Please describe the primary sentiment of this article and provide specific examples to support your viewpoint”. Prompt engineering is not only about the skill of asking questions but also involves an understanding of the model’s responses. Through precise prompts, one can better leverage the potential of the LLM.

When writing effective prompts, keep the following principles in mind:

(a): Clarity and specificity: Avoid using ambiguous terms. For instance, when dealing with numerical issues, specify exact numbers for the desired output instead of using vague terms like “some” or “a few”;
(b): Role-playing: Specify the roles of the LLM and the user. For example, if you are seeking weight loss advice, using the prompt “Assume you are a professional personal fitness trainer, and I am your client. Please use your expertise and consider my physical condition to create a detailed weight loss plan for me” will often yield more professional advice and suggestions than simply saying, “Give me a weight loss plan”;
Of course, if the role definition is not clear enough, it may lead to ambiguous or irrelevant outputs from the model. Additionally, when an LLM is assigned a specific role, its responses may become overly constrained by stereotypical assumptions associated with that role. These issues also require the user to make flexible adjustments based on the specific context;
(c): Output formatting: To achieve more consistent output, specify the format in which you would like the LLM to deliver the response. LLMs can handle various common data formats, such as TXT and JSON.

2.2. Classification of Intentions

After the preliminary organization of the data from the team’s previous NER work, we classified the questions into 11 categories. This means that if we successfully construct the question-answering system, the system will be able to answer the 11 types of questions shown in Table 1.

2.3. Dataset Template Generation Based on Prompt Engineering

After defining the question categories, we began generating question classification dataset templates using the LLM. In this study, the ChatGPT 4 model was chosen, and taking the “disease symptoms” category as an example, the prompt and model’s response are shown in Figure 1.

The prompt is divided into four parts: Role Definition, Task Description, Task Requirements, and Output Format. This structure is designed to make the process easier for readers to follow. However, in actual input to the model, these sections should be omitted. In the Role Definition section, the LLM is specified as an experienced NLP engineer. The implicit intention here is to guide the model toward processing NLP-related content and to communicate the real need for constructing a PDPQCD. The Task Description section further refines the requirements, asking the LLM to generate training templates specifically for the “Disease Symptoms” category. In the Task Requirements section, constraints are placed on the content generated by the LLM. The generated questions should simulate real-world scenarios and the way people naturally ask questions, including short, conversational queries and longer, descriptive sentences. Additionally, placeholders like [Disease] are used in place of specific disease names, allowing for the templates to be adapted to any particular disease during practical use. At the end of each question, the label “disease symptoms” is added, directly tagging the question for subsequent classification tasks. Finally, the Output Format section provides two examples to further standardize the model’s output, ensuring consistency across all generated content. This step guarantees that the results are structured and ready for use in the intended application, making the training data clean, organized, and effective for building the intent recognition model.

Using prompt engineering to generate dataset templates offers several distinct advantages over manual construction: First, in terms of efficiency and speed, LLMs can generate large volumes of data in a very short time, whereas manual writing may take several hours or even days. Second, in terms of diversity and creativity, manual construction is often limited by an individual’s imagination and linguistic habits, whereas LLMs, trained on large-scale text corpora, are capable of producing more diverse and innovative sentence structures, thereby increasing the coverage and complexity of the corpus. Lastly, in terms of customizability, LLMs can flexibly adjust the output based on the input prompts, allowing for the generation of text in specific styles or formats as required.

2.4. Construction of a PDPQCD

After generating various question templates using the LLM, a manual review of each template was conducted to assess alignment with question topics and human linguistic conventions. Templates that were deemed irrelevant were excluded. Finally, we bulk-imported the potato pest and disease entities extracted from the previous NER work to replace the placeholders in the templates. Using the data from the “disease symptoms” category as an example, the algorithmic logic is shown in Figure 2.

Three lists—Diseases, Templates, and Questions—were initialized. The Diseases and Templates lists store potato disease entities and question templates, respectively, while the Questions list is initially empty and is used to store the generated real data. The algorithm iterates over the Diseases list externally and the Templates list internally. During each iteration, the algorithm identifies placeholders in the templates and replaces them with the corresponding disease entities, generating a real data. Simultaneously, a counter assigns a unique identifier to each question to facilitate the subsequent counting of questions in that category. Finally, by iterating through the Questions list, all real data for the “disease symptoms” category can be obtained.

Through the above method, a dataset containing 20,260 entries was finally constructed, with its distribution shown in Figure 3. As can be seen from Figure 3, the data in the dataset exhibit a certain degree of uneven distribution. This is due to the differing number of entities in each category and the varying number of templates retained for each category, with both factors collectively contributing to this discrepancy.

3. Model Structure

This study utilizes BERT as the pre-trained model, upon which the proposed GF-CNN is employed for feature extraction. Finally, the softmax function is applied for question classification. The overall architecture of the system is illustrated in Figure 4.

3.1. BERT

BERT, proposed by Devlin et al. [15], is a pre-trained language model. Compared to other language models, such as FastText and GloVe, BERT provides dynamic word embeddings, meaning that the generated word vectors are dynamically adjusted based on the context in which the word appears. This allows the model to distinguish different meanings of the same word in different contexts.

The core innovation of BERT lies in its use of two pre-training tasks: masked language model (MLM) and next-sentence prediction (NSP). In the MLM task, BERT randomly masks some words in the input data and then predicts the original words for these masked positions. This process requires the model to accurately understand and retain contextual information. The NSP task, on the other hand, enhances the model’s understanding of sentence relationships by predicting whether the second sentence fragment is a continuation of the first sentence fragment.

3.2. GF-CNN

Since max-pooling efficiently reduces the dimensionality of feature maps without significantly increasing computational cost, most question classification models, such as TextCNN [21] and DPCNN [22], utilize max-pooling for downsampling. Max-pooling is highly effective in scenarios where it is necessary to highlight strong features. However, relying solely on max-pooling has some drawbacks. First, there is the issue of information loss. Max-pooling only retains the maximum value, and any non-maximum values within the feature region, regardless of their importance, are ignored. This can make the model less sensitive to subtle changes in the input data, causing it to lose information that, although not reflected in the local maximum, is important at a global level. Second, max-pooling is particularly sensitive to noise. Since max-pooling focuses on the maximum value, incidental high values in the data might be mistakenly interpreted as important features, which could affect the overall performance and reliability of the model.

To address these issues, this paper proposes the GF-CNN model, which simultaneously incorporates both max-pooling and average-pooling strategies. Through a gating mechanism, it dynamically adjusts the weights of the outputs from the different pooling layers, allowing the model to leverage both local important features and global distributional information, thus achieving a richer data representation. The structure of the GF-CNN model is shown in Figure 5.

For the original input features, X, the model first applies three convolutional kernels of different sizes to extract multi-scale information, resulting in feature maps U₁, U₂, and U₃. Next, U₁, U₂, and U₃ are concatenated together.

U = Concat (U_{1}, U_{2}, U_{3})

(1)

After activating the fused features, both max-pooling and average-pooling are applied separately.

S_{1} = P_{\max} (ReLU (U))

(2)

S_{2} = P_{avg} (ReLU (U))

(3)

The gating fusion unit assigns weights to S₁ and S₂ and performs a weighted fusion.

V = Gated_fusion (S_{1}, S_{2})

(4)

The structure of the gating fusion unit is illustrated in Figure 6. Suppose the gating fusion unit receives two inputs, A₁ and A₂. First, two learnable parameters, w₁ and w₂, are initialized for these inputs. Then, the sigmoid function is applied to generate the weight representation B.

B = Sigmoid (w_{1} \cdot A_{1} + w_{2} \cdot A_{2})

(5)

Subsequently, the two parts are weighted and summed according to the weight representation B, resulting in the final output O.

O = B ⊙ A_{1} + (1 - B) ⊙ A_{2}

(6)

In summary, given inputs A1 and A2, the two are first linearly transformed via learnable parameter matrices w₁ and w₂, respectively. The transformed results are then element-wise added together to generate interaction features. These features are subsequently mapped to the interval (0, 1) using a sigmoid function, forming a weight matrix B. This weight reflects the importance of A1 across various feature dimensions, while the weight of A2 is automatically assigned as 1-B. The final output O is obtained by computing the weighted sum of A1 and A2 with weights B and 1-B, respectively. The gated fusion unit models the interaction relationships between inputs in a parametric manner, enabling fine-grained feature adaptive fusion. The normalization constraint further prevents redundancy or conflicts in weight allocation.

GF-CNN extracts and integrates features at multiple levels, enabling the model to respond to various data types more flexibly and effectively, thereby enhancing its generalization ability.

3.3. Softmax Function

The softmax function is an activation function commonly used in multi-class classification tasks. It transforms a real-valued vector into a probability distribution. Softmax is primarily applied to the final layer of a neural network to ensure that the output values represent probabilities, meaning that the sum of all output values equals 1, and each value falls within the range (0, 1).

For an input vector z = [z₁, z₂, …, z_n], the calculation formula of softmax is as follows:

Softmax (z_{i}) = \frac{e^{z_{i}}}{\sum_{j = 1}^{n} e^{z_{j}}}

(7)

In the formula,

e^{z_{i}}

represents the exponential operation applied to each input, while the denominator is the sum of the exponentials of all the inputs.

4. Experiments and Analysis

4.1. Datasets

The Potato Pest and Disease Intention Recognition Dataset was divided into three subsets: training set, testing set, and validation set, with a ratio of 7:2:1. The validation set was used for fine-tuning the model’s hyperparameters and monitoring its performance during training. Importantly, the three datasets are mutually exclusive, with no overlap or cross-contamination between them. As a result, the experimental results obtained from the testing set served as a reliable benchmark for evaluating the model’s performance.

4.2. Experimental Setup

The experiments were conducted on a system running Ubuntu 20.04, with a NVIDIA GeForce RTX 4090D GPU. The programming language used was Python 3.8, and the framework employed for model development was PyTorch 2.0.0. The pre-trained model utilized was bert-base-chinese, which was downloaded from Huggingface. This model consists of 12 layers, with a 768-dimensional hidden state and a 12-head multi-head attention mechanism. Other parameter settings are detailed in Table 2.

Precision (P), recall (R), and the F1 score were selected as evaluation metrics for the task. The F1 score is the harmonic mean of P and R, and it reflects the overall performance of the model. The specific calculation process is shown in Equations (8)–(10):

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N;}

(9)

F 1 = \frac{2 \times P \times R}{P + R}

(10)

where TP denotes the number of samples correctly predicted as the positive class, FP refers to the number of samples incorrectly predicted as the positive class, and FN is the number of samples incorrectly predicted as the negative class.

4.3. Experimental Results and Analysis

In this study, six models—TextCNN [21], DPCNN [22], RCNN [23], LSTM-Attention [24], Stacked-LSTM [17], and Co-LSTM [25]—were compared with the proposed GF-CNN through a series of experiments. All models utilized word embeddings provided by BERT. The experimental results are presented in Table 3.

As shown in Table 3, all models achieved impressive results, with P, R, and F1 all exceeding 98%. This can be attributed to two key factors:

(1): High-quality word embeddings: Since BERT was pre-trained on a large-scale corpus, it captured rich contextual and semantic features, thereby providing high-quality word vector representations for downstream tasks;
(2): High-quality dataset: The PDPQCD dataset underwent thorough scrutiny and filtering after training templates were generated through LLM, ensuring its exceptional quality. This result aligns with our ultimate goal of building an intelligent question-answering system, one that accurately understands user intentions.

Given that the GF-CNN model did not exhibit significant differentiation from other models on the PDPQCD dataset, the next section focused on experiments with several public datasets to further validate the GF-CNN model’s effectiveness and generalizability.

5. Experiments on Public Datasets

5.1. Public Dataset

This section selects the Subj and THUCNews datasets for further experimentation. The Subj dataset contains 10,000 sentences, which are used to distinguish between subjective and objective sentences. Of these, 5000 sentences are labeled as subjective, and the other 5000 sentences are labeled as objective. THUCNews, provided by Tsinghua University, consists of 10 topic categories, such as education, entertainment, finance, and so on. Detailed information about both datasets is shown in Table 4.

5.2. Experimental Results

5.2.1. Performance Comparison of Different Models

Table 5 compares the performance of seven text classification models on the Subj and THUCNews datasets. The experimental results show that the GF-CNN model demonstrates significant classification advantages, with F1 scores of 96.70% and 93.55% on the two datasets, respectively, leading all models. In contrast, the TextCNN and LSTM-Attention models exhibit strong stability, consistently performing in the second tier across both datasets of different natures. The RCNN and Stacked-LSTM architectures, however, show clear performance limitations, with F1 scores of only 95.15% and 94.60% on the Subj dataset and even lower performance than the baseline (92.23% and 91.82%) on the THUCNews dataset. Notably, Co-LSTM and DPCNN exhibit significant data sensitivity; the former performs excellently on Subj but poorly on THUCNews, while the latter shows the opposite trend.

5.2.2. Confusion Matrix

Figure 7a and Figure 7b respectively present the confusion matrices on the Subj and THUCNews datasets. As an important visualization tool for classification tasks, the confusion matrix clearly reflects the model’s classification performance across different categories. In these matrices, each row represents the true category, while each column represents the predicted category. The numbers in the cells indicate the frequency with which a specific category was predicted as the corresponding category. Darker colors indicate higher frequencies, while lighter colors represent lower frequencies. In the Subj dataset, the classification results for the “Objective” category are slightly better than those for “Subjective”. In the THUCNews dataset, categories such as “Sports” and “Education” are classified with relatively high accuracy, while categories like “Science” and “Stocks” show slightly poorer performance.

5.2.3. Performance Comparison of Different Feature Fusion Methods

This subsection selects the following three methods to fuse the features processed by maximum pooling and average pooling, and compares them with the gating mechanism. The experimental results on the Subj and THUCNews datasets are presented in Figure 8a and Figure 8b, respectively.

Concatenation: The features processed by max-pooling and average-pooling are 2D tensors of the shape [Batch_size, Embedding_dim]. These tensors are concatenated along the last dimension, resulting in a tensor of the shape [Batch_size, 2×Embedding_dim].

Averaging: The two tensors are summed element-wise, and then, the mean is taken without changing the shape of the tensor.

Soft attention mechanism: First, the two tensors are concatenated along a new dimension, resulting in a three-dimensional tensor with the shape [2, Batch_size, Embedding_dim]. Subsequently, the softmax function is applied along this new dimension to obtain the weight representations for each tensor. These weights are then used to perform element-wise multiplication along the new dimension, followed by summation to obtain the fused tensor. The resulting fused tensor has the shape [Batch_size, Embedding_dim].

From Figure 8, it can be observed that the three alternative feature fusion methods—concatenation, averaging, and soft attention mechanism—all result in lower F1 scores compared to the gating fusion unit. Among them, the performance drop is most significant with the averaging method. This is likely because averaging simply takes the mean of the features obtained from max-pooling and average-pooling, disregarding the fact that different features may carry varying levels of importance and information. Despite the decline in performance, these feature fusion methods still outperform other baseline models in most cases. This could be attributed to the complementary nature of max-pooling and average-pooling, which together provide a more comprehensive representation of the data. By effectively combining these two pooling strategies, the extracted features are richer and more robust, even with simpler fusion methods.

5.2.4. Ablation Study

This section presents an ablation experiment to evaluate the impact of different model components on overall performance. The experimental results are shown in Table 6.

From Table 6, we can observe the following: (1) The method of generating word embeddings has a decisive impact on model performance. When BERT was replaced with Word2Vec, the F1 scores of the model on the three datasets decreased by 1.43%, 2.5%, and 2.33%, respectively. This significant difference confirms that BERT has a clear advantage over traditional static word embedding methods in capturing contextual semantic information and generating high-quality word vectors. (2) The effectiveness of the multi-scale feature extraction mechanism is thus validated. Experimental results show that using a single convolutional kernel (kernel_size = 3 × 768) led to a decrease in F1 scores compared to the multi-kernel combination (kernel_size = 2 × 768, 3 × 768, 4 × 768). This is attributed to the fact that convolutional kernels at different scales can capture local language features and long-range dependencies in parallel. (3) In the comparison of pooling strategies, max-pooling demonstrated a relative advantage over average-pooling. However, when used individually, both methods result in information loss. Therefore, it is recommended in practice to adopt a dynamic weighted fusion strategy to balance key feature retention and overall information consistency.

6. Discussion

Although this study has made certain progress in classifying questions related to potato pest and disease, there are still some limitations that need to be addressed in future research.

First, at the data construction level, while the dataset generated based on LLM ensures standardization and consistency, it carries the potential risk of homogeneity. These highly standardized training samples may weaken the model’s ability to generalize to natural language variations, especially in real-world application scenarios where user inputs often involve non-standard linguistic phenomena such as spelling errors, grammatical mismatches, and semantic ambiguity. However, the models trained in a controlled environment have not yet undergone rigorous multimodal noise testing. In the future, we plan to collect natural language expressions from real-world scenarios and combine data augmentation strategies to construct a hybrid dataset with ecological validity.

Second, in terms of model architecture, although the existing GF-CNN network demonstrates good feature extraction capabilities, there is still a gap between its computational complexity and the real-time response requirements in agricultural production scenarios. The research team plans to implement a dual-path optimization approach: on one hand, we will advance lightweight architecture improvements, reducing the number of parameters through channel pruning and quantization compression; on the other hand, we will explore knowledge distillation techniques to improve computational efficiency while maintaining classification accuracy.

7. Conclusions

This study has addressed the current lack of high-quality question-answer datasets in the field of potato pest and disease and achieved dual breakthroughs at the methodological level. On the one hand, it innovatively constructed the first domain-specific question-answer dataset for this field, and on the other hand, it designed a classification model with feature fusion advantages. Specifically, this research proposes a question classification dataset construction method based on prompt engineering. First, prompts are used to guide the LLM in generating question templates, and then, entities extracted from previous NER work are batch-inserted into the question templates, resulting in the construction of a high-quality PDPQCD. Additionally, this study introduces the GF-CNN question classification model, which employs (a) a parallel dual-pooling structure, using max-pooling to capture key semantic features and average-pooling to retain global statistical information, and (b) a learnable gating fusion module to dynamically adjust the feature weights from the dual channels. Experimental results show that GF-CNN achieves F1 scores of 100.00%, 96.70%, and 93.55% on the PDPQCD, Subj, and THUCNews datasets, respectively, outperforming other models.

Future work includes the completion of a potato pest and disease intelligent question-answering system, with exploration of LLM integration to enhance text-generation capabilities.

Author Contributions

Conceptualization, W.T. and Z.H.; methodology, W.T. and Z.H.; software, W.T.; validation, W.T.; formal analysis, W.T.; investigation, W.T. and Z.H.; resources, W.T.; data curation, W.T. and Z.H.; writing—original draft, W.T.; writing—review and editing, Z.H.; supervision, Z.H.; project administration, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

Key Discipline Construction of Gannan Normal University (220108), Science and Technology Project of Jiangxi Provincial Department of Education (490164), National Key R&D Program of China (2017YFD0701600), and Gannan Normal University Talent Fund (13SJJ202130).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data and code are available at https://github.com/Tracyyytao/Intent-Recognition/tree/master (accessed on 23 February 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.; Feng, H.; Yue, J.; Li, Z.; Yang, G.; Song, X.; Yang, X.; Zhao, Y. Remote-sensing estimation of potato above-ground biomass based on spectral and spatial features extracted from high-definition digital camera images. Comput. Electron. Agric. 2022, 198, 107089. [Google Scholar] [CrossRef]
Meno, L.; Escuredo, O.; Rodríguez-Flores, M.S.; Seiji, M.C. Looking for a sustainable potato crop. Field assessment of early blight management. Agric. For. Meteorol. 2021, 308, 108617. [Google Scholar] [CrossRef]
Saffer, A.; Tateosian, L.; Saville, A.C.; Yang, Y.; Ristaino, J.B. Reconstructing historic and modern potato late blight outbreaks using text analytics. Sci. Rep. 2024, 14, 2523. [Google Scholar] [CrossRef] [PubMed]
Zhu, H.; Shi, W.; Guo, X.; Lyu, S.; Yang, R.; Han, Z. Potato disease detection and prevention using multimodal AI and large language model. Comput. Electron. Agric. 2025, 229, 109824. [Google Scholar] [CrossRef]
Ferro, M.V.; Sørensen, C.G.; Catania, P. Comparison of different computer vision methods for vineyard canopy detection using UAV multispectral images. Comput. Electron. Agric. 2024, 225, 109277. [Google Scholar] [CrossRef]
Yang, T.; Mei, Y.; Xu, L.; Yu, H.; Chen, Y. Application of question answering systems for intelligent agriculture production and sustainable management: A review. Resour. Conserv. Recycl. 2024, 204, 107497. [Google Scholar] [CrossRef]
Guo, X.; Wang, J.; Gao, G.; Zhou, J.; Li, Y.; Cheng, Z.; Miao, G. Efficient Agricultural Question Classification with a BERT-Enhanced DPCNN Model. IEEE Access 2024, 12, 109255–109268. [Google Scholar] [CrossRef]
Wei, F.; Duan, Q.; Xiao, X.; Zhang, L. Classification Technique of Chinese Agricultural Text Information Based on SVM. Trans. Chin. Soc. Agric. Mach. 2015, 46, 174–179. [Google Scholar]
Cui, X.; Shi, D.; Chen, Z.; Xu, F. Parallel forestry text classification technology based on XGBoost in spark framework. Trans. Chin. Soc. Agric. Mach. 2019, 50, 280–287. [Google Scholar]
Espejo-Garcia, B.; Martinez-Guanter, J.; Pérez-Ruiz, M.; Lopez-Pellicer, F.J.; Zarazaga-Soria, F.J. Machine learning for automatic rule classification of agricultural regulations: A case study in Spain. Comput. Electron. Agric. 2018, 150, 343–352. [Google Scholar] [CrossRef]
Tang, W.; Hu, Z. Survey of agricultural knowledge graph. Comput. Eng. Appl. 2024, 60, 63–76. [Google Scholar]
Jin, N.; Zhao, C.; Wu, H.; Miao, Y.; Li, S.; Yang, B. Classification technology of agricultural questions based on BiGRU_MulCNN. Trans. Chin. Soc. Agric. Mach. 2020, 51, 199–206. [Google Scholar]
Wang, H.; Wu, H.; Feng, S.; Liu, Z.; Xu, T. Classification technology of rice questions in question answer system based on Attention_DenseCNN. Trans. Chin. Soc. Agric. Mach. 2021, 52, 237–243. [Google Scholar]
Feng, S.; Xu, T.; Zhou, Y.; Zhao, D.; Jing, N.; Wang, H. Rice Knowledge Text Classification Based on Deep Convolution Neural Network. Trans. Chin. Soc. Agric. Mach. 2021, 52, 257–264. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019, arXiv:1904.09223. [Google Scholar]
Li, L.; Diao, L.; Tang, Z.; Bai, Z.; Zhou, H.; Guo, X. Question classification method of agricultural diseases and pests based on BERT_Stacked LSTM. Trans. Chin. Soc. Agric. Mach. 2021, 52, 172–177. [Google Scholar]
Li, D.; Bai, T.; Xiang, H.; Dai, S.; Wang, Z. Intention recognition of cotton disease and pest questions based on ERNIE and improved DPCNN. Shandong Agric. Sci. 2024, 56, 143–151. [Google Scholar]
Duan, X.; Li, Z.; Liu, L.; Liu, Y. Multimodal Chinese agricultural news classification method based on interactive attention. IEEE Access 2024, 12, 161718–161731. [Google Scholar] [CrossRef]
Tang, W.; Wen, X.; Li, M.; Chen, Y.; Hu, Z. ResiAdvNet: A named entity recognition model for potato diseases and pests based on progressive residual structures and adversarial training. Comput. Electron. Agric. 2024, 227, 109543. [Google Scholar] [CrossRef]
Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar]
Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; p. 29. [Google Scholar]
Bai, X. Text classification based on LSTM and attention. In Proceedings of the 2018 Thirteenth International Conference on Digital Information Management, Berlin, Germany, 24–26 September 2018; pp. 29–32. [Google Scholar]
Behera, R.K.; Jena, M.; Rath, S.K.; Misra, S. Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Inf. Process. Manag. 2021, 58, 102435. [Google Scholar] [CrossRef]

Figure 1. Prompt and model replies.

Figure 2. Entity import algorithm logic.

Figure 3. Data distribution.

Figure 4. Model structure.

Figure 5. GF-CNN model structure.

Figure 6. Gating fusion unit.

Figure 7. Confusion matrix.

Figure 8. Performance comparison of different feature fusion.

Table 1. Category of intent.

Categories	Examples
Disease Symptoms	What are the common symptoms of early blight?
Pest Symptoms	How does the cabbage looper damage potatoes?
Causes of Disease	What are the typical causes of late blight?
Affected Areas	Which parts of the potato are primarily affected by the beet armyworm?
Treatment Agents	What treatment agents can be used to cure potato leaf roll disease?
Alternate Names	What are the alternate names for Epicauta gorhami?
Distribution	In which regions is potato cancer disease primarily distributed?
Transmission Vectors	Through which pathways is leaf blight mainly transmitted?
Medication Uses	For which diseases and pests is Bordeaux mixture an effective treatment agent?
Overwintering Sites	Where does Cercospora solani-tuberosi overwinter?
Pest Categories	To which biological category does the pea leaf miner belong?

Table 2. Experimental parameter settings.

Parameter	Value
learning rate	2 × 10⁻⁵
batch_size	32
epoch	20
optimizer	Adam
text_len	30
filters	256
filter_size	(2, 3, 4) × 768

Table 3. Experimental result.

Model	P (%)	R (%)	F1 (%)
TextCNN	99.97	99.95	99.96
DPCNN	99.81	99.63	99.72
RCNN	99.91	99.89	99.90
LSTM-Attention	98.42	100.00	99.20
Stacked-LSTM	99.88	99.84	99.86
Co-LSTM	99.83	99.66	99.74
GF-CNN	100.00	100.00	100.00

Table 4. Dataset Details.

Datasets	Categories	Training Set	Validation Set	Test Set	Language
Subj	2	8 k	-	2 k	English
THUCNews	10	180 k	10 k	10 k	Chinese

Table 5. Performance comparison of different models.

Models		Subj			THUCNews
Models	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
TextCNN	95.79	95.84	95.80	92.52	92.49	92.49
DPCNN	94.46	94.21	94.28	92.74	92.72	92.72
RCNN	95.16	95.21	95.15	92.27	92.23	92.23
LSTM-Attention	95.45	95.50	95.45	92.60	92.56	92.56
Stacked-LSTM	94.59	94.63	94.60	91.89	91.83	91.82
Co-LSTM	95.72	95.60	95.64	92.19	92.15	92.15
GF-CNN	96.70	96.69	96.70	93.55	93.56	93.55

Table 6. Ablation study results.

Model Components	PPDQCD			Subj			THUCNews
Model Components	P (%)	R (%)	F1(%)	P (%)	R (%)	F1(%)	P (%)	R (%)	F1(%)
Word2Vec word embeddings	97.19	100.00	98.57	94.23	94.20	94.20	91.28	91.24	91.26
Using a single convolutional kernel	99.28	100.00	99.64	95.90	95.89	95.89	93.00	93.01	93.00
Using only max-pooling	99.95	99.95	99.95	95.61	95.37	95.44	92.58	92.55	92.55
Using only average-pooling	99.93	99.94	99.94	95.54	95.45	95.44	92.31	92.27	92.27
Our model	100.00	100.00	100.00	96.70	96.69	96.70	93.55	93.56	93.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, W.; Hu, Z. Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution. Agriculture 2025, 15, 493. https://doi.org/10.3390/agriculture15050493

AMA Style

Tang W, Hu Z. Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution. Agriculture. 2025; 15(5):493. https://doi.org/10.3390/agriculture15050493

Chicago/Turabian Style

Tang, Wentao, and Zelin Hu. 2025. "Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution" Agriculture 15, no. 5: 493. https://doi.org/10.3390/agriculture15050493

APA Style

Tang, W., & Hu, Z. (2025). Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution. Agriculture, 15(5), 493. https://doi.org/10.3390/agriculture15050493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Potato Disease and Pest Question Classification Based on Prompt Engineering and Gated Convolution

Abstract

1. Introduction

2. Materials and Methods

2.1. Prompt Engineering

2.2. Classification of Intentions

2.3. Dataset Template Generation Based on Prompt Engineering

2.4. Construction of a PDPQCD

3. Model Structure

3.1. BERT

3.2. GF-CNN

3.3. Softmax Function

4. Experiments and Analysis

4.1. Datasets

4.2. Experimental Setup

4.3. Experimental Results and Analysis

5. Experiments on Public Datasets

5.1. Public Dataset

5.2. Experimental Results

5.2.1. Performance Comparison of Different Models

5.2.2. Confusion Matrix

5.2.3. Performance Comparison of Different Feature Fusion Methods

5.2.4. Ablation Study

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI