Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling

Liu, Mingtang; Wu, Shanshan; Tian, Wenlong; Lei, Shuo; Miao, Jiahao

doi:10.3390/app152010932

Open AccessArticle

Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling

by

Mingtang Liu

^1,*,

Shanshan Wu

²

,

Wenlong Tian

¹,

Shuo Lei

¹ and

Jiahao Miao

¹

School of Electronic Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

²

School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10932; https://doi.org/10.3390/app152010932

Submission received: 23 September 2025 / Revised: 6 October 2025 / Accepted: 8 October 2025 / Published: 11 October 2025

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

Intent detection and slot filling are fundamental components for constructing intelligent question-answering systems in agricultural domains. Existing approaches show notable limitations in semantic feature extraction and achieve relatively low accuracy when processing domain-specific agricultural queries with complex terminology and contextual dependencies. To address these challenges, this paper proposes an agricultural knowledge-enhanced deep learning approach that integrates agricultural domain knowledge and terminology with advanced neural architectures. The method integrates HanLP-based agricultural terminology processing with BERT contextual encoding, TextCNN feature extraction, and attention-based fusion. Experimental validation on a curated domain-specific agricultural dataset of 8041 melon cultivation queries demonstrates that the proposed model achieves an accuracy of 79.6%, recall of 80.1%, and F1-score of 79.8%, demonstrating significant improvements (7–22% performance gains) over baseline methods including TextRNN, TextRCNN, TextCNN, and BERT-TextCNN models. The results demonstrate significant potential for advancing intelligent agricultural advisory systems and domain-specific natural language understanding applications, particularly for precision agriculture applications.

Keywords:

agricultural AI; deep learning; intent detection; natural language processing; slot filling; smart agriculture

1. Introduction

The rapid advancement of artificial intelligence in agriculture has created new opportunities for food security and sustainable farming. Modern precision agriculture relies on intelligent advisory systems to provide timely and context-specific recommendations, while their effectiveness depends on the ability to accurately understand farmers’ natural language queries, which often involve domain-specific terminology and context-dependent expressions. Intent recognition and slot filling play a crucial role in semantic modeling and in building efficient question-answering systems [1,2].

Agricultural question-answering systems face unique challenges that distinguish them from conventional natural language processing applications. First, the agricultural domain contains a large number of specialized vocabularies, such as crop varieties, growth stages, disease classifications, treatment methods, and environmental factors, which are rarely found in general corpora [3,4,5]. Second, farmers’ queries often exhibit complex semantic structures where intent and entity information are closely intertwined; for example, the query “What is the growth status during the fruiting stage?” requires both intent recognition and entity extraction [6]. Third, agricultural decision-making requires the integration of textual queries with environmental data such as soil conditions, climate patterns, and seasonal variations in order to provide actionable recommendations.

In natural language understanding research, methods for intent recognition and slot filling are typically categorized into pipeline approaches and joint modeling frameworks [7,8,9]. Pipeline methods treat the two tasks as independent, which can easily lead to error propagation and fail to exploit their inherent interdependencies [10,11]. Joint modeling frameworks alleviate this issue to some extent, yet most still rely on general-purpose language representations, which are insufficient to capture the specialized semantics and terminology of agricultural contexts [12,13]. Although pre-trained language models such as BERT have achieved remarkable success in NLP tasks in recent years [14,15], their performance often degrades in specialized domains like agriculture without domain adaptation strategies. For instance, on a certain agricultural knowledge platform, the daily number of user queries has exceeded several thousand, directly impacting production management and yield prediction, which highlights the urgent need for efficient agricultural question-answering systems. Previous studies have shown that hybrid architectures combining BERT with convolutional neural networks are effective in applications such as rumor detection and text classification [16], while attention mechanisms have proven valuable in selectively focusing on salient information [17]. However, research on systematically integrating these advanced architectures with agricultural domain knowledge remains relatively limited.

The primary contributions of this work are threefold: (1) We develop a comprehensive framework that systematically incorporates agricultural knowledge—including specialized terminology, growth stage classifications, and environmental parameters—into the neural natural language understanding process. (2) We design a hybrid architecture that combines the strengths of pre-trained language models (BERT) with convolutional feature extraction (TextCNN) and attention mechanisms to capture both global contextual semantics and fine-grained local patterns in agricultural queries. (3) We conduct extensive experiments on a curated agricultural dataset, achieving 79.6% accuracy, 80.1% recall, and 79.8% F1-score, with substantial gains (7–22%) over baseline methods.

To the best of our knowledge, this is one of the first studies to systematically integrate domain-specific agricultural knowledge and environmental parameters into a joint intent detection and slot filling model, thereby demonstrating the effectiveness of agricultural knowledge enhancement for practical deployment in precision agriculture applications.

The remainder of this paper is organized as follows: Section 2 presents the proposed framework and model architecture; Section 3 describes the dataset and experimental setup; Section 4 reports the experimental results and analysis; Section 5 discusses implications and potential future research; and Section 6 concludes the paper.

2. Materials and Methods

This study proposes a hybrid model integrating HanLP-based word segmentation with an enhanced BERT-TextCNN-AT architecture, whose overall structure is illustrated in Figure 1. The model comprises the following core modules: first, HanLP is employed to perform word segmentation for text representation; subsequently, the BERT model is utilized to generate deep semantic vectors; then, TextCNN is applied to extract multi-scale features; finally, an attention mechanism is introduced to achieve adaptive weighted feature fusion. This architecture provides a complete end-to-end processing pipeline from raw text input to final intent recognition.

2.1. HanLP Word Segmentation

In this study, the HanLP [18] natural language processing toolkit is employed to perform the slot filling task. For the input text, domain-specific vocabulary used in melon cultivation—such as growth stages, pest and disease terms, and growth status—is accurately mapped to the corresponding part-of-speech (POS) categories. During this process, HanLP leverages its built-in annotation model to transform the input text into POS-tagged output, ensuring that each component of the text is assigned an explicit POS label. This process ultimately completes the slot filling task and establishes a solid foundation for subsequent natural language processing tasks.

P (t_{1}, \dots t_{n} | c_{1}, \dots c_{n}) = \prod_{i = 1}^{n} P (t_{i} | {(t_{1}, \dots t_{i - 1}), (c_{1}, \dots, c_{n})})

(1)

Here, t denotes the positional index of the token, and c represents the input character sequence.

2.2. Bert Model

In the intent recognition and slot filling tasks within the domain of melon cultivation, the BERT model primarily serves as the core component for deep semantic representation learning. User queries or utterances related to melon cultivation are first tokenized and encoded into corresponding word vector sequences, with special tokens (such as [CLS] and [SEP]) added to explicitly define the input structure. During the semantic modeling process, BERT maps each token into a context-dependent vector representation, enabling domain-specific terms (e.g., “seedling cultivation,” “fertilization,” and “disease control”) to be represented more accurately in the semantic space. Based on this, the output vectors of BERT can be applied to intent recognition (using the [CLS] vector for classification to determine the user’s intent, such as “disease control inquiry” or “cultivation technique consultation”) and slot filling (using token-level outputs for sequence labeling to identify specific slot information, such as “disease name = downy mildew” and “cultivation stage = seedling period”). These feature vectors are then used as the input for the subsequent TextCNN model. The structure of the BERT model is shown in Figure 2.

2.3. TextCNN Model

TextCNN applies multiple convolution kernels of different sizes to the sequence feature vectors output by BERT, thereby extracting local semantic features at various granularities, such as phrase-level and clause-level representations. For example, it can capture key information fragments such as “flowering stage control” and “disease management.” Through max-pooling, TextCNN selects the most representative feature values from those extracted by different convolution kernels, enabling the aggregation of critical information and dimensionality reduction, thus enhancing the compactness and discriminative capability of the feature representation. The structure of the TextCNN model is illustrated in Figure 3.

In the architecture of the TextCNN model, the convolutional layer serves as the core feature extraction module, applying multi-dimensional convolution kernels over the text representation matrix through sliding window operations to capture local features. Specifically, given an input text that has been vectorized into a word embedding matrix

V (w_{i})

, the model employs a convolution kernel of dimension

F_{k} = k * d

for the computation.

c_{i} = f (F_{k} * V (w_{i}) + b)

(2)

The feature activation value computed by a single convolution kernel at a specific position is denoted as

c_{i}

. After the convolution kernel completes the sliding scan over the entire input sequence, a complete feature mapping vector C is generated. To perform feature dimensionality reduction, the model applies a pooling operation following the convolutional layer. In practical implementation, the original text sequence is first transformed into a word embedding matrix representation through the embedding layer, after which multiple sets of convolution kernels with varying sizes are applied in parallel to perform convolution operations, thereby extracting multi-level local semantic features.

2.4. Attention Mechanism

In this study, the traditional TextCNN model is enhanced with an attention mechanism. The attention layer assigns different weights to the feature sequences obtained from the fusion of BERT and TextCNN, with the core idea being to “focus the model on more important parts.” For instance, in the sentence “How to control downy mildew during the flowering stage?”, higher weights are assigned to domain-specific key information such as “downy mildew,” “flowering stage,” and “control.” In terms of model architecture, the attention layer takes the multi-level features extracted by the convolutional neural network and generates an attention-weighted vector representation v. The computation formula is as follows:

v = A t t e n t i o n (Q, K, V)

(3)

Here, Q denotes the query vector, while K and V represent the key-value pairs, respectively. Finally, the representation vector is passed through a Softmax classifier to output the probability distribution over intent categories. The computation is formulated as follows:

S o f t \max (x_{i}) = \frac{e^{x_{i}}}{\sum_{i = 1}^{n} e^{x_{i}}} \in (0, 1)

(4)

3. Data Source and Data Construction

3.1. Data Source

In this study, the corpus primarily originates from the “Encyclopedia Q&A” section of an agricultural knowledge platform. The raw corpus contains both structured and unstructured data, which were initially preprocessed through operations such as denoising and filtering. Additionally, time-series data of Cucumis melo var. inodorus at different growth stages were systematically monitored and recorded, specifically covering its complete growth cycle, detailed growth state manifestations, and closely related dynamic changes in soil temperature and humidity. Through data collection, a total of 8041 high-quality data records were accumulated, comprehensively and deeply reflecting critical information on the growth and development of Cucumis melo var. inodorus. The workflow of the data collection process is illustrated in Figure 4.

During the model training process, we divided the dataset into training, validation, and test sets with proportions of 70%, 15%, and 15%, corresponding to 5629 samples for training, 1206 samples for validation, and 1206 samples for testing. Table 1 provides the detailed sample distribution across the predefined intent categories. These valuable datasets provide a solid empirical foundation and rich feature resources for building an accurate and efficient named entity recognition model for Cucumis melo var. inodorus, thereby laying important data support for optimizing agricultural production management and improving the accuracy of Cucumis melo var. inodorus quality and yield predictions.

3.2. HanLP Slot Filling

This study encompasses two subtasks: intent recognition and slot filling, thus requiring the pre-definition of two types of labels: intents and slots. Based on a systematic analysis of domain-specific corpora and incorporating expertise in Cucumis melo var. inodorus cultivation, a multi-dimensional slot system was constructed. Through a combination of corpus statistics and domain knowledge integration, nine fundamental slots were identified, including “period,” “disease,” and “growth status.” Additionally, considering the critical role of environmental factors in Cucumis melo var. inodorus cultivation, the “soil temperature and humidity” slot was introduced. This slot effectively characterizes the soil environmental features essential for Cucumis melo var. inodorus growth by integrating historical monitoring data. Based on the ten predefined slot categories, the corpus was segmented using HanLP, resulting in a vocabulary of approximately 1800 unique tokens. The complete slot system and its semantic definitions are presented in Table 2.

The dataset was first manually annotated and then imported into the HanLP toolkit. For tokens without manual labels, HanLP automatically assigned part-of-speech tags. When the natural language input by the user is parsed by the HanLP toolkit, the system analyzes and replaces its word components. For example, “fruiting period” is replaced with “nm(period)”, and “root rot” is replaced with “nnt(disease)”. The complete part-of-speech categorization is shown in Table 3. After analysis and replacement, the system applies pre-defined syntactic structures in the program. Only when the meaning of the user’s natural language input is understood by the program can the intelligent Q&A system provide the correct response. With the support of database resources, the broader the range of pre-defined syntactic structures in the program, the more questions the system can answer, and the higher its level of intelligence.

Table 3 presents examples of slot labeling after part-of-speech annotation using the HanLP toolkit. A total of 8041 data entries were processed, resulting in 2356 unique tokens after stopword removal. Each token was mapped to its corresponding slot label to establish a consistent lexical–semantic alignment. Furthermore, all HanLP-generated annotations were manually reviewed to ensure semantic accuracy and labeling consistency.

3.3. Intent Recognition

For intent classification, this study ultimately derived the intent categories shown in Table 2 based on an analysis of the collected corpus. Through systematic analysis and annotation of the raw corpus, combined with the intent categories present in the data, we constructed a multi-dimensional intent labeling system. The specific classification results are presented in Table 4. This classification framework fully considers the intentionality features from speech act theory while incorporating domain adaptation principles, thereby ensuring the comprehensiveness of the intent categories.

4. Experimental Analysis

4.1. Experimental Environment

The specific hardware and software environment used for the experiments is shown in Table 5.

The training employed an H-Bert-TextCNN-AT algorithm, utilizing Bert-base-Chinese as the pre-trained model with 12 network layers and a hidden layer dimension of 768. During model training, the maximum number of epochs was set to 30. An early stopping strategy was applied, whereby training was terminated if the validation F1-score did not improve for six consecutive epochs, and the best weights were restored. For regularization, a dropout rate of 0.1 was introduced in the fully connected and feature fusion layers. Combined with early stopping, this effectively mitigated the risk of overfitting. Other model parameters are specified in Table 6. Under the specified hardware environment and parameter configuration, the complete training time of the model was approximately 4 h, and the average inference latency was about 6–12 ms per sample.

The hyperparameters used in this study, including the learning rate, batch size, and dropout ratio (Table 6), were selected through preliminary experiments combined with reference to common settings reported in related literature. Specifically, we conducted a grid search within reasonable ranges (learning rate: [1 × 10⁻³, 1 × 10⁻⁵], batch size: [8, 16, 32], dropout: [0.1, 0.5]) and adopted early stopping based on validation performance to prevent overfitting. The chosen configuration was learning rate = 1 × 10⁻⁵, batch size = 16, and dropout = 0.1. To verify the stability of the training process, the loss convergence curves of the training and validation sets are presented as shown in Figure 5.

4.2. Evaluation Metrics

Evaluation metrics are commonly used parameters in the field of natural language processing, including Precision, Recall, and Macro-F1. Among them, Precision measures the proportion of true positive samples among those predicted as positive by the model, and Recall reflects the model’s ability to identify positive samples, while Macro-F1 comprehensively evaluates the precision and recall of different categories through harmonic mean, avoiding evaluation bias caused by class imbalance.

Precision represents the probability of samples being truly positive among those predicted as positive, as shown in Equation (5).

P = \frac{T P}{T P + F P}

(5)

Recall represents the probability of positive samples being correctly predicted as positive, and its calculation formula is given in Equation (6).

R = \frac{T P}{T P + F N}

(6)

The F1-score is the harmonic mean, and its calculation formula is given in Equation (7).

F 1 = \frac{2 * P * R}{P + R}

(7)

TP (True Positive) denotes the number of samples correctly predicted as positive by the model. FP (False Positive) represents the number of samples incorrectly predicted as positive.

4.3. Model Comparison

To validate the effectiveness of the proposed model, this study conducts comparative experiments using the following four models: (1) TextRNN, (2) TextRCNN, (3) TextCNN, and (4) Bert-TextCNN. In classification tasks, gradient updates are typically determined by the learning rate and batch size, which govern the model’s weight adjustments as specified in Equation (8).

ω_{t + 1} = ω_{t} - η \frac{1}{n} \sum_{x \in β} \nabla l (x, ω_{t})

(8)

In the equation, n represents the batch size, and

η

denotes the learning rate. As shown in Equation (8), the model’s convergence capability is influenced by both the learning rate and the batch size. The model can be optimized by adjusting these parameters. By varying the learning rate and the batch_size parameter within 30 iterations and observing the changes in classification accuracy and loss values, the optimal model under the best parameters can be determined.

The experimental results of the four classification algorithm models are presented in Table 7, which records the precision, recall, and F1-score achieved by each algorithm in this study.

As shown in Table 7, the model proposed in this study outperforms the other four models in overall performance, achieving an F1-score of 79.8%, which demonstrates the effectiveness of the H-Bert-TextCNN-AT model. However, in practical short-text prediction tasks, models are prone to overfitting, whereas H-Bert-TextCNN-AT exhibits superior generalization capabilities and better data stability. Additionally, H-Bert-TextCNN-AT demonstrates higher efficiency and accuracy in processing short texts, enabling rapid extraction of salient features from the text. All comparative algorithms were executed over five independent runs, and the corresponding confidence intervals were computed for each, thereby providing a more intuitive reflection of the stability and reliability of model performance.

Figure 6a presents the ablation experiment, the results of which demonstrate that the proposed algorithm exhibits significant effectiveness in both intent recognition and slot filling tasks. The observed performance gains validate the rationality of the model architecture design and the necessity of each module; Figure 6b shows the hyperparameter experiment, where the results indicate that by tuning the input sequence length of the BERT model, the proposed H-BERT-TextCNN-AT model achieves optimal performance at a sequence length of 70, with its overall evaluation metrics substantially outperforming other parameter configurations.

5. Discussion

5.1. Performance Analysis and Model Contribution

The proposed H-Bert-TextCNN-AT model achieves an F1-score of 79.8%, outperforming all baseline methods and validating the effectiveness of integrating do-main-specific preprocessing with hybrid contextual and local feature extraction. The performance gains can be attributed to three key architectural innovations working synergistically.

First, the integration of HanLP-based word segmentation provides domain-specific preprocessing that effectively addresses agricultural terminology recognition challenges. Unlike general-purpose tokenization methods, HanLP’s domain-aware segmentation enables accurate identification of compound agricultural terms such as “fruiting period” and specialized disease names, which are critical for effective slot filling in agricultural con-texts.

Second, the hybrid BERT-TextCNN architecture combines the strengths of contextual understanding and local feature extraction. While BERT provides rich semantic representations that capture long-range dependencies in user queries, TextCNN effectively extracts multi-granularity local features that are particularly important for identifying specific agricultural concepts and entities. This combination addresses the limitation of using either approach independently, as evidenced by the substantial improvement over the individual BERT-TextCNN baseline (79.8% vs. 72.8% F1-score).

Third, the attention mechanism enables adaptive weighting of features, allowing the model to focus on domain-relevant terms such as growth stages, pest names, and control methods. This targeted attention proves particularly valuable in agricultural question-answering scenarios where certain keywords carry disproportionate semantic importance.

5.2. Comparison with Existing Approaches

When compared to recent work in agricultural natural language processing, our approach demonstrates notable advantages. The agricultural intent detection method pro-posed by Hao et al. (2023) [13] in “Joint agricultural intent detection and slot filling based on enhanced heterogeneous attention mechanism” achieved similar objectives but focused primarily on heterogeneous attention mechanisms without incorporating domain-specific preprocessing. Our integration of HanLP-based knowledge enhancement provides a more systematic approach to handling agricultural terminology, which directly contributes to the observed performance improvements.

Furthermore, compared to general-purpose joint models evaluated on standard datasets, our domain-specific approach highlights the complexity challenges inherent in specialized domains. The performance gap between general benchmark results and our agricultural domain performance (79.8% F1-score) underscores the necessity of do-main-specific enhancements rather than direct application of general-purpose models.

The inclusion of soil temperature and humidity slots in our framework represents a novel contribution to agricultural NLP, extending beyond traditional linguistic features to incorporate environmental context. This multimodal consideration aligns with the practical requirements of precision agriculture systems, where environmental factors significantly influence crop management decisions.

5.3. Limitations and Challenges

Despite the encouraging results, several limitations warrant discussion. First, the dataset size of 8041 queries, while substantial for a domain-specific corpus, remains orders of magnitude smaller than standard benchmarks like SNIPS or ATIS. This limitation potentially affects the model’s ability to capture the full diversity of agricultural terminology and user query patterns.

Second, the current approach focuses specifically on melon cultivation, raising questions about generalizability to other crops and agricultural domains. While the framework architecture is designed to be adaptable, the domain-specific components would require reconfiguration for different agricultural contexts.

Third, the performance ceiling at approximately 80% F1-score suggests remaining challenges in semantic understanding. Error analysis reveals that the model struggles with ambiguous queries and novel terminology combinations not well-represented in the training data.

The computational complexity of the hybrid architecture also presents practical deployment considerations, potentially limiting real-time applications in re-source-constrained agricultural environments.

5.4. Implications for Agricultural AI Systems

Despite these limitations, the demonstrated improvements provide valuable insights into the practical deployment potential of the model. The results have important implications for developing intelligent agricultural advisory systems. The effectiveness of do-main-specific knowledge enhancement suggests that agricultural AI applications benefit significantly from specialized preprocessing and terminology handling, rather than relying solely on general-purpose language models.

The integration of environmental sensor data (soil temperature and humidity) with linguistic processing represents a step toward comprehensive agricultural decision sup-port systems. This approach aligns with the broader trend toward precision agriculture, where multiple data sources inform farming decisions. Such multi-source integration can further enable predictive analytics for crop yield estimation and early warning systems for pests and diseases. Future agricultural AI systems could build upon this foundation to incorporate additional modalities such as weather data, soil composition, and crop imagery.

The performance improvements over baseline methods validate the potential for deploying such systems in practical agricultural advisory contexts, particularly for ad-dressing common farmer queries about cultivation practices, pest management, and growth optimization. However, the current performance levels suggest that such systems would likely function best as decision support tools rather than autonomous advisory systems.

5.5. Future Research Directions

Several promising directions emerge from this work. First, since this experiment focuses on melon cultivation, its applicability to other crops has not yet been empirically validated. In future research, extending this method to multiple crops and agricultural domains should be a key priority, as it would enhance its practical utility and provide new insights into cross-domain transfer learning in agricultural contexts. Second, improving the model’s handling of multi-intent queries represents an important technical challenge, as real-world agricultural queries often contain multiple related questions.

Third, integration with broader agricultural information systems presents opportunities for enhanced functionality through connecting with comprehensive databases, weather services, and precision agriculture platforms. Fourth, addressing computational efficiency through model compression or edge computing deployment would enhance practical applicability in resource-constrained environments.

Finally, longitudinal evaluation in real agricultural settings would provide valuable insights into practical deployment challenges and system effectiveness in supporting actual farming decisions, crucial for transitioning from research prototypes to production-ready agricultural AI systems.

6. Conclusions

This study tackles the critical challenge of natural language understanding in agricultural question-answering systems by developing an agricultural knowledge-enhanced deep learning framework for joint intent detection and slot filling. Experimental validation on a curated dataset of 8041 melon cultivation queries demonstrates substantial performance improvements. The proposed H-Bert-TextCNN-AT model achieved 79.6% accuracy, 80.1% recall, and 79.8% F1-score, representing significant gains of 7–22% over baseline methods including TextRNN, TextRCNN, TextCNN, and BERT-TextCNN models, validating the effectiveness of integrating specialized agricultural knowledge with hybrid neural architectures for practical agricultural applications.

The key contributions of this work advance the field of agricultural artificial intelligence in three fundamental ways. First, we developed a comprehensive framework that systematically integrates agricultural domain knowledge, specialized terminology, and environmental parameters into neural language understanding processes. Second, we designed a novel hybrid architecture that synergistically combines BERT contextual understanding, TextCNN local feature extraction, and attention-based fusion mechanisms. Third, we demonstrated the superior effectiveness of knowledge enhancement over general-purpose approaches in agricultural contexts. These innovations collectively represent a significant step toward bridging the gap between complex agricultural expertise and accessible intelligent advisory systems.

While the current study focuses on melon cultivation with a domain-specific dataset, the demonstrated improvements provide valuable insights into the broader potential of agricultural language understanding systems. The research contributes to the growing field of precision agriculture by providing a practical framework for developing intelligent advisory systems. However, the applicability of the proposed approach to other crops has not yet been empirically validated. Looking ahead, extending this framework to multiple crop domains represents not only a critical priority for verifying its generalizability and robustness, but also an essential step toward advancing intelligent agricultural question-answering systems toward broader practical deployment by integrating additional environmental modalities for context-aware multimodal agriculture NLP, and evaluating deployment effectiveness in real-world settings. In addition, it is necessary to address the dynamic evolution of agricultural vocabularies and the potential impact of regional linguistic variations, and to explore corresponding mitigation strategies such as dynamic lexicon expansion or continual learning approaches. As agricultural digitalization continues to advance, domain-specific natural language understanding systems will play an increasingly critical role in supporting global food security and sustainable farming practices. This research underscores its importance for shaping the next generation of intelligent agricultural advisory systems at the intersection of artificial intelligence and agricultural science.

Author Contributions

Conceptualization, M.L. and S.W.; methodology, W.T., S.W., S.L. and J.M.; validation, S.L., W.T. and J.M.; formal analysis, S.W.; investigation, W.T., S.L. and J.M.; resources, M.L.; data curation, S.W.; writing—original draft preparation, S.W.; writing—review and editing, M.L., S.W., W.T., S.L. and J.M.; visualization, S.W.; supervision, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Weld, H.; Huang, X.; Long, S.; Poon, J.; Han, S.C. A survey of joint intent detection and slot filling models in natural language understanding. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
Cai, S.; Ma, Q.; Hou, Y.; Zeng, G. Semantically Guided Enhanced Fusion for Intent Detection and Slot Filling. Appl. Sci. 2023, 13, 12202. [Google Scholar] [CrossRef]
Yan, C.; Zhenghang, L. Pre-Trained Joint Model for Intent Classification and Slot Filling with Semantic Feature Fusion. Sensors 2023, 23, 2848. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Yu, M.; Chen, Y.; Xu, J. Cross-domain slot filling as machine reading comprehension: A new perspective. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 673–685. [Google Scholar] [CrossRef]
Wu, Y.; Mao, W.; Feng, J. AI for Online Customer Service: Intent Recognition and Slot Filling Based on Deep Learning Technology. Mob. Netw. Appl. 2021, 27, 2305–2317. [Google Scholar] [CrossRef]
Wang, C.; Zeng, X.; Wang, Y.; Sun, Y.; Zou, Z. The Joint Model of Multi-Intent Detection and Slot Filling Based on Bidirectional Interaction Structure. Wuhan Univ. J. Nat. Sci. 2025, 30, 21–31. [Google Scholar] [CrossRef]
Wu, D.; Zheng, Y.; Cheng, P. Co-interaction for intent recognition and slot filling with global–local-spatial–temporal feature fusion. Multimed. Syst. 2025, 31, 104. [Google Scholar] [CrossRef]
Xingping, W.; Xusheng, L.; Yeteng, A. Key technologies of artificial intelligence in electric power customer service. Glob. Energy Interconnect. 2021, 4, 631–640. [Google Scholar] [CrossRef]
Zhang, W.; Gao, Y.; Xu, Z.; Wang, L.; Ji, S.; Zhang, X.; Yuan, G. Research on Co-Interactive Model Based on Knowledge Graph for Intent Detection and Slot Filling. Appl. Sci. 2025, 15, 547. [Google Scholar] [CrossRef]
Dong, Z. Research of Big Data Information Mining and Analysis: Technology Based on Hadoop Technology. In Proceedings of the 2022 International Conference om Big Data, Information and Computer Network (BDICN), Sanya, China, 20–22 January 2022; pp. 173–176. [Google Scholar]
Qiu, L.; Chen, Y.; Jia, H.; Zhang, Z. Query Intent Recognition Based on Multi-Class Features. IEEE Access 2018, 6, 52195–52204. [Google Scholar] [CrossRef]
Fan, Y.; Kang, R.; Huang, W.; Li, L. Research on Medical Text Parsing Method Based on BiGRU-BiLSTM Multi-Task Learning. Appl. Sci. 2024, 14, 10028. [Google Scholar] [CrossRef]
Hao, X.; Wang, L.; Zhu, H.; Guo, X. Joint agricultural intent detection and slot filling based on enhanced heterogeneous attention mechanism. Comput. Electron. Agric. 2023, 207, 107756. [Google Scholar] [CrossRef]
Guo, Y.; Xie, Z.; Chen, X.; Chen, H.; Wang, L.; Du, H.; Wei, S.; Zhao, Y.; Li, Q.; Wu, G. ESIE-BERT: Enriching sub-words information explicitly with BERT for intent classification and slot filling. Neurocomputing 2024, 591, 127725. [Google Scholar] [CrossRef]
Li, R.; Ren, G.; Yan, J.; Zou, B.; Liu, Q. Intelligent question answering system for traditional Chinese medicine based on BSG deep learning model: Taking prescription and Chinese material medica as examples. Digit. Chin. Med. 2024, 7, 47–55. [Google Scholar] [CrossRef]
Lu, W.; Ming, D.; Mao, X.; Wang, J.; Zhao, Z.; Cheng, Y. A DeBERTa-Based Semantic Conversion Model for Spatiotemporal Questions in Natural Language. Appl. Sci. 2025, 15, 1073. [Google Scholar] [CrossRef]
Raikwar, S.; Mayuri, R.V.A. Self-attention-based 1DCNN model for multiclass EEG emotion classification. J. Supercomput. 2025, 81, 520. [Google Scholar] [CrossRef]
Yang, Y.; Ren, G. HanLP-Based Technology Function Matrix Construction on Chinese Process Patents. Int. J. Mob. Comput. Multimed. Commun. (IJMCMC) 2020, 11, 48–64. [Google Scholar] [CrossRef]

Figure 1. Model structure diagram.

Figure 2. BERT model structure diagram.

Figure 3. Architecture Diagram of the TextCNN Model.

Figure 4. Data Acquisition Flow Diagram.

Figure 5. Loss Convergence Curve. (a) Train-Loss. (b) Valid-Loss.

Figure 6. Experimental comparison. (a) Ablation Experiment. (b) Hyperparameter Experiment.

Table 1. Intent Category Distribution.

Intent Category	Number of Samples	Intent Category	Number of Samples
Period	252	Symptom	646
State	457	Path	528
Condition	962	Period	400
Operate	1914	Method	691
Disease	725	Humid	691
Pest	489	Temperature	286

Table 2. Slot Information Table for Melon Cultivation.

Number	Category	English Abbreviation
1	Period	nm
2	Growth state	ns
3	Growth conditions	nc
4	Operation	no
5	Disease	nnt
6	Disease symptoms	nns
7	Onset conditions	nt
8	Control methods	nnm
9	Soil moisture	nh
10	Soil temperature	nt

Table 3. Examples of slot labeling after HanLP part-of-speech annotation.

Query	Slot Label
What is the state during the fruiting stage?	What/r is/v the state/ns during/v the fruiting stage/nm
How to treat downy mildew?	How/r to/v treat/v downy mildew/nnt?
Powdery mildew causes the appearance of white, powdery fungal layers.	Powdery mildew/nnt causes/v the appearance of white, powdery fungal layers/nns.

Table 4. Intent Categories.

Number	Category	Number	Category
1	Period	7	Symptom
2	State	8	Path
3	Condition	9	DPeriod
4	Operate	10	Method
5	Disease	11	Humid
6	Pest	12	Temperature

Table 5. Operating Environment.

Experimental Environment	Model and Version
Operating System	Windows 11
Deep Learning Framework	Keras 2.3
Programming Language	Python 3.6
GPU-accelerated library	CUDA 10.1
Central Processing Unit	Intel Core i7-13700H 2.40GHz
Graphics Processing Unit	NVIDIA GeForce RTX 4060 Ti 24GB

Table 6. Model Parameters.

Parameter Name	Parameter Value
Batch_size	16
Optimizer	Adam
Learning_rate	0.00001
dropout	0.1

Table 7. Model Performance Comparison.

Algorithm	Accuracy/%	Recall/%	F1 Value/%	F1 Value CI/%
TextRNN	62.4	55.5	58.0	[56.65, 59.35]
TextRCNN	64.7	54.0	57.3	[56.07, 58.53]
TextCNN	62.1	56.3	58.5	[57.35, 59.65]
Bert-TextCNN	72.3	73.4	72.8	[72.5, 73.1]
H-Bert-TextCNN-AT	79.6	80.1	79.8	[79.6, 80.0]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, M.; Wu, S.; Tian, W.; Lei, S.; Miao, J. Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling. Appl. Sci. 2025, 15, 10932. https://doi.org/10.3390/app152010932

AMA Style

Liu M, Wu S, Tian W, Lei S, Miao J. Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling. Applied Sciences. 2025; 15(20):10932. https://doi.org/10.3390/app152010932

Chicago/Turabian Style

Liu, Mingtang, Shanshan Wu, Wenlong Tian, Shuo Lei, and Jiahao Miao. 2025. "Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling" Applied Sciences 15, no. 20: 10932. https://doi.org/10.3390/app152010932

APA Style

Liu, M., Wu, S., Tian, W., Lei, S., & Miao, J. (2025). Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling. Applied Sciences, 15(20), 10932. https://doi.org/10.3390/app152010932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Agricultural Knowledge-Enhanced Deep Learning for Joint Intent Detection and Slot Filling

Abstract

1. Introduction

2. Materials and Methods

2.1. HanLP Word Segmentation

2.2. Bert Model

2.3. TextCNN Model

2.4. Attention Mechanism

3. Data Source and Data Construction

3.1. Data Source

3.2. HanLP Slot Filling

3.3. Intent Recognition

4. Experimental Analysis

4.1. Experimental Environment

4.2. Evaluation Metrics

4.3. Model Comparison

5. Discussion

5.1. Performance Analysis and Model Contribution

5.2. Comparison with Existing Approaches

5.3. Limitations and Challenges

5.4. Implications for Agricultural AI Systems

5.5. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI