Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects

Xie, Jiaxing; Lu, Meiyi; Gao, Qunpeng; Chen, Liye; Zou, Yingxin; Wu, Jiatao; Cao, Yue; Xu, Niechong; Wang, Weixing; Li, Jun

doi:10.3390/agronomy15061416

Open AccessReview

Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects

by

Jiaxing Xie

^1,2,3,

Meiyi Lu

¹,

Qunpeng Gao

¹,

Liye Chen

¹,

Yingxin Zou

¹,

Jiatao Wu

¹,

Yue Cao

¹,

Niechong Xu

¹,

Weixing Wang

⁴

and

Jun Li

^1,5,*

¹

State Key Laboratory of Agricultural Equipment Technology, South China Agricultural University, Guangzhou 510642, China

²

College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China

³

Engineering Research Center for Monitoring Agricultural information of Guangdong Province, Guangzhou 510642, China

⁴

Zhujiang College, South China Agricultural University, Guangzhou 510900, China

⁵

College of Engineering, South China Agricultural University, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(6), 1416; https://doi.org/10.3390/agronomy15061416

Submission received: 29 April 2025 / Revised: 2 June 2025 / Accepted: 6 June 2025 / Published: 9 June 2025

(This article belongs to the Special Issue Smart Pest Control for Building Farm Resilience)

Download

Browse Figures

Versions Notes

Abstract

Against the backdrop of a growing global population and intensifying climate change, crop pests and diseases have become significant challenges affecting agricultural production and food security. Efficient and precise detection and control of crop pests and diseases are crucial for ensuring yield and quality, reducing agricultural losses, and promoting sustainable agriculture. In recent years, intelligent diagnostic methods based on machine learning and deep learning have advanced rapidly, providing new technological means for the early detection and management of crop pests and diseases. Meanwhile, large language models have demonstrated potential advantages in information integration and knowledge inference, offering prospects for more scientific and efficient decision support in pest and disease control. This paper reviews the research progress in the application of machine learning, deep learning, and large language models in crop pest and disease detection and control, analyzes the challenges in current technological implementations, and explores future development directions.

Keywords:

crop pests and diseases detection; pest and disease control; machine learning; deep learning; large language models

1. Introduction

Since the mid-20th century, the global population has experienced rapid growth, increasing from approximately 2.5 billion in 1950 to 7.8 billion in 2020. It is projected to reach 8.5 billion by 2030, 9.7 billion by 2050, and 10.9 billion by 2100 [1]. The rapid growth of the global population implies a continuous increase in global food demand over the coming decades, placing greater demands on agricultural production [2]. However, crop pests and disease infestations have emerged as major constraints to improving both yield and quality. These threats not only cause direct reductions in crop productivity and quality but also pose a serious challenge to global food security. A study documented the impact of 137 pathogens and pests on five major staple crops across global hotspots, revealing that regions with rapid population growth and food shortages suffer the most severe agricultural losses [3]. The economic burden of crop pests and diseases is substantial, with an estimated 20% to 40% of global crop yields lost annually to pest infestations, translating to approximately $220 billion in economic losses worldwide [4]. With the intensification of climate change and the expansion of global trade, the geographical spread and frequency of pest and disease outbreaks are increasing, presenting an unprecedented challenge to global food security.

Pests and diseases are a serious threat to crop diversity. The whitefly, a global polyphagous pest, severely impacts agricultural productivity, especially in solanaceous, cucurbitaceous, and leguminous plants [5]. When epidemics occur, farmers reduce the planting area or abandon susceptible varieties, turning to resistant but less diverse crops, which affects local crop diversity. As for diseases, wheat rust is a key fungal disease that often leads to wheat yield losses [6]. To control it, many monocropped wheat varieties have been infected, suffering from decreased yields or even total losses, which has impacted wheat cultivar diversity. Overall, pests and diseases undermine agricultural sustainability, so effective control measures are urgently needed.

Detection is a critical step in pest and disease control [7]. Traditional detection methods primarily rely on manual field inspection, but this approach has significant limitations. On one hand, manual inspection is inefficient and struggles to achieve comprehensive coverage of large-scale farmland. On the other hand, accurate identification of crop pests and diseases requires specialized expertise, yet there is a widespread shortage of agricultural technical personnel at the grassroots level, further constraining the accuracy and timeliness of detection [8]. Chemical control remains one of the most common methods for managing pests and diseases, with farmers often relying on personal experience to determine whether to apply pesticides, as well as the timing and dosage of application. However, excessive reliance on experience can lead to pesticide overuse or misuse, resulting in environmental pollution and contributing to the development of pest resistance, ultimately compromising the effectiveness of pest control measures [9].

To maintain high-quality and efficient agricultural production while rationally utilizing existing resources, automated crop pest and disease detection methods [10] and more precise, scientifically grounded control strategies are essential. Driven by advances in information technology, machine learning, a pivotal domain within artificial intelligence, utilizes data-trained models for task-specific applications [11]. This characteristic grants machine learning significant advantages in the agricultural intelligence revolution, and several studies have documented its applications in the agricultural sector [12,13,14,15].

As an advanced branch of machine learning, deep learning overcomes the limitations of traditional machine learning algorithms through architectures such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These models enable the processing of more complex tasks and exhibit superior feature learning capabilities [16]. Deep learning has achieved breakthrough applications in complex agricultural scenarios. For instance, to address the challenge of detecting small-scale pests and diseases on litchi leaves, researchers proposed a real-time and precise identification method based on an improved fully convolutional one-stage object detection network (FCOS-FL) [17]. Additionally, for real-time and accurate detection of apple leaf diseases in natural environments, an improved lightweight deep learning model, YOLOX-ASSANano, was successfully developed [18]. Moreover, a pest and disease detection algorithm based on an enhanced Mask R-CNN model has been introduced, significantly improving the identification and counting of insects on yellow sticky traps in the field [19].

Notably, large language models (LLMs) have achieved remarkable progress in natural language processing in recent years [20]. In the domain of pest and disease detection, LLMs also demonstrate significant potential in information processing, data integration, and intelligent decision-making. For example, Zhao et al. [21] integrated LLMs, agricultural knowledge graphs, and graph neural networks to develop a detection system for Elaeagnus angustifolia disease. Similarly, Zhang [22] proposed a Chinese large language model named IPM-AgriGPT, specifically designed for agricultural pest and disease management. By leveraging the G-EA framework and Agricultural Contextual Reasoning Chain-of-Thought Distillation (ACR-CoTD), the model achieved outstanding performance in terms of specialization, safety, and effectiveness.

This paper comprehensively reviews three key technologies (machine learning, deep learning, and large language models), analyzing their applications in crop pest and disease detection and control. Furthermore, it delves into the major technical challenges currently faced in this field and explores potential future research directions.

2. Classic Machine Learning

Machine learning is a data-driven learning approach that extracts underlying patterns from existing data to make predictions or decisions [23]. Based on task types, machine learning is primarily categorized into two main types: supervised learning and unsupervised learning [24].

Supervised learning fundamentally aims to establish a mapping relationship between input features and target variables using labeled training datasets. Specifically, it requires training data to contain pairs of input samples and corresponding labels. The learning process optimizes model parameters by minimizing a loss function, thereby constructing a mapping function that accurately represents the relationship between inputs and outputs [25]. This function must possess generalization capability, ensuring reliable performance on unseen data. Common supervised learning algorithms include decision trees, support vector machines, and random forests [26].

Unsupervised learning is another major category of machine learning. Unlike supervised learning, its objective is to uncover underlying structures or patterns from unlabeled data without relying on predefined labels [27]. Although the application of unsupervised learning in crop pest and disease detection is relatively limited, it still holds significant value in specific scenarios. Classic unsupervised learning algorithms include K-means clustering, among others. A comparative analysis of supervised and unsupervised learning characteristics is summarized in Table 1.

In addition to traditional supervised and unsupervised learning, reinforcement learning (RL) stands as another crucial paradigm in machine learning. In the subsequent sections, we will delve into the principles of these four algorithms from supervised and unsupervised learning, as well as reinforcement learning, and explore their practical applications in pest and disease detection and control. These discussions will enhance our understanding of how each algorithm functions in real-world scenarios.

2.1. Decision Tree

Decision tree is a widely used supervised learning algorithm for classification and regression tasks. Its fundamental idea is to recursively partition the dataset using a tree-like structure to construct the model. In a decision tree, internal nodes represent conditions for data splitting, branches indicate test outcomes, and leaf nodes correspond to the final decision results [28]. The core of the algorithm lies in selecting the optimal feature for data partitioning to maximize the purity of the subsets after each split, thereby improving the model’s predictive accuracy. Through this approach, decision trees can effectively capture patterns and relationships within the data, enabling reasonable predictions or classifications.

Quinlan proposed the ID3 algorithm, one of the earliest decision tree algorithms, which selects features based on the information gain criterion. Building on ID3, the C4.5 algorithm introduced the concept of gain ratio, addressing biases in feature selection and extending the algorithm to handle continuous features and missing values. The Classification and Regression Tree (CART) algorithm selects features using the Gini index and is primarily used to construct binary trees. The Chi-square Automatic Interaction Detection (CHAID) algorithm applies the chi-square test as a feature selection criterion, making it suitable for handling categorical variables and automatically detecting interactions between variables. These four algorithms have distinct characteristics in tree construction and pruning strategies. A detailed comparison is provided in Table 2.

Satrio Dewanto and Jonathan Lukas [33] developed an expert system based on a decision tree for diagnosing pest and disease issues in fruit plants. The system stores expert knowledge in a rule base and uses backward reasoning to derive conclusions based on symptoms. This system effectively assists users in identifying pest and disease types, particularly for users without specialized knowledge. Similar studies [34,35] have demonstrated the powerful potential of decision tree algorithms in agricultural pest and disease identification. By combining expert knowledge with data-driven approaches, decision trees can provide scientific support for agricultural pest and disease management.

2.2. Support Vector Machine

Support Vector Machine (SVM) was introduced by Vapnik and Chervonenkis in 2015 [36], establishing a solid theoretical foundation for the method. Later, Vapnik and colleagues further developed the approach by introducing kernel tricks, enabling SVM to effectively handle nonlinear classification problems. In classification tasks, support vectors refer to the data points located near the boundary, which play a decisive role in determining the hyperplane. The goal of SVM is to select a hyperplane that maximizes the margin between the two classes, which enhances classification accuracy and generalization ability [37]. For linearly inseparable data, SVM uses kernel functions to map the data into a higher-dimensional space, where a hyperplane that effectively separates the data can be found, thus achieving nonlinear classification.

Mokhtar et al. [38] proposed an SVM-based method for detecting tomato leaf diseases. The study extracted texture and color features from leaf images and used SVM to classify the health status and disease type of tomato leaves. The experimental results showed that SVM performed excellently in detecting tomato leaf diseases, accurately distinguishing healthy leaves from diseased ones, and achieving high classification accuracy for various types of diseases. Additionally, Ebrahimi et al. [39] also used SVM for vision-based strawberry pest detection. They captured pest images and extracted their morphological features, applying SVM for classification and recognition. The study demonstrated that the system could detect target pests with less than 2.5% error, highlighting SVM’s ability to achieve high-precision classification and its robustness and generalization ability in pest detection.

2.3. Random Forest

Random forest is a machine learning algorithm based on ensemble learning, proposed by Breiman in 2017 [40]. The core mechanism of this algorithm involves constructing multiple decision trees and aggregating their prediction results to improve the model’s generalization ability and accuracy. In the specific implementation process, random subsets of samples are drawn from the original dataset with replacement to provide different training data for each decision tree. When constructing each decision tree, only a random subset of features is selected for node splitting, reducing feature correlation and enhancing model robustness. Finally, the predictions of all decision trees are combined through majority voting or averaging, forming a comprehensive conclusion. This approach effectively reduces the risk of over-fitting and improves the model’s performance on unseen data.

In the study of rapeseed pest image recognition, Zhu et al. [41] used random forests to classify the morphological features of pests after preprocessing, achieving an accuracy of 89.6%. This demonstrates the adaptability of random forests to unstructured image data, especially in field scenarios with uneven lighting or complex backgrounds, where it maintains high robustness. Sangeetha et al. [42] combined convolutional neural networks with random forests for banana leaf pest and disease detection. This method automatically learns the spatial distribution features of lesions through convolutional layers and then inputs the results into random forests for classification, significantly improving fine-grained recognition capability. Resti et al. [43] proposed a Bootstrap-Aggregating-based random forest model for classifying corn pests and diseases. This study improved the sampling method of random forests, enhancing the model’s generalization ability and stability.

2.4. K-Means Clustering

K-means is a classic unsupervised learning clustering method. Its basic principle is to iteratively optimize the process of dividing a dataset into K clusters, ensuring that the data points within each cluster are as similar as possible, while the similarity between data points from different clusters is minimized [44]. Specifically, the basic process of K-means includes the following key steps: First, K data points are randomly selected from the dataset as initial centroids, which will serve as the initial centers of the clusters. Then, the distance between each data point and the K centroids is calculated, and each data point is assigned to the cluster whose centroid is the closest. After all data points are assigned, the centroids of each cluster are updated by calculating the mean of all the data points within the cluster and using this mean as the new cluster center. Next, the algorithm checks whether the centroids have changed. If the changes in all centroids are smaller than a set threshold or the maximum number of iterations is reached, the algorithm terminates. Otherwise, the process is repeated until convergence. Finally, the algorithm outputs the K clusters and their corresponding centroid coordinates.

K-means has been widely used in agricultural pest and disease detection and classification. Faithpraise et al. [45] proposed an automatic plant pest detection and identification method based on the K-means clustering algorithm and corresponding filters. By extracting features from plant leaf images and applying K-means clustering, this method achieves efficient and accurate pest identification, reducing the time and errors associated with manual detection and improving detection efficiency. Additionally, K-means has also been applied to analyze the spectral data of crop pests and diseases. Ji et al. [46] studied a clustering method for crop pest and disease spectra, using K-means to classify the spectra of different types of pests and diseases. Experimental results indicate that the K-means algorithm effectively identifies different types of pests and diseases, providing technical support for precision agriculture.

2.5. RL

RL is a learning paradigm that learns optimal behavior strategies through the interaction between an agent and the environment [47]. Unlike traditional supervised learning and unsupervised learning, reinforcement learning does not rely on pre-labeled data but obtains rewards or penalties through the actions of the agent in the environment and continuously adjusts the behavior strategy based on this feedback to maximize the cumulative rewards.

In agricultural pest detection and management, reinforcement learning has unique advantages. Lu et al. [48] proposed an innovative dual-mode gray wolf optimization algorithm based on reinforcement learning to optimize the hyperparameters of convolutional neural networks to improve the accuracy of pest identification. Experimental results show that the proposed algorithm significantly improves the recognition accuracy of the original CNN model, achieving a maximum accuracy of 95.83% on the pest dataset and a maximum accuracy of 96.51% on the corn disease dataset; Fu et al. [49] proposed an agricultural drone path planning method based on improved deep reinforcement learning. This method integrates the bidirectional Long Short-Term Memory structure with the deep Q network (DQN) algorithm to develop a new BL-DQN algorithm and design a complete agricultural drone pest control path planning framework. Through simulation experiments, the BL-DQN algorithm has improved the coverage performance by 41.68% compared with the traditional DQN algorithm, and the repeated coverage rate is 5.56%, which is lower than the 9.78% of the DQN algorithm and the 31.29% of the depth-first search (DFS) algorithm.

3. Deep Learning

Deep learning is a significant branch of machine learning that focuses on using multi-layer neural networks to solve complex problems [16]. The core idea of deep learning is to simulate the structure and function of the human brain’s neural networks, enabling computers to automatically extract features from massive datasets and learn from them. This allows deep learning to effectively handle complex tasks that traditional machine learning methods struggle with [50]. CNNs have achieved remarkable success in image processing. By leveraging convolution and pooling operations, CNNs effectively extract spatial features from images, making them widely used as foundational models. Meanwhile, RNNs are particularly well-suited for sequential data processing due to their unique network architecture.

In the following sections, we will analyze the fundamental principles of CNNs and RNNs, systematically exploring classic pest and disease detection algorithms and their applications in crop pest and disease identification. For CNNs, we will categorize tasks into three main types: image classification, object detection, and image segmentation. We will introduce two representative algorithms for each category to demonstrate different applications of CNNs in pest and disease identification. For RNNs, we will focus on Long Short-Term Memory (LSTM) models, analyzing their advantages in sequential data modeling and their practical applications in pest and disease detection.

3.1. CNN

CNNs are one of the most fundamental deep learning architectures, primarily designed for processing data with a grid-like topology, such as images. The core components of CNNs include convolutional layers, pooling layers, and fully connected layers. The convolutional layer serves as the backbone of CNNs, performing convolution operations on input data. This operation involves a set of filters or convolutional kernels that slide over the input, computing weighted sums over local regions to generate feature maps. The pooling layer, typically placed after convolutional layers, is used to reduce the spatial dimensions of feature maps while preserving critical information. Pooling operations, such as max pooling or average pooling, downsample feature maps by selecting the maximum or average value from local regions. This process reduces computational complexity and prevents overfitting. The fully connected layer connects each neuron to all neurons in the previous layer, integrating extracted features and transforming them into final outputs through linear operations [51,52]. The overall CNN processing is illustrated in Figure 1.

Building upon the foundational CNN architecture, various advanced network structures have been developed to address the increasing complexity of tasks in image classification, object detection, and segmentation. These enhancements often involve integrating novel techniques or architectural designs, enabling models to better capture intricate patterns and contextual information inherent in complex datasets.

3.1.1. Image Classification Algorithms

AlexNet

AlexNet is a classic deep convolutional neural network architecture proposed by Krizhevsky in 2012 [53]. It consists of an eight-layer structure, including five convolutional layers followed by three fully connected layers. In this architecture, the convolutional layers primarily perform feature extraction, capturing local patterns such as edges and textures while providing a certain degree of spatial invariance, ensuring robustness to geometric transformations like translation and scaling. The fully connected layers, located at the end of the network, map the feature vectors extracted by the convolutional layers to the output space for final classification decisions. A key innovation of AlexNet lies in its use of the ReLU activation function and dropout technique, which accelerate training and mitigate overfitting. Additionally, AlexNet leverages GPU acceleration to significantly enhance training efficiency and employs data augmentation techniques to improve model generalization.

Morankar et al. [54] applied AlexNet for the identification of crop pests and diseases. By extracting texture, color, and shape features from images, they successfully achieved high-precision detection of various plant pests and diseases. In their study, AlexNet attained an accuracy of 92.5% on the test set, significantly outperforming traditional machine learning methods. This result demonstrates that AlexNet can effectively handle the complex backgrounds and diversity present in agricultural images, providing strong support for early disease diagnosis. Furthermore, Qiu et al. [55] proposed an improved version of AlexNet for tomato leaf disease identification. They optimized the network architecture by adjusting the number of filters in the convolutional layers and the number of neurons in the fully connected layers, making the model more suitable for processing tomato leaf disease images. The enhanced AlexNet model achieved an accuracy of 95.3% in tomato leaf disease classification and exhibited greater robustness in distinguishing different disease types.

Visual Geometry Group (VGG)

VGG was proposed by the Visual Geometry Group at the University of Oxford [56]. Compared to earlier architectures, VGG replaces large convolutional kernels with multiple small

3 \times 3

kernels, reducing the number of parameters while enhancing the model’s representational capacity and classification performance. For instance, a combination of three

3 \times 3

convolutional kernels achieves an equivalent receptive field to a

7 \times 7

kernel but with fewer parameters and lower computational cost. Additionally, VGG networks typically consist of multiple convolutional layers, pooling layers, and fully connected layers, where max pooling is employed for downsampling between layers to progressively extract high-level image features. This deep architecture enables VGG to achieve outstanding performance in image classification tasks.

Paymode and Malode [57] utilized the VGG16 model for crop leaf disease classification, applying transfer learning to adapt the pre-trained VGG16 model for grape and tomato leaf disease detection, achieving high accuracies of 98.40% and 95.71%, respectively. Another study proposed an automatic classification method for tobacco leaf pests and diseases based on VGG16 with transfer learning [58]. By optimizing trainable parameters and selecting appropriate optimizers, the model achieved high accuracy in disease detection. Furthermore, Ye et al. [59] enhanced the VGG architecture by optimizing the number of fully connected layers and replacing the original SoftMax classifier, improving the recognition accuracy of vegetable pest images. Experimental results demonstrated that the improved VGG16 and VGG19 models achieved test set accuracies of 99.90% and 99.99%, respectively.

3.1.2. Object Detection Algorithms

R-CNN Series

The Region-based Convolutional Neural Network (R-CNN) series represents a classic two-stage object detection framework. These algorithms first extract a set of candidate regions from the input image using a specific region proposal strategy. Then, based on the extracted features, a classifier is applied to each candidate region to determine whether it contains an object and to classify it accordingly. As the pioneering model in this series, R-CNN generates candidate regions using selective search and independently performs convolutional feature extraction and classification for each region [60]. However, its training process is computationally expensive and inefficient. Fast R-CNN improves upon this by introducing a Region of Interest (RoI) pooling layer, enabling the entire image’s feature map to be shared across all candidate regions [61], significantly enhancing computational efficiency. Faster R-CNN further advances the framework by incorporating a Region Proposal Network (RPN), which integrates region proposal generation and object detection into an end-to-end trainable model [62]. This innovation achieves a breakthrough in both speed and accuracy. A detailed comparison of the three algorithms is presented in Table 3.

The original R-CNN algorithm, due to its low computational efficiency, has fewer applications in crop pest and disease detection, with research leaning more towards its improved versions. For instance, Patel and Bhatt [63] optimized the Faster R-CNN model using data augmentation strategies, achieving an average precision of 89.7% in pest detection tasks, demonstrating its adaptability to complex field environments. In tomato crops, Faster R-CNN was employed to detect and classify insects on sticky traps, including the tomato whitefly and its natural predator, the predatory stink bug [64]. In this study, the model was trained using images captured with a camera under controlled lighting conditions and tested with images captured in uncontrolled environments using a smartphone camera. The results showed that Faster R-CNN achieved a classification accuracy of 87.4% on the validation set. This indicates that Faster R-CNN can effectively handle complex background images and accurately identify and count insects, providing an efficient and accurate tool for pest and disease monitoring.

You Only Look Once (YOLO) Series

The YOLO series, a benchmark algorithm for single-stage object detection, discards the region proposal mechanism used in traditional two-stage detectors. Instead, it directly predicts the bounding box coordinates and class probabilities through a unified network, significantly reducing computational redundancy. Since its introduction in 2015 [65], the YOLO algorithm has seen continuous iterations that have greatly enhanced its detection performance. With the release of YOLOv4, the series established a three-stage architecture paradigm, consisting of the Backbone, Neck, and Head components in a complete detection framework [66].

The Backbone is typically based on convolutional neural networks and is responsible for extracting hierarchical features at multiple scales from the input image. Shallow layers primarily capture low-level features such as edges, textures, and simple patterns, whereas deeper layers progressively extract higher-level features, including object parts, shapes, and semantic representations. The Neck, as an intermediate component, aggregates and optimizes the features extracted by the Backbone using mechanisms such as Feature Pyramid Networks (FPN) or Path Aggregation Networks (PAN), enhancing spatial and semantic information at different scales. The Head structure receives and processes the features provided by the Neck to generate predictions for each object candidate region. These predictions include the object’s position, class, and other attributes. After generating the predictions, post-processing steps, such as Non-Maximum Suppression (NMS), are typically applied to filter out overlapping predictions and retain the most confident detection results [67]. The YOLO object detection network architecture is shown in Figure 2.

Liu and Wang [68] proposed an improved YOLOv3 model for detecting tomato diseases and pests. By incorporating multi-scale feature detection based on an image pyramid and optimizing anchor box sizes using the K-means clustering algorithm, the model achieved a detection accuracy of 92.39% with an inference time of only 20.39 milliseconds, significantly outperforming traditional methods. Sun et al. [69] further optimized the YOLOv8 model by introducing a progressive feature pyramid network and the SimAM attention mechanism, which substantially enhanced the model’s detection performance in complex environments. When applied to tobacco pest detection, the improved YOLOv8 model reduced the number of parameters by 52.66% while increasing detection accuracy (mAP@0.5) by 1%.

Detection Transformer (DETR)

DETR is an end-to-end object detection model based on the Transformer architecture. DETR leverages the self-attention mechanism to capture global features and contextual information within an image, allowing for precise object localization and classification. Specifically, DETR consists of a CNN backbone, an encoder, a decoder, and a feedforward network layer [70]. The CNN backbone is responsible for extracting low-resolution feature maps from the input image; the encoder processes the global information of the feature map using a multi-head self-attention mechanism and incorporates positional encoding to enhance spatial awareness; the decoder decodes object information through self-attention and encoder–decoder attention mechanisms to generate object representations; finally, the prediction layer, via a multi-layer perceptron, predicts the class and bounding box of each object, and target matching is performed using Hungarian loss, achieving end-to-end object detection. This Transformer-based architecture not only handles complex image features but also directly outputs detection results without the need for anchor boxes and non-maximum suppression, significantly improving detection efficiency and accuracy.

The DS-DETR model proposed by Wu et al. [71] significantly enhances tomato leaf disease segmentation performance by introducing unsupervised pre-training, spatial modulation collaborative attention, and improved relative positional encoding. This model not only outperforms existing state-of-the-art methods in segmentation accuracy but also enables precise assessment of disease severity by calculating the ratio of lesion area to leaf area, providing strong technical support for early disease diagnosis and control. Similarly, the Skip-DETR model developed by Liu et al. [72] effectively improves DETR’s performance in small object detection by incorporating skip connections and spatial pyramid pooling layers, significantly enhancing detection accuracy in forestry pest datasets and offering a new solution for early monitoring and precision management of agricultural pests and diseases. The successful applications of these improved models demonstrate that the DETR architecture has strong adaptability and scalability for agricultural pest and disease detection tasks, effectively addressing challenges in small object detection and disease segmentation in complex backgrounds and providing new ideas and methods for intelligent monitoring and control of agricultural pests and diseases.

3.1.3. Image Segmentation Algorithms

Fully Convolutional Networks (FCNs)

FCNs are deep learning models used for image semantic segmentation. The core idea is to replace the fully connected layers in traditional convolutional neural networks with convolutional layers, enabling pixel-wise classification of the input image [73]. This model can handle input images of any size and uses transposed convolutional layers to upsample the feature maps generated by the last convolutional layer, restoring them to the size of the input image. During this process, the FCN generates prediction results for each pixel while preserving the spatial information of the input image. The network structure of a FCN consists mainly of convolutional layers and pooling layers in the first half, responsible for extracting multi-level features of the image and gradually reducing the spatial resolution. The second half performs upsampling and feature fusion through transposed convolutional layers and skip connections, gradually restoring the spatial resolution of the feature maps and combining high-level semantic information from low-resolution layers with low-level details from high-resolution layers. The final output is a segmentation map of the same size as the input image. The architecture of a FCN is shown in Figure 3.

The application of Fully Convolutional Networks (FCNs) in agricultural pest and disease detection primarily focuses on the segmentation of diseased crop leaf images and pest detection. FCNs enable precise segmentation of diseased regions, laying the foundation for subsequent disease classification and severity assessment. For example, Gong et al. [74] proposed a rice pest detection method based on the FCN and DenseNet framework. By incorporating an encoder–decoder structure and a Conditional Random Field (CRF) module, their approach significantly improved both segmentation accuracy and classification performance of pest images. Experimental results demonstrated an identification accuracy of 98.28%, outperforming traditional models. Similarly, Wang et al. [75] introduced an improved Fully Convolutional Network model for the segmentation of crop disease leaf images. This model consists of an encoder network and a decoder network, where the encoder network is an optimized version of the conventional VGG-16 architecture. The decoder network mirrors the encoder and primarily performs deconvolution operations on the pooling layers of the encoder to restore the output features, thereby enhancing segmentation performance.

Mask R-CNN

Mask R-CNN is an instance segmentation algorithm based on deep learning that extends Faster R-CNN by adding a segmentation branch. It uses a Fully Convolutional Network to perform pixel-level segmentation for each candidate region, generating high-precision segmentation masks [76]. This improvement allows Mask R-CNN to not only detect objects but also accurately extract their contours, thereby enhancing the performance of instance segmentation.

Lin et al. [77] used Mask R-CNN to identify pests and diseases on sweet peppers. The results showed that the model performed excellently in both pest and disease detection and segmentation, effectively identifying the type and location of the pest and disease. Kasinathan et al. [78] applied Mask R-CNN to detect fall armyworm. Through extensive training data and an optimized network structure, the Mask R-CNN model accurately identified the position and quantity of fall armyworm in complex agricultural environments. The results showed that the model achieved a detection accuracy of 94.21% in complex backgrounds, significantly improving pest and disease detection efficiency.

3.2. RNN

RNNs use a cyclical connection mechanism, where the output at the current time step not only depends on the current input data but also on the previous state, enabling the network to capture the dynamic changes in time series. This feature makes RNNs widely used in tasks such as speech recognition, language modeling, and machine translation. However, traditional RNNs face the problem of vanishing or exploding gradients when dealing with long sequence data, which makes it difficult to effectively capture long-term dependencies during training. To overcome this limitation, LSTM networks were introduced to enhance the modeling ability of long-term dependencies through gating mechanisms.

LSTM

LSTM networks address the long-term dependency problem by introducing “cell states” and three gates to control what information can be remembered and what can be forgotten. The input gate controls whether new information can enter the memory unit, the forget gate determines whether the information in the memory unit should be forgotten, and the output gate determines how much information from the memory unit should be passed to the next time step of the network [79]. Through these gating mechanisms, LSTM is able to retain important historical information while ignoring irrelevant or noisy data during the processing of time series. As a result, LSTM networks have shown high performance and stability in tasks such as natural language processing, speech recognition, and time series prediction. The structure of an LSTM unit is shown in Figure 4.

Bidirectional Long Short-Term Memory (BiLSTM) networks further expand the capabilities of LSTM. The network structure is shown in Figure 5. A BiLSTM consists of two LSTM layers: one processes the forward time sequence, and the other processes the reverse time sequence. Specifically, the input sequence is first processed sequentially by the forward LSTM layer, which adjusts the cell state and hidden state dynamically through the forget gate, input gate, and output gate to retain key information and suppress irrelevant content. Meanwhile, the same input sequence is fed into the reverse LSTM layer in reverse order, where the cell states and hidden states are computed in the same manner but in reverse. Then, at each time step, the hidden states from both the forward and reverse LSTM layers are concatenated or fused in some other way to form the final hidden representation. This bidirectional structure enables the model to simultaneously utilize information from both the past and the future, allowing for a more comprehensive understanding of the contextual relationships in the sequence [80].

Xiao et al. [81] proposed a pest and disease prediction method based on LSTM networks. By analyzing the correlation rules between meteorological factors and cotton pest and disease outbreaks, they transformed the pest and disease prediction problem into a time series forecasting task. Experimental results showed that the LSTM model performed excellently in predicting cotton pest and disease outbreaks, with an AUC value reaching 0.97. Chen et al. [82] employed technologies such as wireless Internet of Things (IoT) to obtain real-time environmental meteorological data in the form of time series and used multi-layer LSTM and BiLSTM models to address temporal issues in meteorological forecasting. The LSTM model effectively captured long-term dependencies in weather data, predicting the occurrence and distribution of future litchi stink bugs based on historical data (including meteorological factors and pest surveys). Wahyono et al. [83] further proposed a deep LSTM-based climate anomaly model to predict crop pest and disease outbreaks. This study improved prediction accuracy by detecting anomalies in climate data and using these anomalies as features input into the LSTM model.

4. Large Language Models

Natural Language Processing (NLP) encompasses various tasks related to text understanding and generation, including text classification, entity recognition, and semantic understanding [84]. Early NLP methods were primarily based on rule-based systems and shallow statistical models. These methods performed well on small-scale datasets but faced limitations in terms of expressive power and generalization when handling large-scale corpora, making it difficult to effectively capture complex linguistic patterns and deep semantics.

4.1. Semantic Large Language Models

In 2017, Google introduced the groundbreaking Transformer architecture, with its core innovation being the self-attention mechanism [85]. This mechanism enables the dynamic computation of weighted representations based on the relationships between positions in the input sequence, allowing the model to effectively capture long-range dependencies and significantly enhance its ability to understand and process long texts. The Transformer model architecture and processing flow are shown in Figure 6.

The introduction of the Transformer architecture not only directly accelerated the development of large language models (LLMs) but also significantly improved the overall performance of natural language processing tasks. Building on this architecture, a series of groundbreaking pre-trained language models have emerged, such as Bidirectional Encoder Representations from Transformers (BERT) [86] and the Generative Pre-trained Transformer (GPT) [87] series. These models are pre-trained on large-scale text data using unsupervised learning and can be fine-tuned to adapt to specific tasks. The pre-training and fine-tuning framework offers the advantage of facilitating transfer learning across different NLP tasks, reducing the reliance on large-scale annotated data, while also enhancing the model’s generalization ability [88].

The introduction of the Transformer architecture has led to the development of various large language models. These models differ significantly in their applicability to agricultural pest and disease management and related applications due to variations in their technical approaches, training data, and optimization objectives. The performance of different models in the agricultural domain is not only influenced by their core algorithmic capabilities but also closely related to their understanding of agricultural terminology and their ability to adapt to local contexts. Table 4 compares the core advantages of current mainstream LLMs and their agricultural application scenarios.

In agricultural scenarios, LLMs, with their multimodal perception, knowledge integration, and dynamic decision-making capabilities, provide a new paradigm for solving complex problems such as pest and disease prediction, prevention, and knowledge services. LLMs have demonstrated exceptional performance in the automated synthesis of pest and disease information. Research by Scheepens et al. [95] showed that LLMs can efficiently extract key information on pest and disease control from vast amounts of academic literature and agricultural materials, integrating it into a high-quality knowledge base, providing solid scientific evidence for farmers and researchers. This capability not only accelerates the dissemination of knowledge but also enhances the scientific accuracy of agricultural decision-making.

In the field of agricultural extension services, Tzachor et al. [96] pointed out that LLMs can transform complex agricultural knowledge into easily understandable language and, with multilingual support, overcome language barriers, offering personalized and precise advice to farmers worldwide. This capability has significantly promoted the dissemination and application of agricultural technologies, particularly in resource-limited regions.

Furthermore, the integration of LLMs with emerging technologies has brought new breakthroughs to agricultural practices. The PestGPT model proposed by Yuan et al. [97] is a typical example. This model combines IoT technology to monitor field environment data and pest occurrences in real-time, generating customized pest management recommendations based on this data. This real-time, data-driven decision-making ability not only improves the efficiency of pest and disease control but also reduces pesticide usage, contributing to sustainable agricultural production.

4.2. Vision–Language Models

Current research is gradually focusing on exploring the integration of language with other modalities (such as images, videos, etc.) to advance the development of multimodal learning. Vision–language models (VLMs), as an extension of traditional language models, aim to integrate and understand visual and textual data to enhance the processing capabilities of cross-modal tasks. Early research in vision–language learning primarily focused on image–text alignment, achieving significant progress, with the introduction of vision–language pretraining models considered a major breakthrough in the field. These models are trained on large-scale image–text paired data, learning deep semantic correlations between images and text, significantly improving performance in cross-modal tasks, and providing new possibilities for vision–language interaction and understanding.

Among these models, OpenAI’s CLIP model [98] is one of the most influential vision–language pretraining models. CLIP uses contrastive learning on large-scale image–text pairs, allowing the model to learn semantic information about images from textual descriptions without explicit image labels. This ability enables CLIP to excel in tasks such as image retrieval and classification, even surpassing traditional computer vision models in some scenarios, thus laying a solid foundation for vision–language understanding and applications.

Building on this, Liu et al. [99] proposed a cross-modal unified framework that integrates image and text information, significantly enhancing the accuracy and efficiency of pest and disease identification. Models can combine field images and meteorological data to generate more accurate pest and disease predictions and control recommendations. This cross-modal integration not only improves the model’s ability to identify pests and diseases but also provides strong support for intelligent decision-making in agriculture. Wang [100] proposed an agricultural vision–language dialogue system named Agri-LLaVA, which significantly enhances the model’s performance in agricultural pest and disease detection, diagnosis, and knowledge-based Q&A by constructing a large-scale agricultural pest and disease vision–language dataset and employing a two-stage training approach with feature alignment pretraining and instruction fine-tuning. This system provides a new intelligent solution for pest and disease management in agriculture.

5. Discussion and Conclusions

In the field of pest and disease control, technologies such as machine learning, deep learning, and large language models have not only improved the accuracy and efficiency of pest and disease identification but also provided new solutions for prediction and prevention. However, despite the immense potential of these technologies, several challenges remain in their practical application, including the complexity of data acquisition, the generalization ability of models, and issues related to the promotion of these technologies. This chapter will delve into the main challenges currently faced in crop pest and disease detection and control while also exploring future development directions, with the aim of providing insights for further research in the field of pest and disease management.

5.1. Discussion

Despite significant advancements in intelligent crop pest and disease detection technologies in recent years, there remains a significant gap between theoretical research and mature practical applications. The reasons for this gap are primarily rooted in several key challenges: the difficulty in large-scale collection and annotation of high-quality data, the lack of model generalization in complex agricultural environments, and the balance between computational power requirements and energy constraints in field devices. These issues have become core bottlenecks limiting the practical implementation of these technologies.

First, machine learning and deep learning techniques predominantly rely on supervised learning, which is highly dependent on high-quality annotated data. However, the complexity, diversity, and instability of agricultural image data make the acquisition of large-scale, high-quality annotated datasets extremely challenging [101]. At the same time, the application of LLMs in the agricultural field also faces challenges related to data quality [102]. While LLMs can generate a wealth of suggestions, the lack of high-quality agricultural data can result in the generation of fictitious or inaccurate content, potentially leading to misleading decisions in practical applications. Therefore, acquiring high-quality and accurate data is an urgent issue that needs to be addressed.

Second, model generalization ability is another significant challenge in crop pest and disease detection. The distribution and occurrence patterns of pests and diseases vary by region, influenced by multiple factors such as natural environment and climate conditions [103]. This regional difference not only increases the complexity of pest and disease management but also poses a severe challenge to the generalization ability of detection models. Specifically, the types, timing, severity, and transmission patterns of pests and diseases vary significantly across regions, meaning that models trained on data from one region may exhibit high false detection rates when applied to other regions. At the same time, although LLMs have made groundbreaking progress in NLP, their generalization ability is also limited in agricultural pest and disease detection. Due to the strong regional and dynamic nature of agricultural pests and diseases, their cross-regional performance is often unsatisfactory in the absence of localized data and professional knowledge support.

Lastly, the balance between computational power and energy consumption is another critical issue that needs to be resolved in the field of crop pest and disease detection. As image resolution improves and data volume increases, the required memory capacity also continues to grow. Deep learning models rely on large-scale image data for training and inference, which puts immense pressure on GPU memory. Large language models face the same problem. With an extremely large number of parameters, training large language models requires substantial computational resources, which increases memory demand. For example, GPT-3, with 175 billion parameters, requires not only significant time for training but also extensive high-performance GPU resources [104]. Even during the optimized inference stage, the model still demands very high computational capacity, leading to high costs and considerable energy consumption when deploying large language models in real-world applications.

5.2. Conclusions

Looking at the current state of research on the application of machine learning, deep learning, and large language models in agricultural pest and disease detection and control, the integration of new technologies and concepts in agriculture will continue to deepen. In addressing the challenges of applying artificial intelligence technologies to real-world scenarios, domain adaptation and transfer learning strategies will become key to enhancing the generalization ability of models. Through transfer learning, models can learn from training data specific to one region and then transfer the acquired knowledge to other regions, automatically adjusting and optimizing their parameters to adapt to new environmental conditions. Additionally, conducting domain-adaptive training for large language models is crucial. This requires incorporating the specialized language, terminology, and background knowledge from the agricultural field, improving the model’s ability to adapt to regional variations in pest and disease detection tasks, and providing more precise solutions for the complexities of pests and diseases in different regions.

At the same time, lightweight model design will be a critical direction for addressing resource-constrained environments. To improve model efficiency and adaptability, future research can focus on continuously optimizing model architecture to meet the demands of different agricultural environments and data types. For example, using techniques such as neural architecture search, more efficient and resource-friendly model architectures can be designed automatically, reducing the dependency on memory and computational resources. For large language models, knowledge distillation techniques can be used to transfer knowledge from large, pre-trained language models to lightweight, agriculture-specific models. These models not only significantly reduce the computational and storage resource requirements but can also be customized to meet the specific needs of the agricultural domain.

Moreover, the collaborative application of multiple technologies will become an important trend in disease detection and control. By combining machine learning, deep learning, and large language models to build multimodal data fusion systems, multiple types of data, such as images, text, and sensor data, can be processed simultaneously. For example, machine learning and deep learning models can efficiently analyze and process multidimensional data, extract key features, and recognize potential pest and disease patterns, while large language models, with their strong semantic understanding and generation capabilities, can deeply analyze the output results of the fused model and extract valuable information. This will provide farmers with precise decision support for pest and disease control. This multi-technology collaborative model aims to leverage the strengths of each technology to achieve accurate perception and intelligent decision-making in complex agricultural environments.

In summary, machine learning, deep learning, and large language models analyze agricultural data for early pest and disease identification, prediction, and control, outperforming traditional methods. The integration of these technologies not only improves the scientific and precise management of pests and diseases but also drives the agriculture sector toward more intelligent and sustainable development. In the future, with the further development and integration of these technologies, they are expected to make breakthroughs from theoretical research to practical applications in more areas, providing powerful technical support for the transformation of agricultural production methods and contributing to the high-quality development of global agriculture.

Author Contributions

Conceptualization, J.X. and M.L.; formal analysis, M.L., Q.G. and L.C.; data curation and investigation, N.X., Y.Z. and Y.C.; project administration, J.X., J.L. and W.W.; funding acquisition, J.X. and J.L.; methodology and software, Q.G., L.C. and J.W.; writing—original draft, M.L.; writing—review and editing, J.X., M.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

The project was supported by the Open Fund of State Key Laboratory of Agricultural Equipment Technology, South China Agricultural University. It was also partly supported by the China Agriculture Research System of MOF and MARA, China (No. CARS-32-11); the Guangdong Provincial Special Fund for Modern Agriculture Industry Technology Innovation Teams, China (No. 2024CXTD19-11); the College Students’ Innovative Entrepreneurial Training Plan Program; and the Guangdong Science and Technology Innovation Cultivation Special Fund Project for College Students (“Climbing Program” Special Fund), China (No. pdjh2023a0074).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their critical comments and suggestions for improving the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Global Population Growth and Sustainable Development|Policy Commons. Available online: https://policycommons.net/artifacts/8983020/global-population-growth-and-sustainable-development/9868536/ (accessed on 19 March 2025).
Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food Security: The Challenge of Feeding 9 Billion People. Science 2010, 327, 812–818. [Google Scholar] [CrossRef] [PubMed]
Savary, S.; Willocquet, L.; Pethybridge, S.J.; Esker, P.; McRoberts, N.; Nelson, A. The Global Burden of Pathogens and Pests on Major Food Crops. Nat. Ecol. Evol. 2019, 3, 430–439. [Google Scholar] [CrossRef] [PubMed]
Researchers Helping Protect Crops From Pests|NIFA. Available online: https://www.nifa.usda.gov/about-nifa/blogs/researchers-helping-protect-crops-pests (accessed on 20 March 2025).
Abubakar, M.; Koul, B.; Chandrashekar, K.; Raut, A.; Yadav, D. Whitefly (Bemisia tabaci) Management (WFM) Strategies for Sustainable Agriculture: A Review. Agriculture 2022, 12, 1317. [Google Scholar] [CrossRef]
Lidwell-Durnin, J.; Lapthorn, A. The Threat to Global Food Security from Wheat Rust: Ethical and Historical Issues in Fighting Crop Diseases and Preserving Genetic Diversity. Glob. Food Secur. 2020, 26, 100446. [Google Scholar] [CrossRef]
Azfar, S.; Nadeem, A.; Alkhodre, A.B.; Ahsan, K.; Mehmood, N.; Alghmdi, T.; Alsaawy, Y. Monitoring, Detection and Control Techniques of Agriculture Pests and Diseases Using Wireless Sensor Network: A Review. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 424–433. [Google Scholar] [CrossRef]
Domingues, T.; Brandão, T.; Ferreira, J.C. Machine Learning for Detection and Prediction of Crop Diseases and Pests: A Comprehensive Survey. Agriculture 2022, 12, 1350. [Google Scholar] [CrossRef]
Rwakipamba, E.; Sseremba, G.; Byalebeka, J.; Ssekandi, J.; Mwine, J. Over Reliance on Pesticides and Poor Handling Practices Characterize Intensive Vegetable Farming: Case of Selected Smallholders in Southwestern Uganda. Preprints 2020, 1–36. [Google Scholar] [CrossRef]
Kartikeyan, P.; Shrivastava, G. Review on Emerging Trends in Detection of Plant Diseases Using Image Processing with Machine Learning. Int. J. Comput. Appl. 2021, 174, 39–48. [Google Scholar] [CrossRef]
Cornuéjols, A.; Moulet, M. Machine Learning: A Survey. In Knowledge Based Systems. Advanced Concepts, Techniques and Applications; Tzafestas, S.G., Ed.; World Scientific: Singapore, 1997; pp. 61–86. [Google Scholar]
Rajan, P.; Radhakrishnan, B.; Suresh, L.P. Detection and Classification of Pests from Crop Images Using Support Vector Machine. In Proceedings of the 2016 International Conference on Emerging Technological Trends (ICETT), Kollam, India, 21–22 October 2016; pp. 1–6. [Google Scholar]
Pattnaik, G.; Parvathi, K. Machine Learning-Based Approaches for Tomato Pest Classification. TELKOMNIKA Telecommun. Comput. Electron. Control. 2022, 20, 321–328. [Google Scholar] [CrossRef]
Reddy, D.T.K.; Ramesh, S. Identification of the Pest Detection Using Random Forest Algorithm and Support Vector Machine with Improved Accuracy. AIP Conf. Proc. 2024, 3193, 020185. [Google Scholar] [CrossRef]
Revathy, R.; Lawrance, R. Classifying Crop Pest Data Using C4.5 Algorithm. In Proceedings of the 2017 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Krishnankoil, India, 23–25 March 2017; pp. 1–6. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Xie, J.; Zhang, X.; Liu, Z.; Liao, F.; Wang, W.; Li, J. Detection of Litchi Leaf Diseases and Insect Pests Based on Improved FCOS. Agronomy 2023, 13, 1314. [Google Scholar] [CrossRef]
Liu, S.; Qiao, Y.; Li, J.; Zhang, H.; Zhang, M.; Wang, M. An Improved Lightweight Network for Real-Time Detection of Apple Leaf Diseases in Natural Scenes. Agronomy 2022, 12, 2363. [Google Scholar] [CrossRef]
Pest Identification and Counting of Yellow Plate in Field Based on Improved Mask R-CNN-Rong-2022-Discrete Dynamics in Nature and Society—Wiley Online Library. Available online: https://onlinelibrary.wiley.com/doi/full/10.1155/2022/1913577 (accessed on 20 March 2025).
Roumeliotis, K.I.; Tselikas, N.D. ChatGPT and Open-AI Models: A Preliminary Review. Future Internet 2023, 15, 192. [Google Scholar] [CrossRef]
Zhao, X.; Chen, B.; Ji, M.; Wang, X.; Yan, Y.; Zhang, J.; Liu, S.; Ye, M.; Lv, C. Implementation of Large Language Models and Agricultural Knowledge Graphs for Efficient Plant Disease Detection. Agriculture 2024, 14, 1359. [Google Scholar] [CrossRef]
Zhang, Y.; Fan, Q.; Chen, X.; Li, M.; Zhao, Z.; Li, F.; Guo, L. IPM-AgriGPT: A Large Language Model for Pest and Disease Management with a G-EA Framework and Agricultural Contextual Reasoning. Mathematics 2025, 13, 566. [Google Scholar] [CrossRef]
Mahesh, B. Machine Learning Algorithms—A Review. Int. J. Sci. Res. 2020, 9, 381–386. [Google Scholar] [CrossRef]
Sharma, R. Study of Supervised Learning and Unsupervised Learning. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 588–593. [Google Scholar] [CrossRef]
Nasteski, V. An Overview of the Supervised Machine Learning Methods. Horizons. B 2017, 4, 51–62. [Google Scholar] [CrossRef]
Singh, A.; Thakur, N.; Sharma, A. A Review of Supervised Machine Learning Algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [Google Scholar]
Naeem, S.; Ali, A.; Anam, S.; Ahmed, M. An Unsupervised Machine Learning Algorithms: Comprehensive Review. IJCDS J. 2023, 13, 911–921. [Google Scholar] [CrossRef]
Song, Y.; Lu, Y. Decision Tree Methods: Applications for Classification and Prediction. Shanghai Arch. Psychiatry 2015, 27, 130–135. [Google Scholar] [CrossRef] [PubMed]
Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Quinlan, J.R. C4.5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014; ISBN 978-0-08-050058-4. [Google Scholar]
Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: New York, NY, USA, 2017; ISBN 978-1-315-13947-0. [Google Scholar]
Kass, G.V. An Exploratory Technique for Investigating Large Quantities of Categorical Data. J. R. Stat. Soc. Ser. C Appl. Stat. 1980, 29, 119–127. [Google Scholar] [CrossRef]
Dewanto, S.; Lukas, J. Expert System For Diagnosis Pest And Disease In Fruit Plants. EPJ Web Conf. 2014, 68, 00024. [Google Scholar] [CrossRef]
Carisse, O.; Fall, M.L. Decision Trees to Forecast Risks of Strawberry Powdery Mildew Caused by Podosphaera aphanis. Agriculture 2021, 11, 29. [Google Scholar] [CrossRef]
Pratheepa, M.; Meena, K.; Subramaniam, K.R.; Venugopalan, R.; Bheemanna, H. A Decision Tree Analysis for Predicting the Occurrence of the Pest, Helicoverpa Armigera and Its Natural Enemies on Cotton Based on Economic Threshold Level. Curr. Sci. 2011, 100, 238–246. [Google Scholar]
Vapnik, V.N.; Chervonenkis, A.Y. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. In Measures of Complexity: Festschrift for Alexey Chervonenkis; Vovk, V., Papadopoulos, H., Gammerman, A., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 11–30. ISBN 978-3-319-21852-6. [Google Scholar]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A Training Algorithm for Optimal Margin Classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 1 July 1992; Association for Computing Machinery: New York, NY, USA, 1992; pp. 144–152. [Google Scholar]
Mokhtar, U.; El Bendary, N.; Hassenian, A.E.; Emary, E.; Mahmoud, M.A.; Hefny, H.; Tolba, M.F. SVM-Based Detection of Tomato Leaves Diseases. In Proceedings of the Intelligent Systems’2014; Filev, D., Jabłkowski, J., Kacprzyk, J., Krawczak, M., Popchev, I., Rutkowski, L., Sgurev, V., Sotirova, E., Szynkarczyk, P., Zadrozny, S., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 641–652. [Google Scholar]
Ebrahimi, M.A.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-Based Pest Detection Based on SVM Classification Method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Zhu, L.; Wu, M.; Wan, X.; Zhao, N.; Xiong, W. Image Recognition of Rapeseed Pests Based on Random Forest Classifier. Int. J. Inf. Technol. Web Eng. IJITWE 2017, 12, 1–10. [Google Scholar] [CrossRef]
Thirumoorthy, S.; Govindarajan, L.; Kesavan, M.; Kumar, T.R. Detection of Pest and Disease in Banana Leaf Using Convolution Random Forest. Test Eng. Manag. 2020, 83, 3727–3735. [Google Scholar]
Resti, Y.; Irsan, C.; Latif, J.F.; Yani, I.; Dewi, N.R. A Bootstrap-Aggregating in Random Forest Model for Classification of Corn Plant Diseases and Pests|Science and Technology Indonesia. Available online: https://sciencetechindonesia.com/index.php/jsti/article/view/695 (accessed on 20 March 2025).
Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Faithpraise, F.; Birch, P.; Young, R.; Obu, J.; Faithpraise, B.; Chatwin, C. Automatic Plant Pest Detection and Recognition Using k-Means Clustering Algorithm and Correspondence Filters. Int. J. Adv. Biotechnol. Res. 2013, 4, 189–199. [Google Scholar]
Xia, J.; Yang, Y.; Cao, H.; Ke, Y.; Ge, D.; Zhang, W.; Ge, S.; Chen, G. Performance Analysis of Clustering Method Based on Crop Pest Spectrum. Eng. Agric. Environ. Food 2018, 11, 84–89. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A. Reinforcement Learning: An Introduction; Adaptive computation and machine learning; Nachdruck; The MIT Press: Cambridge, MA, USA, 2014; ISBN 978-0-262-19398-6. [Google Scholar]
Lu, Y.; Yu, X.; Hu, Z.; Wang, X. Convolutional Neural Network Combined with Reinforcement Learning-Based Dual-Mode Grey Wolf Optimizer to Identify Crop Diseases and Pests. Swarm Evol. Comput. 2025, 94, 101874. [Google Scholar] [CrossRef]
Fu, H.; Li, Z.; Zhang, W.; Feng, Y.; Zhu, L.; Fang, X.; Li, J. Research on Path Planning of Agricultural UAV Based on Improved Deep Reinforcement Learning. Agronomy 2024, 14, 2669. [Google Scholar] [CrossRef]
Kim, K.G. Book Review: Deep Learning. Healthc. Inform. Res. 2016, 22, 351. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
Morankar, D.; Shinde, D.; Pawar, S.; Sabri, M. Identification of Pests and Diseases Using Alex-Net. SSRN Electron. J. 2020, 7, 53–62. [Google Scholar]
Qiu, J.; Lu, X.; Wang, X.; Chen, C.; Chen, Y.; Yang, Y. Research on Image Recognition of Tomato Leaf Diseases Based on Improved AlexNet Model. Heliyon 2024, 10. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Paymode, A.S.; Malode, V.B. Transfer Learning for Multi-Crop Leaf Disease Image Classification Using Convolutional Neural Network VGG. Artif. Intell. Agric. 2022, 6, 23–33. [Google Scholar] [CrossRef]
Swasono, D.I.; Tjandrasa, H.; Fathicah, C. Classification of Tobacco Leaf Pests Using VGG16 Transfer Learning. In Proceedings of the 2019 12th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 18 July 2019; pp. 176–181. [Google Scholar]
Ye, H.; Han, H.; Zhu, L.; Duan, Q. Vegetable Pest Image Recognition Method Based on Improved VGG Convolution Neural Network. J. Phys. Conf. Ser. 2019, 1237, 032018. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
Patel, D.; Bhatt, N. Improved Accuracy of Pest Detection Using Augmentation Approach with Faster R-CNN. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1042, 012020. [Google Scholar] [CrossRef]
Nieuwenhuizen, A.; Hemming, J.; Suh, H.K. Detection and Classification of Insects on Stick-Traps in a Tomato Crop Using Faster R-CNN. In Proceedings of the The Netherlands Conference on Computer Vision, Eindhoven, The Netherlands, 26–27 September 2018. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications|Multimedia Tools and Applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Tomato Diseases and Pests Detection Based on Improved Yolo V3 Convolutional Neural Network. Front. Plant Sci. 2020, 11, 898. [Google Scholar] [CrossRef]
Sun, D.; Zhang, K.; Zhong, H.; Xie, J.; Xue, X.; Yan, M.; Wu, W.; Li, J. Efficient Tobacco Pest Detection in Complex Environments Using an Enhanced YOLOv8 Model. Agriculture 2024, 14, 353. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Wu, J.; Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Su, H.; Yang, C. DS-DETR: A Model for Tomato Leaf Disease Segmentation and Damage Evaluation. Agronomy 2022, 12, 2023. [Google Scholar] [CrossRef]
Liu, B.; Jia, Y.; Liu, L.; Dang, Y.; Song, S. Skip DETR: End-to-End Skip Connection Model for Small Object Detection in Forestry Pest Dataset. Front. Plant Sci. 2023, 14, 1219474. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Gong, H.; Liu, T.; Luo, T.; Guo, J.; Feng, R.; Li, J.; Ma, X.; Mu, Y.; Hu, T.; Sun, Y.; et al. Based on FCN and DenseNet Framework for the Research of Rice Pest Identification Methods. Agronomy 2023, 13, 410. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Zhang, S. Segmenting Crop Disease Leaf Image by Modified Fully-Convolutional Networks. In Proceedings of the Intelligent Computing Theories and Application; Huang, D.-S., Bevilacqua, V., Premaratne, P., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 646–652. [Google Scholar]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Lin, T.-L.; Chang, H.-Y.; Chen, K.-H. The Pest and Disease Identification in the Growth of Sweet Peppers Using Faster R-CNN and Mask R-CNN. J. Internet Technol. 2020, 21, 605–614. [Google Scholar]
Kasinathan, T.; Uyyala, S.R. Detection of Fall Armyworm (Spodoptera Frugiperda) in Field Crops Based on Mask R-CNN. Signal Image Video Process. 2023, 17, 2689–2695. [Google Scholar] [CrossRef]
Survey on Research of RNN-Based Spatio-Temporal Sequence Prediction Algorithms—ProQuest. Available online: https://www.proquest.com/openview/9ebe553918e3e43d67209a82d3243534/1?cbl=4585453&pq-origsite=gscholar (accessed on 20 March 2025).
Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Xiao, Q.; Li, W.; Kai, Y.; Chen, P.; Zhang, J.; Wang, B. Occurrence Prediction of Pests and Diseases in Cotton on the Basis of Weather Factors by Long Short Term Memory Network. BMC Bioinform. 2019, 20, 688. [Google Scholar] [CrossRef]
Chen, C.-J.; Li, Y.-S.; Tai, C.-Y.; Chen, Y.-C.; Huang, Y.-M. Pest Incidence Forecasting Based on Internet of Things and Long Short-Term Memory Network. Appl. Soft Comput. 2022, 124, 108895. [Google Scholar] [CrossRef]
Wahyono, T.; Heryadi, Y.; Soeparno, H.; Abbas, B.S. Crop Pest Prediction Using Climate Anomaly Model Based on Deep-LSTM Method. ICIC Express letters. Part B Appl. Int. J. Res. Surv. 2021, 12, 395–401. [Google Scholar]
Fanni, S.C.; Febi, M.; Aghakhanyan, G.; Neri, E. Natural Language Processing. In Introduction to Artificial Intelligence; Klontzas, M.E., Fanni, S.C., Neri, E., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 87–99. ISBN 978-3-031-25928-9. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; Volume 1, (Long and Short Papers). pp. 4171–4186. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 20 March 2025).
Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv 2020, arXiv:2004.10964. [Google Scholar]
Bi, X.; Chen, D.; Chen, G.; Chen, S.; Dai, D.; Deng, C.; Ding, H.; Dong, K.; Du, Q.; Fu, Z.; et al. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv 2024, arXiv:2401.02954. [Google Scholar]
Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar]
Scheepens, D.; Millard, J.; Farrell, M.; Newbold, T. Large Language Models Help Facilitate the Automated Synthesis of Information on Potential Pest Controllers. Methods Ecol. Evol. 2024, 15, 1261–1273. [Google Scholar] [CrossRef]
Tzachor, A.; Devare, M.; Richards, C.; Pypers, P.; Ghosh, A.; Koo, J.; Johal, S.; King, B. Large Language Models and Agricultural Extension Services. Nat. Food 2023, 4, 941–948. [Google Scholar] [CrossRef]
Yuan, Z.; Liu, K.; Peng, R.; Li, S.; Leybourne, D.; Musa, N.; Huang, H.; Yang, P. PestGPT: Leveraging Large Language Models and IoT for Timely and Customized Recommendation Generation in Sustainable Pest Management. IEEE Internet Things Mag. 2025, 8, 26–33. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the Proceedings of the 38th International Conference on Machine Learning; Virtual, 8–24 July 2021, PMLR; pp. 8748–8763.
Liu, J.; Xing, J.; Zhou, G.; Wang, J.; Sun, L.; Chen, X. Transfer Large Models to Crop Pest Recognition—a Cross-Modal Unified Framework for Parameters Efficient Fine-Tuning. 2024. Available online: https://papers.ssrn.com/abstract=4999751 (accessed on 20 March 2025).
Wang, L.; Jin, T.; Yang, J.; Leonardis, A.; Wang, F.; Zheng, F. Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases. arXiv 2024, arXiv:2412.02158. [Google Scholar]
Li, J.; Chen, D.; Qi, X.; Li, Z.; Huang, Y.; Morris, D.; Tan, X. Label-Efficient Learning in Agriculture: A Comprehensive Review. Comput. Electron. Agric. 2023, 215, 108412. [Google Scholar] [CrossRef]
Sapkota, R.; Qureshi, R.; Hassan, S.Z.; Shutske, J.; Shoman, M.; Sajjad, M.; Dharejo, F.A.; Paudel, A.; Li, J.; Meng, Z.; et al. Multi-Modal LLMs in Agriculture: A Comprehensive Review. TechRxiv 2024, 1–28. [Google Scholar] [CrossRef]
Wang, C.; Wang, X.; Jin, Z.; Müller, C.; Pugh, T.A.M.; Chen, A.; Wang, T.; Huang, L.; Zhang, Y.; Li, L.X.Z.; et al. Occurrence of Crop Pests and Diseases Has Largely Increased in China since 1970. Nat. Food 2022, 3, 57–65. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2025, arXiv:2303.18223. [Google Scholar]

Figure 1. Schematic diagram of CNN processing.

Figure 2. YOLO object detection network architecture.

Figure 3. Schematic diagram of the Fully Convolutional Network architecture.

Figure 4. The structure diagram of LSTM.

Figure 5. The structure diagram of BiLSTM.

Figure 6. Transformer model architecture and processing diagram.

Table 1. Comparison of supervised learning and unsupervised learning.

Category	Supervised Learning	Unsupervised Learning
Data Labeling	Requires labeled data	No labeled data required
Task Objectives	Prediction, Classification	Discovering intrinsic data structures or patterns
Common Tasks	Classification, Regression	Clustering, Dimensionality Reduction
Classic Algorithms	Decision Trees, Support Vector Machines, Random Forests	K-means Clustering

Table 2. Characteristics of classic decision tree algorithms.

Algorithm	Feature Selection Criterion	Tree Structure	Handling Continuous Features	Pruning Strategy	Reference
ID3	Information Gain	Multi-way Tree	Manual Discretization Required	none	[29]
C4.5	Gain Ratio	Multi-way Tree	Automatic Binary Splitting	Pessimistic Error Postpruning	[30]
CART	Gini Index	Binary Tree	Automatically Finds Optimal Binary Split	Cost Complexity Postpruning	[31]
CHAID	Chi-square, F-Test	Multi-way Tree	Manual Binning and Interval Merging	Significance-based Pre-pruning	[32]

Table 3. Comparison of R-CNN, Fast R-CNN, and Faster R-CNN.

Algorithm	Region Proposal Method	Feature Extraction Strategy
R-CNN	Selective Search	Extract features independently for each region
Fast R-CNN	Selective Search	Share feature maps
Fast R-CNN	Region Proposal Network	Share feature maps

Table 4. Comparison of large LLM applications in the agricultural sector.

Model	Research Institution	Core Strengths	Potential Agricultural Applications	References
DeepSeek	DeepSeek	Strong Chinese–English reasoning capability, deep integration of agricultural knowledge	Cross-regional pest warning, precision agriculture decision support	[89]
Qwen	Alibaba	Strong Chinese adaptation, supports localized deployment	Agricultural technology dissemination, farmer training	[90]
Chat-GPT	OpenAI	Powerful multimodal reasoning, extensive knowledge base	Global agricultural knowledge integration, intelligent agricultural Q&A	[91,92]
Llama	Meta	Open-source with customizable local fine-tuning, suitable for agricultural optimization	Agricultural monitoring and customized model optimization	[93,94]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, J.; Lu, M.; Gao, Q.; Chen, L.; Zou, Y.; Wu, J.; Cao, Y.; Xu, N.; Wang, W.; Li, J. Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects. Agronomy 2025, 15, 1416. https://doi.org/10.3390/agronomy15061416

AMA Style

Xie J, Lu M, Gao Q, Chen L, Zou Y, Wu J, Cao Y, Xu N, Wang W, Li J. Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects. Agronomy. 2025; 15(6):1416. https://doi.org/10.3390/agronomy15061416

Chicago/Turabian Style

Xie, Jiaxing, Meiyi Lu, Qunpeng Gao, Liye Chen, Yingxin Zou, Jiatao Wu, Yue Cao, Niechong Xu, Weixing Wang, and Jun Li. 2025. "Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects" Agronomy 15, no. 6: 1416. https://doi.org/10.3390/agronomy15061416

APA Style

Xie, J., Lu, M., Gao, Q., Chen, L., Zou, Y., Wu, J., Cao, Y., Xu, N., Wang, W., & Li, J. (2025). Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects. Agronomy, 15(6), 1416. https://doi.org/10.3390/agronomy15061416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Detection and Control of Crop Pests and Diseases: Current Status and Future Prospects

Abstract

1. Introduction

2. Classic Machine Learning

2.1. Decision Tree

2.2. Support Vector Machine

2.3. Random Forest

2.4. K-Means Clustering

2.5. RL

3. Deep Learning

3.1. CNN

3.1.1. Image Classification Algorithms

AlexNet

Visual Geometry Group (VGG)

3.1.2. Object Detection Algorithms

R-CNN Series

You Only Look Once (YOLO) Series

Detection Transformer (DETR)

3.1.3. Image Segmentation Algorithms

Fully Convolutional Networks (FCNs)

Mask R-CNN

3.2. RNN

LSTM

4. Large Language Models

4.1. Semantic Large Language Models

4.2. Vision–Language Models

5. Discussion and Conclusions

5.1. Discussion

5.2. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI