With the application of convolutional neural networks in image recognition, multi-target recognition in complex scenarios has become increasingly feasible. How to utilize this technology to study plant disease and pest recognition has become a highly concerning topic among researchers [
5,
6]. In 2021, Zhao et al. [
7] designed PestNet, a pest classification model based on saliency detection, by simulating the main process of object recognition by the human visual nervous system. The model mainly consists of the target localization module OPM and the multi-feature fusion module MFFM. OPM integrates shallow detail information and deep spatial information of pest images through a U-shaped network structure, initially delineates the salient regions, and outputs the spatial semantic features. MFFM weakens the background information and increases the detailed features through a bilinear pooling operation of spatial semantic features and abstract semantic features. Guo et al. [
8] constructed an attention mechanism based on salient graph analysis to effectively narrow down the range of traps to be detected. This makes the network more focused on dealing with the pest region, which in turn mitigates the problem of misdetection and ultimately improves the detection accuracy. Tang et al. [
9] proposed the Pest-YOLO network model, a real-time agricultural pest detection method based on an improved convolutional neural network (CNN) and YOLOv4. The model introduces the SE attention mechanism module and designs a cross-stage multi-feature fusion method to improve the structure of feature pyramid networks and path aggregation networks, thus enhancing the feature expression ability. Ramazi et al. [
10] employed machine learning models to study short-term and medium-term predictions of mountain pine beetle outbreaks in Canadian forests. They found that the Gradient Boosting Machine (GBM) performed best for short-term forecasts, while ensemble models demonstrated superior performance for medium-to-long-term projections. This methodology provides reliable support for forest pest monitoring and control decision-making. In 2022, Sun et al. [
11] proposed a forestry pest detection method based on the attention model and lightweight YOLOv4. It contains three improvements: replacing the backbone network, introducing the CBAM attention mechanism, and introducing the Focal Loss optimization loss function. This achieved 93.7% mAP on a dataset containing seven forestry pests. Dong et al. [
12] proposed a multi-category pest detection network, MCPD-net, to address the problems of great difficulty in detecting small targets and the similarity in appearance of certain pests. It achieved 67.3% mAP and 89.3% average recall (AR) in experiments on the multi-category pest dataset MPD2021. Bjerge et al. [
13] proposed an automated real-time insect monitoring system based on computer vision and deep learning. By integrating the YOLOv3 model with the Insect Classification and Tracking (ICT) algorithm, end-to-end processing for insect detection, species classification, and individual tracking has been achieved. In 2023, Zhao et al. [
14] proposed a pest identification method based on improved YOLOv7 for the complex environment of farmland. By combining the CSP Bottleneck with the shift-window Transformer self-attention mechanism based on the shift-window, increasing the detection branch, introducing the CBAM attention mechanism, and adding the Focal EIoU Loss Function, this four-point improvement made the mAP of the model reach 88.2%. Wang et al. [
15] proposed a single-stage unanchored detection network, OSAF-Net, with robust performance against the challenges of distinguishing between similarly shaped pests and multi-scale pests, leading to a large number of false-negative detections. It achieved good detection results on both CropPest24 and MPD2018 datasets. Duan et al. [
16] introduced the SENet module and Soft-NMS algorithm in YOLOv4 to solve the problem that corn pests are not easy to recognize and improve the detection accuracy. Choiński et al. [
17] developed a method using deep learning to automatically detect insects in photographs, enabling automatic feature extraction from raw images for insect detection and counting. In image tests conducted in Poland, Germany, Norway, and other locations, the method achieved a precision of 0.819 and a recall of 0.826. Badgujar et al. [
18] proposed a real-time detection and identification system for storage pests based on deep learning (YOLO series models), providing an end-to-end framework for automated and real-time insect detection and identification in stored product environments. Salamut et al. [
19] achieved automated detection of cherry fruit flies in sticky traps using deep learning. Among the five models employed, the optimal average detection accuracy reached 0.9, addressing the inefficiency inherent in traditional monitoring methods. Bjerge et al. [
20] constructed a large-scale image dataset of insect taxa. Their YOLOv5 model addressed the challenge of detecting small insects against complex vegetation backgrounds, advancing the field of insect monitoring. In 2024, Zhou et al. [
21] proposed a small-objective multi-category farmland pest target detection algorithm based on YOLOv5 improvement to address the problem of inconspicuous characteristics of farmland pests and the predominance of small pests. It achieved a mAP of 79.4% on a publicly available dataset containing 28 categories of farmland pests. Li et al. [
22] proposed a lightweight, location-aware fast R-CNN (LLA-RCNN) method. To reduce the computational effort, the model uses MobileNetV3 to replace the original backbone and introduces the CA Attention Mechanism Module for augmenting the location information. In addition, the generalized intersection and union set (GIoU) loss function and region of interest alignment (RoI Align) technique are used to improve the pest detection accuracy. Elmagzoub et al. [
23] proposed a rice pest identification method integrating deep learning feature extraction and feature optimization. The ResNet50 model, combined with feature vectors extracted via LR and PCA, achieved an accuracy rate of 99.28% in identifying rice insects. Hacinas et al. [
24] developed an edge computing application for low-end mobile devices that enables automated counting of cocoa pod borers (CPB) on sticky traps using an optimized YOLOv8 model. This study provides a feasible solution for low-cost pest monitoring. In 2025, Xiong et al. [
25] proposed an improved detection model, QMDF-YOLO11, based on YOLO11 for farmland pests in complex scenarios. It achieved a mAP of 94.57% on the RicePests dataset, effectively solving the problem of low accuracy of pest detection in complex backgrounds and small target scenarios. Zhang et al. [
3] proposed a lightweight farmland pest detection algorithm based on YOLOv7 to address the problem of a large number of parameters and computation in the current pest detection algorithm model. It achieved 72.1% detection accuracy for farmland pests, while the computation and number of parameters were kept at a low level. Liu et al. [
26] proposed an end-to-end pest detection method based on feature representation compensation (FRC) and region-like shape self-attention (RPSA). They designed a CSWin-based FRC module to compensate for the loss of feature information due to the downsampling process and proposed an RPSA-based Transformer encoder to capture global information and enhance the local information of the feature map. Rajeswaran et al. [
27] proposed a method for identifying live insects in agricultural scenes using motion analysis of consecutive video frames, and compared five deep learning object detection models. Experiments demonstrate that the SSD_MobileNet_V2 model delivers optimal performance, providing a lightweight solution for precision agricultural insect management. Kargar et al. [
28] proposed SemiY-Net, a compact deep learning model for insect segmentation and image-dependent tasks, achieving ideal pest detection and counting performance on MCU-based circuit boards. This research provides edge computing solutions for precision agriculture. Ong et al. [
29] investigated how the color of sticky traps and the imaging equipment used affect the effectiveness of deep learning in automatically identifying insects on sticky traps. It is proposed that to build a stable and reliable automated insect monitoring system, simultaneous optimization is required in both trap color selection and deep learning network architecture design.