Next Article in Journal
Simulating Water Use and Yield for Full and Deficit Flood-Irrigated Cotton in Arizona, USA
Previous Article in Journal
Influence of Organic Mulching Strategies on Apple Tree (Mallus domestica BORKH.) Development, Fruit Quality and Soil Enzyme Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection

1
College of Computer Science & Engineering, Guangxi Normal University, Guilin 541004, China
2
Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
3
Institute of Agricultural Science and Technology Information, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China
*
Authors to whom correspondence should be addressed.
Agronomy 2025, 15(9), 2022; https://doi.org/10.3390/agronomy15092022
Submission received: 11 July 2025 / Revised: 13 August 2025 / Accepted: 19 August 2025 / Published: 22 August 2025
(This article belongs to the Section Pest and Disease Management)

Abstract

Agricultural pest detection through image analysis is a key technology in automated pest-monitoring systems. However, some existing pest detection models face excessive model complexity. This study proposes YOLO-LCE, a lightweight model based on the YOLOv8 architecture for agricultural pest detection. Firstly, a Lightweight Complementary Residual (LCR) module is proposed to extract complementary features through a dual-branch structure. It enhances detection performance and reduces model complexity. Additionally, Efficient Partial Convolution (EPConv) is proposed as a downsampling operator. It adopts an asymmetric channel splitting strategy to efficiently utilize features. Furthermore, the Ghost module is introduced to the detection head to reduce computational overhead. Finally, WIoUv3 is used to improve detection performance further. YOLO-LCE is evaluated on the Pest24 dataset. Compared to the baseline model, YOLO-LCE achieves mAP50 improvement of 1.7 percentage points, mAP50-95 improvement of 0.4 percentage points, and precision improvement of 0.5 percentage points. For computational efficiency, parameters are reduced by 43.9%, and GFLOPs are reduced by 33.3%. These metrics demonstrate that YOLO-LCE improves detection accuracy while reducing computational complexity, providing an effective solution for lightweight pest detection.

1. Introduction

Agricultural production faces a lot of problems, and one of them is pest infestation, which threatens food security. Crop losses caused by pest damage affect farmers’ incomes and regional food supplies. The timely and accurate monitoring of agricultural pests is a critical step of Integrated Pest Management (IPM) strategies [1]. Therefore, efficient pest-monitoring methods have become an important research direction in modern agriculture.
Traditional monitoring methods mainly rely on professional technicians conducting field inspections or observing crop damage. These approaches work to some extent. But they have problems such as high labor consumption and difficulties in the timely detection of initial infestations. This makes it hard to control pests quickly. To address these limitations, automated pest monitoring has become a solution through deploying pest detection devices in agricultural fields. These devices can automatically capture and regularly monitor pest species and quantities through various trapping methods [2,3,4], including attractant lures, light traps, or sticky boards. The captured pests are photographed at regular intervals, and automated image analysis is used to identify pest species. The identification process can be implemented in two ways: images can be transmitted via network to centralized servers for processing, or detection can be performed locally on the device with results uploaded to cloud servers. Local detection offers advantages such as reduced server computational load and lower network transmission requirements.
The core challenge of automated pest monitoring lies in accurate image analysis for pest identification. In the past, people tried to use machine learning methods for pest classification. For example, Ebrahimi et al. [5] used a region index and color index features with an SVM classifier to detect thrips on crop canopy images. Such traditional approaches, especially those that relied on manual feature engineering, had trouble with the variety of pests, which made generalization difficult. Modern CNN-based object detection models are more suitable for pest monitoring, as they can automatically extract features without relying on manual feature engineering and provide both the location and category of pests in images.
Early applications of CNN in pest detection mainly adopted two-stage detection architectures. These techniques include R-CNN [6] and its subsequent developments such as Faster R-CNN [7], Cascade R-CNN [8], and Mask R-CNN [9]. They work by generating region proposals and then classifying the proposed regions. Jiao et al. [10] proposed AF-RCNN in multi-category pest detection. This model integrates Fast R-CNN with an Anchor-Free Region Proposal Network to eliminate anchor dependencies. Wang et al. [11] addressed the foreground–background imbalance challenges in small crop pest detection by proposing S-RPN (Sampling-balanced Region Proposal Network). This approach utilizes sparse sampling strategies and centerness-based sample selection mechanisms to improve detection performance. Teng et al. [12] proposed MSR-RCNN by incorporating the multi-scale super-resolution feature enhancement module. This model achieved good performance on LLPD-26. Gao et al. [4] proposed an enhanced Cascade R-CNN model to detect small pests on yellow sticky board images. Although two-stage detection approaches generally provide high accuracy, they have excessively high model complexity. This makes them unfavorable for deployment on resource-constrained devices, such as field monitoring equipment.
Single-stage detection frameworks, such as the Single Shot Multibox Detector (SSD) [13] and the YOLO series [14,15,16,17,18,19,20,21], are widely used due to their computational efficiency and accuracy. These characteristics have made them popular choices for pest detection applications. However, existing single-stage pest detection methods face two primary limitations: limited pest coverage and high model complexity.
Many current single-stage model studies suffer from limited pest coverage. Li et al. [22] proposed YOLO-TP as a lightweight model for Lasioderma serricorne counting, employing GIoU loss function, GSConv and the PC2f structure. But it only focuses on a single species. Li et al. [23] proposed YOLO-JD for jute disease and pest detection, integrating SCFEM, DSCFEM, and SPPM for effective feature extraction. Although this model covers eight types of jute diseases, it only includes two pest species. Addressing grain pest detection challenges, Lyu et al. [24] proposed a feature fusion SSD algorithm. They implemented a top–down strategy to combine multi-layer features and removed the block unfavorable to small object detection, but it only targets five types of grain pests. Wang et al. [25] proposed Insect-YOLO, augmented with CBAM for crop insect detection, but it only covers seven agricultural pest species. Wang and Wang [26] proposed GAS-YOLOv8 as a lightweight YOLOv8 variant for mango pest and disease detection, incorporating GhostHGNetv2 backbone, AsDDet detection head, and C2f-SE module, targeting 10 mango-related pests. Zhao et al. [27] proposed AC-YOLO as a multi-class detection model in stored grain pest applications, integrating ECIoU loss function, CBAM and ACmix, covering 12 types of grain pests.
Many studies focus primarily on improving accuracy without fully considering the lightweight requirements for practical deployment, making even single-stage models quite complex. Zhang et al. [28] proposed AgriPest-YOLO. This model employed the coordination and local attention mechanism and the grouping spatial pyramid pooling fast module. Although this model achieved 71.3% mAP50 on the Pest24 dataset, it has 16.2 GFLOPs, making it still unsuitable for resource-constrained deployment environments. Tian et al. [29] proposed MD-YOLO. This model deployed the DenseNet block and the adaptive attention module to enhance feature utilization. Although it achieved 86.2% mAP50, it has 126.8 M parameters and only targets three species of lepidopteran pests, with limited coverage scope. Dai et al. [30] proposed an enhanced YOLOv5m-based pest detection method. This method integrated the C3TR and SWinTR modules to extract global features. Additionally, this method incorporated ResSPP and WConcat components for improved feature extraction and fusion. Although it achieved 96.4% mAP50, they used a dataset that was not focused on small targets, potentially reducing detection difficulty, and the model has 38.1 MB model size.
This study proposes a lightweight YOLOv8 model for agricultural pest detection. Our research uses the Pest24 dataset for evaluation, which covers 24 categories of pests that need to be detected according to Ministry of Agriculture of China requirements, making our model applicable to broader pest detection applications. Furthermore, our research is dedicated to seeking a balance between accuracy and efficiency, making the model size more suitable for deployment on resource-constrained devices, such as field monitoring equipment. The primary technical contributions include the following:
(1) We propose a Lightweight Complementary Residual (LCR) module. This module has two branches that respectively focus on extracting different types of features, forming feature complementarity. Feature extraction is implemented through depthwise convolution, thus maintaining lightweight design. Furthermore, we integrate the LCR into the C2f module to form C2f-LCR.
(2) We propose Efficient Partial Convolution (EPConv) based on PConv [31], a downsampling operator that adopts an asymmetric channel splitting strategy to efficiently utilize features. Additionally, EPConv incorporates the shortcut in ResNet-D [32] and the SE module [33].
(3) We introduce the Ghost module [34] to the detection head, which generates more feature maps through cheap operations while reducing computational cost.
(4) We adopt WIoUv3 [35] as the localization loss function, which improves detection performance through the dynamic non-monotonic gradient allocation mechanism.
These integrated modifications aim to produce an efficient and lightweight pest detection model suitable for agricultural applications.

2. Materials and Methods

2.1. Dataset

This study uses the Pest24 dataset [36], which is a benchmark dataset for agricultural pest detection applications. The dataset encompasses 25,378 images across 24 distinct crop pest species. These 24 pest species are selected from the 38 field crop pest categories designated for detection by the Ministry of Agriculture of China. The remaining 14 categories are excluded from this dataset due to insufficient instances for reliable training and evaluation.
Table 1 presents details regarding the 24 pest categories within the Pest24 dataset, encompassing identifiers and category names.
Figure 1 illustrates the instance count distribution across the 24 pest categories. The dataset exhibits significant class imbalance, with Anomala corpulenta (ID: 10) having the highest number of instances at 53,347, while Holotrichia oblita (ID: 18) has the fewest instances with only 108 samples.
The Pest24 dataset exhibits the following characteristics:
(1) The dataset is characterized by extremely small target scales, with pest relative scales primarily distributed between 0 and 0.01 [36], much smaller than traditional object detection datasets. As shown in Figure 2a,b, individual pests occupy relatively few pixels in the images, making feature extraction particularly challenging.
(2) The dataset contains some imprecise annotations. As illustrated in Figure 2a, several bounding boxes (marked with red circles) are relatively loose and do not tightly fit the pest boundaries. Since evaluating bounding box annotation accuracy involves subjective factors, we present here only a few examples that we consider to have inaccurate annotations.
(3) The dataset employs an incomplete annotation strategy, labeling only the 24 target pest categories while leaving other pest species in the images unlabeled. These unlabeled pests are referred to as non-target pests, creating additional detection challenges.
(4) Many different pest classes exhibit high visual similarity, which manifests in two forms. Similarity among target pests is demonstrated in Figure 2b, where the blue circles highlight two visually similar but different pest species—Bollworm (ID: 0) and Armyworm (ID: 8). Additionally, similarity between non-target and target pests is shown in Figure 2a, where Agriotes fuscicollis Miwa (ID: 4) appears similar to many unlabeled black insects in the background.
(5) The dataset contains numerous instances of pest adhesion and overlap, as evidenced in Figure 2b, where multiple pests are clustered together, making individual detection difficult.
(6) The dataset includes various environmental disturbances, such as illumination reflections. As shown in Figure 2b, green circles mark areas where pests are partially obscured by lighting reflections.

2.2. YOLO-LCE Network Architecture

The standard YOLOv8 model can process 640 × 640 RGB images and is composed of three main parts: the backbone, neck, and head. The backbone is primarily responsible for feature extraction. It adopts CBS modules for downsampling and employs multiple C2f modules to capture rich features. At the end of the backbone, a spatial pyramid pooling fast (SPPF) module is used to expand the receptive field and fuse features from different receptive fields. The neck adopts the Path Aggregation Network (PANet) structure. This design fuses low-level spatial details with high-level semantic features from the backbone, enhancing the model’s ability to detect objects at various scales. Finally, the detection head is decoupled, which allows two separate branches to focus on the classification and regression tasks respectively, improving detection accuracy. YOLOv8 includes multiple variants (n, s, m, l) with varying complexity levels.
YOLO-LCE is designed based on the YOLOv8n architecture for multi-class agricultural pest detection, capable of simultaneously localizing pest objects and assigning category labels. As shown in Figure 3, the model introduces targeted improvements while maintaining the core architecture of YOLOv8n.
To enhance feature representation capability through complementary features and reduce model complexity, YOLO-LCE employs the C2f-LCR modules to replace some of the original C2f modules in the network. Specifically, the last C2f module in the backbone network is replaced with C2f-LCR for high-level semantic feature extraction; meanwhile, all C2f modules preceding the three detection heads in the neck network are also replaced with C2f-LCR.
Additionally, the last convolutional layer in the backbone network and all convolutional layers in the neck network employ EPConv to achieve efficient downsampling with low computational complexity.
The detection head incorporates the Ghost module [34], which generates more feature maps through cheap operations.
In terms of loss function design, YOLO-LCE adopts WIoUv3 [35] to improve bounding box regression for pest targets.
Through these components, YOLO-LCE constructs a lightweight and efficient pest detection network that reduces computational resource requirements.

2.3. Lightweight Complementary Residual Module

In pest detection tasks, unlabeled non-target pests in the dataset have similar appearances to target pests, interfering with target pest detection. The C2f module in YOLOv8 employs the Bottleneck module for feature extraction. However, this module has high computational overhead, and the single feature extraction path lacks feature diversity.
To address these issues, this study proposes a Lightweight Complementary Residual (LCR) module, which designs a complementary dual-branch structure. This structure has two branches that focus on extracting different types of complementary features respectively. The first branch focuses on extracting stable features of pests. The second branch focuses on extracting discriminative features of pests. This design enhances the discriminative capability between target and non-target pests through complementary feature extraction while reducing model complexity.
As shown in Figure 4, given an input feature map X with dimensions C × H × W , the LCR module first splits it equally along the channel dimension into two branches:
X 1 , X 2 = Split ( X )
where X 1 , X 2 has dimensions C 2 × H × W .
The two branches employ different pooling strategies for feature abstraction. Pooling is beneficial for detecting dense and even occluded pests, as it expands the receptive field through aggregation, providing more context for the next layer. The first branch uses average pooling:
Y 1 = AvgPool 3 × 3 ( X 1 )
The second branch uses max pooling:
Y 2 = MaxPool 3 × 3 ( X 2 )
Both pooling operations use a kernel size of 3 × 3 , stride of 1, and padding of 1. This ensures that the output feature maps maintain the same spatial dimensions as the input.
This dual-branch pooling design provides complementary feature foundations. Average pooling preserves smooth local responses. Max pooling preserves maximum local responses, highlighting potential discriminative information of similar pests.
Both branches then employ depthwise convolution for lightweight feature learning. Depthwise convolution conducts feature extraction independently on each channel. Compared to standard convolution, this approach reduces parameter complexity.
The depthwise convolution processing in the LCR module is as follows:
Z 1 = DWConv 3 × 3 ( Y 1 )
Z 2 = DWConv 3 × 3 ( Y 2 )
where DWConv 3 × 3 represents depthwise convolution operation with kernel size 3 × 3 .
Through these depthwise convolutions, the first branch extracts stable features based on smooth responses. The second branch extracts discriminative features based on maximum responses. This helps the network better distinguish target pests from non-target pests.
These two complementary features are then concatenated along the channel dimension:
Z cat = Concat ( Z 1 , Z 2 )
To enhance feature learning capability and alleviate gradient vanishing problems, LCR adopts residual connections:
Z res = Z cat + X
Since depthwise convolution lacks channel interaction capability, a final 1 × 1 convolution is applied for channel mixing [37]:
Z = Conv 1 × 1 ( Z res )
To leverage the advantages of the LCR module, this study integrates it into the YOLOv8 C2f structure to form the C2f-LCR module. As shown in Figure 5, C2f-LCR replaces the original Bottleneck structure in C2f with the LCR module.

2.4. Efficient Partial Convolution

FasterNet [31] proposed Partial Convolution (PConv) to satisfy lightweight needs. Only a small part of the feature map is processed with convolution in this method. Most of the feature map is passed directly to the output without processing to reduce computational complexity. The large number of unprocessed features leads to insufficient feature utilization, limiting the feature representation capability of the network.
Addressing the above problems, this study proposes Efficient Partial Convolution (EPConv) as a downsampling operator based on PConv. As shown in Figure 6, EPConv implements an asymmetric channel splitting strategy. In this strategy, 1 8 of the input features are processed through the main convolution branch1 for extracting primary pest features. The remaining 7 8 are processed through the parameter-efficient group convolution branch2 for extracting auxiliary features. This design ensures complete feature utilization while reducing computational cost and model parameters.
Given an input feature map X with dimensions C × H × W , EPConv splits it along the channel dimension into two parts:
X 1 , X 2 = Split ( X )
where X 1 has dimensions C 8 × H × W , representing features processed by branch1 main convolution, and X 2 has dimensions 7 C 8 × H × W , representing features processed by branch2 group convolution.
When used for downsampling with stride s (typically 2), the processing of both branches is as follows:
Y 1 = BN ( Conv 3 × 3 ( X 1 ) )
Y 2 = BN ( GroupConv 3 × 3 ( X 2 ) )
where Conv 3 × 3 represents standard convolution with kernel size 3 × 3 for extracting main features. GroupConv 3 × 3 applies group convolution with kernel size 3 × 3 for extracting auxiliary features. Both operations use stride s for spatial dimension reduction and are followed by batch normalization (BN) [38]. The application of group convolution reduces computational complexity, achieving the lightweight design goal.
The outputs are then concatenated:
Y cat = Concat ( Y 1 , Y 2 )
To match the target output dimensions, 1 × 1 convolution followed by BN is applied for channel projection:
Y p r o j = BN ( Conv 1 × 1 ( Y c a t ) )
To enable residual learning during downsampling, EPConv adopts the shortcut in ResNet-D [32]:
Y s h o r t c u t = BN ( Conv 1 × 1 ( AvgPool s × s ( X ) ) )
where AvgPool s × s represents the average pooling operation with kernel size s × s and stride s. Subsequently, 1 × 1 convolution adjusts channel dimensions followed by BN. This method conducts spatial dimension reduction through average pooling. It preserves more information during spatial downsampling, which is beneficial for small target detection.
The features are then combined through this residual connection:
Y r e s = Y p r o j + Y s h o r t c u t
To focus on important feature channels, EPConv introduces the SE (Squeeze-and-Excitation) module [33]. The SE module processes the features as follows:
Y s e = SE ( Y r e s )
The final output of the EPConv is
Y = SiLU ( Y s e )

2.5. Ghost Module

The detection head usually contains multiple standard convolutional layers. Because it needs to process features with high dimensions, it typically becomes a concentrated area of computational complexity.
To reduce computational overhead, this study introduces the Ghost module [34] in YOLOv8’s detection head. As shown in Figure 7, the standard convolutions are replaced with Ghost modules.
The idea of the Ghost module is to generate ghost features through cheap operations, reducing computational cost.
According to Figure 8, for an input feature map X with dimensions C 1 × H × W where C 1 denotes the input channel number, the Ghost module processing procedure comprises the following stages:
First, half of the output features are generated through 3 × 3 standard convolution:
Y 1 = Conv 3 × 3 ( X )
where Y 1 has dimensions C 2 2 × H × W . C 2 is the output channel number, and H × W indicates the output spatial size.
Subsequently, ghost features are generated via 5 × 5 depthwise convolution as the cheap operation:
Y 2 = DWConv 5 × 5 ( Y 1 )
where Y 2 has dimensions C 2 2 × H × W .
Finally, these two feature parts are concatenated along the channel dimension to form the complete output:
Y = Concat ( Y 1 , Y 2 )
To quantify the computational efficiency advantages of the Ghost module, this study analyzes the theoretical computational complexity. The analysis compares the standard convolution and the Ghost module. Note that the following computational complexity analysis mainly considers multiplication operations in convolution, ignoring the bias term computational overhead. The 3 × 3 standard convolution operation can be formulated as
Y = X W + b
where Y has dimensions C 2 × H × W representing the output feature map. W has dimensions C 2 × C 1 × 3 × 3 as convolution kernel parameters, and b is the bias term. The computational complexity of standard convolution is
C std = C 1 × C 2 × 3 2 × H × W
The computational cost of the Ghost module consists of two parts. The first is the computational cost of the main path 3 × 3 convolution:
C main = C 1 × C 2 2 × 3 2 × H × W
The second is the computational cost of the ghost path 5 × 5 depthwise convolution:
C ghost = C 2 2 × 5 2 × H × W
The total computational cost of the Ghost module is
C total = C main + C ghost
The computational efficiency ratio η can be expressed as
η = C std C total = 18 C 1 9 C 1 + 25
In the deep layers where detection heads are located, the input channel number C 1 is typically large. For theoretical analysis, when C 1 approaches infinity, the efficiency ratio approaches the limit value:
lim C 1 η = lim C 1 18 9 + 25 C 1 = 2
These theoretical analyses show that as the number of channels increases, the computational efficiency advantage of the Ghost module becomes greater.

2.6. WIoUv3

Box loss function plays an important role in training the object detection model. YOLOv8 adopts CIoU [39] as the bounding box regression loss. The CIoU loss function considers the overlap area, central point distance, and consistency of aspect ratios, but it still has some limitations. The pest dataset used in this study inevitably has some low-quality samples. CIoU uses the same calculation method for anchor boxes of different qualities. This cannot dynamically optimize bounding box regression.
According to Figure 9, CIoU loss is formulated as
L CIoU = 1 IoU + ρ 2 ( b p , b gt ) c 2 + α v
where IoU denotes the overlap ratio between the predicted box and the ground truth box. ρ 2 ( b p , b g t ) represents the squared distance between box centers. c is the diagonal length of the smallest enclosing rectangle. v quantifies the consistency of aspect ratios. α is a trade-off coefficient.
To reduce the impact of low-quality samples on training, this study introduces the WIoUv3 [35] loss function. Since WIoUv3 is improved based on WIoUv1, the details of WIoUv1 are first presented:
L IoU = 1 IoU
R WIoU = exp ( x p x g t ) 2 + ( y p y g t ) 2 ( W g 2 + H g 2 )
L WIoUv 1 = R WIoU L IoU
where W g and H g represent the width and height of the smallest enclosing box. The superscript ∗ indicates that this term does not participate in gradient calculation during backpropagation.
WIoUv3 is built on WIoUv1 and adds a dynamic non-monotonic gradient allocation mechanism:
β = L IoU L ¯ IoU [ 0 , + )
r = β δ α β δ
L WIoUv 3 = r L WIoUv 1
where β represents the sample anomaly degree. δ and α are hyperparameters.
This non-monotonic design allows WIoUv3 to allocate small gradient gain to high-quality anchor boxes with low anomaly degrees. It also allocates small gradient gain to low-quality anchor boxes with high anomaly degrees. This makes the model focus on optimizing ordinary-quality anchor boxes instead of extreme cases, weakening the harmful gradients generated by low-quality samples. Therefore, the model can achieve better performance.

3. Results

3.1. Experimental Setup

For this study, the Pest24 dataset was randomly partitioned into training, validation, and testing subsets using a 6:2:2 distribution ratio based on image count. All experimental work was conducted on a computational server. The server is equipped with Intel Xeon Gold 6430 CPU and NVIDIA GeForce RTX 4090 GPU, operating under Ubuntu 22.04.3 with CUDA version 12.1.
The Faster R-CNN model is implemented with MMDetection. Model training employed the hyperparameter settings detailed in Table 2. The “Other YOLO Series” column represents the settings for YOLOv4-tiny, YOLOv5n, YOLOv7-tiny, YOLOv8n, YOLOv10n, YOLOv11n, and YOLO-LCE.
For data augmentation, all models are trained using their respective default settings. The proposed YOLO-LCE follows the default data augmentation strategy [40] of YOLOv8, with the specific enabled techniques detailed in Table 3. Additionally, the WIOUv3 loss function employed in YOLO-LCE uses the following hyperparameters: α = 1.7 and δ = 2.7.

3.2. Evaluation Metrics

This study conducts a quantitative assessment of the enhanced model across two aspects: detection performance and model efficiency.
The detection performance metrics [41,42] include precision, recall, AP50, mAP50, and mAP50-95. Precision and recall are calculated based on the counts of true positives (TPs), false positives (FPs), and false negatives (FNs). First, only predictions with a confidence score above a specific threshold are considered for evaluation. Then, these filtered predictions are categorized as follows. A prediction is a TP if it correctly identifies a pest’s species and its bounding box achieves an Intersection over Union (IoU) of at least the predefined IoU threshold. A prediction is an FP if it is incorrect, either due to a class mismatch or an IoU below the IoU threshold. An FN represents a pest that was present but not detected by the model with sufficient confidence.
Based on these counts, precision and recall are defined as
Precision = TP TP + FP
Recall = TP TP + FN
For the i-th pest category, its average precision is defined as
AP i = 0 1 P i ( r ) d r
where P i ( r ) represents the precision at recall r for that category. AP50 is calculated when the IoU threshold is set to 0.5.
The mean Average Precision (mAP) represents the overall detection performance across all categories. mAP is calculated as
mAP = 1 k i = 1 k AP i
where k = 24 is the total number of pest categories. This study uses two primary mAP metrics. The first, mAP50, is calculated at a single IoU threshold of 0.5. The second, mAP50-95, is the average mAP across ten IoU thresholds, ranging from 0.5 to 0.95 in increments of 0.05.
The model efficiency metrics include three measures. Params (parameters) [43] reflect model complexity, Model Size (MB) indicates storage requirements, and GFLOPs (Giga Floating Point Operations) [43] represent computational complexity.

3.3. Comparison Experiments with Other Models

To evaluate the comprehensive performance of the improved algorithm, this study compares the proposed method with multiple advanced object detection models on the same dataset. The comparison includes the two-stage detector (Faster R-CNN) [7] and multiple lightweight versions of the YOLO series [17,18,19,21]. The comparison outcomes are shown in Table 4.
The experimental results show clear performance differences across detection architectures. Faster R-CNN, as a representative of traditional two-stage detectors, achieves only 43.2% mAP50 while requiring 41.47 M parameters and 91.0 GFLOPs. This indicates that the two-stage architecture not only fails to achieve high detection accuracy but also introduces substantial model parameters and computational cost.
Among early YOLO models, YOLOv4-tiny achieves 55.1% mAP50 with 5.93 M parameters and 16.3 GFLOPs. YOLOv5n outperforms it, achieving 57.8% mAP50 with only 1.79 M parameters and 4.2 GFLOPs. YOLOv6n exhibits slightly lower performance compared to YOLOv5n, achieving 57.5% mAP50 using 4.63 M parameters and 11.4 GFLOPs. YOLOv7-tiny shows further improvements over YOLOv5n, reaching 59.2% mAP50 but requiring 6.07 M parameters and 13.2 GFLOPs.
Recent YOLO models present a better balance between performance and efficiency. YOLOv8n achieves 62.2% mAP50 with 3.01 M parameters and 8.1 GFLOPs. YOLOv11n attains the same 62.2% mAP50 with fewer parameters of 2.59 M and 6.3 GFLOPs. YOLOv10n achieves 61.8% mAP50 with 2.70 M parameters and 8.3 GFLOPs.
YOLO-LCE achieves the highest mAP50 of 63.9% among all evaluated models. It uses only 1.69 M parameters, 3.69 MB model size, and 5.4 GFLOPs. Compared to YOLOv8n, YOLO-LCE improves mAP50 by 1.7 percentage points. It also reduces parameters by 43.9%, model size by 41.1%, and GFLOPs by 33.3%. When compared to YOLOv11n, YOLO-LCE achieves the same 1.7 percentage point improvement in mAP50. It also reduces parameters by 34.7%, model size by 32.7%, and GFLOPs by 14.3%.
On the stricter mAP50-95 metric, YOLO-LCE achieves the highest performance of 39.1% among all compared models, indicating that YOLO-LCE has better localization precision. Regarding precision and recall analysis, compared to YOLOv7-tiny, YOLO-LCE has lower precision by 4.0 percentage points, achieving 69.3% compared to YOLOv7-tiny’s 73.3%, but demonstrates higher recall by 4.2 percentage points with 60.3% compared to YOLOv7-tiny’s 56.1%. YOLOv5n demonstrates the highest precision at 76.2%, but its recall of 52.7% is 7.6 percentage points lower than YOLO-LCE’s 60.3%. YOLOv4-tiny achieves the highest recall at 65.5%, but its precision of 35.3% is significantly lower than YOLO-LCE’s 69.3% by 34.0 percentage points. YOLO-LCE surpasses Faster R-CNN, YOLOv10n, and YOLOv11n in precision and recall. Additionally, YOLO-LCE achieves the same recall as YOLOv8n while improving precision by 0.5 percentage points.

3.4. Ablation Experiments

To validate the effectiveness of each component, this study conducted ablation experiments by progressively integrating each component into the baseline YOLOv8n according to the YOLO-LCE design. The evaluation outcomes are shown in Table 5.
The introduction of the C2f-LCR module reduces parameters and GFLOPs by 19.9% and 11.1%, respectively. The reduction in parameters and GFLOPs is primarily due to the use of depthwise convolutions. These convolutions perform independent computations on each channel. Compared to standard convolutions, they require fewer parameters and lower computational overhead. Although recall decreases from 60.3% to 59.1%, the module significantly improves precision from 68.8% to 75.6%, an increase of 6.8 percentage points. More importantly, mAP50, as a comprehensive metric that considers both precision and recall, improves by 0.8 percentage points from 62.2% to 63.0%. This demonstrates that the comprehensive detection performance is enhanced. This improvement can be attributed to complementary features that enhance feature representation, thereby improving comprehensive detection capability.
The EPConv continues to reduce parameters to 2.05 M and GFLOPs to 6.8. This indicates that the asymmetric channel splitting strategy reduces both parameters and computational complexity while ensuring complete feature utilization. Additionally, the shortcut based on average pooling [32] preserves more pest details. SE attention [33] focuses on important channels. These strategies enable the model to maintain mAP50 at 63.1% and mAP50-95 at 38.8%. Although EPConv causes precision to decrease from 75.6% to 74.1%, recall improves from 59.1% to 60.1%.
The introduction of the Ghost module [34] generates more ghost features through cheap operations. This reduces model complexity with parameters reduced to 1.69 M and GFLOPs to 5.4. Although mAP50 slightly decreases to 62.8%, mAP50-95 remains at 38.8%. Moreover, the 0.3% mAP50 loss is exchanged for a 20.6% reduction in GFLOPs. Additionally, although recall drops to 58.4%, this module enables the model to achieve the highest precision of 76.3%.
The adoption of the WIoUv3 loss function [35] enables recall to recover to the same level as the baseline at 60.3%. Although precision decreases compared to the previous stage, it still improves by 0.5 percentage points compared to the baseline, reaching 69.3%. In terms of mAP metrics that comprehensively consider precision and recall, it enables the final model to achieve optimal mAP50 and mAP50-95 performance. mAP50 and mAP50-95 improve by 1.1 percentage points and 0.3 percentage points respectively compared to the previous stage, reaching 63.9% and 39.1%. This indicates that WIoUv3 can optimize overall detection performance through its dynamic non-monotonic gradient allocation mechanism.
The ablation experiments verify the effectiveness of the integrated components. Each component works synergistically to achieve the goals of detection performance improvement and lightweight design. Through integrating these components into YOLOv8n, YOLO-LCE is constructed. YOLO-LCE achieves 63.9% mAP50 and 39.1% mAP50-95 while reducing parameters by 43.9% and computational cost by 33.3% in GFLOPs compared to baseline YOLOv8n.

3.5. Per-Class AP50 Comparison

This study conducted per-class AP50 analysis comparing YOLOv8n and YOLO-LCE. Table 6 presents the AP50 values for each pest category, ranked by improvement magnitude in descending order.
The results show notable performance variations across different pest categories, with YOLO-LCE achieving the highest AP50 for Gryllotalpa orientalis (97.9%) and the lowest for Rice planthopper (1.49%). Holotrichia oblita achieves the largest improvement of 13.1 percentage points, followed by Nematode trench with 8.4 percentage points. However, both categories have limited test instances (29 and 30 respectively), which affects the reliability of these improvements.
Categories with moderate sample sizes demonstrate more reliable improvements. Rice Leaf Roller (243 instances) and Stem borer (384 instances) show consistent gains of 5.2% and 4.7% respectively. These improvements are more statistically meaningful due to adequate sample representation.
For pest categories with large test populations, the results show stable performance patterns. Anomala corpulenta (10,533 instances), Athetis lepigone (6000 instances), and Bollworm (5496 instances) maintain strong detection rates with modest improvements. These categories provide the most reliable evidence of model performance due to sufficient statistical power.
YOLO-LCE improves or maintains performance for 20 out of 24 categories. The exceptions include Eight-character tiger, which decreases by 8.51 percentage points, but this category has only 30 test instances, making this decline potentially attributable to limited sample rather than genuine model weakness. Categories with substantial test instances show improvements, demonstrating that our model achieves enhanced detection accuracy while maintaining a lightweight design.

3.6. Comparison of EPConv Channel Splitting Ratios

To determine the optimal channel splitting ratio for EPConv, we conducted experiments with different splitting strategies. As shown in Table 7, this study evaluated several ratios under the same experimental framework.
As demonstrated in Table 7, the 1:7 channel splitting ratio achieves the highest mAP50 of 63.1% and mAP50-95 of 38.8% while maintaining the lowest parameter count of 2.05 M. It also attains the highest precision and recall, indicating that the 1:7 ratio delivers the best overall performance among the three experimental ratios. It should be noted that this study did not experiment with ratios smaller than 1:7. This is because excessively small ratios would cause the standard convolution component to be overwhelmed by group convolution, which violates the asymmetric design concept.

3.7. Detection Results Visualization Analysis

To validate the capability of the LCR module in enhancing discrimination between target pests and non-target pests, this study conducted visualization comparison analysis. Representative test images were selected for this analysis. Figure 10 shows the detection result comparisons between YOLOv8n and YOLOv8n integrated with C2f-LCR. Each row displays the same test image processed by different methods.
The visualization results demonstrate clear improvements after integrating the C2f-LCR module. In the first comparison group, the enhanced model reduces three false detections. In the second group, the enhanced model shows two fewer false detections. It also eliminates one missed detection and one classification error. In the third group, the enhanced model reduces three false detections and one missed detection. In the fourth group, the enhanced model produces one false detection that YOLOv8n does not have, but it still achieves a net reduction of two false detections and one missed detection compared to YOLOv8n. In the fifth group, the enhanced model reduces four false detections.
Overall, YOLOv8n integrated with the C2f-LCR module reduces false detections across different test scenarios. This performance is better than YOLOv8n. These improvements demonstrate that the introduction of the LCR module enhances the model’s capability to discriminate between target pests and non-target pests.

3.8. Heatmap Analysis

To intuitively observe the attention distribution of YOLO-LCE on pest targets, heatmap visualization analysis was conducted. The analysis was conducted on both YOLOv8n and YOLO-LCE models.
As shown in Figure 11, performance differences are revealed across three representative scenarios. Each row displays the same test image processed by different models. In the first comparison group, which represents a clustered pest detection scenario, YOLOv8n fails to focus on some target pests. In contrast, YOLO-LCE focuses on more target pests. The second comparison group shows that YOLO-LCE generates more intense attention regions compared to YOLOv8n. In the third comparison group, YOLOv8n incorrectly focuses on non-target pest regions. In contrast, YOLO-LCE partially avoids this erroneous focus. Overall, the heatmap visualization demonstrates that YOLO-LCE can better focus on pest targets.

4. Discussion

4.1. Review of Model Optimization

4.1.1. LCR Module Design Philosophy

Non-target similar pests in the dataset cause detection interference. These unlabeled pests are easily misidentified as target species. Lightweight modules have constrained feature extraction capabilities. This is due to their limited parameters and computational resources. This interference makes it difficult to design lightweight modules that maintain effective detection performance.
The proposed LCR module utilizes parameter-free pooling operations to provide different types of feature foundations. Similar to CBAM [44], which utilizes max pooling and average pooling operations to generate attention weights. Our LCR module employs these two pooling operations with a different purpose to expand the receptive field and provide two distinct feature foundations. This is followed by lightweight depthwise convolution for feature extraction. One branch focuses on extracting stable features of pests. Another branch extracts discriminative features. Since depthwise convolution has no channel interaction, at the end of the module, we introduce a 1 × 1 convolution specifically for mixing these two types of features. These designs achieve lightweight implementation while enhancing feature representation capability through extracting complementary features, thereby better distinguishing similar pests.

4.1.2. EPConv Design Philosophy and Shortcut Strategy Analysis

Considering the complexity of pest features, the proposed EPConv improves upon PConv [31]. Small pests typically occupy few pixels in images and exhibit complex features. While PConv reduces computational cost by processing only a subset of features, this approach leads to insufficient feature extraction capability. EPConv applies lightweight group convolution to the previously unprocessed branch, ensuring that all features are utilized. The SE module [33] is introduced to focus attention on important channels. Furthermore, the shortcut using average pooling [32] is adopted, and the reasons for this will be analyzed in detail below.
Since EPConv serves as a downsampling operator, input and output feature maps typically have different dimensions. To address the dimensional mismatch in residual connections, various shortcut connection strategies have been proposed.
ResNet [45] uses 1 × 1 strided convolution to construct a projection shortcut. ResNet-D [32] uses avgpool for spatial downsampling. iResNet [46] proposes an improved projection shortcut, which uses max pooling for spatial downsampling.
To evaluate the effectiveness of these shortcut strategies for pest target detection, this study designs targeted comparison experiments.
This experiment is built upon the YOLOv8n framework. With the introduction of the C2f-LCR module, this evaluation compares different shortcut strategies in EPConv:
  • S1: No short connection.
  • S2 (ResNet [45]): Uses 1 × 1 convolution with stride 2 for direct spatial downsampling and channel dimension adjustment, followed by BN.
  • S3 (iResNet [46]): Uses 3 × 3 max pooling with stride 2 and padding 1 for spatial downsampling, followed by 1 × 1 convolution with stride 1 for channel dimension adjustment and BN.
  • S4 (iResNet [46]): Uses 2 × 2 max pooling with stride 2, followed by 1 × 1 convolution with stride 1 for channel dimension adjustment and BN.
  • S5 (ResNet-D [32]): Uses 2 × 2 average pooling with stride 2, followed by 1 × 1 convolution with stride 1 and BN.
As shown in Table 8, the evaluation outcomes demonstrate that S1 shows the lowest performance at 62.0% mAP50. This confirms the importance of residual connections. Among the projection shortcut strategies, S5 achieves the best performance with 63.1% mAP50. It improves by 0.5, 0.3, and 0.7 percentage points compared to S2, S3, and S4, respectively.
The performance differences can be attributed to varying degrees of information loss during spatial downsampling. S2 employs 1 × 1 convolution with stride 2. It conducts skip sampling and discards 75% of feature elements (for feature maps whose height and width are even numbers) [46]. S3 and S4 use max pooling that only retains regionally maximum values. This results in loss of information important for small pest targets.
In contrast, S5 employs average pooling. This utilizes all feature elements in each pooling window rather than selective sampling. Compared to other methods, this design reduces information loss from previous layer features. This is beneficial for small pest targets with limited pixel coverage, enabling the module to perform better residual learning.

4.1.3. Incorporating Ghost Module into Detection Head

Some pest detection works adopt GhostNet as the backbone. Qi et al. [47] proposed BF-YOLO, evaluated on the same Pest24 dataset as our study. This model used YOLOv5 as the baseline and adopted GhostNet as the lightweight backbone. In their backbone comparison experiments, GhostNet achieved a 57.2% reduction in GFLOPs compared to the original CSPDarknet-53 but at the cost of a 2.7% mAP50 decrease. Similarly, Xiao et al. [3] also incorporated GhostNet in YOLOv5 for agricultural pest detection. In their ablation studies, introducing GhostNet resulted in a 48.7% reduction in GFLOPs but caused a 4.1% drop in mAP. These studies demonstrate that generating feature maps through cheap operations can provide substantial computational efficiency gains, but directly introducing GhostNet as the backbone may lead to significant detection performance loss.
Considering these factors, we only introduce the Ghost module from GhostNet into the detection head. Unlike YOLOv5’s coupled detection head, YOLOv8’s detection head adopts a decoupled structure, which brings substantial parameters and computational overhead. We believe that applying the Ghost module in this computationally intensive region can also leverage the advantages of cheap operations. Our ablation experiments (Table 5) also demonstrate that introducing the Ghost module reduces GFLOPs by 20.6% while only decreasing mAP50 by 0.3%. Compared to directly replacing the backbone, our approach of incorporating the Ghost module in the detection head achieves a better trade-off between accuracy and efficiency.

4.1.4. WIoU Loss Function Version Selection

Since the CIoU [39] cannot dynamically allocate gradients according to anchor box quality, we adopt WIoUv3 [35] as the bounding box loss function. To demonstrate the superiority of WIoUv3, we compare three versions of WIoU.
As shown in Table 9, WIoUv1 has relatively limited detection performance. This is because WIoUv1 only uses an attention mechanism to adjust the loss, lacking an effective focusing mechanism to handle anchor boxes of different qualities. Specifically, WIoUv1 achieves 63.0% mAP50 and 38.6% mAP50-95, and the overall detection performance remains suboptimal. WIoUv2 adds a monotonic focusing mechanism, but it tends to excessively focus on low-quality anchor boxes. This potentially leads to harmful gradient amplification for low-quality anchor boxes, thus resulting in inconsistent performance across different IoU thresholds. The WIoUv2 mAP50 drops to 62.7%, a decrease of 0.3 percentage points compared to WIoUv1, while its mAP50-95 improves slightly to 38.8%, representing a 0.2 percentage point increase. This suggests that the monotonic focusing mechanism may benefit stricter IoU evaluations but harms performance at the commonly used 0.5 IoU threshold. The dynamic non-monotonic focusing mechanism of WIoUv3 prevents excessive focus on low-quality anchor boxes while reducing the competitiveness of high-quality anchor boxes. This enables the model to better optimize ordinary-quality anchor boxes and achieve overall detection performance improvement. WIoUv3 achieves 63.9% mAP50 and 39.1% mAP50-95, outperforming both WIoUv1 and WIoUv2.

4.2. Research Limitations and Future Work

In the comparative experiments, YOLO-LCE achieves the highest mAP50 of 63.9%. However, several limitations need to be addressed in future research.
The YOLOv8 detection head adopts a decoupled structure with separate branches for classification and regression tasks. In this study, the Ghost module is applied to both branches simultaneously, but its effectiveness when applied to individual branches remains unknown. Future research will investigate the performance impact of introducing the Ghost module to single branches, which may provide insights for more targeted optimization strategies in detection head design.
The Pest24 dataset we used contains only 24 pest categories, which is 14 categories fewer than the 38 pest types required by the Ministry of Agriculture of China. Future work will focus on collecting and annotating the remaining 14 pest categories as supplementary data. This expansion would enhance the model’s practical applicability in comprehensive pest-monitoring systems.
The dataset exhibits severe class imbalance, with some pest categories having very few instances that are difficult to detect accurately due to insufficient training samples. Future research could incorporate focal loss as the classification loss function to better handle class imbalance, following the approach of AgriPest-YOLO [28]. More importantly, few-shot learning techniques should be explored to enable effective detection with limited training samples.
While our model demonstrates effectiveness on the broad categorical coverage of Pest24, its performance on domain-specific pest detection tasks and larger-scale datasets remains unknown. Future work will evaluate the model using domain-specific pest datasets or larger-scale comprehensive pest datasets for generalization assessment.
Currently, this research remains theoretical without deployment validation on embedded devices. Future work will prioritize deploying the model on resource-constrained hardware platforms such as Jetson Nano using quantization techniques and inference acceleration frameworks like TensorRT. The lightweight nature of YOLO-LCE makes it suitable for integration into pest-monitoring devices that capture pests through various methods (such as light traps, sticky boards, or attractant lures), perform local inference, and transmit results to cloud platforms.
The ultimate validation requires deployment in real field environments to assess practical effectiveness under varying environmental conditions, lighting scenarios, and field deployment challenges.

5. Conclusions

Some existing pest detection models face challenges of limited pest species coverage and high computational complexity that make them unsuitable for deployment in resource-constrained environments. This study proposes YOLO-LCE, a lightweight detection model for agricultural pest detection. Evaluation outcomes on the Pest24 dataset containing 24 comprehensive agricultural pest categories demonstrate that YOLO-LCE attains 63.9% mAP50 and 39.1% mAP50-95. Compared to the YOLOv8n baseline, YOLO-LCE achieves mAP50 improvement of 1.7 percentage points, mAP50-95 improvement of 0.4 percentage points, and precision improvement of 0.5 percentage points. For computational efficiency, the model reduces parameters by 43.9% to 1.69 M and decreases computational cost by 33.3% to 5.4 GFLOPs. This lightweight characteristic makes it suitable for deployment in resource-constrained environments. Future work will evaluate the model’s generalization performance on domain-specific pest datasets or larger-scale datasets. Furthermore, the model’s performance will be validated on embedded devices and deployed in real field environments to test its practical effectiveness.

Author Contributions

X.C. conducted the experimental design, verified all experiments, and wrote the initial draft of the manuscript. S.L. contributed to the conceptualization design, provided supervision and guidance throughout the research process, and reviewed and edited the manuscript. T.Q. contributed to methodology refinement, revised the manuscript and provided funding. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Shanghai Agriculture Applied Technology Development Program, China (Grant No.2023-02-08-00-12-F04621); Shanghai Science and Technology Committee Program (Grant No. 21N21900700).

Data Availability Statement

This study uses the Pest24 dataset, which is publicly available at https://www.kaggle.com/datasets/boatshuai/pest24 (accessed on 13 August 2025). The train/validation/test split files created for this research are publicly available at https://github.com/abcxiaoq666/YOLO-LCE-Dataset-Split (accessed on 13 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this study:
IPMIntegrated Pest Management
SVMSupport Vector Machine
CNNConvolutional Neural Network
R-CNNRegion-based Convolutional Network
SSDSingle Shot Multibox Detector
YOLOYou Only Look Once
PConvPartial Convolution
SESqueeze-and-Excitation
WIoUWise-IoU
SPPFSpatial Pyramid Pooling Fast
PANetPath Aggregation Network
LCRLightweight Complementary Residual
EPConvEfficient Partial Convolution
BNBatch Normalization
ResNetResidual Network
CIoUComplete-IoU
IoUIntersection over Union
TPTrue Positive
FPFalse Positive
FNFalse Negative
APAverage Precision
mAPmean Average Precision
GFLOPsGiga Floating Point Operations
SGDStochastic Gradient Descent
CBAMConvolutional Block Attention Module
iResNetimproved Residual Network
YOLO-LCEYOLO (Lightweight, Complementary, and Efficient)
FPNFeature Pyramid Network
BiFPNBi-directional Feature Pyramid Network

References

  1. Dara, S.K. The New Integrated Pest Management Paradigm for the Modern Age. J. Integr. Pest Manage. 2019, 10, 12. [Google Scholar] [CrossRef]
  2. Guo, Q.; Wang, C.; Xiao, D.; Huang, Q. Automatic monitoring of flying vegetable insect pests using an RGB camera and YOLO-SIP detector. Precis. Agric. 2023, 24, 436–457. [Google Scholar] [CrossRef]
  3. Xiao, Q.; Zheng, W.; He, Y.; Chen, Z.; Meng, F.; Wu, L. Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm. Agriculture 2023, 13, 1878. [Google Scholar] [CrossRef]
  4. Gao, Y.; Yin, F.; Hong, C.; Chen, X.; Deng, H.; Liu, Y.; Li, Z.; Yao, Q. Intelligent field monitoring system for cruciferous vegetable pests using yellow sticky board images and an improved Cascade R-CNN. J. Integr. Agric. 2025, 24, 220–234. [Google Scholar] [CrossRef]
  5. Ebrahimi, M.A.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
  6. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
  7. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  8. Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
  9. He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
  10. Jiao, L.; Dong, S.; Zhang, S.; Xie, C.; Wang, H. AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agric. 2020, 174, 105522. [Google Scholar] [CrossRef]
  11. Wang, R.; Jiao, L.; Xie, C.; Chen, P.; Du, J.; Li, R. S-RPN: Sampling-balanced region proposal network for small crop pest detection. Comput. Electron. Agric. 2021, 187, 106290. [Google Scholar] [CrossRef]
  12. Teng, Y.; Zhang, J.; Dong, S.; Zheng, S.; Liu, L. MSR-RCNN: A Multi-Class Crop Pest Detection Network Based on a Multi-Scale Super-Resolution Feature Enhancement Module. Front. Plant Sci. 2022, 13, 810546. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
  14. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
  15. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar] [CrossRef]
  16. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
  17. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
  18. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
  19. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
  20. Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer International Publishing: Cham, Switzerland, 2025; pp. 1–21. [Google Scholar] [CrossRef]
  21. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
  22. Li, B.; Liu, L.; Jia, H.; Zang, Z.; Fu, Z.; Xi, J. YOLO-TP: A lightweight model for individual counting of Lasioderma serricorne. J. Stored Prod. Res. 2024, 109, 102456. [Google Scholar] [CrossRef]
  23. Li, D.; Ahmed, F.; Wu, N.; Sethi, A.I. YOLO-JD: A Deep Learning Network for Jute Diseases and Pests Detection from Images. Plants 2022, 11, 937. [Google Scholar] [CrossRef]
  24. Lyu, Z.; Jin, H.; Zhen, T.; Sun, F.; Xu, H. Small Object Recognition Algorithm of Grain Pests Based on SSD Feature Fusion. IEEE Access 2021, 9, 43202–43213. [Google Scholar] [CrossRef]
  25. Wang, N.; Fu, S.; Rao, Q.; Zhang, G.; Ding, M. Insect-YOLO: A new method of crop insect detection. Comput. Electron. Agric. 2025, 232, 110085. [Google Scholar] [CrossRef]
  26. Wang, J.; Wang, J. A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection. J. Real-Time Image Process. 2024, 21, 136. [Google Scholar] [CrossRef]
  27. Zhao, C.; Bai, C.; Yan, L.; Xiong, H.; Suthisut, D.; Pobsuk, P.; Wang, D. AC-YOLO: Multi-category and high-precision detection model for stored grain pests based on integrated multiple attention mechanisms. Expert Syst. Appl. 2024, 255, 124659. [Google Scholar] [CrossRef]
  28. Zhang, W.; Huang, H.; Sun, Y.; Wu, X. AgriPest-YOLO: A rapid light-trap agricultural pest detection method based on deep learning. Front. Plant Sci. 2022, 13, 1079384. [Google Scholar] [CrossRef]
  29. Tian, Y.; Wang, S.; Li, E.; Yang, G.; Liang, Z.; Tan, M. MD-YOLO: Multi-scale Dense YOLO for small target pest detection. Comput. Electron. Agric. 2023, 213, 108233. [Google Scholar] [CrossRef]
  30. Dai, M.; Dorjoy, M.M.H.; Miao, H.; Zhang, S. A New Pest Detection Method Based on Improved YOLOv5m. Insects 2023, 14, 54. [Google Scholar] [CrossRef] [PubMed]
  31. Chen, J.; Kao, S.h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
  32. He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of Tricks for Image Classification with Convolutional Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar] [CrossRef]
  33. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
  34. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
  35. Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
  36. Wang, Q.J.; Zhang, S.Y.; Dong, S.F.; Zhang, G.C.; Yang, J.; Li, R.; Wang, H.Q. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Comput. Electron. Agric. 2020, 175, 105585. [Google Scholar] [CrossRef]
  37. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
  38. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  39. Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef]
  40. Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  41. Padilla, R.; Netto, S.L.; da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar] [CrossRef]
  42. Mirzaei, B.; Nezamabadi-pour, H.; Raoof, A.; Derakhshani, R. Small Object Detection and Tracking: A Comprehensive Review. Sens. 2023, 23, 6887. [Google Scholar] [CrossRef]
  43. Chen, H.; Su, L.; Shu, R.; Yin, F. EMB-YOLO: A Lightweight Object Detection Algorithm for Isolation Switch State Detection. Appl. Sci. 2024, 14, 9779. [Google Scholar] [CrossRef]
  44. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
  45. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  46. Duta, I.C.; Liu, L.; Zhu, F.; Shao, L. Improved Residual Networks for Image and Video Recognition. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 9415–9422. [Google Scholar] [CrossRef]
  47. Qi, F.; Wang, Y.; Tang, Z.; Chen, S. Real-time and effective detection of agricultural pest using an improved YOLOv5 network. J. Real-Time Image Process. 2023, 20, 33. [Google Scholar] [CrossRef]
Figure 1. Pest instance count distribution across categories (ID represents the pest category identifier).
Figure 1. Pest instance count distribution across categories (ID represents the pest category identifier).
Agronomy 15 02022 g001
Figure 2. Sample images from the Pest24 dataset showing detection challenges. Yellow boxes represent ground truth annotations with category identifiers as labels. (a) The red circles mark some ground truth annotations with slight inaccuracies. (b) The blue circles mark two similar pests, and the green circle marks an area where the pest target is affected by light reflection.
Figure 2. Sample images from the Pest24 dataset showing detection challenges. Yellow boxes represent ground truth annotations with category identifiers as labels. (a) The red circles mark some ground truth annotations with slight inaccuracies. (b) The blue circles mark two similar pests, and the green circle marks an area where the pest target is affected by light reflection.
Agronomy 15 02022 g002
Figure 3. YOLO-LCE network architecture.
Figure 3. YOLO-LCE network architecture.
Agronomy 15 02022 g003
Figure 4. LCR module structure.
Figure 4. LCR module structure.
Agronomy 15 02022 g004
Figure 5. C2f-LCR structure. This module can be configured with one or more LCR modules. The ellipses in the figure represent omitted LCR modules.
Figure 5. C2f-LCR structure. This module can be configured with one or more LCR modules. The ellipses in the figure represent omitted LCR modules.
Agronomy 15 02022 g005
Figure 6. EPConv architecture.
Figure 6. EPConv architecture.
Agronomy 15 02022 g006
Figure 7. Improved detection head structure.
Figure 7. Improved detection head structure.
Agronomy 15 02022 g007
Figure 8. Ghost module structure.
Figure 8. Ghost module structure.
Agronomy 15 02022 g008
Figure 9. Illustration of predicted and ground truth bounding boxes.
Figure 9. Illustration of predicted and ground truth bounding boxes.
Agronomy 15 02022 g009
Figure 10. Visualization comparison of pest detection results. (a) Ground Truth: Yellow bounding boxes indicate pest target locations with category identifiers as labels; (b) YOLOv8n detection results; (c) YOLOv8n enhanced with C2f-LCR detection results. Color coding: Green boxes represent correct detections, red boxes represent missed targets, purple boxes represent detections with incorrect classification, and blue boxes represent false detections.
Figure 10. Visualization comparison of pest detection results. (a) Ground Truth: Yellow bounding boxes indicate pest target locations with category identifiers as labels; (b) YOLOv8n detection results; (c) YOLOv8n enhanced with C2f-LCR detection results. Color coding: Green boxes represent correct detections, red boxes represent missed targets, purple boxes represent detections with incorrect classification, and blue boxes represent false detections.
Agronomy 15 02022 g010
Figure 11. Model heatmap comparison. (a) Ground Truth: Yellow bounding boxes indicate true pest target locations; (b) YOLOv8n heatmap; (c) YOLO-LCE heatmap.
Figure 11. Model heatmap comparison. (a) Ground Truth: Yellow bounding boxes indicate true pest target locations; (b) YOLOv8n heatmap; (c) YOLO-LCE heatmap.
Agronomy 15 02022 g011
Table 1. Pest category identifiers and names in the Pest24 dataset [36].
Table 1. Pest category identifiers and names in the Pest24 dataset [36].
IDCategory NameIDCategory Name
0Bollworm12Plutella xylostella
1Meadow borer13Holotrichia parallela
2Gryllotalpa orientalis14Rice planthopper
3Little Gecko15Yellow tiger
4Agriotes fuscicollis Miwa16Land tiger
5Nematode trench17Eight-character tiger
6Athetis lepigone18Holotrichia oblita
7Scotogramma trifolii Rottemberg19Stem borer
8Armyworm20Striped rice borer
9Spodoptera cabbage21Rice Leaf Roller
10Anomala corpulenta22Spodoptera litura
11Spodoptera exigua23Melahotus
Table 2. Training hyperparameter configuration comparison.
Table 2. Training hyperparameter configuration comparison.
ParameterFaster R-CNNYOLOv6nOther YOLO Series
Image Size640 × 640640 × 640640 × 640
Batch Size163232
Epochs100100100
Initial learning rate0.020.020.01
OptimizerSGDSGDSGD
Momentum0.90.9370.937
Weight Decay0.00010.00050.0005
Table 3. Data augmentation settings enabled for YOLO-LCE training.
Table 3. Data augmentation settings enabled for YOLO-LCE training.
CategoryAugmentation TechniqueParameter
Color SpaceHSV-Hue (HSV-H)±0.015 (fraction)
HSV-Saturation (HSV-S)±0.7 (fraction)
HSV-Value (HSV-V)±0.4 (fraction)
GeometricTranslation±0.1 (fraction)
Scaling[0.5, 1.5] (factor)
Horizontal Flip0.5 (probability)
Mosaic1.0 (probability)
Table 4. Comparison of experimental results with other object detection models. Bold values indicate the best value for the corresponding metric.
Table 4. Comparison of experimental results with other object detection models. Bold values indicate the best value for the corresponding metric.
ModelmAP50mAP50-95PrecisionRecallParamsModel SizeGFLOPs
Faster R-CNN43.2%24.0%55.5%44.2%41.47 M104.93 MB91.0
YOLOv4-tiny55.1%31.8%35.3%65.5%5.93 M11.91 MB16.3
YOLOv5n57.8%32.3%76.2%52.7%1.79 M3.92 MB4.2
YOLOv6n57.5%34.7%65.7%61.0%4.63 M10.47 MB11.4
YOLOv7-tiny59.2%34.2%73.3%56.1%6.07 M12.39 MB13.2
YOLOv8n62.2%38.7%68.8%60.3%3.01 M6.27 MB8.1
YOLOv10n61.8%38.1%67.8%60.0%2.70 M5.78 MB8.3
YOLOv11n62.2%38.5%67.1%60.0%2.59 M5.48 MB6.3
YOLO-LCE63.9%39.1%69.3%60.3%1.69 M3.69 MB5.4
Table 5. Ablation experiment results comparison.
Table 5. Ablation experiment results comparison.
YOLOv8nC2f-LCREPConvGhost ModuleWIoUv3mAP50mAP50-95PrecisionRecallParamsGFLOPs
62.2%38.7%68.8%60.3%3.01 M8.1
63.0%38.7%75.6%59.1%2.41 M7.2
63.1%38.8%74.1%60.1%2.05 M6.8
62.8%38.8%76.3%58.4%1.69 M5.4
63.9%39.1%69.3%60.3%1.69 M5.4
The ✓ symbol indicates that the corresponding component is included in the model configuration.
Table 6. Per-class AP50 comparison between YOLOv8n and YOLO-LCE (ranked by improvement). Test instances represent the number of instances for each category in the test set.
Table 6. Per-class AP50 comparison between YOLOv8n and YOLO-LCE (ranked by improvement). Test instances represent the number of instances for each category in the test set.
Pest CategoryTest InstancesYOLOv8n AP50YOLO-LCE AP50Improvement
Holotrichia oblita2948.3%61.4%+13.1%
Nematode trench3055.9%64.3%+8.4%
Rice Leaf Roller24348.7%53.9%+5.2%
Stem borer38469.2%73.9%+4.7%
Spodoptera cabbage43653.0%56.4%+3.4%
Striped rice borer24266.7%69.6%+2.9%
Scotogramma trifolii Rottemberg94245.0%47.4%+2.4%
Yellow tiger30450.7%53.1%+2.4%
Spodoptera exigua140849.7%51.8%+2.1%
Spodoptera litura36675.2%76.9%+1.7%
Meadow borer327875.6%77.2%+1.6%
Armyworm161575.9%76.8%+0.9%
Land tiger10173.2%74.1%+0.9%
Little Gecko76285.3%86.1%+0.8%
Bollworm549689.1%89.7%+0.6%
Athetis lepigone600069.9%70.5%+0.6%
Gryllotalpa orientalis132297.7%97.9%+0.2%
Rice planthopper2931.35%1.49%+0.14%
Plutella xylostella1824.28%4.41%+0.13%
Anomala corpulenta10,53397.2%97.2%0.0%
Holotrichia parallela240690.7%90.4%−0.3%
Agriotes fuscicollis Miwa129879.2%78.7%−0.5%
Melahotus11473.0%71.7%−1.3%
Eight-character tiger3017.5%8.99%−8.51%
Table 7. Comparison of EPConv with different channel splitting ratios.
Table 7. Comparison of EPConv with different channel splitting ratios.
MethodPrecisionRecallmAP50mAP50-95Params
YOLOv8n + C2f-LCR + EPConv (1:1)71.1%60.1%62.7%38.5%2.12 M
YOLOv8n + C2f-LCR + EPConv (1:3)71.6%57.2%62.5%38.3%2.07 M
YOLOv8n + C2f-LCR + EPConv (1:7)74.1%60.1%63.1%38.8%2.05 M
Table 8. Performance comparison of different shortcut connection strategies in EPConv.
Table 8. Performance comparison of different shortcut connection strategies in EPConv.
MethodmAP50
YOLOv8n + C2f-LCR + EPConv (S1)62.0%
YOLOv8n + C2f-LCR + EPConv (S2)62.6%
YOLOv8n + C2f-LCR + EPConv (S3)62.8%
YOLOv8n + C2f-LCR + EPConv (S4)62.4%
YOLOv8n + C2f-LCR + EPConv (S5)63.1%
Table 9. Performance comparison of different WIoU versions.
Table 9. Performance comparison of different WIoU versions.
Loss FunctionmAP50mAP50-95
WIoUv163.0%38.6%
WIoUv262.7%38.8%
WIoUv363.9%39.1%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cen, X.; Lu, S.; Qian, T. YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection. Agronomy 2025, 15, 2022. https://doi.org/10.3390/agronomy15092022

AMA Style

Cen X, Lu S, Qian T. YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection. Agronomy. 2025; 15(9):2022. https://doi.org/10.3390/agronomy15092022

Chicago/Turabian Style

Cen, Xinyu, Shenglian Lu, and Tingting Qian. 2025. "YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection" Agronomy 15, no. 9: 2022. https://doi.org/10.3390/agronomy15092022

APA Style

Cen, X., Lu, S., & Qian, T. (2025). YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection. Agronomy, 15(9), 2022. https://doi.org/10.3390/agronomy15092022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop