YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments

Yang, Xiang; He, Qi; Xie, Xiaolan; Dong, Minggang

doi:10.3390/sym17101598

Open AccessArticle

YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments

by

Xiang Yang

^1,2,

Qi He

^1,2,*,

Xiaolan Xie

^1,2,* and

Minggang Dong

³

¹

College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China

²

Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin 541004, China

³

College of Physics and Electronic Information Engineering, Guilin University of Technology, Guilin 541004, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1598; https://doi.org/10.3390/sym17101598

Submission received: 18 August 2025 / Revised: 13 September 2025 / Accepted: 15 September 2025 / Published: 25 September 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Accurate and efficient pest monitoring in complex rice field environments is vital for food security. Existing detection methods often struggle with small targets and high computational redundancy, limiting deployment on resource-constrained edge devices. To address these issues, we propose YOLO-RP, a lightweight and efficient rice pest detection method based on YOLO11n. YOLO-RP reduces model complexity while maintaining detection accuracy. The model removes the redundant P5 detection head and introduces a high-resolution P2 head to enhance small-object detection. A lightweight partial convolution detection head (LPCHead) decouples task branches and shares feature extraction, reducing redundancy and boosting performance. The re-parameterizable DBELCSP module strengthens feature representation and robustness while cutting parameters and computation. Wavelet pooling preserves essential edge and texture information during downsampling, improving accuracy under complex backgrounds. Experiments show that YOLO-RP achieves a precision of 90.62%, recall of 87.38%, mAP@0.5 of 90.99%, and mAP@0.5:0.95 of 63.84%, while reducing parameters, GFLOPs, and model size by 61.3%, 50.8%, and 49.1% to 1.00 M, 3.1, and 2.8 MB. Cross-dataset tests on Common Rice Pests (Philippines), IP102, and Pest24 confirm strong robustness and generalization. On NVIDIA Jetson Nano, YOLO-RP attains 20.8 FPS—66.4% faster than the baseline—validating its potential for edge deployment. These results indicate that YOLO-RP provides an effective solution for real-time rice pest detection in complex, resource-limited environments.

Keywords:

rice pests; object detection; lightweight; YOLO11n

1. Introduction

As the staple food for more than half of the world’s population, rice plays a vital role in ensuring global food security and maintaining regional socio-economic stability. However, throughout its growth cycle, rice is highly susceptible to various pest infestations, which can result in significant yield losses, quality degradation, and substantial economic damage. Traditionally, rice pest detection has relied heavily on manual field inspection, which is time-consuming, labor-intensive, and highly dependent on the experience of individual inspectors. Such methods are inadequate for large-scale, real-time monitoring and precision management in modern agricultural production. Therefore, developing an automated rice pest detection method that is efficient, accurate, and deployable in real-world field environments is of great practical significance for ensuring the safe and sustainable production of rice.

Early studies primarily employed traditional machine learning methods that combined low-level visual features—such as color, texture, and shape—with classifiers such as Support Vector Machine (SVM) [1] and AdaBoost [2] for pest identification. For instance, Sun et al. [3] utilized Simple Linear Iterative Clustering (SLIC) in conjunction with an SVM classifier to detect tea leaf diseases, achieving 98.5% accuracy on a dataset of 261 images. Sahu et al. [4] extracted texture and color features and employed SVM for plant leaf disease classification, achieving over 95% accuracy. Qing et al. [5] adopted Histogram of Oriented Gradients (HOG) features combined with an AdaBoost-SVM classifier to detect white-backed planthoppers, yielding 73.1% accuracy. Pantazi et al. [6] integrated Local Binary Patterns (LBP) with machine learning classifiers to identify grape leaf diseases, attaining 95% accuracy. Although these methods are lightweight and relatively easy to implement, they suffer from poor generalization and are inadequate for handling challenges such as pest occlusion, morphological variation, and complex field backgrounds, thereby limiting their scalability and practical applicability in real-world agricultural scenarios.

With the rapid development of deep learning, convolutional neural network (CNN)-based object detection frameworks have been extensively employed in agricultural pest detection tasks. Among them, two-stage detectors such as Faster R-CNN have demonstrated strong capabilities in feature representation and object localization, achieving promising results in multi-class pest detection scenarios. For example, Zhang et al. [7] developed a rice leaf disease detection model based on Faster R-CNN, achieving 99% accuracy in identifying common rice diseases such as bacterial blight, rice blast, and cowpea disease. By leveraging a ResNet backbone, the detection accuracy for brown spot and leaf blast was further improved to 99.87%. Guan et al. [8] proposed an improved Mask R-CNN framework with enhanced feature extraction mechanisms to tackle challenges posed by complex field environments, significantly improving disease recognition performance. GC-Faster R-CNN integrated global contextual information, resulting in improvements of 4.5% and 16.6% in precision and recall, respectively. Ali et al. [9] further integrated a lightweight MobileNet backbone into the Faster R-CNN framework to build a model tailored for small pest detection, thereby reducing computational overhead. Wang et al. [10] introduced MPest-RCNN, which adopted small anchor box designs to enhance the detection and counting of apple pests across varying densities and object scales. While these approaches have significantly improved pest feature modeling and detection accuracy, two-stage detectors inherently suffer from slower inference speed and larger model complexity, limiting their applicability for real-time deployment on edge devices.

With the growing demand for efficient and deployable solutions in the agricultural domain, single-stage object detectors such as SSD and the YOLO family have garnered increasing attention due to their faster inference speeds and lower computational complexity. For example, Haruna et al. [11] developed a rice disease recognition system based on SSD, achieving a mean average precision (mAP) of 91% in detecting rice blast and brown spot. Yin et al. [12] proposed JujubeSSD, a specialized detector for jujube leaf spot, which achieved an mAP of 97.1%—approximately 6.35 percentage points higher than the baseline—although it suffered from relatively long inference times per image. Lyu et al. [13] enhanced the SSD architecture by introducing a top-down feature fusion strategy and K-means clustering, along with custom prior boxes and multi-scale feature aggregation, thereby improving small pest detection under complex agricultural environments. Hu et al. [14] proposed YOLO-GBS, which integrates global context attention, a bidirectional feature pyramid, and a Swin Transformer module into YOLOv5, along with additional detection heads to enlarge the receptive field. Nevertheless, the model achieved only 79.8% mAP on rice pest detection. Yang et al. [15] proposed YOLOv7-tiny for tea pest detection by incorporating deformable convolution and Soft-NMS, achieving a detection accuracy of 93.2%, while the parameters reached 26.4 M. Similarly, Maize-YOLO [16], tailored for corn pest detection, achieved 76.3% mAP on the IP13 dataset but required 38.9 GFLOPs and 33.4 M parameters. CLT-YOLOX [17] enhanced YOLOX using a cross-layer Transformer module, reaching an AP@50 of 57.7%, yet with 35.35 GFLOPs and 10 M parameters. Wei et al. [18] proposed AEC-YOLOv8n to improve multi-target detection in cluttered backgrounds. Although it outperformed the original YOLOv8n, its model size increased to 7.8 MB with 3.69 M parameters. Zhu et al. [19] introduced CBF-YOLO based on YOLOv7 for real-world soybean pest detection, achieving 86.9% mAP at the cost of 70.1 M parameters and 188 GFLOPs. Fang et al. [20] developed RLDD-YOLO11n by integrating the SCSA residual attention mechanism and the CARAFE upsampling module. This model achieved an average precision of 88.3%, improving by 2.8% over the baseline, albeit with increased model size and computational overhead. Yin et al. [21] proposed YOLO-RMD, which combines receptive field attention and dynamic multi-scale detection heads, achieving 98.2% accuracy across seven crop types. However, this improvement came at the cost of more than a twofold increase in model size and over a sixfold increase in computation. Huang et al. [22] introduced YOLO-YSTs, a compact model based on YOLOv10n for sticky trap pest detection. While it achieved 86.8% mAP@0.5, it still required 3 M parameters and 8.8 GFLOPs, suggesting room for further optimization.

Despite significant advancements in detection accuracy, feature representation, and multi-scale fusion, existing methods generally rely on highly complex architectures to achieve high precision. This results in substantial computational overhead, making it difficult to balance accuracy with real-time performance, and thereby limiting lightweight deployment in resource-constrained agricultural scenarios. Moreover, rice pests are characterized by tiny size, high-density aggregation, complex background interference, and morphological diversity, further increasing detection difficulty and imposing higher requirements on the anti-interference capability and generalization of detection models.

To address these challenges, this study presents YOLO-RP, a lightweight and efficient variant of YOLO11n tailored for rice pest detection. YOLO-RP incorporates several targeted architectural refinements—namely P2 head, LPCHead, DBELCSP, and WaveletPool—that collectively enhance small-object detection accuracy while reducing computational overhead, making it more suitable for deployment on resource-constrained devices. Table 1 summarizes representative YOLO-based pest detection models, including their performance metrics, parameters, GFLOPs, and datasets. Detailed innovations of each model are described in the footnotes.

The main contributions of this work are summarized as follows:

A lightweight and efficient YOLO11n-based detection framework, named YOLO-RP, is presented, specifically tailored for small-object rice pest detection in complex field environments. The model integrates multiple modular improvements to reduce computational complexity while maintaining high detection accuracy, providing an effective solution for deployment in resource-constrained agricultural scenarios.
A lightweight partial convolutional detection head (LPCHead) is designed, which reduces parameter redundancy and computational overhead by decoupling task-specific branches while sharing a common feature extraction module. This design not only enhances inference efficiency but also improves detection performance. Additionally, the original high-cost P5 detection layer is removed, and a high-resolution P2 detection head is introduced to strengthen the model’s ability to accurately detect and localize small pests.
A re-parameterizable multi-branch feature extraction module (DBELCSP) is proposed, which enhances feature representation capability and robustness in small-object detection, while simultaneously reducing the number of parameters and computational cost. Furthermore, a structure-aware wavelet pooling mechanism is incorporated to preserve edge and texture features during downsampling, thereby improving detection accuracy and stability in complex background scenarios.

2. Materials and Methods

Figure 1 illustrates the overall experimental workflow of this study, comprising five major stages: dataset preparation, data processing, YOLO-RP model construction, experimental setup, and performance evaluation. In the dataset preparation stage, representative images of rice pests were selected and curated from the publicly available IP102 dataset to construct training and testing samples. During the data processing stage, various data augmentation strategies—including cropping, noise injection, blurring, color jittering, horizontal and vertical flipping, contrast adjustment, rotation, saturation adjustment, and brightness variation—were applied to increase data diversity and enhance the model’s generalization capability. In the model construction stage, the YOLO-RP model was developed based on the YOLO11n framework, integrating the P2 detection head, LPCHead, DBELCSP, and WaveletPool modules to optimize feature representation, improve small object detection accuracy, and reduce computational complexity. In the experimental setup stage, the model was trained and tested under a defined hardware environment and specific hyperparameter configurations. Finally, model performance was quantitatively evaluated using metrics such as Precision, Recall, and mAP, and was further validated through ablation studies, comparative experiments, generalization tests, and deployment experiments, comprehensively demonstrating the effectiveness and robustness of YOLO-RP for rice pest detection tasks.

2.1. Datasets

This study employs a curated subset of the publicly available IP102 dataset [23], a large-scale benchmark for crop pest identification that contains 102 pest species and 75,222 annotated images. The dataset spans multiple developmental stages (egg, larva, pupa, adult) and poses significant challenges due to intra-class variation, inter-class similarity, and complex backgrounds, making it highly suitable for evaluating object detection algorithms.

For this study, we extracted rice-specific pest data from IP102, initially selecting 14 categories relevant to rice cultivation, resulting in 8416 images. To improve label accuracy and overall data quality, we performed manual data cleaning based on explicit rules. Specifically, 804 duplicate images were removed, identified as visually identical or nearly identical samples. In addition, 63 images with mislabeled instances or missing bounding boxes were re-annotated using the LabelImg (version 1.8.6) tool, strictly following the official IP102 category definitions. All re-annotations were independently verified by two researchers to ensure consistency. After refinement, the final dataset consists of 7612 images covering 12 rice pest categories, as illustrated in Figure 2.

2.2. Data Processing

To enhance the model’s generalization and robustness in complex field environments, we designed a series of data processing strategies to simulate diverse and challenging pest detection scenarios. Specifically, composite images were constructed by arranging original pest samples into 3 × 3 grid layouts, thereby generating synthetic scenes containing multiple targets within a single image. This strategy simulates realistic agricultural conditions, including the coexistence of multiple pest species, densely packed targets, and strong background interference within a single field of view.

Subsequently, a series of offline data augmentation techniques was applied to further increase dataset diversity. These techniques included cropping, noise injection, blurring, color jittering, horizontal flipping, vertical flipping, contrast adjustment, rotation, saturation adjustment, and brightness variation, as illustrated in Figure 3. Among these, horizontal and vertical flipping exploit the natural symmetry of many pest species, helping the model capture invariant morphological features across different orientations. Together with other augmentations, these strategies simulate practical challenges commonly encountered in field pest detection, such as motion blur from capture devices, illumination variation under natural conditions, geometric deformation from diverse viewing angles, and background clutter caused by inconsistent image quality.

These processing strategies expose the model to a wide range of spatial distributions and visual variations, including orientation changes that can be addressed by leveraging the natural symmetry of many pest species through flipping. By maintaining key morphological information while increasing robustness to different poses and perspectives, these augmentations collectively facilitate the learning of more robust and generalizable feature representations. Following augmentation, the dataset was expanded to 10,998 images. The dataset was then split into training (8788 images), validation (1105 images), and testing (1105 images) sets using an 8:1:1 ratio, ensuring sufficient diversity for reliable training and evaluation. The detailed distribution of pest categories is presented in Table 2.

2.3. YOLO-RP Model

YOLO11 is a recently released object detection framework developed by Ultralytics. It adopts a classical three-stage architecture, comprising a backbone, neck, and detection head. The backbone integrates a Cross-Stage Convolutional module (C3K2), a Channel-Spatial Attention mechanism (C2PSA), and a fast Spatial Pyramid Pooling module (SPPF) to enhance feature representation and enlarge the receptive field. The neck is constructed based on the Path Aggregation Network (PAN) for multi-scale semantic fusion, while the detection head employs a decoupled design with independent classification and regression branches, combined with a dual-positive sample assignment strategy and an improved loss function, thereby improving localization accuracy and classification stability.

The YOLO11 series provides five model variants—YOLO11n (nano), YOLO11s (small), YOLO11m (medium), YOLO11l (large), and YOLO11x (extra-large)—which share a unified architectural design but differ in network depth, width, and parameter scale to meet various computational constraints and application requirements. Among them, YOLO11n exhibits the smallest number of parameters and the lowest computational complexity while maintaining relatively high accuracy, making it the fastest variant in terms of inference speed. Therefore, this study selects YOLO11n as the baseline and proposes a systematically optimized model, YOLO-RP, focusing on enhancing small-object detection accuracy, improving overall detection performance, and further reducing model size for efficient deployment in resource-constrained agricultural environments.

Several structural improvements are introduced in YOLO-RP. First, the original P5 detection head is removed to reduce computational load, and a high-resolution P2 detection head is incorporated to improve the model’s localization precision and its ability to detect small objects. Second, we design a lightweight partial convolutional head (LPCHead), which reduces parameter redundancy and computational overhead by decoupling task branches while sharing the feature extraction module, thereby improving detection performance. In addition, the original C3K2 module in the backbone is replaced with a re-parameterizable multi-branch module, DBELCSP, which enhances feature extraction capabilities and improves robustness and sensitivity to small objects, while further reducing the number of parameters and computational cost. Finally, to better address the loss of spatial detail during downsampling, a structure-aware wavelet pooling module is introduced. By leveraging frequency-domain analysis, it preserves edge and texture information, thereby improving detection accuracy and robustness under complex backgrounds.

The overall architecture of YOLO-RP is illustrated in Figure 4.

2.3.1. Reconstruction of Detection Layer

The original YOLO11n adopts a multi-scale detection framework with detection heads applied to feature maps of 80 × 80 (P3), 40 × 40 (P4), and 20 × 20 (P5) resolutions, as illustrated in Figure 5a. The P5 detection head, operating on the lowest-resolution feature map, captures high-level semantic information through its large receptive field. However, it lacks fine spatial detail, which limits localization precision and hinders accurate detection of small targets. Moreover, progressive downsampling leads to the loss of fine-grained spatial cues, further impairing the detection of small and densely clustered pests.

To enhance small-object detection, we introduce an additional detection head, P2, at a shallower and higher-resolution feature layer (160 × 160), as illustrated in Figure 5b. This shallow feature map inherently preserves richer spatial and textural details, which are essential for the precise detection of small-scale targets. The P2 detection head effectively leverages these high-resolution features to enhance the model’s sensitivity to small objects and improve its ability to capture precise boundaries and intricate details, particularly under complex background conditions.

Although introducing P2 improves small-object detection, the increased spatial resolution also incurs additional computational cost. As the P5 head primarily processes low-resolution features with large receptive fields, it contributes minimally to the localization of small objects while introducing redundant computational overhead. To address this, we remove the original P5 detection head and retain only P2, P3, and P4, as shown in Figure 5c. This modification reduces both parameter and computational complexity, allowing for the model to focus on feature maps that preserve richer spatial information.

2.3.2. LPCHead

In YOLO11n, the decoupled detection head (as shown in Figure 6) separates the classification and bounding box regression tasks into two parallel branches, offering task-specific optimization benefits. However, both branches redundantly utilize 3 × 3 and 1 × 1 convolutional layers, resulting in unnecessary computational overhead, increased parameters, and diminished detection efficiency. Although depthwise separable convolution (DWConv) can effectively reduce computational complexity, its limited feature representation capability leads to suboptimal performance in complex agricultural pest detection scenarios.

To address these limitations, we propose a Lightweight Partial Convolution Head (LPCHead). As shown in Figure 7, this module retains the structural advantages of task decoupling while adopting a more streamlined design. By introducing a shared feature extraction module between the two task-specific branches and eliminating redundant convolutional layers, the design effectively reduces computational complexity and parameters, thereby improving inference speed and overall detection performance without compromising task separation efficiency.

Specifically, we incorporate Partial Convolution (PConv) [24] into the shared feature extraction stem. In PConv, the input feature map

X \in R^{C \times H \times W}

is divided along the channel dimension into two subsets:

X = [\begin{matrix} X_{1}, X_{2} \end{matrix}]

(1)

where

X_{1} \in R^{r C \times H \times W}

comprises a proportion of the channels subjected to standard 3 × 3 convolution, and

X_{2} \in R^{(1 - r) C \times H \times W}

retains the remaining channels unchanged. Here,

r

denotes the proportion of channels subjected to convolution, typically set within the range (0, 1), and serves to control the sparsity level within the feature extraction process. Partial convolution is applied only to

X_{1}

:

Y_{1} = C o n v_{3 \times 3} (X_{1})

(2)

the outputs

Y_{1}

and

X_{2}

are then concatenated:

Y = [\begin{matrix} Y_{1}, X_{2} \end{matrix}]

(3)

The transformed output

Y

is then processed by a 1 × 1 convolution to integrate channel-wise information and normalize the feature dimension. Following the extracted features are propagated to two parallel branches, each consisting of a 1 × 1 convolution, which separately handle classification and regression tasks.

2.3.3. DBELCSP

In complex agricultural environments such as rice fields, pest targets are typically characterized by small sizes, dense distributions, and strong background interference. These factors impose high demands on the feature extraction and representation capabilities of detection networks. However, the C3K2 module in YOLO11n presents several limitations: (1) its shallow stacked structure restricts the receptive field, making it difficult to capture long-range contextual information; (2) it has limited feature representation capacity, resulting in poor detection performance for small pests with blurred edges or heavily blended backgrounds; (3) inefficient gradient propagation and redundant computations hinder training efficiency. These limitations lead to missed detection, reduced stability, and weak adaptability in cluttered field scenarios.

To address these challenges, we design DBELCSP, a novel backbone module illustrated in Figure 8, which integrates the cross-stage partial connections of CSPNet [25], the multi-level aggregation structure of ELAN [26], and a structurally re-parameterizable Diversified Branch Block (DBB).

Specifically, DBELCSP first adopts the split-transform-merge strategy from CSPNet, where the input feature map is partitioned into two parts: one retains original information, while the other passes through multiple transformation layers to extract deep semantic features. Merging these branches reduces redundant computations and parameters while ensuring efficient gradient flow. Next, the module incorporates ELAN’s stacked connection structure, aggregating features across multiple depths to enrich multi-scale semantic representation and improve sensitivity to densely distributed small objects.

Subsequently, a DBB module is introduced (Figure 9), consisting of multiple convolutional paths with diverse receptive fields (e.g., 1 × 1 and 3 × 3 convolutions) and non-convolutional operations such as average pooling. This heterogeneous design enables the network to capture both fine-grained local details and broader contextual information, enhancing robustness in complex backgrounds and improving perceptual accuracy for subtle pest features. A key feature of DBB is its structural re-parameterization mechanism, which transforms the heterogeneous multi-branch structure used during training into a single 3 × 3 convolution at inference (Figure 10), enabling enriched feature learning during training while maintaining fast and efficient inference.

In summary, DBELCSP effectively addresses the limitations of C3K2 in receptive field, feature representation, and gradient propagation, achieving enhanced detection accuracy and robustness for small pests in complex field environments while reducing parameter and computational complexity.

2.3.4. WaveletPool

Downsampling is a critical operation in deep neural networks, serving not only to reduce spatial resolution to expand the receptive field, but also to facilitate the extraction of higher-level semantic features. However, conventional pooling methods—such as max pooling and average pooling—often discard fine-grained texture details and edge information, weakening the representation of small-object features and reducing detection accuracy.

To address this limitation, we replace conventional pooling with WaveletPool [27] in the YOLO11n network. As illustrated in Figure 11, WaveletPool first applies a two-level Discrete Wavelet Transform (DWT) to the input feature map. In the first level, the input is decomposed into one approximation sub-band (LL₁) and three detail sub-bands (LH₁, HL₁, HH₁), capturing horizontal, vertical, and diagonal high-frequency components. Then, LL₁ is further decomposed via a second-level DWT, producing four new sub-bands: LL₂, LH₂, HL₂, and HH₂. All second-level sub-bands are preserved and combined using an Inverse Discrete Wavelet Transform (IDWT) to reconstruct a feature map downsampled by 2×.

By preserving multi-frequency and multi-directional information, WaveletPool enhances the network’s structural perception, maintains edge and texture details, and reduces information loss in cluttered backgrounds. This strengthens the model’s ability to detect small objects, weak edges, and localized patterns under complex agricultural conditions, thereby improving overall detection robustness and accuracy.

2.4. Experimental Setup

All experiments were conducted on a deep learning server equipped with an Intel^® Xeon^® Gold 5418Y processor (2.00 GHz), 32 GB of RAM, and an NVIDIA GeForce RTX 4090 GPU with 24 GB of VRAM. The software environment included Ubuntu 22.04 as the operating system, with GPU acceleration enabled by NVIDIA CUDA 12.1 and cuDNN 8.9.2. The model was implemented in Python 3.12 using the PyTorch 2.3.1 deep learning framework, which was used for network construction, training management, and performance evaluation. Through reference to previous YOLO-based pest detection studies [20,21,22] and preliminary trials, the final training hyperparameters were determined. A summary of the key hyperparameters is provided in Table 3. Figure 12 shows the training and validation loss curves along with detection performance metrics over 200 epochs. The performance metrics include mAP50, mAP50–95, precision, and recall. The loss curves indicate stable convergence without significant overfitting, while the performance metrics demonstrate steady improvement and stable detection performance throughout training, confirming the effectiveness of the chosen hyperparameters.

2.5. Metrics of Evaluation

This study adopts standard object detection metrics to evaluate model performance, including Precision (P), Recall (R), mean Average Precision (mAP), parameters, GFLOPs, and model size. These indicators jointly assess the model’s classification accuracy, detection capability, computational efficiency, and overall detection performance.

Precision (P) measures the proportion of true positive detection among all predicted positive instances and is defined as:

P = \frac{T P}{T P + F P}

(4)

where

T P

denotes true positives and

F P

denotes false positives. Higher Precision indicates a lower false positive rate.

Recall (R) quantifies the model’s ability to identify all actual positive instances and is defined as:

R = \frac{T P}{T P + F N}

(5)

where

F N

denotes false negatives. A higher Recall indicates a lower miss rate.

Mean Average Precision (mAP) serves as a key evaluation metric that combines Precision and Recall to reflect detection accuracy across multiple object categories. It is defined as the mean area under the Precision–Recall curves for all classes:

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P_{i} (R) d R

(6)

where

N

is the number of object categories and

P_{i} (R)

denotes the Precision of class

i

at Recall level

R

. This integral-based formulation ensures a threshold-independent and comprehensive evaluation of detection performance. mAP effectively characterizes the model’s generalization capability in multi-class detection tasks, particularly for the diverse rice pest categories involved in this study.

3. Results

3.1. Comparison Experiment of Different Detection Layers

To investigate the effects of different detection head configurations on detection performance and model complexity, we conducted ablation experiments using various combinations of detection layers. The results are summarized in Table 4.

The baseline model using the conventional P3, P4, and P5 detection heads achieved a mAP@0.5 of 90.95%, with 2.58 million parameters and a model size of 5.5 MB. Adding a high-resolution P2 detection head on top of the baseline configuration (i.e., P2 + P3 + P4 + P5) led to improved detection performance. The mAP@0.5 increased to 91.48%, and mAP@0.5:0.95 rose to 64.73%. Precision and Recall also increased slightly, reaching 90.19% and 88.34%, respectively. However, this performance gain came with higher complexity: the parameter increased to 2.66 million, GFLOPs rose to 10.2, and the model size expanded to 5.8 MB, indicating a clear trade-off between accuracy and efficiency. In contrast, removing the P5 head while retaining only P2, P3, and P4 resulted in the best balance. This configuration achieved the highest mAP@0.5 of 91.51%, with Precision and Recall reaching 90.49% and 88.39%, respectively. Notably, the parameter decreased by 25.2% to 1.93 million, and the model size was reduced to 4.3 MB. With GFLOPs slightly reduced to 9.6 compared to the P2 + P3 + P4 + P5 configuration.

These results demonstrate that the P2 + P3 + P4 configuration offers the most effective trade-off between detection accuracy and computational complexity. The inclusion of the high-resolution P2 head enhances fine-grained spatial sensitivity, which is critical for small-object localization. Meanwhile, removing the low-resolution P5 head reduces redundancy and resource consumption without sacrificing detection performance. This configuration aligns well with the characteristics of rice pest datasets, which predominantly feature small, densely distributed targets in complex backgrounds.

3.2. Ablation Experiments

To evaluate the individual and combined contributions of the proposed modules to detection performance and model efficiency, systematic ablation experiments were conducted based on the previously validated P2, P3, and P4 detection head combination (referred to as Net). On this Net, the DBELCSP, LPCHead, and WaveletPool modules were progressively integrated to analyze their impact on detection accuracy and lightweight performance. The detailed results are presented in Table 5, where “√” indicates the inclusion of each module.

As shown in Table 5, the introduction of LPCHead slightly decreases mAP@0.5 by 0.33% but significantly reduces parameters to 1.72 M and GFLOPs to 5.5, with a model size of 3.7 MB. Precision increases marginally by +0.37% while recall decreases slightly by −0.06%, indicating that LPCHead effectively decouples classification and regression tasks, reduces redundant computation, and simplifies model complexity while maintaining effective detection performance. Normalized to computational cost, this corresponds to approximately

- 0.384 %

mAP per million parameters reduced and

- 0.412 %

mAP per GFLOP saved, highlighting the efficiency-oriented design of LPCHead.

Subsequently, the incorporation of the DBELCSP module results in a 0.56% increase in precision, a 0.2% gain in mAP@0.5, and a 0.25% improvement in mAP@0.5:0.95. Meanwhile, the number of parameters decreases to 1.38 M, GFLOPs are further reduced to 3.9, and the model size shrinks to 3.5 MB. Recall slightly decreases by −0.04%. These improvements stem from the multi-branch structure and structural re-parameterization design of DBELCSP, which enhances feature extraction, improves the model’s robustness and sensitivity to small objects, and reduces computational cost and parameter. Normalized to computation, DBELCSP achieves approximately +0.059% mAP per million parameters reduced and +0.0125% mAP per GFLOP saved, demonstrating the best accuracy–efficiency trade-off.

Finally, replacing the traditional downsampling mechanism with WaveletPool reduces the parameter to 1.00 M, GFLOPs to 3.1, and model size to 2.8 MB. Despite these substantial reductions, the mAP@0.5 only slightly decreases to 90.99%, with precision and recall remaining nearly unchanged. This indicates that the model maintains detection performance while achieving significant lightweight compression. This is due to WaveletPool’s frequency-domain decomposition mechanism, which enables structure-aware downsampling and preserves edge and texture features, thereby improving robustness under complex backgrounds and suppressing irrelevant noise. Normalized to computation, this corresponds to approximately +0.921% mAP per million parameters reduced and +0.438% mAP per GFLOP saved.

Overall, the normalized analysis reveals that DBELCSP provides the most cost-effective improvement, while LPCHead and WaveletPool primarily reduce model complexity with minimal accuracy compromise. Compared with the YOLO11n baseline, the final YOLO-RP model achieves comparable mAP@0.5, with improved precision by 0.83% and recall slightly decreased by −0.59%. Parameters are reduced from 2.58 M to 1.00 M (−61.3%), and halving both GFLOPs and model size. These results demonstrate that the proposed modules collectively enable YOLO-RP to deliver efficient, accurate, and stable small-object detection performance in complex agricultural scenarios, exhibiting strong robustness to tiny pests under resource-constrained conditions while significantly reducing model complexity and computational cost, thereby achieving a superior accuracy–efficiency trade-off.

As shown in Figure 13, the visualization provides a comprehensive comparison of detection performance and computational efficiency for YOLO11n, Models 2–8, and the proposed YOLO-RP. Seven metrics are considered: Precision (%), Recall (%), mAP@0.5 (%), mAP@0.5:0.95 (%), Parameters, GFLOPs, and Model Size (MB). Higher values for Precision, Recall, mAP@0.5, and mAP@0.5:0.95 indicate better detection performance, whereas lower values for Parameters, GFLOPs, and Model Size indicate reduced computational cost. Each model is represented by a distinct colored polygon: YOLO11n (light blue), Models 2–8 (pink, purple, orange, red, dark gray, green, teal, respectively), and YOLO-RP (yellow).

The visualization demonstrates that YOLO-RP achieves a favorable balance, maintaining high Precision and Recall comparable to baseline models while substantially reducing Parameters, GFLOPs, and Model Size. YOLO11n achieves competitive detection accuracy but with higher parameters and larger model size. The remaining models correspond to successive module-enhanced versions, in which Parameters, GFLOPs, and Model Size progressively decrease while retaining comparable detection performance, forming polygons of varying shapes and sizes.

Overall, YOLO-RP consistently maintains stable detection performance while significantly reducing computational overhead, demonstrating its potential for deployment on resource-constrained agricultural devices.

3.3. Performance Comparison

In this experiment, we evaluated the performance of the proposed YOLO-RP model and compared it with several lightweight networks, including GhostNet V2 [28], ShuffleNet V2 [29], and MobileNet V3 [30], as well as U-Net V2 [31], which is commonly applied in segmentation tasks. All models were evaluated under identical experimental settings on the same dataset, consistent with the settings described in Table 3.

The results are summarized in Table 6 YOLO-RP achieved a precision of 90.62% and an mAP@0.5 of 90.99%, outperforming all baseline models. Although lightweight networks are designed to reduce parameters and computational cost compared with conventional architectures, in this study their parameter counts and GFLOPs were still higher than those of YOLO-RP, while their detection performance was generally inferior. For example, ShuffleNet V2 achieved a comparable accuracy of 90.06% but exhibited a lower recall of 85.87%; MobileNet V3 demonstrated the lowest computational complexity (1.3 GFLOPs) but suffered from poor detection accuracy. U-Net V2, despite its strong feature extraction capability, was less effective than YOLO-RP in detection accuracy and incurred substantially higher computational demands, with 5.05 M parameters, 12.3 GFLOPs, and a model size of 9.9 MB. In contrast, YOLO-RP required only 1.00 M parameters, 3.1 GFLOPs, and 2.8 MB of model size, achieving an effective balance between accuracy and efficiency. Overall, YOLO-RP not only maintained high detection precision but also significantly reduced computational overhead, underscoring its strong potential for practical deployment in complex small-target detection tasks such as rice pest monitoring.

3.4. Comparison with Representative Methods

To comprehensively validate the performance of the proposed YOLO-RP model for rice pest detection, comparative experiments were conducted against several representative object detection frameworks, including the two-stage Faster R-CNN, the classic one-stage SSD, multiple YOLO series variants, and the Transformer-based RT-DETR-r18 model. All models were evaluated under identical experimental settings on the same dataset. Detailed results are summarized in Table 7.

In terms of detection performance, YOLO-RP achieves a precision of 90.62%, the highest among all evaluated models. For mAP@0.5, YOLO-RP attains 90.99%, outperforming all other compared models. Specifically, it improves upon YOLOv5n, YOLOv8n, YOLOv9t, and YOLOv10n by 1.39%, 0.93%, 0.39%, and 0.66%, respectively. Compared with RT-DETR-r18, SSD, and Faster R-CNN, YOLO-RP demonstrates improvements of 5.25%, 9.49%, and 5.09%, respectively. Moreover, YOLO-RP performs nearly identically to YOLO11n in mAP@0.5 (only a 0.04% difference) and achieves 63.84% in mAP@0.5:0.95, which is only 0.31 percentage points below YOLO11n, confirming that the proposed model retains high detection precision.

Regarding model compactness and efficiency, YOLO-RP exhibits superior lightweight characteristics. It reduces the parameter to 1.00 M, GFLOPs to 3.1, and model size to 2.8 MB. These reductions represent at least 55% fewer parameters, over 50% less computation, and nearly half the model size compared to existing YOLO variants. Compared to RT-DETR-r18, SSD, and Faster R-CNN, YOLO-RP’s parameters, GFLOPs, and model size are reduced to merely 5.0%, 6.9%, and 2.4%; 5.4%, 19.7%, and 2.3%; and 3.6%, 2.5%, and 0.9% of those models, respectively.

Although certain YOLO variants achieve comparable accuracy, they remain suboptimal in terms of model compactness and deployment feasibility. Heavyweight detectors like RT-DETR-r18, Faster R-CNN, and SSD suffer from excessive computational overhead and limited real-time performance. In contrast, YOLO-RP achieves a more favorable balance between accuracy and efficiency, simultaneously delivering state-of-the-art detection precision and significant reductions in parameters, GFLOPs, and model size. This optimal trade-off highlights the model’s effectiveness in addressing the dual demands of small-object detection and lightweight design.

A three-dimensional visualization compares model parameters, GFLOPs, and model size on a log₁₀ scale across multiple object detection models (Figure 14). The main plot (right) provides an overview of all evaluated models, including Faster-RCNN, SSD, RT-DETR-R18, YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLO11n, and the proposed YOLO-RP. Each point represents a model, with its position along the x-, y-, and z-axes corresponding to Parameters, GFLOPs, and Model Size, respectively.

The inset plot (left) offers a zoomed-in view of the YOLO series (YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLO11n) and YOLO-RP, highlighting differences in computational overhead. In both plots, point colors encode model size, ranging from yellow (smaller models) to dark purple (larger models), facilitating intuitive assessment of model compactness relative to performance.

The visualization shows that YOLO-RP occupies a region with the lowest Parameters, GFLOPs, and Model Size, while RT-DETR-R18 and other models are located in regions with larger model sizes (darker colors). These results underscore YOLO-RP’s potential for deployment in resource-constrained agricultural scenarios.

3.5. Generalization Experiments

To comprehensively evaluate the generalization capability of YOLO-RP across diverse data distributions and pest detection scenarios, we conducted cross-dataset experiments on three representative public benchmarks: the Common Rice Pests (Philippines) dataset [37], the large-scale IP102 dataset [23], and the Pest24 dataset [38].

The Common Rice Pests (Philippines) dataset comprises 5229 images spanning six representative rice pest categories, reflecting typical distribution characteristics observed in Southeast Asian rice-growing regions.
The IP102 dataset, previously introduced in Section 2.1, covers a broad spectrum of crop types—including Rice, Corn, Wheat, Beet, Alfalfa, Vitis, Citrus, and Mango—and various pest species, and is characterized by severe class imbalance, high intra-class variability, and complex agricultural environments.
The Pest24 dataset contains 25,378 images of 24 major pest species affecting field crops, posing significant challenges for fine-grained detection due to dense target distributions, small object sizes, high inter-class similarity, and varying background conditions.

Collectively, these datasets form a diverse and challenging benchmark for evaluating the cross-domain robustness and adaptability of YOLO-RP in real-world agricultural scenarios. For all three datasets, we trained the models independently from scratch while strictly maintaining the experimental settings specified in Table 3 to ensure fair comparison. The corresponding experimental results and comparative analysis are presented in Table 8.

On the Common Rice Pests (Philippines) dataset, YOLO-RP reduces parameters by 61% and decreases GFLOPs and model size by approximately 50%. Despite these substantial reductions, it still attains a precision of 92.24% and a recall of 90.67%, surpassing the YOLO11n baseline in precision. Moreover, it achieves an mAP@0.5 of 93.11%, closely matching the 93.28% of YOLO11n, indicating only minimal accuracy loss. On the more diverse IP102 and Pest24 pest detection datasets, YOLO-RP consistently maintains its lightweight characteristics, with over 61% fewer parameters and approximately 50% reductions in both computational complexity and model size. While mAP@0.5 exhibits slight declines—less than two percentage points on both datasets—this level of degradation is generally acceptable given the substantial gains in model compactness.

These results confirm that YOLO-RP delivers high detection accuracy for primary rice pest detection tasks while achieving notable reductions in model complexity. Even when applied to broader and more complex pest detection scenarios, it maintains performance comparable to the original YOLO11n, underscoring its robust generalization and cross-domain adaptability.

3.6. Deployment Evaluation

The inference speed of an object detection model is a critical metric for assessing its real-time performance, typically measured in frames per second (FPS). A higher FPS indicates the model’s ability to process more images per unit of time, reflecting stronger real-time responsiveness.

To evaluate the deployment feasibility of the proposed YOLO-RP model under resource-constrained conditions, we conducted inference experiments on the NVIDIA Jetson Nano B01 platform, which provides approximately 0.5 TFLOPs of computational capacity and serves as a representative low-power edge computing environment. TensorRT was utilized to optimize and accelerate the models, enabling efficient deployment on the embedded device. All experiments were performed using the model’s default input size of 640 × 640 pixels, with TensorRT operating in FP16 precision to balance speed and accuracy.

As summarized in Table 9, YOLO-RP achieved an average inference speed of 20.8 FPS on the Jetson Nano, significantly outperforming the baseline YOLO11n, which reached 12.5 FPS—representing an improvement of approximately 66.4%. Runtime measurements further revealed a cold-start latency of 1010.2 ms, while the warm-up single-frame latency exhibited P50, P90, and P95 values of 47.7 ms, approximately 48.1 ms, and 48.1–48.3 ms, respectively. RAM usage remained stable at 900–910 MB out of 3.9 GB, and the GPU temperature was maintained within 38–46.5 °C without thermal throttling. The Jetson Nano was tested with default nvpmodel and jetson_clocks settings, a warm-up period of 15–20 s, and a 6 min inference session. The approximate power consumption during inference was measured using tegrastats with a 1 Hz sampling rate and a 6 min averaging window, showing an average draw of 2.93 W with a variance of ±0.3 W and a peak of approximately 3.61 W, corresponding to an estimated per-frame energy cost of 0.144 J. Taken together, these results demonstrate YOLO-RP’s real-time capability and energy-efficient operation, highlighting its suitability for deployment on resource-constrained edge devices.

3.7. Detection and Experimental Results Analysis

To further evaluate the real-world adaptability of different models, we conducted qualitative comparisons on representative test images to visually analyze the detection performance of the baseline YOLO11n and the proposed YOLO-RP model.

As illustrated in Figure 15, the YOLO11n model frequently produces overlapping or redundant bounding boxes in dense object regions, which adversely affects localization accuracy. Moreover, it is prone to false detections, often mistaking background patterns such as leaf veins, shadows, or speckled textures for pests. Instances of category confusion were also observed—for example, misclassifying brown plant hopper as rice leaf hopper—highlighting its limited robustness and discriminative capacity in cluttered field environments.

In contrast, YOLO-RP consistently generates more compact and precise bounding boxes, with significantly fewer false detections in non-target regions. Even under complex background interference, YOLO-RP effectively distinguishes pests from irrelevant features, maintaining lower false positive rates and higher category consistency. Although a few misclassifications may still occur in extremely challenging scenes, the overall detection results are more stable and accurate, providing a more realistic representation of pest spatial distribution.

In summary, YOLO-RP demonstrates superior detection precision and robustness in complex agricultural environments compared to YOLO11n, validating its effectiveness and adaptability for small-object detection tasks.

To further illustrate the differences in feature perception between models, we employed Kernel Principal Component Analysis–Class Activation Mapping (KPCA-CAM) [39] to visualize the class activation regions on representative test images, as shown in Figure 16. This technique enhances the interpretability and discriminability of class activation maps by extracting nonlinear principal components in the high-dimensional feature space, thereby highlighting the regions most critical to the model’s decision-making process.

The visualization results reveal that YOLO11n exhibits relatively dispersed and weak attention responses within target areas. In particular, when dealing with small objects or complex backgrounds, its attention tends to drift away from the actual pest targets and instead focuses erroneously on irrelevant background structures, increasing the risk of false positives and missed detections. In contrast, YOLO-RP demonstrates more concentrated and discriminative attention maps. The activation regions are tightly aligned with the pest contours and accurately cover the object body. Even in scenarios involving overlapping pests or strong background interference, YOLO-RP maintains consistent attention stability and superior target discrimination capability.

These findings further validate the enhanced feature representation capability of YOLO-RP. By accurately attending to critical regions, the model achieves greater stability and robustness in small-object detection under complex field conditions. This conclusion is consistent with the quantitative evaluation results and underscores the model’s practical potential and wide adaptability for real-world rice pest detection tasks in resource-constrained environments.

To quantitatively evaluate detection performance across different pest categories, we computed the average precision at IoU threshold 0.5 (AP50) for both YOLO11n and YOLO-RP, with the results summarized in Table 10. YOLO-RP achieves higher AP in 7 out of 12 categories, with the most notable improvements observed in small-object pests, such as Asiatic rice borer (+1.70%) and Rice leaf roller (+2.27%). Slight decreases were observed in categories such as Rice stem fly (–0.93%) and Grain spreader thrips (–1.80%), likely reflecting trade-offs introduced by the lightweight architectural modifications.

Overall, YOLO-RP maintains detection performance comparable to YOLO11n while achieving significant model lightweighting. It achieves similar or higher AP across most pest categories, demonstrating that the proposed method effectively balances model efficiency and detection accuracy in complex agricultural environments.

Figure 17 shows the Precision–Recall (PR) curves of YOLO-RP across all rice pest categories. The overall mAP@0.5 is 0.9099, consistent with the AP results in Table 10. The curves indicate that YOLO-RP maintains high precision and recall across most pest categories, with only minor variations between classes. Concentrated near the top-right corner, the PR curves further confirm that YOLO-RP achieves stable and robust detection performance.

3.8. Statistical Tests

To evaluate whether the performance improvements of YOLO-RP over the baseline model YOLO11n are both statistically significant and practically meaningful, we conducted repeated experiments using ten different random seeds and performed independent-samples t-tests.

The analysis shows that the differences in both precision and mAP@0.5 are statistically significant, with p-values of 0.0032 and 0.01, respectively. The computed Cohen’s d values are 1.52 for precision and 1.28 for mAP@0.5, indicating very large and large effect sizes. Furthermore, the 95% confidence intervals (CI) for the mean values confirm the consistency of the improvements across repeated experiments (Precision: 90.76 ± 0.25 vs. 90.19 ± 0.27; mAP@0.5: 91.13 ± 0.08 vs. 90.93 ± 0.13). For mAP@0.5:0.95, the difference between YOLO-RP and YOLO11n is not statistically significant (p = 0.1232, Cohen’s d = −0.72), with mean ± 95% CI values of 63.84 ± 0.21 and 64.07 ± 0.25. These results indicate that YOLO-RP achieves statistically significant improvements in precision and mAP@0.5, while maintaining comparable performance in mAP@0.5:0.95, confirming the reliability and robustness of the proposed method in small rice pest detection tasks.

The detailed experimental results, including per-seed values, p-values, Cohen’s d, and 95% CI, are presented in Table 11.

4. Discussion

In recent years, lightweight object detection models have received increasing attention in agricultural pest recognition, with the core objective of maintaining high detection accuracy while reducing computational complexity to meet practical application requirements. For instance, Cen et al. [40] proposed YOLO-LCE, which incorporates lightweight convolutions and feature enhancement modules into the YOLOv8 architecture, achieving approximately 44% reduction in parameters and 33% reduction in GFLOPs while slightly improving precision, reaching an mAP50 of 63.9%. Additionally, Zhu et al. [41] introduced GDS-YOLO, which employs GsConv, Dysample, and SCAM modules to achieve an mAP50 of 85.3% in rice field disease detection, while reducing parameters by 23% and GFLOPs by 10.1%, demonstrating robust performance in complex environments. In contrast, YOLO-RP achieves greater model compression and inference efficiency while maintaining high detection performance. Experimental results show that YOLO-RP attains a precision of 90.62%, recall of 87.38%, mAP@0.5 of 90.99%, and mAP@0.5:0.95 of 63.84%, with parameters, GFLOPs, and model size reduced by 61.3%, 50.8%, and 49.1%, respectively, down to 1.00 M parameters, 3.1 GFLOPs, and 2.8 MB. On the NVIDIA Jetson Nano, YOLO-RP achieves 20.8 FPS real-time inference, 66.4% faster than the baseline, demonstrating its deployment potential in resource-constrained edge environments. Moreover, cross-dataset experiments on Common Rice Pests (Philippines), IP102, and Pest24 indicate that YOLO-RP maintains robust detection performance across diverse crops and pest scenarios, showing strong cross-domain generalization capability.

Despite these advantages, YOLO-RP still has several limitations. In scenarios with dense pest aggregation or severe occlusion, as well as in complex or noisy backgrounds (e.g., rice weeds, soil textures, or light spots), the discrimination of small targets and detection stability remain insufficient, potentially leading to missed detections and false positives. This is primarily due to the inability of the current feature fusion and receptive field design to fully capture occluded target details and suppress background interference. Future studies could explore multi-scale attention mechanisms to enhance local feature representation, optimize feature fusion strategies, and incorporate adaptive receptive field design, along with background suppression or context-aware modules, to further improve detection accuracy and overall robustness, and systematically evaluate performance in larger-scale field scenarios.

In addition, the cross-species generalization of YOLO-RP remains limited. Cross-dataset experiments demonstrate that the model performs robustly on Common Rice Pests (Philippines), IP102, and Pest24: on the Common Rice Pests dataset, precision and recall reach 92.24% and 90.67%, with mAP@0.5 at 93.11%, nearly matching the YOLO11n baseline; on IP102 and Pest24, mAP@0.5 decreases by only 1–2 percentage points, while still maintaining 61% parameter reduction and approximately 50% decrease in computational cost and model size. These results suggest strong robustness across different data distributions and partial cross-species scenarios. However, current experiments are mainly limited to rice and a narrow range of crop pests, leaving the adaptability to a broader variety of crops or complex ecosystems unverified. Future work could integrate domain generalization and transfer learning approaches, pretraining or fine-tuning on more diverse crop and pest datasets, and employ cross-dataset validation to further enhance model robustness and cross-domain generalization.

The adaptability of YOLO-RP to varying crops and environmental conditions still requires improvement. In real-world agricultural scenarios, variations in illumination, seasonal changes, and climatic factors such as humidity, rainfall, and wind can degrade image quality, thereby affecting detection stability and accuracy. Future work could incorporate illumination-invariant feature representations to mitigate lighting effects, combined with domain adaptation and online environmental adaptation mechanisms to handle dynamic field conditions. Systematic evaluation under extreme lighting, climate, and field scenarios will further enhance YOLO-RP’s robustness and reliability in practical agricultural applications.

Regarding edge deployment, current experiments are limited to the NVIDIA Jetson Nano, which is insufficient to fully evaluate model performance under different computational and energy constraints. Future research should extend deployment tests to multiple edge devices, such as NVIDIA Jetson Xavier, Raspberry Pi, and FPGA, and systematically quantify efficiency, latency, and energy consumption. Additionally, model compression, operator optimization, and inference acceleration strategies tailored to different hardware architectures could be explored to comprehensively enhance YOLO-RP’s adaptability and practicality on diverse edge platforms, providing solid technical support for agricultural IoT and smart farming applications.

5. Conclusions

To address the challenges of detecting tiny rice pests in complex field environments—such as poor detection performance, high computational redundancy, and low deployment adaptability—this study proposes YOLO-RP, a lightweight and efficient detection model based on the YOLO11n architecture. By integrating a high-resolution P2 detection head, a lightweight partial convolutional head (LPCHead), a re-parameterizable multi-branch module (DBELCSP), and a structure-aware wavelet pooling module (WaveletPool), YOLO-RP achieves structural redesign and module simplification. These improvements jointly enhance small-object detection accuracy while significantly reducing model complexity, resulting in strong generalization capability and suitability for edge deployment.

Experimental results show that YOLO-RP achieves 90.62% precision, 87.38% recall, 90.99% mAP@0.5, and 63.84% mAP@0.5:0.95 on the rice pest dataset. Meanwhile, the parameters, computational complexity, and model size are reduced by 61.3%, 50.8%, and 49.1%, respectively—compressed to just 1.00 M, 3.1 GFLOPs, and 2.8 MB. These results demonstrate that YOLO-RP achieves an effective trade-off between accuracy and efficiency, outperforming many existing mainstream detection algorithms. Cross-dataset evaluations further verify its generalization ability and cross-scenario adaptability. On the Common Rice Pests (Philippines) dataset, YOLO-RP achieves 92.24% precision and 93.11% mAP@0.5. On the IP102 and Pest24 datasets—both covering diverse crop pest species—YOLO-RP maintains comparable detection performance to the baseline while reducing the number of parameters by over 61% and cutting GFLOPs and model size by approximately 50%. Furthermore, when deployed on the NVIDIA Jetson Nano embedded platform, YOLO-RP achieves a real-time inference speed of 20.8 FPS, representing a 66.4% improvement over YOLO11n. This demonstrates the model’s high inference efficiency and its ability to meet the demands of real-time, lightweight monitoring in practical agricultural environments.

Despite these achievements, several challenges remain. YOLO-RP’s performance can still be affected by dense pest aggregation, occlusion, complex backgrounds, and limited cross-species and cross-crop generalization. Future research could focus on enhancing feature representation and fusion, incorporating adaptive receptive fields and attention mechanisms, improving robustness to environmental variations, and extending deployment across diverse edge devices with optimized efficiency. These improvements are expected to enhance the reliability and accuracy of pest monitoring, providing valuable support for decision-making in precision agriculture.

Author Contributions

Conceptualization, X.Y., X.X. and M.D.; methodology, Q.H.; software, Q.H.; validation, X.Y., Q.H. and X.X.; formal analysis, Q.H.; investigation, Q.H.; resources, X.X. and M.D.; data curation, Q.H.; writing—original draft preparation, Q.H.; writing—review and editing, X.Y. and Q.H.; visualization, Q.H.; supervision, X.Y., X.X. and M.D.; project administration, X.Y., X.X. and M.D.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Science and Technology Major Program (Grant No. GuikeAB23026036 and Grant No. GuikeAB23026004), and the National Natural Science Foundation of China (Grant No. 62262011).

Data Availability Statement

The data presented in this study are available from the IP102 dataset at https://github.com/xpwu95/IP102 (accessed on 18 August 2025). In addition, the rice-pest subset used in this study is publicly available and can be accessed at https://github.com/rzorange/Ricepests (accessed on 9 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
Li, X.; Wang, L.; Sung, E. AdaBoost with SVM-Based Component Classifiers. Eng. Appl. Artif. Intell. 2008, 21, 785–795. [Google Scholar] [CrossRef]
Sun, Y.; Jiang, Z.; Zhang, L.; Dong, W.; Rao, Y. SLIC_SVM Based Leaf Diseases Saliency Map Extraction of Tea Plant. Comput. Electron. Agric. 2019, 157, 102–109. [Google Scholar] [CrossRef]
Sahu, S.K.; Pandey, M. An Optimal Hybrid Multiclass SVM for Plant Leaf Disease Detection Using Spatial Fuzzy C-Means Model. Expert Syst. Appl. 2023, 214, 118989. [Google Scholar]
Qing, Y.A.; Zheng, W.A.; Bao-jun, Y.A.; Jian, T.A. Automated Detection and Identification of White-Backed Planthoppers in Paddy Fields Using Image Processing. J. Integr. Agric. 2017, 16, 1547–1557. [Google Scholar] [CrossRef]
Pantazi, X.E.; Moshou, D.; Tamouridou, A.A. Automated Leaf Disease Detection in Different Crop Species through Image Features Analysis and One Class Classifiers. Comput. Electron. Agric. 2019, 156, 96–104. [Google Scholar] [CrossRef]
Zhang, Y.; Zhong, L.; Ding, Y.; Yu, H.; Zhai, Z. ResViT-Rice: A Deep Learning Model Combining Residual Module and Transformer Encoder for Accurate Detection of Rice Diseases. Agriculture 2023, 13, 1264. [Google Scholar] [CrossRef]
Guan, B.; Wu, Y.; Zhu, J.; Kong, J.; Dong, W. GC-Faster RCNN: The Object Detection Algorithm for Agricultural Pests Based on Improved Hybrid Attention Mechanism. Plants 2025, 14, 1106. [Google Scholar] [CrossRef]
Ali, F.; Qayyum, H.; Iqbal, M.J. Faster-PestNet: A Lightweight Deep Learning Framework for Crop Pest Detection and Classification. IEEE Access 2023, 11, 104016–104027. [Google Scholar] [CrossRef]
Wang, T.; Zhao, L.; Li, B.; Liu, X.; Xu, W.; Li, J. Recognition and Counting of Typical Apple Pests Based on Deep Learning. Ecol. Inform. 2022, 68, 101556. [Google Scholar] [CrossRef]
Haruna, Y.; Qin, S.; Mbyamm Kiki, M.J. An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline. Appl. Sci. 2023, 13, 1346. [Google Scholar] [CrossRef]
Yin, Z.B.; Liu, F.Y.; Geng, H.; Xi, Y.J.; Zeng, D.B.; Si, C.J.; Shi, M.D. A High-Precision Jujube Disease Spot Detection Based on SSD during the Sorting Process. PLoS ONE 2024, 19, e0296314. [Google Scholar] [CrossRef] [PubMed]
Lyu, Z.; Jin, H.; Zhen, T.; Sun, F.; Xu, H. Small Object Recognition Algorithm of Grain Pests Based on SSD Feature Fusion. IEEE Access 2021, 9, 43202–43213. [Google Scholar] [CrossRef]
Hu, Y.; Deng, X.; Lan, Y.; Chen, X.; Long, Y.; Liu, C. Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 2023, 14, 280. [Google Scholar] [CrossRef]
Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea Tree Pest Detection Algorithm Based on Improved Yolov7-Tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]
Yang, S.; Xing, Z.; Wang, H.; Dong, X.; Gao, X.; Liu, Z.; Zhang, X.; Li, S.; Zhao, Y. Maize-YOLO: A New High-Precision and Real-Time Method for Maize Pest Detection. Insects 2023, 14, 278. [Google Scholar] [CrossRef]
Zhang, L.; Cui, H.; Sun, J.; Li, Z.; Wang, H.; Li, D. CLT-YOLOX: Improved YOLOX Based on Cross-Layer Transformer for Object Detection Method Regarding Insect Pest. Agronomy 2023, 13, 2091. [Google Scholar] [CrossRef]
Wei, J.; Gong, H.; Li, S.; You, M.; Zhu, H.; Ni, L.; Luo, L.; Chen, M.; Chao, H.; Hu, J.; et al. Improving the Accuracy of Agricultural Pest Identification: Application of AEC-YOLOv8n to Large-Scale Pest Datasets. Agronomy 2024, 14, 1640. [Google Scholar] [CrossRef]
Zhu, L.; Li, X.; Sun, H.; Han, Y. Research on CBF-YOLO Detection Model for Common Soybean Pests in Complex Environment. Comput. Electron. Agric. 2024, 216, 108515. [Google Scholar] [CrossRef]
Fang, K.; Zhou, R.; Deng, N.; Li, C.; Zhu, X. RLDD-YOLOv11n: Research on Rice Leaf Disease Detection Based on YOLOv11. Agronomy 2025, 15, 1266. [Google Scholar] [CrossRef]
Yin, J.; Zhu, J.; Chen, G.; Jiang, L.; Zhan, H.; Deng, H.; Long, Y.; Lan, Y.; Wu, B.; Xu, H. An Intelligent Field Monitoring System Based on Enhanced YOLO-RMD Architecture for Real-Time Rice Pest Detection and Management. Agriculture 2025, 15, 798. [Google Scholar] [CrossRef]
Huang, Y.; Liu, Z.; Zhao, H.; Tang, C.; Liu, B.; Li, Z.; Wan, F.; Qian, W.; Qiao, X. YOLO-YSTs: An Improved YOLOv10n-Based Method for Real-Time Field Pest Detection. Agronomy 2025, 15, 575. [Google Scholar] [CrossRef]
Wu, X.; Zhan, C.; Lai, Y.K.; Cheng, M.M.; Yang, J. IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8787–8796. [Google Scholar]
Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image Inpainting for Irregular Holes Using Partial Convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Designing Network Design Strategies through Gradient Path Analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar] [CrossRef]
Williams, T.; Li, R. Wavelet Pooling for Convolutional Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Peng, Y.; Chen, D.Z.; Sonka, M. U-net v2: Rethinking the skip connections of u-net for medical image segmentation. In Proceedings of the 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 14–17 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Sun, X.; Wu, P.; Hoi, S.C. Face Detection Using Deep Learning: An Improved Faster RCNN Approach. Neurocomputing 2018, 299, 42–50. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin, Germany, 2016; pp. 21–37. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Common Rice Pests (Philippines). Available online: https://universe.roboflow.com/data-science-project/common-rice-pests-philippines (accessed on 1 July 2025).
Wang, Q.J.; Zhang, S.Y.; Dong, S.F.; Zhang, G.C.; Yang, J.; Li, R.; Wang, H.Q. Pest24: A Large-Scale Very Small Object Data Set of Agricultural Pests for Multi-Target Detection. Comput. Electron. Agric. 2020, 175, 105585. [Google Scholar] [CrossRef]
Karmani, S.; Sivakaran, T.; Prasad, G.; Ali, M.; Yang, W.; Tang, S. KPCA-CAM: Visual Explainability of Deep Computer Vision Models Using Kernel PCA. In Proceedings of the 2024 IEEE 26th International Workshop on Multimedia Signal Processing (MMSP), West Lafayette, IN, USA, 2–4 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Cen, X.; Lu, S.; Qian, T. YOLO-LCE: A Lightweight YOLOv8 Model for Agricultural Pest Detection. Agronomy 2025, 15, 2022. [Google Scholar] [CrossRef]
Huang, Y.; Feng, X.; Han, T.; Song, H.; Liu, Y.; Bao, M. GDS-YOLO: A rice diseases identification model with enhanced feature extraction capability. IET Image Process. 2025, 19, e70034. [Google Scholar] [CrossRef]

Figure 1. Experimental workflow of YOLO-RP for rice pest detection.

Figure 2. Samples of twelve types of rice pests: (a) Asiatic Rice Borer; (b) Brown Plant Hopper; (c) Grain Spreader Thrips; (d) Paddy Stem Maggot; (e) Rice Gall Midge; (f) Rice Leaf Caterpillar; (g) Rice Leaf Hopper; (h) Rice Leaf Roller; (i) Rice Shell Pest; (j) Rice Stem Fly; (k) Rice Water Weevil; (l) Yellow Rice Borer.

Figure 3. Image augmentation method: (a) Cropping; (b) Noise Injection; (c) Blurring; (d) Color Jittering; (e) Horizontal Flipping; (f) Vertical Flipping; (g) Contrast Adjustment; (h) Rotation; (i) Saturation Adjustment; (j) Brightness Variation.

Figure 4. Structure of YOLO-RP.

Figure 5. Comparison of detection head structures: (a) YOLO11n structure; (b) YOLO11n with the newly introduced P2 detection head; (c) proposed structure with detection heads.

Figure 6. YOLO11n detection head structure.

Figure 7. The architecture of LPCHead.

Figure 8. Structural design of the DBELCSP module.

Figure 9. Structural Design of the Diverse Branch Block.

Figure 10. The six typical re-parameterization paths of the Diverse Branch Block.

Figure 11. WaveletPool Forward Propagation Algorithm. The symbol * denotes newly generated images.

Figure 12. Training and validation loss curves and detection performance metrics of YOLO-RP over 200 epochs.

Figure 13. The effect of normalization of overall indicators.

Figure 14. Three-dimensional comparative visualization of Parameters, GFLOPs, and Model Size across different models.

Figure 15. Comparison of Detection Performance Across Different Models.

Figure 16. KPCA-CAM Heatmap Comparison Between YOLO-RP and YOLO11n.

Figure 17. The precision–recall curve for the YOLO-RP model.

Table 1. Summary of representative YOLO-based pest detection models and their key features.

Model	mAP@0.5 (%)	Parameter (M)	GFLOPs	Datasets
YOLO-GBS ^a	79.80	-	-	IP102 (7 rice pest classes)
YOLOv7-tiny ^b	93.20	26.40	-	Tree Pest
Maize-YOLO ^c	76.30	33.40	38.90	IP102 (13 maize pest classes)
CLT-YOLOX ^d	57.70	10.51	35.36	IP102
AEC-YOLOv8n ^e	67.10	3.69	-	IP102
CBF-YOLO ^f	86.90	70.10	188.00	Soybean Pest
RLDD-YOLO11n ^g	88.30	2.58	6.30	Rice Leaf Disease
YOLO-RMD ^h	98.2	5.87	49.40	IP102 + online (7 rice pest classes)
YOLO-YSTs ⁱ	86.8	3.02	8.80	Sticky Trap Pest
YOLO-RP (Ours)	90.99	1.00	3.10	IP102 (12 rice pest classes)

^a Introduced GC attention, BiFPN, and Swin Transformer for improved feature extraction and multi-scale fusion. ^b Incorporated deformable convolution, BiFormer dynamic attention, non-maximal suppression, and a decoupled detection head to enhance detection accuracy. ^c Utilized ELAN, CBS, and CSPResNeXt-50 for efficient feature extraction, combined with SPPCSPC and ELAN for multi-scale feature aggregation. ^d Introduced a Cross-Layer Transformer (CLT) to enable cross-layer fusion and improve multi-scale object detection. ^e Implemented EMSFEM, AFEM_SIE, Concat-Weighting, and SPPELAN for enhanced multi-scale feature extraction and fusion in large pest datasets. ^f Added CSE-ELAN, Bi-PAN, FFE, and SPPCSPC modules to improve detection of small and occluded pests. ^g Employed SCSA residual attention and CARAFE upsampling to enhance feature representation and multi-scale fusion for rice leaf diseases. ^h Integrated RFAConv, MLCA, and DyHead to enhance small-object detection and multi-scale feature representation. ⁱ Implemented SPD-Conv, iRMB attention, and Inner-SIoU loss to improve dense and small-object detection.

Table 2. Dataset composition across training, validation, and testing subsets.

Category	Training Instances	Val Instances	Test Instances
Asiatic rice borer	9322	969	1052
rice stem fly	3233	530	389
grain spreader thrips	7690	1081	896
paddy stem maggot	3551	582	465
rice gall midge	9345	1040	1428
rice leaf caterpillar	5307	623	643
rice leaf hopper	9257	1431	1406
rice leaf roller	12,530	2319	1378
rice shell pest	4917	621	712
brown plant hopper	21,332	2765	3336
rice water weevil	15,538	1967	2049
yellow rice borer	5603	609	545
Summary	107,625	14,537	14,299

Table 3. Experimental hyperparameters settings.

Parameter	Value
Epochs	200
Batch size	32
Learning rate	0.01
Optimizer	SGD
Image size	640 × 640
Weight decay	0.0005
Momentum	0.937

Table 4. Comparison of detection performance and model complexity under different detection head configurations.

Model	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95(%)	Parameter (M)	GFLOPs	Size (MB)
P3 + P4 + P5	89.79	87.97	90.95	64.15	2.58	6.3	5.5
P2 + P3 + P4 + P5	90.19	88.34	91.48	64.73	2.66	10.2	5.8
P2 + P3 + P4	90.49	88.39	91.51	64.37	1.93	9.6	4.3

Table 5. Results of the ablation experiments.

Model	Net	LPCHead	DBELCSP	WaveletPool	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Parameter (M)	GFLOPs	Size (MB)
YOLO11n					89.79	87.97	90.95	64.15	2.58	6.3	5.5
Model 2	√				90.49	88.39	91.51	64.37	1.93	9.6	4.3
Model 3	√	√			90.08	88.45	91.18	64.09	1.72	5.5	3.7
Model 4	√		√		91.11	88.55	91.56	64.58	1.59	8.0	4.0
Model 5	√			√	90.29	88.52	91.17	64.29	1.56	8.7	3.4
Model 6	√	√	√		90.64	88.41	91.38	64.34	1.38	3.9	3.5
Model 7	√	√		√	90.15	88.13	91.13	63.80	1.35	4.6	2.9
Model 8	√		√	√	90.76	87.84	91.24	64.06	1.23	7.3	3.2
YOLO-RP (Ours)	√	√	√	√	90.62	87.38	90.99	63.84	1.00	3.1	2.8

Table 6. Comparative evaluation of YOLO-RP with lightweight and segmentation-oriented architectures.

Model	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95(%)	Parameter (M)	GFLOPs	Size (MB)
YOLO11n	89.79	87.97	90.95	64.15	2.58	6.3	5.5
GhostNet V2	88.42	83.75	89.11	61.24	2.32	4.8	4.7
ShuffleNet V2	90.06	85.87	89.74	62.33	2.07	4.7	4.2
MobileNet V3	79.80	67.89	76.04	47.11	1.94	1.3	4.0
U-Net V2	89.37	86.83	89.77	62.55	5.05	12.3	9.9
YOLO-RP (Ours)	90.62	87.38	90.99	63.84	1.00	3.1	2.8

Table 7. Performance comparison of YOLO-RP with other representative object detection models.

Model	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95(%)	Parameter (M)	GFLOPs	Size (MB)
Faster-RCNN [32]	85.40	83.20	85.9	53.8	41.18	134.55	315.43
SSD [33]	80.90	79.10	81.5	49.50	14.43	15.76	110.69
YOLOv5n	89.40	86.30	89.60	62.21	2.18	5.8	4.7
YOLOv8n	90.01	86.84	90.06	63.30	2.69	6.8	5.6
YOLOv9t [34]	90.50	87.10	90.60	64.00	1.73	6.5	4.2
YOLOv10n [35]	89.74	86.03	90.33	63.19	2.27	6.5	5.8
YOLO11n	89.79	87.97	90.95	64.15	2.58	6.3	5.5
RT-DETR-r18 [36]	90.31	86.42	85.74	58.66	19.89	57	77
YOLO-RP (Ours)	90.62	87.38	90.99	63.84	1.00	3.1	2.8

Table 8. Comparison of detection performance and model complexity on cross-domain pest datasets.

Datasets	Model	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Parameter (M)	GFLOPs	Size (MB)
Common Rice Pests (Philippines)	YOLO11n	91.84	91.46	93.28	67.17	2.58	6.3	5.5
Common Rice Pests (Philippines)	YOLO-RP (Ours)	92.24	90.67	93.11	66.91	0.99	3.1	2.9
IP102	YOLO11n	57.87	55.58	57.15	36.59	2.65	6.6	5.6
IP102	YOLO-RP (Ours)	57.58	56.69	56.39	35.88	1.02	3.3	2.8
Pest24	YOLO11n	65.28	58.08	60.17	37.70	2.59	6.3	5.5
Pest24	YOLO-RP (Ours)	63.44	57.31	58.28	35.73	1.00	3.1	2.8

Table 9. Inference speed comparison of different models on the Jetson Nano platform.

Device	Model	FPS (f/s)
Jetson Nano	YOLO11n	12.5
Jetson Nano	YOLO-RP (Ours)	20.8

Table 10. Per-class AP50 comparison between YOLO-RP and YOLO11n.

Class Name	AP50 (%)		ΔAP50 (%)
Class Name	YOLO11n	YOLO-RP	ΔAP50 (%)
Asiatic rice borer	88.62	90.32	↑ 1.70
Rice stem fly	96.69	95.76	↓ 0.93
Grain spreader thrips	90.25	88.45	↓ 1.80
Paddy stem maggot	89.91	88.81	↓ 1.10
Rice gall midge	94.95	95.04	↑ 0.09
Rice leaf caterpillar	88.50	87.97	↓ 0.53
Rice leaf hopper	93.14	93.68	↑ 0.54
Rice leaf roller	85.86	88.13	↑ 2.27
Rice shell pest	97.93	97.38	↓ 0.55
Brown plant hopper	87.93	88.43	↑ 0.50
Rice water weevil	91.13	90.69	↓ 0.44
Yellow rice borer	86.50	87.27	↑ 0.77

ΔAP50 indicates the change in AP50 of YOLO-RP relative to YOLO11n; “↑” denotes an increase, while “↓” denotes a decrease.

Table 11. Statistical tests of YOLO-RP versus YOLO11n across multiple random seeds.

Seed	Precision		mAP@0.5		mAP@0.5:0.95
Seed	YOLO11n	YOLO-RP (Ours)	YOLO11n	YOLO-RP (Ours)	YOLO11n	YOLO-RP (Ours)
Seed0	89.79	90.62	90.95	90.99	64.15	63.84
Seed1	89.76	90.34	91.01	91.12	64.16	63.63
Seed2	90.17	90.98	91.04	91.15	64.47	63.78
Seed3	89.97	90.45	90.94	91.08	64.03	64.00
Seed4	89.94	91.32	91.06	91.32	63.93	64.54
Seed5	89.98	90.48	90.76	91.06	64.44	63.91
Seed6	90.92	91.34	91.25	91.28	64.30	63.61
Seed7	90.29	90.54	90.59	90.96	63.28	63.87
Seed8	90.55	90.63	90.82	91.20	63.83	63.49
Seed9	90.56	90.85	90.91	91.12	64.09	63.70
Mean ± 95% CI	90.19 ± 0.27	90.76 ± 0.25	90.93 ± 0.13	91.13 ± 0.08	64.07 ± 0.27	63.88 ± 0.23
p	-	0.0032	-	0.01	-	0.1232
Cohen’s d	-	1.52	-	1.28	-	−0.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; He, Q.; Xie, X.; Dong, M. YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments. Symmetry 2025, 17, 1598. https://doi.org/10.3390/sym17101598

AMA Style

Yang X, He Q, Xie X, Dong M. YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments. Symmetry. 2025; 17(10):1598. https://doi.org/10.3390/sym17101598

Chicago/Turabian Style

Yang, Xiang, Qi He, Xiaolan Xie, and Minggang Dong. 2025. "YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments" Symmetry 17, no. 10: 1598. https://doi.org/10.3390/sym17101598

APA Style

Yang, X., He, Q., Xie, X., & Dong, M. (2025). YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments. Symmetry, 17(10), 1598. https://doi.org/10.3390/sym17101598

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-RP: A Lightweight and Efficient Detection Method for Small Rice Pests in Complex Field Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Data Processing

2.3. YOLO-RP Model

2.3.1. Reconstruction of Detection Layer

2.3.2. LPCHead

2.3.3. DBELCSP

2.3.4. WaveletPool

2.4. Experimental Setup

2.5. Metrics of Evaluation

3. Results

3.1. Comparison Experiment of Different Detection Layers

3.2. Ablation Experiments

3.3. Performance Comparison

3.4. Comparison with Representative Methods

3.5. Generalization Experiments

3.6. Deployment Evaluation

3.7. Detection and Experimental Results Analysis

3.8. Statistical Tests

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI