Lightweight Pepper Disease Detection Based on Improved YOLOv8n

Wu, Yuzhu; Huang, Junjie; Wang, Siji; Bao, Yujian; Wang, Yizhe; Song, Jia; Liu, Wenwu

doi:10.3390/agriengineering7050153

Open AccessArticle

Lightweight Pepper Disease Detection Based on Improved YOLOv8n

by

Yuzhu Wu

¹

,

Junjie Huang

¹

,

Siji Wang

¹,

Yujian Bao

¹,

Yizhe Wang

¹,

Jia Song

² and

Wenwu Liu

^1,*

¹

College of Engineering, Northeast Agricultural University, Harbin 150030, China

²

College of Food, Northeast Agricultural University, Harbin 150030, China

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(5), 153; https://doi.org/10.3390/agriengineering7050153

Submission received: 24 March 2025 / Revised: 25 April 2025 / Accepted: 6 May 2025 / Published: 12 May 2025

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

China is the world’s largest producer of chili peppers, which occupy particularly important economic and social values in various fields such as medicine, food, and industry. However, during its production process, chili peppers are affected by pests and diseases, resulting in significant yield reduction due to the temperature and environment. In this study, a lightweight pepper disease identification method, DD-YOLO, based on the YOLOv8n model, is proposed. First, the deformable convolutional module DCNv2 (Deformable ConvNetsv2) and the inverted residual mobile block iRMB (Inverted Residual Mobile Block) are introduced into the C2F module to improve the accuracy of the sampling range and reduce the computational amount. Secondly, the DySample sampling operator (Dynamic Sample) is integrated into the head network to reduce the amount of data and the complexity of computation. Finally, we use Large Separable Kernel Attention (LSKA) to improve the SPPF module (Spatial Pyramid Pooling Fast) to enhance the performance of multi-scale feature fusion. The experimental results show that the accuracy, recall, and average precision of the DD-YOLO model are 91.6%, 88.9%, and 94.4%, respectively. Compared with the base network YOLOv8n, it improves 6.2, 2.3, and 2.8 percentage points, respectively. The model weight is reduced by 22.6%, and the number of floating-point operations per second is improved by 11.1%. This method provides a technical basis for intensive cultivation and management of chili peppers, as well as efficiently and cost-effectively accomplishing the task of identifying chili pepper pests and diseases.

Keywords:

pepper; disease image recognition; YOLOv8; sampling operator; deformable convolution; inverse residual shift

1. Introduction

As the world’s top three fruits and vegetables, chili peppers rank first in the world in terms of planting area and production in China. They have a unique diversity of values, such as edible, medicinal, and ornamental uses. However, at present, the high morbidity rate and the difficulty of monitoring the initial stage of the disease during the production process have become the key factors restricting the chili pepper industry. Therefore, it is crucial to implement effective measures for the prevention, control, and management of chili pepper diseases and pests [1]. In recent years, with the development of computer technology and the wide application of artificial intelligence, traditional agriculture has been developing toward intelligence and modernization [2]. This shift provides a developing direction for the intensive cultivation of fruits and vegetables. Meanwhile, real-time target pest monitoring can provide intelligent and modernized solutions for the growth process of chili peppers [3]. Moreover, the rapid detection of chili peppers’ pests and diseases based on target recognition has important practical significance for the stability of their yield and the increase in production.

With the rocketing development of computer technology, non-contact target detection technology of computer vision with crops as the main body provides the technical basis for crop intensive management. The two-stage improved RCNN is widely utilized for crop target detection. Su, T et al. [4] constructed a wheat leaf pest and disease identification model by replacing softmax with an improved Local Vector Machine (LSVM) as the model’s classifier, thereby improving its classification capability. It was verified to have an average identification accuracy of 93.68% through experiments. Xi, R et al. [5] Res- 51 NET50 was used as the backbone network of RCNN to construct a potato sprout detection model to improve the recognition ability of multi-scale features, and the model was verified through experiments to have an average recognition accuracy of 97.71%, which was 5.98% higher compared to the original model. Li, Y et al. [6] introduced methods of an improved Shallow Neural Network SCNN-KSVM (Shallow CNN with Kernel SVM) and SCNN-RF (Shallow CNN with Random Forest) to construct a shallow and efficient crop disease model, which was experimentally verified to have an average recognition accuracy of 94%, with a 1000-fold reduction in the number of parameters compared with other traditional deep convolutional neural networks [6]. By introducing transfer learning in CNN, Pattnaik, Gayatri, et al. [7] have constructed a tomato plant pest classification model. This method enhances the model’s ability to recognize multi-scale features. Experimental results show that the model achieves an average recognition accuracy of 88.83%, representing an improvement of 11.63% compared with the original model.

Although the two-stage algorithm can provide higher detection accuracy, the number of parameters is larger and more computationally expensive. Another one-stage model based on SSD and YOLO is faster and less computationally expensive. Liu et al. [8] improved the YOLOv3 model by pooling a pyramid layer fusion of multiscale features, constructed a tomato pest detection model to improve the recognition rate of small feature targets, and verified that the average recognition accuracy of this model reached 92.39% through experiments. Han et al. [8] constructed a disease identification model by using CSPD-arknet as the backbone feature extraction network and replacing the SPD-Conv module with stride-2 convolution, which further improved the speed of feature extraction while guaranteeing the accuracy due to the introduction of the SE mechanism. The average identification accuracy of the model reached 95.7%, regarded as a great reduction in the weight of the model compared with that of the original network. Jinming Zheng et al. [9] improved YOLOv8 by introducing the Convolutional Block Attention Module (CBAM) and replacing the Conv in the neck with Ghost and constructed the cabbage size prediction model YOLOv8n-CK, which was verified to have an average recognition accuracy of 99.2%, and the floating-point operations per second were reduced by 13.04% compared to the pest detection model [10,11].

In recent years, machine vision has also been widely applied to the detection of phenotypic features of chili peppers. Anita S. Kini et al. [12] recognized and classified multiple pests and diseases of chili peppers by combining models; the accuracy of the different classes of diseases could reach 60% to 90% but the model construction process is more complicated: it is not an end-to-end detection method and the detection rate is slower than that of single-stage algorithms. Xuejun Yue et al. [13] improved the YOLOv7 baseline model by improving GhostNetV2 to enhance the backbone network, replacing the feature extraction module with CFNet and introducing the CBAM attention mechanism, which increased the accuracy by 12% compared to the baseline model, but due to the complexity of the baseline model structure, the model still has redundancy and it is not suitable for deployment on edge-enabled devices. Na Ma et al. [14] improved the YOLOv8n baseline model by improving GhostNetV2 to enhance the backbone network, replacing the feature extraction module with CFNet and introducing the CBAM attention mechanism. The baseline model was improved by enhancing the backbone network as well as improving C2f and introducing the CARAFE sampling operator, which improved the average detection accuracy by 2% compared to the baseline model, but there may be data leakage due to blurring of data division.

Aiming to address these problems, this study, based on YOLOv8n, introduces DCNv2 and iRMB in the C2f layer, proposes fusion Dysample to improve the model, and proposes the LSKA attention mechanism. It also proposes a pepper disease detection model, DD-YOLO, which shows good detection effects for small disease features and has a good improvement in the detection speed of the model, so that the model has good detection capability on marginalized embedded devices. This study not only provides technical support for pepper disease monitoring but also promotes the development of intelligent agriculture.

2. Materials and Methods

2.1. Establishment of the Dataset

The images used in the experiment were taken in the chili pepper experimental park of Northeast Agricultural University’s Smart Agriculture, and the acquisition device was a Huawei P50 smartphone with a resolution of 1224 × 2700 pixels. Considering the different light conditions of the growing environment of chili peppers and the shading of their own characteristic conditions, the images were taken under different weather and environmental conditions, including sunny days, after rain, field of view overlap, shading, smooth light, backlight, changing angles and distances, and so on. This was with the aim to enrich the diversity of data and improve the model’s generalization ability as much as possible as well as to improve the recognition ability in complex environments [15,16]. This study expands the dataset by sourcing additional data from the internet. A total of 8987 images were collected, and the images of pepper disease leaves were taken as shown in Figure 1.

In this study, the dataset is augmented using a Generative Adversarial Network (GAN), as illustrated in the overall workflow diagram of dataset construction shown in Figure 2. In the GAN framework, G represents the generator and D represents the discriminator. Through the adversarial process between G and D and the forward propagation of the model, a set of highly plausible data is generated. This augmented dataset is further utilized to enhance the generalization performance of the model through data augmentation techniques. Finally, the augmented dataset is divided into training, validation, and test sets in an 8:1:1 ratio, specifically 80% for training, 10% for validation, and 10% for testing.

The experiment used LabelImg 1.8.6 labeling software to annotate the features of the above dataset to form a sample set, framed all diseased leaves in the image one by one with rectangular boxes, saved as the YOLO “txt” file format, and divided the combined pepper dataset into a training set, validation set, and testing set according to the division ratio of 8:1:1. The training set includes 7189 images, the validation set includes 899 images, and the testing set includes 899 images, totaling 8987 images and 77,076 labels [12]. The division results are shown in Table 1:

2.2. Pepper Target Detection Methods

2.2.1. YOLOv8 Convolutional Network Modeling

YOLOv8 is a target detection algorithm introduced by Ultralytics in January 2023 for tasks such as image classification, object detection, and instance segmentation, which possesses several advantages such as higher detection performance, smaller number of model parameters, faster inference speed, etc. It discards the previous allocation method of IoU allocation or unilateral proportion and adopts the Task-Aligned Assigner positive–negative sample allocation strategy, which significantly improves the model performance. The YOLOv8 network structure consists of four main parts: Input, Backbone, Neck, and Head. The backbone network is mainly responsible for extracting image features, which can be divided into three parts: the standard convolution (Conv) module, C2f module, and SPPF module. Among them, the SPPF module connects the backbone network and the neck network, extracts the pepper features information through different convolution kernels, and then fuses the feature information of different scales. The standard convolution (Conv) module consists of convolutional layers, which are responsible for feature extraction from the input data, and different levels of feature information are extracted through layer-by-layer convolution.

In the YOLOv8 model, the Conv layer can be adapted to different input data and detection tasks by adjusting parameters such as the size, number, and step size of the convolution kernel.

The SPPF module is applied between the feature extraction network (backbone) and the neck network (neck). The feature extraction network is responsible for extracting the feature information of the input image, while the neck network is responsible for fusing and integrating this feature information, effectively fusing feature information at different scales to improve the model’s ability to recognize the target [4,9,17,18] (Vijayakumar, Vairavasundaram, and Applications 2024) (Vijayakumar, Vairavasundaram, and Applications 2024 [2]).

The Head network (Head) consists of a series of convolutional and anti-convolutional layers used to generate detections of diseases. Each detector consists of a set of convolutional and fully connected layers for predicting the bounding box at that scale, and each subtask is responsible for predicting the bounding box at one scale.

2.2.2. Proved DD-YOLO Algorithm

Aiming at the characteristics of pepper pests and diseases with many types and small scales, this study proposes a lightweight pepper disease recognition method, DD-YOLO, based on the YOLOv8n base model. First, by introducing DCNv2 and IRMB into the C2f module, the adaptive sampling of the model is improved; by integrating the DySample ultra-lightweight dynamic up-sampling operator in the header network, the computational volume and model complexity are reduced; by introducing the LSKA attention mechanism, multi-scale feature fusion capability is enhanced and misdetection and omission are reduced [19]. The DD-YOLO model diagram is shown in Figure 3.

(1): C2f-iRBM module

The Inverted Residual Mobile Block (iRMB) combines a lightweight Convolutional Neural Network (CNN) architecture with a highly efficient Multi-Head Self-Attention(MHSA) mechanism to build a new network architecture that maintains the lightweight nature of the model while maximizing the utilization and accuracy of computational resources [20]. iRMB is designed to significantly reduce the number of parameters and computational complexity without sacrificing model performance.

The core design concept of iRMB is that the combination of these two techniques significantly reduces the number of parameters and computational complexity without sacrificing the performance of the model [21]. With the same training configuration, iRMB demonstrates excellent efficacy, utilizing fewer parameters and computational resources to achieve improved performance. Meanwhile, in the lightweight model architecture, iRMB with a single-residual structure has a more obvious performance advantage over the traditional two-residual Transformer structure [22]. With regard to this improvement, this study uses C2f-iRMB as the key feature extraction branch through optimization strategies such as transaction compression, slicing techniques, and microtransactions to reduce the data storage requirements of the model during computation and alleviate the high demand on hardware resources.C2f-iRMB not only inherits iRMB’s feature of maintaining the model’s lightness while improving its performance, but also, through the enhancement characteristics of C2f, it further enhances the model’s ability to capture fine and small targets in rice disease images [22]. This design enables the network to efficiently and accurately extract key features from the images under a limited computational power environment, which effectively improves the performance and efficiency of the model in practical applications. The IRMB structural paradigm is shown in Figure 4.

Compared to standard convolution, the use of depthwise separable convolutions significantly reduces computational complexity. The convolution process is shown in Equation (1). Using DW-Conv instead of regular convolution significantly reduces the computational effort.

O(C_in·C_out·K·K) to O(C_in·K·K + C_in·C_out)

(1)

(2): C2f-DCNv2 module

DCNv2 enhances its ability to focus on relevant image regions by improving modeling capabilities and introducing stronger training strategies. c2f-DCNv2 builds on this foundation by incorporating a network-specific architectural design (the CSP Bottleneck structure). At the heart of the C2f-DCNv2 module is the deformable convolutional layer, which allows the convolutional kernel to deform freely on the input feature map, thus capturing the shape of the target more accurately [23].

In order to deepen the flexibility of deformable convolutional networks in spatial support region processing, a tuning mechanism is introduced here. This mechanism endows the deformable convolutional network module with a dual capability: on the one hand, it dynamically adjusts the spatial offsets used to sense the input features in order to capture more precise feature location information; on the other hand, it also modulates the magnitude of feature components from different spatial locations. This feature allows the module to selectively ignore or exclude signals from specific locations by directly setting the feature amplitude of certain location components to zero in extreme cases [24]. As a result, this spatial location selective ignoring mechanism significantly reduces the impact of the corresponding image content on the module’s output, adding a whole new dimension to the network module’s tuning of the spatial support region. The DCNv2 structure is shown in Figure 5.

The convol ution operation of DCNv2 can be expressed as

y (p) = \sum_{k = 1}^{k} ω_{k} \cdot x (p + p_{k} + Δ p_{k}) \cdot Δ m_{k}

(2)

where

y (p)

denotes the value at position

p_{k}

on the output feature map,

ω_{k}

is the weight of the convolution kernel,

p_{k}

is the fixed offset of the kth sampling point in the convolution kernel,

Δ p_{k}

is the sampling point offset, and

Δ m_{k}

is the modulation coefficient.

(3): Fusion of DySample sampling operators

DySample is a computationally efficient and ultra-lightweight dynamic upsampling operator that adopts a point-based sampling strategy [24]. Unlike kernel-based upsampling methods such as FADE and SAPA—which rely on high-resolution guidance features and require custom CUDA implementations—DySample is hardware-agnostic and greatly reduces the number of parameters, FLOPs, memory usage, and inference latency. Despite its simplicity, DySample exhibits competitive or even superior performance in object detection tasks when compared to conventional upsamplers.

In this study, DySample is integrated into the feature extraction stage to enhance the overall computational efficiency of the model. Specifically, it replaces traditional sampling modules in the main data flow branch through class inheritance, resulting in reduced computational cost and memory consumption during feature map scaling. This substitution further contributes to the lightweight design of the model while effectively preserving detection accuracy. The structure of the DySample module is shown in Figure 6.

The offset calculation of the sampling points is formulated as

O = linear(X)

(3)

S = G + O

(4)

where G is the original network, O represents the offset, S represents the up-adopted features, and S represents the input features

(4): Add the Attention Mechanism LSKA

The Large Selective Kernel Attention (LSKA) module decomposes standard 2D convolutions into cascaded horizontal and vertical 1D depth-wise convolutions, significantly reducing computational and memory costs [25]. Unlike LKA, LSKA incorporates both spatial and channel attention mechanisms while maintaining long-range dependencies. By leveraging kernel factorization, LSKA avoids the quadratic increase in computation associated with large kernels, achieving comparable performance to LKA with greater efficiency. The structure of LSKA is illustrated in Figure 7.

In this study, we propose the integration of the Large Selective Kernel Attention (LSKA) mechanism into the YOLOv8 architecture to enhance both the recognition accuracy of pepper diseases and the model’s computational efficiency. Specifically, we replace the standard convolutional operation in the C2f module of the backbone with the LSKA module, enabling adaptive receptive field selection to enhance the fusion of multi-scale features. Furthermore, we design a novel SPPF-LSKA module, which embeds LSKA into the Spatial Pyramid Pooling-Fast (SPPF) structure to strengthen the model’s capacity for capturing long-range dependencies and salient spatial features. These enhancements allow for more efficient feature representation while reducing floating-point operations (FLOPs), thereby achieving a favorable trade-off between accuracy and computational complexity.

Formally, the LSKA module applies multiple depth-wise dilated convolutions with varying kernel sizes in parallel.

y = \sum_{i = 1}^{n} a_{i} \cdot D W - {D C o n v}_{i} (x)

(5)

The attention weights are computed based on global feature descriptors through a softmax function, allowing the model to dynamically emphasize informative scales. This design helps to effectively capture semantic variations in agricultural disease features across different spatial resolutions.

2.3. Perimental Platform and Parameter Settings

In this study, all models are tested under the same conditions to ensure unbiased testing. We also include ablation experiments, where we remove or “ablate” (set the weights to zero) certain layers or parameters in the model to observe the effect on the model performance, in order to better understand the role of each layer and parameter in the network and its impact. In addition, we introduce a comparison test to compare the YOLOv8 base network, the improved network, and other target detection models to further evaluate the performance differences between the different models, thus helping us to select the most appropriate model.

This experiment was conducted under the 64-bit operating system Windows, and the detailed configuration of the server is shown in Table 2:

3. Results

3.1. Analysis of the Rationality of the Model Improvement Method

3.1.1. Generalization Validation Experiment of GAN

To further assess the impact of GAN-based data augmentation on model generalization, we conducted a comparative experiment between two models: one trained on the original dataset (without GAN augmentation) and another trained on a dataset augmented with synthetic samples generated by a Generative Adversarial Network (GAN). The experimental results are presented in Table 3.

As shown in the table, the model trained with GAN-enhanced data exhibited a slight decrease in precision, recall, and mAP@0.5 compared to the model trained on the original dataset. Specifically, the GAN-augmented model achieved a precision of 92.9%, a recall of 92.3%, and an mAP@0.5 of 96.3%, while the model without GAN augmentation achieved 91.6%, 88.9%, and 94.4%, respectively.

3.1.2. Comparison of Other Lightweight Model Backbone Networks

In this study, on the base network of YOLOv8, the replacement of the backbone network is carried out and the training set is trained while keeping all the parameter settings unchanged [11,25]. In this study, the training results are compared and analyzed, and the specific comparison results are shown in Table 4.

As can be seen from Table 3, the accuracy, recall, and average precision of the model after using the C2f-iRMB improvement method are all at a high level, in which the precision is improved by 9.3 percentage points compared to that of the CHOSTHGNetV2 backbone network. In terms of the model computing speed, C2f-iRMB is slightly lower than C2f-AKConv and C2f-MSBlock, but its detection accuracy is remarkable, and the GFLOPs of the C2f-iRMB model are only slightly lower than that of C2f-AKConv by 0.2. Overall, our model improves the inference speed while maintaining a high level of accuracy [26,27].

3.1.3. Visualization of C2f-DCNv2 Model Features

In order to gain a deeper understanding of the ways in which the deformable convolutional module DCNv2 enhances its recognition capabilities, we have taken the step of visualizing the operation by mapping its features, a measure that allows us to see more clearly how the key image regions on which its internal layers depend vary.

As can be observed from Figure 8, with the gradual deepening of the number of network layers, the features extracted in each layer undergo a shift from concrete to abstract. Specifically, the activation output of the initial layer retains more of the original visual content information of the image, which is gradually compressed and abstracted as the layers are incremented to focus more on category-related feature information. This demonstrates how the deformable convolutional module DCNv2 can enhance the recognition of targets by gradually building up target features [28]. The feature map is shown in Figure 8.

3.1.4. Dysample Heat Map

The effectiveness of the model fusion network DCNv2 in enhancing feature target recognition and detection is further demonstrated by drawing heat maps. The heat map can visualize how much attention the model pays to different regions of the image and its activation status in the recognition task, as well as the learning depth of the model on the target features through the brightness and depth of the colors. By observing the color changes in the heat map, we can deeply understand how the Dysample Network accurately captures and reinforces the key target information [29]. The Dysample structure is shown in Figure 9.

From Figure 9, it can be seen that after adding the lightweight DySample sampling operator, the model’s capture of the correct target is improved, which is shown by the fact that the color of the correct target is brighter than the others, and the color of some regions is redder, which shows that the model pays attention to the target region in the image and the confidence of the prediction results is higher [30].

3.2. Ablation Experiments

From Table 5, it can be seen that the average accuracy of the model is improved after the iRBM module is added to Test 3; while the number of parameters and the model size are greatly reduced, the model inference speed is improved and the number of floating-point operations per second is significantly reduced. With regard to Test 4, after incorporating the Dysample sampling operator, the accuracy and so on are increased dramatically. With regard to Test 5, after adding the LSKA attention mechanism, the model’s precision, recall, and average accuracy are all improved.

After incorporating the DCNV2 module into the base network in Experiment 2, the precision of the model was improved but the recall and average precision were reduced, while the weights and number of parameters of the model were slightly increased and the GFLOPs were reduced due to the complex structure of the DCNV2 module, suggesting that DCNV2 can improve the fit of the model.

In Experiment 6, upon adding the iRMB module on top of the DCNV2 module, although the average accuracy is slightly reduced, the number of parameters of the model becomes significantly reduced and the model volume is reduced, which indicates that on top of the DCNV2 module, the iRMB module is able to optimize the expression ability of the features by assigning a higher weight to them at important features. Trial 7 reduces the accuracy by 1.6 percentage points after replacing the original. The DCNV2 module with the DySample sampling operator and the number of model parameters is further reduced. Tests 8, 9, 10, and 11 compare the four module combination tests, and the average accuracy and weight of the model slightly increase or decrease, indicating that a single combination of the two does not maximize the optimization model, and the model is not stable.

In trials 12, 13, and 14, based on the addition of the DySample sampling operator, the DCNV2 module, the iRMB module, and the LSKA attention mechanism are sequentially combined to add the DCNV2 module, the iRMB module, and the LSKA attention mechanism, and the model’s precision and average precision are significantly improved. In trial 14, the recall rate is increased by 1.7 and 6.2 percentage points compared to trials 12 and 13, respectively, and the average precision is increased by 2.3 and 3.7 percentage points, respectively. Meanwhile, the model volume performed the best, the number of parameters was minimized, and the GFLOPs were significantly reduced.

In Experiment 16, compared to the baseline network, the improved model increased precision by 6.2 percentage points, recall by 2.3 percentage points, average precision by 2.8 percentage points, model weight by 22.6%, and number of floating-point operations per second by 11.1%. The volume of deployment is greatly reduced and can be easily integrated into modern network architectures. This enables deep neural networks to be deployed on embedded edge devices with limited computational performance and storage space, making the leap from academia to industry.

The detection results for each disease are shown in Table 6. The DD-YOLO model demonstrates a high overall performance in the task of pepper disease detection. With an overall precision of 92.9% and recall of 92.3%, the model shows a good balance in detecting pepper diseases.

3.3. Detection Model Comparison Experiment

In order to further demonstrate the superior performance of our model, we conduct comparison experiments with other mainstream lightweight target detection models, including SSD, Faster-RCNN, YOLOv5n, and YOLOv8n, while keeping the other parameters consistent [31]. The results are shown in Table 7.

As can be seen from Table 6, the improved algorithm of this study occupies the biggest advantage in detection accuracy compared to other algorithms of SSD, Faster RCNN, YOLOv5n, and YOLOv8n, which is 91.6%, and improves by 28.9 and 6.2 percentage points compared to YOLOv5n and YOLOv8n base networks. In terms of mAP@0.5 to 0.95, the improved model in this study is slightly lower than SSD, but the improved model has a higher recall rate, indicating that our model has good coverage ability and high sensitivity to positive cases, which can better capture the actual presence of positive cases and reduce underreporting and underdetection. Compared to YOLOv7-tiny and MobileNet-SSD, the proposed model also shows superior performance, with respective precision gains of 9.2 and 19.8 percentage points. While YOLOv10n performs relatively well, achieving a precision of 88.2%, our model still outperforms it by 3.4 percentage points, indicating a consistent advantage in accurate object localization and classification. Overall, for pepper pests, the present improved model outperforms other mainstream algorithms.

3.4. Detection Comparison

In order to further verify the detection capability of DD-YOLO in real complex environments, this study took pictures of the test field where there are multiple target sample points for testing, and the pairs of detection results are shown in Figure 10.

As can be seen from Figure 10, the base model YOLOv8 has omissions when dealing with multiple targets, repeated detection, and a larger number of targets that fail to be detected. With regard to DD-YOLO for multi-target data, it has fewer targets missed, is better compared to the former effect performance, is able to detect small target sample points, and the detection speed is greatly improved. It can be seen that DD-YOLO detection ability is better than YOLOv8 [32,33].

As can be seen from Figure 10, the base model YOLOv8 has omissions when dealing with multiple targets, repeated detection, and a larger number of targets that fail to be detected. With regard to DD-YOLO for multi-target data, there are fewer targets missed, it is better compared to the former effect performance, is able to detect the small target sample points, and the detection speed is greatly improved. It can be seen that the DD-YOLO detection ability is better than YOLOv8 [32].

3.5. Lightweight Deployment

To enable the deployment and application of the modified model, we selected the TensorRT inference deployment framework and deployed the improved model on the Jetson Nano edge computing platform. The process is as follows: first, the weight file best.pt from the improved model training is exported to an intermediate file in ONNX format; then, the v8trans.py script in the infer module is used to add a Transpose node before the output of the original ONNX, converting it to best.transd.onnx; and finally, the trtexec tool is used directly to generate an engine. The resulting model can perform high-performance inference, is not complicated to encapsulate, and achieves complete decoupling. The results of lightweight model deployment are presented in Table 8.

4. Discussion

This study proposes an efficient detection method for identifying pests and diseases on chili pepper leaves, aiming to enhance management efficiency in chili cultivation processes. By implementing three targeted module enhancements based on YOLOv8n, the optimized model demonstrates a significantly improved capability to capture irregular lesion features while reducing both parameter count and computational complexity. Specifically, the proposed architecture achieves an mAP@0.5 of 94.4%, coupled with a 22.6% reduction in parameter count and an 11.1% optimization in GFLOPs. This optimization strategy enables the model to surpass state-of-the-art counterparts such as YOLOv5 and Faster R-CNN in comprehensive performance metrics, particularly in balancing detection accuracy and computational efficiency for agricultural applications.

However, two critical limitations merit further discussion.

(1) The insufficiency of original dataset images, compounded by the high costs associated with artificially inducing disease in chili plants (e.g., pathogen inoculation) and labor-intensive data collection, necessitates future research to prioritize efficient dataset expansion strategies. Specifically, exploring synthetic data generation via Generative Adversarial Networks (GANs) or leveraging unsupervised domain adaptation for cross-species transfer learning could mitigate data scarcity challenges;

(2) While the current model size of 4.8 MB meets the deployment requirements for resource-constrained embedded edge devices, further performance optimization could be achieved through advanced model compression techniques. Such refinements would enhance compatibility with ultra-low-power IoT nodes while maintaining detection robustness in field environments.

Regarding dataset augmentation challenges, the advent of Generative Adversarial Networks (GANs) offers a novel direction for addressing data scarcity [34]. Prior studies have validated the feasibility of GAN-based synthetic data generation from limited original datasets [35,36,37]. In this work, GAN-augmented training data demonstrated exceptional performance in enhancing detection metrics for newly generated samples. However, the synthetic data exhibited an adverse effect on the model’s generalization capabilities, particularly when deployed in real-world field conditions with natural lighting variations. In future work, we aim to enhance generalization by incorporating domain adaptation strategies and exploring cross-regional datasets that reflect broader agro-ecological conditions. This will help mitigate performance degradation under novel lighting, occlusion, or disease conditions not seen during training. Future studies will focus on systematically optimizing GAN architectures (e.g., integrating domain-specific lesion texture constraints) and developing adversarial robustness modules to mitigate these generalization deficits. The loss function of the GAN model and the probability of its generation are shown in Table 9.

To further analyze the impact of GAN-based augmentation on the model’s detection robustness, we compared the false positive rates (FPRs) across different disease classes before and after GAN augmentation. As shown in Table 10, the FPR was significantly reduced for all disease types after introducing GAN-augmented training samples. Notably, the FPR of “pepper yellowish” decreased from 12.93% to 1.26%, and “leaf curl of pepper” dropped from 8.38% to 3.29%, indicating enhanced feature discrimination and reduced overfitting to noisy backgrounds. These results suggest that GAN augmentation not only enriches the data diversity but also improves the model’s resilience to image noise and feature ambiguity, thereby enhancing practical robustness.

Recent advancements in model compression through pruning have demonstrated significant potential for agricultural applications [38,39]. For instance, Lei Shen et al. [40] developed a grape cluster counting model by implementing structured pruning on YOLOv5s, achieving a streamlined architecture with a video processing speed of 50.4 frames per second (FPS), thereby fulfilling real-time field deployment requirements. Similarly, Shuxiang Fan et al. [41] applied channel pruning to YOLOv4 for apple defect detection, reducing the model size by 241.24 megabytes (MB) and inference time by 10.82 milliseconds (ms), while simultaneously improving the mean average precision (mAP) from 91.82% to 93.74%. These cases collectively underscore pruning’s dual capability to enhance computational efficiency without compromising detection accuracy—a critical advantage for edge-based agricultural systems operating under hardware constraints.

These technological advancements provide strategic directions for dataset construction and model optimization, paving the way for more intelligent agricultural research paradigms. Future efforts will integrate emerging techniques, such as federated learning for decentralized data harmonization, while expanding the geographical diversity of training data, particularly by incorporating mountainous chili disease datasets from major cultivation regions like Chongqing and Sichuan, China. Enhancing dataset heterogeneity through multi-environment sampling (e.g., varying altitudes, microclimates) will improve model robustness against agro-ecological variabilities, thereby facilitating efficient deployment in complex in-field detection scenarios. Concurrently, hybrid compression frameworks combining pruning, quantization, and knowledge distillation will be explored to achieve sub-5MB model footprints without sacrificing diagnostic accuracy under resource-constrained edge computing environments.

5. Conclusions

(1) This study establishes an image dataset of chili pepper pests and diseases and enriches the chili pepper dataset by mirroring the images horizontally and vertically, as well as by flipping the preprocessing method to prevent the dataset from being too small and too poor, which could lead to overfitting and poor generalization;

(2) A lightweight DD-YOLO rice disease detection model is proposed, introducing the iRMB module and DCNv2 module in the backbone network, fusing the DySample up-sampling operator in the model head, and adding the attention mechanism LSKA to improve the SPPF. Verified by ablation experiments, the accuracy and the average precision are significantly improved compared with the original model, with an increase of 6.2% and 2.8%. At the same time, the number of model parameters is greatly reduced, and the weight is reduced to 4.8MB, which reflects the lightweight advantage and meets the deployment requirements of edge devices;

(3) By introducing different feature extraction networks, such as HGNetV2, postHGNetV2, etc., and comparing them with C2f-iRMB, as well as plotting the transformation of feature information of the visualized feature maps in the deformable convolutional module (DCNv2), it is verified that the introduction of the iRMB module as well as the DCNv2 module can improve the comprehensive performance of the model; by comparing the heat map of recognition features before and after the introduction of the DySample by comparing the heat maps of the recognized features before and after the introduction of DySample, it is verified that the comprehensive performance of the model can be greatly improved by observing the depth of the features.

Author Contributions

Conceptualization, W.L. and J.H.; methodology, Y.W. (Yuzhu Wu) and J.H.; software, Y.W. (Yuzhu Wu); validation, J.H., S.W., Y.B. and Y.W. (Yizhe Wang); formal analysis, Y.W. (Yuzhu Wu); investigation, S.W.; resources, W.L.; data curation, Y.W. (Yuzhu Wu); writing—original draft preparation, S.W.; writing—review and editing, Y.W. (Yuzhu Wu), J.H., Y.B., Y.W. (Yizhe Wang) and J.S.; visualization, Y.W. (Yuzhu Wu); supervision, Y.W. (Yuzhu Wu) and J.H.; project administration, W.L.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Major Science and Technology Projects of Nanning City (Grant No. 20221242). The APC was funded by Wenwu Liu.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, S.; Xiu, Y.; Kong, J.; Yang, C.; Zhao, C. An effective pyramid neural network based on graph-related attentions structure for fine-grained disease and pest identification in intelligent agriculture. Agriculture 2023, 13, 567. [Google Scholar] [CrossRef]
Franczuk, J.; Tartanus, M.; Rosa, R.; Zaniewicz-Bajkowska, A.; Dębski, H.; Andrejiová, A.; Dydiv, A. The effect of mycorrhiza fungi and various mineral fertilizer levels on the growth, yield, and nutritional 450 value of sweet pepper (Capsicum annuum L.). Agriculture 2023, 13, 857. [Google Scholar] [CrossRef]
Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea tree pest detection algorithm based on improved YOLOv7-Tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]
Su, T.; Mu, S.; Shi, A.; Cao, Z.; Dong, M. A CNN-LSVM model for imbalanced images identification of wheat leaf. Neural Netw. World 2019, 29, 345–361. [Google Scholar] [CrossRef]
Xi, R.; Hou, J.; Lou, W. Potato bud detection with improved faster R-CNN. Trans. ASABE 2020, 63, 557–569. [Google Scholar] [CrossRef]
Li, Y.; Nie, J.; Chao, X. Do we really need deep CNN for plant diseases identification? Comput. Electron. Agric. 2020, 178, 105803. [Google Scholar] [CrossRef]
Pattnaik, G.; Shrivastava, V.K.; Parvathi, K. Transfer learning-based framework for classification of pest in tomato plants. Appl. Artif. Intell. 2020, 34, 981–993. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Tomato diseases and pests detection based on improved YOLO V3 convolutional neural network. Front. Plant science 2020, 11, 898. [Google Scholar] [CrossRef]
Zheng, J.; Wang, X.; Shi, Y.; Zhang, X.; Wu, Y.; Wang, D.; Huang, X.; Wang, Y.; Wang, J.; Zhang, J. Keypoint detection and diameter estimation of cabbage (Brassica oleracea L.) heads under varying occlusion degrees via YOLOv8n-CK network. Comput. Electron. Agric. 2024, 226, 109428. [Google Scholar] [CrossRef]
Gai, R.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2023, 35, 13895–13906. [Google Scholar] [CrossRef]
Wu, D.; Lv, S.; Jiang, M.; Song, H. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electron. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
Kini, A.S.; Prema, K.V.; Pai, S.N. Early stage black pepper leaf disease prediction based on transfer learning using ConvNets. Sci. Rep. 2024, 14, 1404. [Google Scholar]
Yue, X.; Li, H.; Song, Q.; Zeng, F.; Zheng, J.; Ding, Z.; Kang, G.; Cai, Y.; Lin, Y.; Xu, X.; et al. YOLOv7-GCA: A Lightweight and High-Performance Model for Pepper Disease Detection. Agronomy 2024, 14, 618. [Google Scholar] [CrossRef]
Ma, N.; Wu, Y.; Bo, Y.; Yan, H. Chili pepper object detection method based on improved YOLOv8n. Plants 2024, 13, 2402. [Google Scholar] [CrossRef] [PubMed]
Teixeira, A.C.; Ribeiro, J.; Morais, R.; Sousa, J.J.; Cunha, A. A systematic review on automatic insect detection using deep learning. Agriculture 2023, 13, 713. [Google Scholar] [CrossRef]
Liu, Z.; Abeyrathna, R.R.D.; Sampurno, R.M.; Nakaguchi, V.M.; Ahamed, T. Faster-YOLO-AP: A lightweight apple detection algorithm based on improved YOLOv8 with a new efficient PDWConv in orchard. Comput. Electron. Agric. 2024, 223, 109118. [Google Scholar] [CrossRef]
Ma, Z.; Wang, Y.; Zhang, T.; Wang, H.; Jia, Y.; Gao, R.; Su, Z. Maize leaf disease identification using deep transfer convolutional neural networks. Int. J. Agric. Biol. Eng. 2022, 15, 187–195. [Google Scholar] [CrossRef]
Vijayakumar, A.; Vairavasundaram, S. YOLO-based object detection models: A review and its applications. Multimedia. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
Wang, Q.; Cheng, M.; Huang, S.; Cai, Z.; Zhang, J.; Yuan, H. A deep learning approach incorporating YOLO v5 and attention mechanisms for field real-time detection of the invasive weed Solanum rostratum Dunal seedlings. Comput. Electron. Agric. 2022, 199, 107194. [Google Scholar] [CrossRef]
Zhang, J.; Li, X.; Li, J.; Liu, L.; Xue, Z.; Zhang, B.; Jiang, Z.; Huang, T.; Wang, Y.; Wang, C. Rethinking Mobile Block for Efficient Attention-based Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 1389–1400. [Google Scholar]
Lu, J.; Tan, L.; Jiang, H. Review on convolutional neural network (CNN) applied to plant leaf disease classification. Agriculture 2021, 11, 707. [Google Scholar] [CrossRef]
Li, B.; Huang, S.; Zhong, G. LTEA-YOLO: An Improved YOLOv5s Model for Small Object Detection. IEEE Access 2024, 12, 99768–99778. [Google Scholar] [CrossRef]
Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023. [Google Scholar]
Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Jin, X.; Sun, Y.; Che, J.; Bagavathiannan, M.; Yu, J.; Chen, Y. A novel deep learning-based method for detection of weeds in vegetables. Pest Manag. Sci. 2022, 78, 1861–1869. [Google Scholar] [CrossRef]
Wang, D.; He, D. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Biosyst. Eng. 2021, 210, 271–281. [Google Scholar] [CrossRef]
Yang, S.; Xing, Z.; Wang, H.; Dong, X.; Gao, X.; Liu, Z.; Zhang, X.; Li, S.; Zhao, Y. Maize-YOLO: A new high-precision and real-time method for maize pest detection. Insects 2023, 14, 278. [Google Scholar] [CrossRef]
Gao, J.; French, A.P.; Pound, M.P.; He, Y.; Pridmore, T.P.; Pieters, J.G. Deep convolutional neural networks for image-based Convolvulus sepium detection in sugar beet fields. Plant Methods 2020, 16, 1–12. [Google Scholar] [CrossRef]
Liu, G.; Nouaze, J.C.; Mbouembe, P.L.T.; Kim, J.H. YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3. Sensors 2020, 20, 2145. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Zhou, X.; Yi, J.; Xie, G.; Jia, Y.; Xu, G.; Sun, M. Human detection algorithm based on improved YOLO v4. Inf. Technol. Control. 2022, 51, 485–498. [Google Scholar] [CrossRef]
Zhang, W.; Gao, X.-Z.; Yang, C.-F.; Jiang, F.; Chen, Z.-Y. A object detection and tracking method for security in intelligence of unmanned surface vehicles. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 1279–1291. [Google Scholar] [CrossRef]
Creswell, A.; Bharath, A.A. Inverting The Generator Of A Generative Adversarial Network. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1967–1974. [Google Scholar] [CrossRef] [PubMed]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
Motamed, S.; Rogalla, P.; Khalvati, F. Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images. Inform. Med. Unlocked 2021, 27, 100779. [Google Scholar] [CrossRef] [PubMed]
Kaur, P.; Khehra, B.S.; Mavi, E.B.S. Data augmentation for object detection: A review. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 537–543. [Google Scholar]
Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2736–2744. [Google Scholar]
Shen, L.; Su, J.; He, R.; Song, L.; Huang, R.; Fang, Y.; Song, Y.; Su, B. Real-time tracking and counting of grape clusters in the field based on channel pruning with YOLOv5s. Comput. Electron. Agric. 2023, 206, 107662. [Google Scholar] [CrossRef]
Fan, S.; Liang, X.; Huang, W.; Zhang, V.J.; Pang, Q.; He, X.; Li, L.; Zhang, C. Real-time defects detection for apple sorting using NIR cameras with pruning-based YOLOV4 network. Comput. Electron. Agric. 2022, 193, 106715. [Google Scholar] [CrossRef]

Figure 1. Example of a dataset on pepper pests and diseases.

Figure 2. Graphical representation of data processing methods.

Figure 3. DD-YOLO model diagram. Note: Conv is convolution; Concat is the feature connection module; DySample is the ultra-lightweight dynamic upsampler; SPPF_lSKA is the spatial pyramid pooling module that introduces the LSKA attention mechanism; C2f_iRBM and C2f_DCNv2 are both the C2f modules with partial convolution added to them; Detect is the detection head; Bbox.Loss and Cls. Cls.Loss are the bounding box loss and classification loss functions, respectively; p3, p4, and p5 are the small, medium, and large feature map sizes, respectively.

Figure 4. iRMB structural paradigm. Note: DW-Conv is Deep Separable Convolutional Neural Network; Attn Mat is Attention Mechanism Matrix; Q is Query; K is Key; V is Value.

Figure 5. Schematic diagram of the DCNv2 structure. Note: The term “Feature Crossing” in the figure denotes the feature interaction mechanism within the model, which is used to generate feature cross terms. “Bias” refers to the learnable bias term in the model, which is used to adjust the model output.

Figure 6. DySample structure. Note: H is the h eight of the feature map; W is the width of the feature map; pixel shuffle is the image super-resolution reconstruction. O for Sampling Offset.

Figure 7. Schematic diagram of the LSKA structure. Note: Max-Pool2d is maximum pooling; both DW-D-Conv and DW-Conv are depth separable convolutional neural networks.

Figure 8. Comparison of model feature visualization.

Figure 9. Comparison of model heat map visualization.

Figure 10. Real-time object detection image.

Table 1. Distribution of the chili pepper dataset.

Class	Train	Val	Test	All	Labels
leaf curl of pepper	1185	148	148	1481	10,494
pepper whitefly	1119	140	141	1400	11,568
pepper yellowish	1314	164	165	1643	13,608
leaf spot of pepper	1199	151	150	1500	13,860
powdery mildew of pepper	1156	145	144	1445	14,010
pepper scab	1216	151	151	1518	13,536
All	7189	899	899	8987	77,076

Table 2. Server configuration.

Configuration	Specific model
mirroring	PyTorch 1.11.0 Python 3.8 (ubuntu20.04) Cuda 11.3
GPU	RTX 2080 Ti (11 GB) × 1
CPU	12 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz

Table 3. Generalization Validation Experiment of GAN.

GAN	Precision P/%	Recall R/%	mAP50/%	Weights/MB	Parameters	GFLOPs
×	91.6	88.9	94.4	4.8	2,296,804	7.2
√	92.9	92.3	96.3	4.8	2,296,804	7.2

Note: √ indicates that the algorithm is used; × indicates that the algorithm is not used.

Table 4. Comparison of different lightweight feature extraction backbone networks.

Backbone Network	P/%	R/%	mAP/%	GFLOPs
C2f-iRMB	88.7	86.3	91.7	7.0
HGNetV2	82.1	78.5	84.6	6.9
chostHGNetV2	80.7	77.3	83.2	6.8
RepHGNetV2	81.5	77.9	84.8	6.9
C2f-AKConv	85.8	82.1	88.2	7.2
C2f-MSBlock	83.4	79.7	86.3	7.7

Table 5. Ablation experiments.

Test No.	DCNV2	iRMB	DySample	LSKA	Precision P/%	Recall R/%	mAP50/%	Weights/MB	Parameters	GFLOPs
1	×	×	×	×	86.1	87.9	92.9	6.2	3,006,818	8.1
2	√	×	×	×	91.2	90.5	95.8	6.3	3,038,205	8.0
3	×	√	×	×	88.7	86.3	91.7	5.5	2,634,194	7.0
4	×	×	√	×	91.8	91.2	95.6	6.3	3,019,170	8.1
5	×	×	×	√	91.3	90.9	95.1	6.8	3,279,714	8.3
6	√	√	×	×	89.7	88.1	93.9	6.0	2,893,549	7.1
7	×	√	√	×	89.6	86.2	92.3	5.5	2,646,546	7.1
8	×	×	√	√	91.5	90.3	95.0	6.8	3,292,066	8.3
9	√	×	√	×	91.4	90.4	95.2	6.3	3,050,557	8.0
10	√	×	×	√	92.7	90.6	95.7	6.8	3,311,101	8.2
11	×	√	×	√	88.2	85.9	91.5	6.1	2,907,090	7.3
12	√	√	√	×	89.3	88.5	93.1	6.0	2,905,901	7.1
13	×	√	√	√	89.4	85.2	91.9	6.1	2,919,442	7.3
14	√	×	√	√	91.9	90.1	95.3	6.9	3,323,453	8.2
15	√	√	×	√	89.1	87.1	92.7	6.6	3,166,445	7.3
16	√	√	√	√	92.9	92.3	96.3	4.8	2,296,804	7.2

Note: √ indicates that the algorithm is used; × indicates that the algorithm is not used.

Table 6. Specific detection performance of the model.

Class	Precision/%	Recall/%	mAP50/%	mAP50-95/%
all	92.9	92.3	96.3	76.5
leaf curl of pepper	87.2	81.5	90.8	56.1
pepper whitefly	93.7	92.9	97.5	71.6
pepper yellowish	86.2	82.7	91.8	55.2
leaf spot of pepper	93.1	97.2	99.0	88.0
powdery milder of pepper	97.9	99.6	99.4	91.0
pepper scab	99.4	100.0	99.5	97.2

Table 7. Comparison of different models of pepper disease detection results.

Models	Precision %	Recall %	mAP@0.5 %	mAP@0.5~0.95 %
SSD	86.0	80.6	86.2	82.8
Faster-RCNN	62.3	93.8	88.5	73.6
MobileNet-SSD	71.8	66.3	77.7	50.2
YOLOv5n	64.9	70.2	72.1	41.8
YOLOv7-tiny	82.4	79.3	85.7	58.5
YOLOv10n	88.2	85.0	91.9	71.2
YOLOv8n	86.1	87.9	92.9	72.7
DD-YOLO	92.9	92.3	96.3	76.5

Table 8. Results of lightweight deployment.

Models	Weights/MB	Precision/%	Recall/%	mAP50/%	Detection Latency/ms	TensorRT Acceleration
YOLOv8n	6.2	86.1	87.9	92.9	150.7	×
YOLOv8n	27.5	85.7	87.5	92.9	77.2	√
DD-YOLO	4.8	92.9	92.3	96.3	126.4	×
DD-YOLO	18.1	92.6	92.3	96.3	67.6	√

Note: √ indicates that the algorithm is used; × indicates that the algorithm is not used.

Table 9. The loss function of the GAN model and the probability of its generation.

Model	Loss_D	Loss_G	D (x)	D (G(z))
GAN	0.0004~4.6782	2.74~28.27	0.78~1.00	0.0000~0.0948

Table 10. False positive rates before and after GAN augmentation for each disease class.

Disease Class	FPR (Before GAN)/%	FPR (After GAN)/%
Leaf curl of pepper	8.38	3.29
Pepper whitefly	10.37	5.39
Pepper yellowish	12.93	1.26
Leaf spot of pepper	11.25	3.11
Powdery mildew of pepper	11.78	3.83
Pepper scab	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Y.; Huang, J.; Wang, S.; Bao, Y.; Wang, Y.; Song, J.; Liu, W. Lightweight Pepper Disease Detection Based on Improved YOLOv8n. AgriEngineering 2025, 7, 153. https://doi.org/10.3390/agriengineering7050153

AMA Style

Wu Y, Huang J, Wang S, Bao Y, Wang Y, Song J, Liu W. Lightweight Pepper Disease Detection Based on Improved YOLOv8n. AgriEngineering. 2025; 7(5):153. https://doi.org/10.3390/agriengineering7050153

Chicago/Turabian Style

Wu, Yuzhu, Junjie Huang, Siji Wang, Yujian Bao, Yizhe Wang, Jia Song, and Wenwu Liu. 2025. "Lightweight Pepper Disease Detection Based on Improved YOLOv8n" AgriEngineering 7, no. 5: 153. https://doi.org/10.3390/agriengineering7050153

APA Style

Wu, Y., Huang, J., Wang, S., Bao, Y., Wang, Y., Song, J., & Liu, W. (2025). Lightweight Pepper Disease Detection Based on Improved YOLOv8n. AgriEngineering, 7(5), 153. https://doi.org/10.3390/agriengineering7050153

Article Menu

Lightweight Pepper Disease Detection Based on Improved YOLOv8n

Abstract

1. Introduction

2. Materials and Methods

2.1. Establishment of the Dataset

2.2. Pepper Target Detection Methods

2.2.1. YOLOv8 Convolutional Network Modeling

2.2.2. Proved DD-YOLO Algorithm

2.3. Perimental Platform and Parameter Settings

3. Results

3.1. Analysis of the Rationality of the Model Improvement Method

3.1.1. Generalization Validation Experiment of GAN

3.1.2. Comparison of Other Lightweight Model Backbone Networks

3.1.3. Visualization of C2f-DCNv2 Model Features

3.1.4. Dysample Heat Map

3.2. Ablation Experiments

3.3. Detection Model Comparison Experiment

3.4. Detection Comparison

3.5. Lightweight Deployment

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI