DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s

Liu, Haoran; Wang, Yu; Zhai, Changyuan; Wu, Huarui; Fu, Hao; Feng, Haiping; Zhao, Xueguan

doi:10.3390/agronomy15102361

Open AccessArticle

DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s

by

Haoran Liu

^1,2,

Yu Wang

²

,

Changyuan Zhai

^1,3

,

Huarui Wu

^1,3,

Hao Fu

³,

Haiping Feng

^4,* and

Xueguan Zhao

^1,3,*

¹

Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

²

School of Mechanical Engineering and Automation, University of Science and Technology Liaoning, Anshan 114051, China

³

National Engineering Research Center of Intelligent Equipment for Agriculture (NERCIEA), Beijing 100097, China

⁴

Institute of Horticulture, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan 750002, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(10), 2361; https://doi.org/10.3390/agronomy15102361

Submission received: 31 August 2025 / Revised: 2 October 2025 / Accepted: 6 October 2025 / Published: 9 October 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Addressing the challenges of multi-scene precision pesticide application for field broccoli crops and computational limitations of edge devices, this study proposes a lightweight broccoli detection method named DWG-YOLOv8, based on an improved YOLOv8s architecture. Firstly, Ghost Convolution is introduced into the C2f module, and the standard CBS module is replaced with Depthwise Separable Convolution (DWConv) to reduce model parameters and computational load during feature extraction. Secondly, a CDSL module is designed to enhance the model’s feature extraction capability. The CBAM attention mechanism is incorporated into the Neck network to strengthen the extraction of channel and spatial features, enhancing the model’s focus on the target. Experimental results indicate that compared to the original YOLOv8s, the DWG-YOLOv8 model has a size decreased by 35.6%, a processing time reduced by 1.9 ms, while its precision, recall, and mean Average Precision (mAP) have increased by 1.9%, 0.9%, and 3.4%, respectively. In comparative tests on complex background images, DWG-YOLOv8 showed reductions of 1.4% and 16.6% in miss rate and false positive rate compared to YOLOv8s. Deployed on edge devices using field-collected data, the DWG-YOLOv8 model achieved a comprehensive recognition accuracy of 96.53%, representing a 5.6% improvement over YOLOv8s. DWG-YOLOv8 effectively meets the lightweight requirements for accurate broccoli recognition in complex field backgrounds, providing technical support for object detection in intelligent precision pesticide application processes for broccoli.

Keywords:

broccoli; object detection; YOLOv8s; DWG-YOLOv8; multi-scene field environments; lightweight; precision agriculture

1. Introduction

China is a major producer of broccoli, accounting for over 30% of the global cultivation area [1]. However, pests and diseases directly impact its yield and quality [2]. Currently, the most effective method to combat pests and diseases remains chemical pesticide spraying [3]. Traditional uniform application of pesticides [4] leads to pesticide waste. Data shows that only about 30% of pesticides sprayed in the field adhere to the broccoli plants, while the remaining 70% deposit into the soil or disperse into the atmosphere [5], consequently causing soil and groundwater pollution. Precision-targeted pesticide spraying can reduce pesticide usage, improve utilization efficiency [6], and minimize environmental pollution, making it an increasingly prominent technology.

The prerequisite for achieving precise pesticide application on broccoli is the accurate identification of the target, obtaining precise information such as the target’s species and location [7]. However, during actual recognition, complex backgrounds can adversely affect the model’s feature extraction [8]. For instance, some weeds resemble broccoli in shape, color, and texture, potentially causing the model to fail in extracting key features and leading to misclassifications [9]. Furthermore, the computational power of edge devices often cannot match that of workstations. Practical field operations frequently face limitations due to constrained computing resources of edge devices [10,11], resulting in slow processing speeds and reduced accuracy during real-time recognition.

To date, numerous researchers have conducted studies on lightweight operations. Yuan et al. [12] proposed a lightweight object detection model named CES-YOLO, which incorporates three key components: the C3K2-Ghost module, an EMA attention mechanism, and a SEAM detection head. This architecture significantly reduces computational complexity while maintaining high detection accuracy. On a blueberry ripeness detection task, the model achieved an mAP of 91.22%, with only 2.1 million parameters. Furthermore, it was successfully deployed on edge devices, enabling real-time detection capabilities. The study provides an efficient and practical solution for automated fruit maturity identification, demonstrating strong potential for smart agricultural applications. Qiu et al. [13] addressed the need for potato seed tuber detection and proposed the DCS-YOLOv5s model. This model utilized DP_Conv to replace standard CBS convolution, reducing the number of parameters, and introduced Ghost convolution into the C3 module to decrease redundant features, achieving overall model lightweighting. Experiments showed that DCS-YOLOv5s achieved an mAP of 97.1%, a frame rate of 65 FPS, and a computational cost of only 10.7 GFLOPs, reducing computation by 33.1% compared to YOLOv5s. Tang et al. [14] targeted automated tea recognition and grading needs, introduced MobileNetV3 as the backbone network into YOLOv5n. They achieved an mAP of 89.56% for four tea categories and 93.17% for three tea categories, with a model size of only 4.98 MB, a reduction of 2 MB compared to the original model. Chen et al. [15] proposed a lightweight detection model based on YOLOv5s, constructing an efficient lightweight framework using ODConv (Omni-Dimensional Dynamic Convolution) for feature extraction. Compared to the original model, the proposed model’s mAP increased by 7.21% to 98.58%, while parameters and FLOPs decreased by 70.8% and 28.3%, respectively, providing an effective solution for recognizing rice grains under conditions of high density and tight adhesion.

The aforementioned studies validate the feasibility of lightweight technology in agricultural object detection. However, existing research often focuses on static optimization for single scenarios or specific backgrounds, leading to poor model robustness in complex environments. Consequently, many researchers have investigated object detection under complex background conditions. Li et al. [16] proposed a crop pest recognition method based on convolutional neural networks (CNNs). Addressing multi-scene environments in natural scenes, they fine-tuned a GoogLeNet model using a manually verified dataset, achieving high-precision classification of 10 common pest types with a recognition accuracy of 98.91%, providing an efficient solution for agricultural monitoring to address the challenges posed by complex natural environments in crop pest and disease detection. Zhang et al. [17] proposed an enhanced detection method based on the YOLOX model. The approach incorporates an Efficient Channel Attention (ECA) mechanism to improve focus on diseased regions, employs Focal Loss to mitigate sample imbalance issues, and utilizes the hard-Swish activation function to increase detection speed. Evaluated on a cotton pest and disease image dataset with complex backgrounds, the model achieved an mAP of 94.60% and a real-time detection speed of 74.21 FPS, significantly outperforming mainstream algorithms. Furthermore, it was successfully deployed on mobile devices, demonstrating practical applicability in real-world scenarios. Ulukaya and Deari [18] proposed a Vision Transformer-based approach for rice disease recognition in complex field environments. By integrating transfer learning, categorical focal loss, and data augmentation, the model effectively addressed class imbalance and background noise, achieving 88.57% accuracy in classifying five disease types and outperforming conventional CNNs, offering a robust solution for automated crop disease monitoring. Peng et al. [19] focused on rice leaf disease detection, proposed the RiceDRA-Net model, which integrates a Res-Attention module to reduce information loss. Recognition accuracy reached 99.71% for simple backgrounds and 97.86% for complex backgrounds, outperforming other mainstream models and demonstrating stability and accuracy in complex scenarios.

In practical field operations, intertwined complex factors such as dynamic crop changes, soil morphology, and weeds pose greater challenges for lightweight designs that maintain both high-speed inference and high accuracy. Feature interference from complex backgrounds further exacerbates the trade-off between robustness and accuracy. Therefore, in the practical recognition process of field broccoli, achieving the lightweight requirements of high speed and high accuracy, while enhancing model robustness and accuracy in complex backgrounds, remains a significant challenge.

To address the above-described problems, this research proposes an improved model based on YOLOv8s to achieve more effective broccoli detection. Firstly, the standard Conv module is replaced with the DWConv [20] module, and GhostConv [21] is introduced into the C2f module to form a C2f-GhostConv module, which helps reduce computational cost and the number of parameters while improving detection speed. Secondly, a CDSL module is designed to enhance the model’s ability to extract features from original images. The CBAM [22] attention module is introduced to focus the model’s feature extraction on the target, suppress feature learning in non-target regions, and improve the ability to extract features in both channel and spatial dimensions. Finally, the WIoU loss function [23] is adopted to accelerate convergence and reduce training loss.

2. Materials and Methods

2.1. Dataset Establishment

Broccoli images used in this study were collected at the Xiaotangshan National Precision Agriculture Research Demonstration Base in Changping District, Beijing. Image acquisition was conducted between 30 September and 25 October 2024, and 20 May to 20 June 2025. Operations were carried out daily from 9:00 a.m. to 4:00 p.m., focusing on capturing imagery during the seedling and rosette stages of broccoli growth. These two phases represent critical growth periods for precision spraying operations. The collection device was an iPhone 13 smartphone, whose rear camera has a resolution of 3024 × 4032 pixels. The image resolution has been adjusted to 1280 × 960 through processing. To ensure background complexity and data diversity, images were captured under various conditions, including different weather types, with or without mulch, dry or wet soil, the presence or absence of weeds, small objects, and partial obstructions. A total of 1192 images were acquired. Sample field broccoli images are shown in Figure 1.

To enhance model robustness and generalization capability, data augmentation was applied to the original images. Primary methods included: image rotation, translation, flipping, scaling, brightness adjustment, contrast adjustment, and adding noise. Through data augmentation, the original 1192 images were expanded to 4133 images. The effects of data augmentation are shown in Figure 2.

The LabelImg software was used to annotate target information in the augmented broccoli dataset. The resulting labels, which contain information about the broccoli target category, along with center point coordinates and bounding box width and height, were saved as .txt files. The acquired images and label files were divided into training, validation, and test sets in an 8:1:1 ratio. Detailed dataset splitting information is shown in Table 1.

2.2. Broccoli Target Detection Method

2.2.1. Broccoli Object Detection Algorithm Based on YOLOv8s

The YOLOv8 network is divided into three parts: backbone, neck, and head [24]. The C2f module in the backbone integrates high-level features with contextual information, improving detection accuracy. The neck adopts the concept of PANet (Path Aggregation Network) combined with FPN (Feature Pyramid Network), fusing feature maps from different depths, enabling the model to perform better on multi-scale recognition tasks.

The YOLOv8 model offers five sizes: n, s, m, l, and x. As size increases, computational demands gradually rise, with corresponding improvements in prediction accuracy but slower processing speeds. This study selects YOLOv8s as the baseline model due to its favorable balance between detection accuracy and model complexity. Compared to the lighter YOLOv8n, YOLOv8s possesses stronger feature extraction capabilities, making it suitable for complex field environments. Compared to the higher-accuracy YOLOv8m, YOLOv8s offers faster processing speeds and a smaller footprint, ensuring better real-time performance and ease of deployment. After comprehensive consideration, this study selected YOLOv8s as the baseline model.

The structure of the improved DWG-YOLOv8 is shown in Figure 3. Firstly, GhostConv is introduced into the C2f module to form the C2f_GhostConv, replacing the original C2f module. The standard CBS feature extraction modules in the original YOLOv8s are replaced with Depthwise Separable Convolution (DWConv) modules to reduce the model’s computational cost and the number of parameters, thereby improving inference speed. Secondly, a CDSL module is designed to extract features from the original image, which dynamically captures fine-grained features (e.g., texture, contours) across different channels and adaptively learns target features from the background. Furthermore, a Convolutional Block Attention Module (CBAM) is introduced before the concatenation (Concat) operation in the Neck network to enhance the relationships between the channels and spatial positions of the extracted features, enhancing the model’s focus on the target rather than the background, and reducing the model’s learning of background features. Finally, the loss function is replaced with WIoU to accelerate the training process and reduce the loss value.

2.2.2. Depthwise Separable Convolution (DWConv)

YOLOv8 uses standard convolution. The input color broccoli image consists of three RGB channels, and multiple 3D convolution kernels are used for convolution calculations. The computational cost formula is as follows:

F L O P s = W H C D^{2} K

(1)

where

F L O P s

represents the number of floating-point operations,

W

represents the pixel width of the input image,

H

represents the pixel height of the input image,

C

represents the number of image channels,

D

represents the convolution kernel size, and

K

represents the number of convolution kernels.

FLOPs are positively correlated with computational resource requirements. Insufficient computational power can easily cause processing delays and accuracy degradation. Since edge devices typically have lower computational power than workstations, the demand for low computational cost is particularly critical for edge deployment. Depthwise Separable Convolution is a lightweight convolution operation that can reduce the number of parameters and computational complexity. Its process is illustrated in Figure 4, and the computational cost formula is as follows:

F L O P s = W H C D^{2} + D^{2} C K

(2)

Taking an RGB three-channel image with a resolution of 640 × 640 pixels during model training as an example, with the number of convolution kernels set to 64 and kernel size of 3 × 3, the FLOPs for standard convolution would be 7.08 × 10⁸, while for depthwise separable convolution it would be 1.11 × 10⁷. The computational cost of depthwise separable convolution is reduced by 97.02% compared to standard convolution. Based on this, this paper adopts depthwise separable convolution to replace the standard convolution in the backbone network, reducing model computational cost and the number of parameters.

2.2.3. C2f_GhostConv Module

The C2f module contains a large number of convolution operations, leading to the generation of many redundant feature maps during feature extraction. These feature maps are highly similar, increasing the model’s computational cost. The Ghost strategy splits the “one-step generation of all feature maps” into “important feature extraction + linear transformation expansion.” The formulas for FLOPs are as follows:

F L O P s_{s p r i m a r y} = W H C D^{2} \frac{K}{r}

(3)

F L O P s_{g h o s t} = W H D^{2} \frac{K}{r} (r - 1)

(4)

F L O P s = F L O P s_{s p r i m a r y} + F L O P s_{g h o s t}

(5)

where

F L O P s_{s p r i m a r y}

is the computation for important features,

F L O P s_{g h o s t}

is the computation for linear transformation, and

r

is the compression ratio, set to 2 in this paper.

First, a small number of standard convolutions generate C/2 intrinsic feature maps. Compared to standard convolution, this step focuses only on key features, reducing computation by 50%. Then, low-cost linear operations (like depthwise convolution) are applied to the intrinsic features to generate ghost features. Finally, the ghost features are concatenated with the intrinsic features to complete the Ghost feature extraction. This approach avoids using expensive standard convolution for all feature maps, effectively reducing computational cost and improving inference speed.

Integrating the Ghost concept, this paper incorporates Ghost convolution into the C2f module to replace standard convolution operations, proposing the C2f_GhostConv module, whose structure is shown in Figure 5. This module effectively reduces the acquisition of redundant features during feature extraction, lowers computational cost, accelerates model’s inference speed, while maintaining multi-scale feature fusion and enhancing the model’s feature extraction capability.

2.2.4. CDSL Module

Considering the supporting role of initial features from the backbone network for subsequent modules, and to enhance the comprehensiveness of feature representation, this paper proposes a CDSL module, whose structure is shown in Figure 6.

The CDSL module employs a dual-branch parallel architecture. Through differentiated processing using depthwise separable convolution and max pooling, it can focus on the target’s fine-grained texture features and key information, respectively, effectively reducing the impact of background and redundant information on target features. The multi-dimensional features formed after concatenation provide deeper information for distinguishing the target from the complex background. Simultaneously, the SE (Squeeze-and-Excitation) [25] attention mechanism dynamically establishes inter-channel dependencies, adaptively increasing the weight of target-relevant feature channels and suppressing background-dominated redundant channels (such as interfering textures or irrelevant regions in the background), enhancing the model’s feature extraction capability for targets in multi-scene environments. Overall, the CDSL module not only improves the model’s feature extraction ability for the target but also enhances its robustness in scenarios with multi-scene environments and varying target morphologies.

2.2.5. CBAM Attention Mechanism

To enable the model to dynamically focus on target feature extraction, enhance the ability to learn important features, and reduce the learning of unimportant features, this paper introduces the CBAM (Convolutional Block Attention Module) into the model. Its structure is shown in Figure 7. By incorporating attention mechanisms in both the channel and spatial dimensions, it allows the model to adaptively focus on key feature regions and important channel information, thereby improving the model’s feature representation capability.

Through the combined action of channel attention and spatial attention, the model can adaptively optimize the extracted features from both the “feature type” and “spatial location” perspectives. Introducing the CBAM into the model enables it to focus on extracting target feature, suppress the learning of features in non-target regions, pay more attention to broccoli features in the image, and perform well under complex background conditions.

2.2.6. WIoU Loss Function

The bounding box loss function used in the YOLOv8 model is CIoU [26] (Complete Intersection over Union). During training, low-quality labels inevitably exist in the dataset, which can lead to decreased generalization performance of the trained model.

To address this issue, this paper adopts the WIoU (Wise Intersection over Union) loss function to replace the original CIoU loss function. The WIoU loss function dynamically adjusts weights to globally scale the IoU and various penalty terms, suppressing the influence of low-quality anchor boxes. This results in faster convergence during training, enhances the model’s generalization performance, and improves detection accuracy. Based on this, WIoU is chosen to replace the CIoU loss function in the original network.

2.3. Experimental Environment and Evaluation Metrics

2.3.1. Experimental Platform and Embedded Platform

The hardware configuration for training the DWG-YOLOv8 model was Intel(R) Core(TM) i5-13400F CPU, NVIDIA RTX 4060 GPU, 16 GB RAM. The software environment was the Windows 11, 64-bit, Python 3.8, PyTorch 1.13.0, CUDA 11.7. The training parameters employed are shown in Table 2.

The edge device used was an NVIDIA Jetson Orin NX (NVIDIA Corporation, Santa Clara, CA, USA), configured with a 6-core ARM^® Cortex^®-A78AE CPU, 1024-core NVIDIA Ampere GPU, 8GB RAM, and delivering up to 117 TOPS. The system was Ubuntu 20.04, with environment configuration: JetPack 5.1.2, Python 3.8, PyTorch 1.12, CUDA 11.4, TensorRT 8.5.2.

The model is deployed on the NVIDIA Jetson Orin NX platform, a device designed for field mobile robots, providing core recognition capabilities for precision pesticide application in smart agriculture. The schematic diagram of the edge device during the spraying process is shown in Figure 8:

The camera captures real-time footage within its field of view, transmitting each frame via USB 3.0 to the edge device. The DWG-YOLOv8 model deployed on the edge device adjusts image resolution and performs predictions to identify all broccoli targets in the scene. The edge device transmits recognition results to the control unit, which activates corresponding solenoid valves based on predictions. Predicted images output by the edge device are displayed via HDMI to a monitor, with the system powered by an external source. Throughout this process, the precision and real-time accuracy of broccoli object detection are prerequisites for precise application. Thus, rapid and accurate object detection becomes the core task for achieving precision application.

2.3.2. Evaluation Metrics

To comprehensively represent the impact of model improvements, this paper uses four common model recognition performance metrics, three computational performance metrics, and model size to evaluate the model. The recognition performance metrics are Precision (P), Recall (R), mAP_0.5, and mAP_0.5–0.95.

The formulas for Precision P and Recall R are as follows:

P = \frac{T_{P}}{T_{P} + F_{P}}

(6)

R = \frac{T_{P}}{T_{P} + F_{N}}

(7)

where TP represents the number of positive samples predicted as positive, FP represents the number of negative samples predicted as positive, and FN represents the number of positive samples predicted as negative.

Average Precision (AP) represents the area under the Precision-Recall (PR) curve for a category. A larger value indicates a better model. Its calculation formula is as follows:

A P = \int_{0}^{1} P (r) d r

(8)

mAP (mean Average Precision) is the average precision for multi-class problems, i.e., the sum of AP values for all categories divided by the number of categories. A larger value indicates higher average model precision. mAP_0.5 refers to the mAP when the IoU threshold is set to 0.5. mAP_0.5–0.95 refers to the average mAP across IoU thresholds from 0.5 to 0.95, in steps of 0.05. The mAP formula is as follows:

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(9)

Computational performance metrics include floating-point operations (GFLOPs), inference time (ms per image), and model size (MB). FLOPs refer to the number of floating-point operations required for one forward pass of the model, measuring computational complexity. Higher FLOPs require more computational power and time. Inference time is the time required for the model to process a single sample from input to output, a key performance indicator for practical deployment. Model size is the storage space occupied by the trained weight file on the terminal device.

3. Results

3.1. Ablation Study Results

To further evaluate the performance improvement effects of the various improvement methods and verify their feasibility, an ablation study was conducted using YOLOv8s as the baseline model. The aforementioned metrics were used as evaluation criteria, and the results are shown in Table 3.

As shown in the table, replacing standard convolution with Depthwise Separable Convolution and modifying the C2f module by incorporating GhostConv reduced the model’s FLOPs and size, making the model more lightweight and improving inference speed. Using the CDSL module for initial feature extraction and adding the CBAM attention mechanism in the neck improved model accuracy but also increased model complexity and inference time. Replacing the CIoU loss function with WIoU improved the model’s recall, mAP, and processing speed, among other metrics.

Analyzing the table data, compared to the original YOLOv8s (Experiment 1), lightweighting the backbone using DWConv and the improved C2f_GhostConv module (Experiment 2) reduced FLOPs and model size by 46.4% and 54.7%, respectively, and reduced image processing time by 1.9 ms. However, precision and recall slightly decreased. The reason is that the improved C2f_GhostConv module reduces the number of convolution kernels, which focuses on extracting features from primary channels, potentially losing some features contained in the omitted channels, leading to a slight overall drop in accuracy. Replacing the initial Conv with the CDSL module for feature extraction and introducing CBAM in the neck (Experiment 3 vs. Experiment 1) increased FLOPs by 7.4%, but improved recall and mAP_0.5 by 0.9% and 1.0%, respectively, while precision remained essentially unchanged. Changing the loss function to WIoU (Experiment 4 vs. Experiment 1) increased mAP_0.5 by 0.6% and recall by 0.7%. The final improved model performance is shown in Experiment 8. Compared to the baseline network (Experiment 1), the model’s precision increased by 1.7 percentage points, recall by 0.9 percentage points, mAP_0.5 by 3.4 percentage points, and while FLOPs decreased by 38.8%, model size decreased by 35.6%, and inference time reduced by 2.2 ms. DWG-YOLOv8, compared to YOLOv8s, enhances the recognition capability for field broccoli, reduces computational cost, decreases model size, and improves recognition accuracy and processing speed.

Figure 9 shows the loss comparison between the model from Experiment 1 (CIoU) and Experiment 4 (WIoU) over 100 epochs. Analysis shows that replacing the CIoU loss function with WIoU accelerated the model’s convergence speed and reduced the loss value. When using CIoU, the loss decreased at a slower rate after the 75th epoch and dropping from 2.84 to 1.13, with a final loss value of about 1.08 at convergence. In contrast, the WIoU loss function reached convergence around the 50th iteration, dropping from 1.32 to 1.03, and stabilized around 0.8 at final convergence.

3.2. Effect of Lightweight Methods

This paper employs DWConv and the C2f_GhostConv module for model lightweight design. According to the experimental results in Table 3, comparing Experiment 1 and Experiment 2, the model’s mAP value remained unchanged, while FLOPs decreased from 28.4 G to 15.2 G, model size decreased from 22.5 MB to 10.2 MB, and image processing speed reduced from 10.8 ms to 8.9 ms. Comparing Experiment 3 and Experiment 5, the model’s mAP_0.5 increased from 87.9% to 88.4%, FLOPs decreased from 30.5 G to 17.3 G, model size decreased from 23.3 MB to 14.5 MB. Comparing Experiment 7 and Experiment 8, the model’s recall increased from 87.8% to 88.5%, mAP_0.5 increased from 89.6% to 90.3%, FLOPs decreased from 30.5 G to 17.4 G, weight size decreased from 23.3 MB to 14.5 MB, and single-image inference time also accelerated from 13.1 ms to 8.6 ms. These results indicate that the strategy of using DWConv and C2f_GhostConv for model lightweighting is feasible. It not only improves the model’s detection accuracy but also reduces computational cost, decreases model size, and accelerates inference speed.

3.3. Effect of CDSL Feature Extraction Module

This paper has designed the CDSL module to replace the initial CBS module for feature extraction from the original image. To demonstrate this module’s enhancement of feature extraction capability, feature maps from the intermediate layers of the model were visualized [27]. Feature maps are the result of convolutional kernels performing convolution operations on the input image and extracting corresponding features. They reflect the features learned by a specific layer of the model, such as edges, textures, colors and so on. Yellow areas in the feature maps indicate high activation values, meaning the model detected significant features relevant to the current layer’s task in that region. Blue areas indicate low activation values, suggesting lower relevance of features in that area to the current layer’s task. A comparison of feature maps from standard convolution and the CDSL module is shown in Figure 10.

Compared to Figure 10a, the bright areas in multiple feature maps in Figure 10b are more pronounced, and details are clearer. For example, in feature maps 1, 4, 5, and 11, the target contours and target regions are more distinct. CDSL shows clearer contours and higher activation intensity in edge extraction. Comparing feature maps 8, 9, and 19, the CDSL feature maps contain more target details, including vein information and leaf texture features. This indicates that at the same depth, the CDSL module captures more refined and richer feature information, which is beneficial for improving detection accuracy.

3.4. Effect of CBAM Attention Mechanism

To more intuitively observe the improvement in model recognition effect due to the CBAM attention mechanism, heatmaps were plotted. Heatmaps are used to represent the model’s focus on different locations during the prediction process, visually reflecting the model’s regions of interest in the image [28]. The color in the heatmaps represents the model’s feature attention status: red areas are very important for the model’s prediction, while blue indicates less important areas. Heatmaps before and after adding the CBAM attention mechanism are shown in Figure 10.

Compared to Figure 11b, the broccoli target area in Figure 11c is brighter and has higher coverage, while the brightness of incorrectly detected areas is reduced. The CBAM attention mechanism enhances the model’s feature extraction capability and provides a more complete extraction of target feature information. It increases the model’s attention to channel and spatial information, enhances perception of the correct target, and suppresses the impact of non-target regions on the overall predictive performance.

3.5. Performance Comparison of DWG-YOLOv8 with Other Network Models

The improved DWG-YOLOv8 model was compared with current mainstream models SSD [29], Faster R-CNN [30], YOLOv5s [31], YOLOv8n, YOLOv8s, YOLOv8m and YOLOv11s [32]. The experimental results are shown in Table 4.

As shown in the table, DWG-YOLOv8 demonstrates certain advantages over other models in terms of accuracy, recall, and mAP_0.5. On the broccoli dataset, its mAP_0.5 outperforms SSD, Faster R-CNN, YOLOv5s, YOLOv8n, YOLOv8s, YOLOv8m, and YOLOv11s by 11.1%, 2.6%, 3.1%, 6.2%, 3.4%, 2.7%, and 1.3%, respectively. Additionally, DWG-YOLOv8 outperforms most models in both floating-point operations and model size. Compared to SSD, Faster R-CNN, YOLOv5s, YOLOv8s, YOLOv8m, and YOLOv11s, its model size is reduced by 83.9%, 86.6%, 25.3%, 35.6%, 72.1%, and 24.5%, respectively. Processing speed outperforms other models: single-frame image processing time is reduced by 28.0 ms, 104.7 ms, 5.6 ms, 2.2 ms, 8.5 ms, and 1.7 ms compared to SSD, Faster R-CNN, YOLOv5s, YOLOv8s, YOLOv8m, and YOLOv11s, respectively. Although YOLOv8n outperforms DWG-YOLOv8 in model size and processing speed, DWG-YOLOv8 leads by 6.2 percentage points in accuracy and lags by only 1.0 ms in single-image processing speed. Overall, DWG-YOLOv8 demonstrates superior broccoli detection performance in multi-scene environments compared to other mainstream models, validating the rationality of using YOLOv8s as the baseline network.

3.6. Complex Background Detection Effect Comparison

To verify the superiority of the proposed DWG-YOLOv8 in broccoli detection under different complex background conditions, a comparative experiment was conducted using the trained DWG-YOLOv8 and YOLOv8s models on images selected from the test set. The selected images mainly included various conditions such as background with or without mulch, wet or dry soil, the presence of weeds, and different lighting conditions. The number of correctly identified targets, false positives, and missed detections was counted for each model. The results are shown in Table 5.

Table 5 shows the recognition performance of the models before and after improvement on complex background image data. On the complex background image data with a total of 663 targets, the pre-improvement YOLOv8s had 21 Missed detections and 181 False Positives, with a Miss Rate and False Positive Rate of 3.2% and 21.9% (of detected objects), respectively. The improved DWG-YOLOv8 had only 12 Missed detections and 36 False Positives, with rates of 1.8% and 5.3%, respectively, decreases of 1.4% and 16.6% compared to the original model. This shows that the improved model outperforms the original YOLOv8s in both Miss Rate and False Positive Rate. Under different background conditions, DWG-YOLOv8 demonstrates better robustness and accuracy, particularly under weedy conditions where misidentification of weeds as broccoli targets is significantly reduced.

Partial comparisons of partial recognition results are shown in Figure 12. The figure shows that YOLOv8s is prone to missed detections in complex environments and, when weeds are abundant, often misidentifies weeds as broccoli targets. The primary reasons for false detections of broccoli targets by the YOLOv8s model can be summarized as follows: (1) Certain weeds exhibit high visual similarity to broccoli in morphological structure and texture features, leading to confusion during the feature extraction stage. (2) The YOLOv8s model lacks dedicated channel and spatial attention mechanisms, resulting in insufficient capability to filter discriminative features. In the channel dimension, the model fails to sufficiently amplify feature channels relevant to broccoli discrimination. In the spatial dimension, it struggles to focus on key target regions, ultimately leading to inadequate sensitivity in distinguishing subtle differences.

DWG-YOLOv8 extracts finer texture features through the CDSL module and enhances attention to target regions via the CBAM mechanism, demonstrating strong feature extraction capabilities and robust inference performance even in multi-scene environments. Under weed conditions, it significantly reduces the number of weeds misclassified as broccoli. Consequently, the improved DWG-YOLOv8 exhibits superior robustness in recognition performance across diverse background conditions compared to the original YOLOv8s model.

3.7. Edge Deployment

Edge devices often face insufficient computational power compared to workstations. To verify the operational performance of the DWG-YOLOv8 model on edge devices and to accelerate its inference speed on these devices, this study employed the TensorRT library [33] for inference acceleration. TensorRT can reduce data transmission and computational overhead, increase model inference speed, and reduce inference latency.

First, the .pt weight file of the trained DWG-YOLOv8 model from the workstation was converted to universal .onnx format, which can be recognized and processed by TensorRT. The .onnx file was then imported onto the Jetson Orin NX device. Subsequently, TensorRT optimized the model structure of the .onnx file and performed compilation to generate the .engine inference engine. Deserializing the .engine file allows for accelerating inference. The frame rates of the models deployed on the device are shown in Table 6.

As shown in the table, before TensorRT acceleration, the processing speed on the edge device, due to limited computational power, could not match that of the workstation, resulting in lower real-time detection frame rates (only 13.7 FPS for DWG-YOLOv8). After acceleration, the frame rate increased to 34.1 FPS, representing a 2.5-fold improvement in detection speed, achieving a single-image processing time of 29.33 ms.

To further test the practical application feasibility of DWG-YOLOv8 on edge devices for broccoli recognition, broccoli images data from actual field environments was selected for testing. The recognition results are shown in Table 7. and partial recognition effect comparisons are shown in Figure 12. Under cloudy weather, broccoli and weeds have similar colors but distinct textures. Under sunny weather, high light intensity makes the images bright, and severe glare on some broccoli veins causes loss of texture features. Under rainy weather, reduced light intensity lowers image contrast, and texture information for some broccoli becomes blurred

As shown in the table, under edge device conditions, DWG-YOLOv8’s performance on broccoli image data from field environments is superior to YOLOv8s. For DWG-YOLOv8, out of 951 detected targets, 918 were correctly identified, with 22 missed and 33 false positives. For YOLOv8s, out of 983 detected targets, only 884 were correctly identified, with 56 missed and 99 false positives. The recognition accuracy of DWG-YOLOv8 was 96.53%, an improvement of 5.6% over YOLOv8s.

Figure 13 shows the recognition effect comparison under different weather conditions. As seen, under rainy conditions, YOLOv8s incorrectly identified 4 targets, while DWG-YOLOv8 had no missed or false detections. Under sunny weather, YOLOv8s had 2 false detections and 1 missed detection, while DWG-YOLOv8 had 1 missed detection. The reason for this is the high light intensity in sunny weather, causing small target broccoli to lose some texture features, reducing the feature information acquired by the model and leading to missed detection. Under cloudy conditions, both YOLOv8s and DWG-YOLOv8 showed no missed or false detections. Overall, the deployment effect of DWG-YOLOv8 on the edge device is superior to YOLOv8s.

3.8. Analysis of Failure Cases

Although the DWG-YOLOv8 model demonstrates strong recognition accuracy and robustness in conventional detection scenarios, it still faces challenges of detection failure under certain extreme physical conditions. As shown in Figure 14, when test samples exhibit severe motion blur, the model is prone to missed detections. The primary reasons are analyzed as follows:

Under the influence of motion blur, object edges in images become indistinct, and texture details become difficult to discern. This degradation in image quality makes it challenging for the model to accurately distinguish foreground targets from complex backgrounds during feature extraction, ultimately leading to increased classification error rates and reduced localization accuracy.

This phenomenon indicates that existing models still have limitations in feature robustness when confronted with complex physical interference. It also points to a key direction for future research aimed at enhancing the model’s adaptability to extreme environments.

4. Discussion

This research proposes DWG-YOLOv8, a lightweight detection model tailored for broccoli recognition. Based on the YOLOv8s architecture, it incorporates systematic enhancements across three key dimensions:

First, for lightweight design, standard convolutions in the backbone network are replaced with Depthwise Separable Convolution (DWConv). The C2f module incorporates Ghost Convolution to form the C2f_GhostConv structure, significantly reducing computational complexity and parameter count. Second, to enhance feature extraction capabilities, the CDSL module was designed. This embeds the CBAM attention mechanism within the Neck section, enabling the model to adaptively focus on key channels and spatial regions, thereby improving its ability to extract discriminative features of the target. Finally, the bounding box regression loss function was replaced with the WIoU loss to optimize convergence speed and stability during training.

In the experimental section, ablation studies validated the effectiveness and synergistic effects of each improved module. Model comparison experiments demonstrated that DWG-YOLOv8 outperforms existing mainstream methods in detection accuracy and robustness for broccoli in complex field environments. Edge deployment experiments further confirmed the model’s high efficiency and feasibility in practical applications.

Based on edge deployment results, DWG-YOLOv8 achieved ideal performance across various field scenarios in terms of timeliness, accuracy, and stability. Beyond performance metrics, the practical significance of this study lies in the model’s application value when deployed in precision application systems. The success of precision application depends not only on detection accuracy but also on inference speed, model size, and stability under typical variable field conditions—such as soil moisture fluctuations and weed presence. The DWG-YOLO model achieves significant volume compression and computational cost reduction. Its lightweight nature makes it particularly suitable for deployment on embedded devices commonly used in agricultural robots, such as the Jetson Orin NX. The model’s robustness to diverse field variations ensures reliable operation in real-world environments. Integration with TensorRT further enhances its practical value by achieving frame rates that meet the real-time demands of automated spraying platforms. Future work will focus on integrating this detection model with robotic execution systems to conduct closed-loop, end-to-end precision spraying trials. Overall, DWG-YOLOv8 maintains low computational resource consumption while demonstrating superior scene adaptability and detection performance, providing a viable technical approach for automated crop identification in field settings.

Compared with existing agricultural object detection methods, this study achieves significant progress in model lightweight design and adaptability to complex scenes. However, several issues remain to be explored in depth:

1.: Motion Blur Enhancement

Under certain levels of motion blur, model recognition performance degrades, leading to a significant increase in false negative rates. To address this, subsequent research will explore image restoration techniques to apply a degree of blur reduction before inputting images into the model, thereby mitigating the impact of motion blur. By modeling and filtering target states based on inter-frame motion continuity, the approach will alleviate image blur caused by rapid relative motion. This will enhance detection stability and trajectory continuity in dynamic scenes, enabling the model to reliably and accurately identify targets under motion-blurred conditions.

2.: Cross-Crop, Multi-Scenario Generalization Validation

Subsequent research will enhance the model’s generalization capabilities to enable recognition and detection across high-value crops such as tomatoes, cabbages, and cucumbers. The model’s application scenarios will be expanded to achieve robustness in diverse environments, including open fields, greenhouses, and orchards. Its operational conditions will be broadened to evaluate applicability under various light conditions for tasks like pesticide application guidance, crop monitoring, and yield estimation, thereby advancing the development of intelligent agricultural algorithms.

3.: Long-Term Deployment Validation for Targeted Application Scenarios

Integrate the model into an automated application robot platform to conduct targeted application research. Perform large-scale, long-term field trials in actual broccoli croplands. Validate the model’s comprehensive performance under complex lighting conditions, variable weather, and different growth stages. Achieve precise spraying by the application system in complex field environments, accelerating the translation of research outcomes into practical applications.

Through continuous exploration in the aforementioned directions, we are committed to further enhancing the practicality, adaptability, and scalability of this model in complex agricultural environments. This will enable precise and targeted spraying at crop locations in field settings, providing reliable technical support for high-precision, low-cost plant protection and production management.

5. Conclusions

To address the growing demand for automated broccoli identification technology in field settings, this study introduces a lightweight DWG-YOLOv8 detection model. Deployed on embedded devices, this model facilitates real-time detection and localization of broccoli, providing a theoretical and technological foundation for the modernization and intelligent development of precision agriculture. The key findings derived from experimental results were summarized as follows:

(1): This paper constructed a broccoli dataset containing broccoli data under various background conditions, such as the presence/absence of mulch and weeds. A lightweight broccoli recognition method, DWG-YOLOv8, was proposed. This method is based on improvements to YOLOv8s, achieving lightweighting by introducing GhostConv into the C2f module and replacing standard CBS convolution with DWConv. A CDSL module was designed to enhance feature extraction capability. The CBAM attention mechanism was introduced in the Neck to focus the model’s learning ability on target features. This method overcomes the challenges of multi-scene environments and limited computational power. Experimental results show that DWG-YOLOv8 achieved a precision of 93.0%, recall of 88.5%, and mAP_0.5 of 90.3% on the validation set. Compared to YOLOv8s performance on the broccoli validation set, this represents an improvement of 1.7%, 0.9%, and 3.4%, respectively. The single-image inference time reached 8.6 ms, FLOPs were 17.4 G, and model size was 14.5 MB. The overall performance of the model is superior to other mainstream models.
(2): Using tensorRT to accelerate model inference, after deployment on the edge device, the model’s detection frame rate reached 34.1 FPS. DWG-YOLOv8 achieved a recognition accuracy of 96.53% on field data, a 5.6% improvement compared to YOLOv8s. This method not only provides a reference for the development of intelligent broccoli detection technology in agriculture but also lays a foundation for research on intelligent broccoli pesticide application robots.

Author Contributions

Conceptualization, H.L. and Y.W.; methodology, H.L. and C.Z.; software, H.F. (Hao Fu); data curation, H.W., H.F. (Hao Fu) and X.Z.; formal analysis, Y.W. and C.Z.; validation, H.W. and Y.W.; writing—original draft preparation, H.L.; writing—review and editing, H.F. (Haiping Feng) and X.Z.; funding acquisition, C.Z. and H.F. (Haiping Feng). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Research and Development Program of Ningxia Hui Autonomous Region (2024BBF02024), the National Natural Science Foundation of China (32201647), and the National Key Research and Development Program of China (2023YFD2001202-05).

Data Availability Statement

The data used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no competing interests.

Correction Statement

This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.

References

Halshoy, H.S.; Rasul, K.S.; Ahmed, H.M.; Mohammed, H.A.; Mohammed, A.A.; Ibrahim, A.S.; Braim, S.A. Effect of nano titanium and organic fertilizer on broccoli growth, production, and biochemical profiles. J. Plant. Nutr. 2025, 48, 1344–1363. [Google Scholar] [CrossRef]
Mesmin, X.; Vincent, M.; Tricault, Y.; Estorgues, V.; Daniel, L.; Cortesero, A.; Faloya, V.; Ralec, A.L. Assessing the relationship between pest density and plant damage: A case study with the belowground herbivore Delia radicum (Diptera: Anthomyiidae) on broccoli. Appl. Entomol. Zool. 2019, 54, 155–165. [Google Scholar] [CrossRef]
Lin, F.; Muhammad, M.H.; Mao, Y.; Zhao, F.; Wang, Z.; Hong, Y.; Cai, P.; Guan, X.; Huang, T. Comparative Control of Phyllotreta striolata: Growth-Inhibiting Effects of Chemical Insecticides Versus the Green Advantages of a Biopesticide. Insects 2025, 16, 552. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Jia, W.; Ou, M.; Wang, X.; Dong, X. A Review of Orchard Canopy Perception Technologies for Variable-Rate Spraying. Sensors 2025, 25, 4898. [Google Scholar] [CrossRef]
Romero, E.; Castillo, J.M.; Nogales, R. Field-scale assessment of vermicompost amendments for diuron-contaminated soil: Implications for soil quality and pesticide fate. Appl. Soil. Ecol. 2024, 201, 105516. [Google Scholar] [CrossRef]
Jin, X.; Zhao, H.; Kong, X.; Han, K.; Lei, J.; Zu, Q.; Chen, Y.; Yu, J. Deep learning-based weed detection for precision herbicide application in turf. Pest. Manag. Sci. 2025, 81, 3597–3609. [Google Scholar] [CrossRef]
Wang, X.; Wang, Q.; Qiao, Y.; Zhang, X.; Lu, C.; Wang, C. Precision Weed Management for Straw-Mulched Maize Field: Advanced Weed Detection and Targeted Spraying Based on Enhanced YOLO v5s. Agriculture 2024, 14, 2134. [Google Scholar] [CrossRef]
Tian, Y.; Zhao, C.; Zhang, T.; Wu, H.; Zhao, Y. Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n. Agriculture 2024, 14, 1125. [Google Scholar] [CrossRef]
Ferreira, S.D.A.; Freitas, M.D.; Silva, G.G.; Pistori, H.; Folhes, M.T. Weed detection in soybean crops using ConvNets. Comput. Electron. Agric. 2017, 143, 314–324. [Google Scholar] [CrossRef]
Upadhyay, A.; Sunil, G.C.; Das, S.; Mettler, J.; Howatt, K.; Sun, X. Multiclass weed and crop detection using optimized YOLO models on edge devices. J. Agric. Food. Res. 2025, 22, 102144. [Google Scholar] [CrossRef]
Bai, B.; Wang, J.; Li, J.; Yu, L.; Wen, J.; Han, Y. T-YOLO: A lightweight and efficient detection model for nutrient buds in complex tea-plantation environments. J. Sci. Food. Agric. 2024, 104, 5698–5711. [Google Scholar] [CrossRef] [PubMed]
Yuan, J.; Fan, J.; Sun, Z.; Liu, H.; Yan, W.; Li, D.; Liu, H.; Wang, J.; Huang, D. Deployment of CES-YOLO: An Optimized YOLO-Based Model for Blueberry Ripeness Detection on Edge Devices. Agronomy 2025, 15, 1948. [Google Scholar] [CrossRef]
Qiu, Z.; Wang, W.; Jin, X.; Wang, F.; He, Z.; Ji, J.; Jin, S. DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s. Agronomy 2024, 14, 2558. [Google Scholar] [CrossRef]
Tang, L.; Yang, Y.; Fan, C.; Pang, T. A Lightweight Tea Bud-Grading Detection Model for Embedded Applications. Agronomy 2025, 15, 582. [Google Scholar] [CrossRef]
Chen, D.; Sun, W.; Xu, K.; Qing, Y.; Zhou, G.; Yang, R. A lightweight detection model for rice grain with dense bonding distribution based on YOLOv5s. Comput. Electron. Agric. 2025, 237, 110672. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Dang, M.L.; Sadeghi-Niaraki, A.; Moon, H. Crop pest recognition in natural scenes using convolutional neural networks. Comput. Electron. Agric. 2020, 169, 105174. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, B.; Hu, Y.; Li, C.; Li, Y. Accurate cotton diseases and pests detection in complex background based on an improved YOLOX model. Comput. Electron. Agric. 2022, 203, 107484. [Google Scholar] [CrossRef]
Ulukaya, S.; Deari, S. A robust vision transformer-based approach for classification of labeled rices in the wild. Comput. Electron. Agric. 2025, 231, 109950. [Google Scholar] [CrossRef]
Peng, J.; Wang, Y.; Jiang, P.; Zhang, R.; Chen, H. RiceDRA-Net: Precise Identification of Rice Leaf Diseases with Complex Backgrounds Using a Res-Attention Mechanism. Appl. Sci. 2023, 13, 4928. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
Zhang, P.; Li, D. CBAM + ASFF-YOLOXs: An improved YOLOXs for guiding agronomic operation based on the identification of key growth stages of lettuce. Comput. Electron. Agric. 2022, 203, 107491. [Google Scholar] [CrossRef]
Zhang, X.; Lu, L.; Luo, H.; Wang, L. Improvement in Pavement Defect Scenarios Using an Improved YOLOv10 with ECA Attention, RefConv and WIoU. World Electr. Veh. J. 2025, 16, 328. [Google Scholar] [CrossRef]
Jia, X.; Hua, Z.; Shi, H.; Zhu, D.; Han, Z.; Wu, G.; Deng, L. A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8. Agriculture 2025, 15, 617. [Google Scholar] [CrossRef]
Wang, Y.; Deng, H.; Wang, Y.; Song, L.; Ma, B.; Song, H. CenterNet-LW-SE net: Integrating lightweight CenterNet and channel attention mechanism for the detection of Camellia oleifera fruits. Multimed. Tools Appl. 2024, 83, 68585–68603. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Hu, J.; Xuan, X.; Cui, Y.; Zhang, X.; Shi, L.; Zhu, R.; Sun, Y. A new approach in generating feature-rich graphical maps using machine learning and its application in peach juice processing. Food Control 2025, 179, 111529. [Google Scholar] [CrossRef]
Liu, S.; Zheng, C.; He, T.; Zhan, W.; Gasson, P.; Lu, Y.; Yin, Y. Automated species discrimination and feature visualization of closely related Pterocarpus wood species using deep learning models: Comparison of four convolutional neural networks. Wood Sci. Technol. 2025, 59, 86. [Google Scholar] [CrossRef]
Song, K.; Chen, S.; Wang, G.; Qi, J.; Gao, X.; Xiang, M.; Zhou, Z. Research on High-Precision Target Detection Technology for Tomato-Picking Robots in Sustainable Agriculture. Sustainability 2025, 17, 2885. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern. Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Guo, Y.; Wang, J.; Lin, P.; Yin, C.; Han, Y. Multiple behaviour recognition of free-range broilers in cross-domain scenarios using MCA-YOLOv5. Biosyst. Eng. 2025, 257, 104226. [Google Scholar] [CrossRef]
Zhang, J.; Maleski, J.; Ashrafi, H.; Spencer, J.A.; Chu, Y. Open-Source High-Throughput Phenotyping for Blueberry Yield and Maturity Prediction Across Environments: Neural Network Model and Labeled Dataset for Breeders. Horticulturae 2024, 10, 1332. [Google Scholar] [CrossRef]
Dhasarathan, C.; Gnanasekaran, S.; Pattanayak, A.; Kumar, G.; Vig, K.; Narain, V.; Narayan, K.M.D.; Garg, S. Tensor RT optimized driver drowsiness detection system using edge device. Ain. Shams. Eng. J. 2025, 16, 103620. [Google Scholar] [CrossRef]

Figure 1. Examples of broccoli image acquisition. (a) Mulched wet soil; (b) Mulched dry soil; (c) Mulched with shadows; (d) Mulched with weeds; (e) Partially mulched; (f) Mixed dry/wet soil; (g) Dry soil with weeds; (h) Wet soil with weeds; (i) Small object detection; (j) Objects obstructing each other; (k) Significant variation in target scales; (l) Close-up perspective.

Figure 2. Image augmentation methods. (a) Original image; (b) Rotation; (c) Translation; (d) Flip; (e) Scaling; (f) Brightness enhancement; (g) Contrast enhancement; (h) Adding noise; (i) Motion blur.

Figure 3. YOLOv8 and DWG-YOLOv8 network architecture diagram. (a) YOLOv8 network architecture diagram; (b) DWG-YOLOv8 network architecture diagram.

Figure 4. Comparison of standard convolution and depthwise separable convolution. (a) Standard Conv. (b) Depthwise separable convolution.

Figure 5. C2f_GhostConv module.

Figure 6. CDSL module.

Figure 7. CBAM attention mechanism.

Figure 8. Schematic diagram of edge devices in precision application.

Figure 9. Comparison of CIoU and WIoU loss curves.

Figure 10. Feature map comparison. (a) Feature maps extracted by the CBS module; (b) Feature maps extracted by the CDSL module.

Figure 11. Model heatmap visualization. (a) Original image; (b) Without the CBAM attention mechanism; (c) With the CBAM attention mechanism.

Figure 12. Comparison of model recognition effects under multi-scene environments. (a) Original images; (b) YOLOv8s detection results; (c) DWG-YOLOv8 detection results. Note: Blue boxes are model prediction boxes, red boxes are false positive boxes, yellow boxes are missed detection boxes.

Figure 13. Comparison of field image recognition effects between DWG-YOLOv8 and YOLOv8s on the edge device. (a) Original image; (b) YOLOv8s detection results; (c) DWG-YOLOv8 detection results. Note: Blue boxes are model prediction boxes, red boxes are false detection boxes, yellow boxes are missed detection boxes.

Figure 14. DWG-YOLOv8 recognition performance under severe motion blur.

Table 1. Distribution of the broccoli dataset.

Set	Number of Images	Number of Labels
Training	3306	21,516
Validation	413	2692
Test	414	2835

Table 2. Training parameters.

Training Parameters	Values
Epochs	100
Initial Learning Rate	0.001
Optimizer	Adam
Batch Size	8
Momentum	0.937
Optimizer Weight Decay Factor	0.0005
Input Image Size	640 × 640

Table 3. Results of ablation experiments.

Experiment	Improvement Methods			P (%)	R (%)	mAP_0.5 (%)	mAP_0.5–0.95 (%)	FLOPs (G)	Size (MB)	Inference Time (ms)
Experiment	1	2	3	P (%)	R (%)	mAP_0.5 (%)	mAP_0.5–0.95 (%)	FLOPs (G)	Size (MB)	Inference Time (ms)
1	-	-	-	91.3	87.6	86.9	65.1	28.4	22.5	10.8
2	√	-	-	89.8	86.8	86.9	62.1	15.2	10.2	8.9
3	-	√	-	91.2	88.5	87.9	65.2	30.5	23.3	12.4
4	-	-	√	91.0	88.3	87.5	65.2	28.4	22.5	10.5
5	√	√	-	90.4	86.1	88.4	62.6	17.3	14.5	9.2
6	√	-	√	90.8	86.0	86.7	62.7	15.2	10.2	8.7
7	-	√	√	93.3	87.8	89.6	64.8	30.5	23.3	13.1
8	√	√	√	93.0	88.5	90.3	65.5	17.4	14.5	8.6

Note: Improvement Method 1: Lightweighting the backbone using DWConv and C2f_GhostConv. Improvement Method 2: Using CDSL module for initial feature extraction and introducing CBAM in the neck. Improvement Method 3: Replacing the loss function with WIoU. “√” indicates that this method is adopted. “-” indicates that this method has not been adopted.

Table 4. Comparison of different model performances.

Model	P (%)	R (%)	mAP_0.5 (%)	FLOPs (G)	Size (MB)	Time (ms)
SSD	63.5	81.4	82.2	63.5	90.5	36.6
Faster R-CNN	69.1	91.5	87.7	142.6	108	113.3
YOLOv5s	94.5	85.5	87.2	16.7	15.8	14.2
YOLOv8n	89.3	85.7	84.1	8.1	5.9	7.6
YOLOv8s	91.3	87.6	86.9	28.4	22.5	10.8
YOLOv8m	90.7	87.5	87.6	78.7	52.0	17.1
YOLOv11s	92.4	88.0	89.0	21.5	19.2	10.3
DWG-YOLOv8	93.0	88.5	90.3	17.4	14.5	8.6

Table 5. Comparison of detection performance on complex background images between DWG-YOLOv8 and YOLOv8s.

Model	Targets	Detected	Correct	Missed	False Positives	Accuracy (%)	Miss Rate (%)	False Positive Rate (%)
YOLOv8s	663	823	642	21	181	78.01	3.17	21.99
DWG-YOLOv8	663	681	651	12	36	95.59	1.81	5.29

Note: The “Targets” represent the number of manually annotated broccoli instances in a specifically selected subset of the test set containing challenging multi-scene environments.

Table 6. Detection frame rates of models on different platforms.

Model	Workstation (FPS)	Edge Device (FPS)	TensorRT (FPS)
YOLOv8s	92.6	11.3	28.9
DWG-YOLOv8	116.3	13.7	34.1

Table 7. Comparison of recognition performance on field images at the edge.

Model	Targets	Detected	Correct	Missed	False Positives	Accuracy (%)	Miss Rate (%)	False Positive Rate (%)	Time (ms)
YOLOv8s	940	983	884	56	99	89.93	5.96	10.07	38.31
DWG-YOLOv8	940	951	918	22	33	96.53	2.34	3.47	33.05

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Wang, Y.; Zhai, C.; Wu, H.; Fu, H.; Feng, H.; Zhao, X. DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s. Agronomy 2025, 15, 2361. https://doi.org/10.3390/agronomy15102361

AMA Style

Liu H, Wang Y, Zhai C, Wu H, Fu H, Feng H, Zhao X. DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s. Agronomy. 2025; 15(10):2361. https://doi.org/10.3390/agronomy15102361

Chicago/Turabian Style

Liu, Haoran, Yu Wang, Changyuan Zhai, Huarui Wu, Hao Fu, Haiping Feng, and Xueguan Zhao. 2025. "DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s" Agronomy 15, no. 10: 2361. https://doi.org/10.3390/agronomy15102361

APA Style

Liu, H., Wang, Y., Zhai, C., Wu, H., Fu, H., Feng, H., & Zhao, X. (2025). DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s. Agronomy, 15(10), 2361. https://doi.org/10.3390/agronomy15102361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DWG-YOLOv8: A Lightweight Recognition Method for Broccoli in Multi-Scene Field Environments Based on Improved YOLOv8s

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Establishment

2.2. Broccoli Target Detection Method

2.2.1. Broccoli Object Detection Algorithm Based on YOLOv8s

2.2.2. Depthwise Separable Convolution (DWConv)

2.2.3. C2f_GhostConv Module

2.2.4. CDSL Module

2.2.5. CBAM Attention Mechanism

2.2.6. WIoU Loss Function

2.3. Experimental Environment and Evaluation Metrics

2.3.1. Experimental Platform and Embedded Platform

2.3.2. Evaluation Metrics

3. Results

3.1. Ablation Study Results

3.2. Effect of Lightweight Methods

3.3. Effect of CDSL Feature Extraction Module

3.4. Effect of CBAM Attention Mechanism

3.5. Performance Comparison of DWG-YOLOv8 with Other Network Models

3.6. Complex Background Detection Effect Comparison

3.7. Edge Deployment

3.8. Analysis of Failure Cases

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI