Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields

Li, Weili; Zhu, Wenpeng; Wang, Qianyu; Gao, Feng; Han, Kang; Jin, Xiaojun

doi:10.3390/agronomy16090907

Open AccessArticle

Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields

by

Weili Li

^1,2

,

Wenpeng Zhu

²,

Qianyu Wang

²,

Feng Gao

²,

Kang Han

² and

Xiaojun Jin

^2,*

¹

Jincheng College, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

²

National Engineering Research Centre of Biomaterials, Nanjing Forestry University, Nanjing 210016, China

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(9), 907; https://doi.org/10.3390/agronomy16090907

Submission received: 6 March 2026 / Revised: 23 April 2026 / Accepted: 28 April 2026 / Published: 30 April 2026

(This article belongs to the Special Issue Next-Generation Crop Management: Bridging AI Vision and Sensor Fusion for Smarter Agronomy)

Download

Browse Figures

Versions Notes

Abstract

Weed infestation poses a significant threat to bok choy (Brassica rapa subsp. chinensis) cultivation, reducing crop yield and quality through resource competition and pest facilitation. Traditional weed detection methods face two major bottlenecks: one is data annotation, arising from the need for extensive, species-diverse datasets, and the other is visual discrimination, due to the high morphological similarity between crops and weeds at certain growth stages. To address these challenges, this study proposed an indirect weed detection framework that combines an optimized You Only Look Once version 10 (YOLOv10) model for crop detection with Excess Green ExG-based segmentation of residual vegetation. The model incorporates RFD and C2f-WDBB modules to improve feature preservation and multi-scale fusion. Compared with baseline YOLOv10, the final proposed RCW-YOLOv10 reduced the number of parameters by 1.04 million and improved detection performance, achieving increases of 3.5%, 1.5%, and 1.1% percentage points in Precision, Recall, and mAP50, respectively, under field conditions. The system initially detected bok choy plants, subsequently localizing weeds by masking crop regions and thresholding residual ExG signals in the uncovered areas. The detected weed coordinates were used to construct a distribution map that may support targeted control in precision agriculture. This approach simplifies weed identification under the tested bok choy field conditions and may be adaptable to other crops after further validation.

Keywords:

precision agriculture; crop masking; Excess Green index; indirect weed detection; image segmentation

1. Introduction

Bok choy (Brassica rapa subsp. Chinensis) is a nutritionally vital vegetable crop, yet its productivity is severely constrained by weed infestation [1]. Weeds compete with bok choy for critical resources, including light, water, and soil nutrients, while also serving as reservoirs for pests and pathogens [2], collectively diminishing crop yield and quality of bok choy [3]. Conventional weed control methods face significant challenges: manual weeding is limited by its labor-intensive nature and operational inefficiency [4], whereas chemical herbicides, often applied non-selectively, contribute to environmental contamination, herbicide resistance, and excessive agrochemical waste [5]. In organic production systems, stringent regulations prohibit the use of synthetic herbicides to align with ecological farming principles [6]. The advancement of agricultural mechanization, automation, and intelligent technologies has facilitated the adoption of mechanical weeding in vegetable production systems, leveraging its superior efficiency and precision to mitigate labor and sustainability challenges [7]. The key and premise of mechanical weeding is the precise detection of weeds.

Researchers have extensively explored weed detection methods [8]. Previous studies primarily depended on manually engineered features such as color, texture, morphology, and multi-spectral characteristics of weeds/crops [9]. However, due to the high visual similarity between weeds and crops, such approaches often suffer from limited detection accuracy and robustness [10]. With the rapid advancement of artificial intelligence, particularly deep learning (DL), automated extraction of hierarchical features from large datasets has become feasible, leading to widespread applications in image recognition, speech processing, natural language understanding, and autonomous driving [11,12]. Consequently, DL-based weed detection has gained increasing attention [13,14], and a growing number of researchers are employing various deep learning (DL) models for weed detection in crop fields [15,16,17,18].

While deep learning models have demonstrated remarkable success in weed detection for agricultural applications, significant challenges persist in achieving consistently high recognition rates and computational efficiency for specific crop varieties [19,20,21]. Recent advances have focused on architecture-level optimizations to overcome these limitations [8,22]. Chen et al. [23] developed the YOLOv8-EGC-Fusion (YEF) model through integration of Efficient Graph Convolution (EGC) and Group Context Anchor Attention (GCAA) modules, achieving a 3.2% improvement in mAP (97.2% vs. 94.0%) over baseline YOLOv8 in vegetable fields. In another study, Kong et al. [24] proposed an attention-enhanced YOLO variant for cornfields, which reduced GPU memory consumption by 35% while maintaining 94.5% Precision through backbone replacement and feature fusion optimization. For rice paddies, Peng et al. [25] reconstructed RetinaNet’s backbone with multi-scale fusion techniques, demonstrating a 15% reduction in reference time with negligible accuracy drop (<1.2%).

While DL has become a prominent tool for automated weed detection, offering alternatives to labor-intensive manual weeding and non-selective herbicide use, its prevailing direct detection paradigm—which requires models to recognize and classify weed species—encounters fundamental, application-specific bottlenecks in the context of seedling-stage bok choy fields. In this specific agronomic scenario, two intertwined bottlenecks render the direct detection approach particularly challenging. One is the data annotation bottleneck, as the prevailing direct detection paradigm requires training datasets exhaustively annotated with diverse weed species, which is prohibitively costly and often impractical to acquire for real-world fields. The other is the visual discrimination bottleneck. At the seedling stage, bok choy and weeds exhibit high visual similarity in color, texture, and morphology. This intrinsic ambiguity, combined with their random distribution, fundamentally limits the accuracy and robustness of models that rely on directly distinguishing crops from weeds based on visual features.

To circumvent the dependency on exhaustive weed species annotation and to overcome the inherent limitation of visual discrimination in this specific agronomic scenario, this study proposes a novel indirect weed detection strategy. Instead of directly identifying diverse weed species, which is hampered by the aforementioned bottlenecks, the approach re-frames the problem. This study aimed to: (1) achieve indirect weed detection through bok choy identification using the enhanced RCW-YOLOv10 model and ExG-based segmentation of residual vegetation; and (2) automatically generate weed distribution maps to facilitate precision weeding. Based on this, weed localization performance can be achieved that is comparable to direct, annotation-intensive methods, while completely eliminating the need for annotated weed training data.

2. Materials and Methods

2.1. Dataset

This study focused on developing the RCW-YOLOv10 module for detecting bok choy at the seedling stage. Image acquisition was conducted across 3 bok choy fields in Qixia District, Nanjing (32.120° N, 118.480° E), China, during typical growing seasons (May and October 2023). A Canon EOS600D digital camera (Canon Inc., Tokyo, Japan) was adopted to capture 1050 original images at 60 cm above ground level under varying illumination conditions (sunny, cloudy, and overcast) to ensure sample diversity. To ensure sample independence and enhance model generalization, each original image was acquired from a distinct, non-overlapping plot, and the training set was subsequently augmented via translation, rotation, mirroring, scaling, and brightness adjustment to mitigate overfitting. Following augmentation, the final dataset comprised 3000 images, with all samples retaining the original resolution of 1792 × 1344 pixels.

All 3000 images were manually annotated using LabelImg (v1.8.6) [26], generating XML annotation files. During annotating, when the bounding boxes of two distinct bok choy plants overlapped, they were kept as separate annotations if the Intersection over Union (IoU) was less than 20%. Each visually distinguishable bok choy plant, regardless of its proximity to others, was annotated with a single, dedicated bounding box to ensure that the model learns to detect individual plant instances. The dataset was then randomly split into training (60%), validation (20%), and testing (20%) subsets, ensuring balanced class distribution across all partitions.

2.2. The RCW-YOLOv10 Module

The YOLOv10 architecture was selected as the foundation due to its favorable balance between accuracy and inference efficiency [27,28,29,30]. However, for detecting seedling-stage crops in dense, weed-infested fields, a scenario characterized by small objects and cluttered backgrounds, the standard backbone network presents limitations. Conventional down-sampling operations may lose fine-grained details of small crops. And the feature fusion mechanisms may be suboptimal for distinguishing crops from visually similar weeds. To address these specific issues, two targeted optimizations were introduced to the backbone network: (1) Replacing the original down-sampling modules with Receptive Field Dilated (RFD) blocks to enhance small-object detection capability; and (2) Substituting standard Channel-to-Pixel (C2f) modules with Wide Dense Bottleneck Blocks (C2f-WDBBs) to improve feature representation while maintaining computational efficiency. The resulting RCW-YOLOv10 module, illustrated in Figure 1, is specifically designed to overcome the challenges posed by randomly distributed seedling-stage bok choy.

2.3. The RFD Module

The RFD module was designed to replace standard down-sampling layers in the YOLOv10 backbone, specifically to mitigate the loss of fine-grained features in small, sparse targets like seeding-stage bok choy. Its core mechanism employs multi-branch, multi-scale feature preservation at the point of resolution reduction. By capturing and combining complementary contextual information through parallel pathways, the module aims to retain discriminative features crucial for detecting small crop seedlings for deeper network layers [31]. The RFD module operates in two specialized variants tailored to different network depths. One is the Shallow Robust Feature Down-sampling (SRFD) variant, deployed early in the backbone, which employs parallel 1 × 1 convolutions and residual connections to preserve the fine-grained details and edge information of seedling leaves susceptible to initial down-sampling loss. The other one, in contrast, is the Deep Robust Feature Down-sampling (DRFD) variant, which is located in deeper stages, employing dilated convolutions and channel attention to capture higher-level abstract contextual features. It enhances the model’s ability to distinguish crops from complex, weedy backgrounds. The design of the RFD module improves feature extraction for this specific task. The architecture of RFD, including details of SRFD and DRFD variants, is illustrated in Figure 2.

2.4. C2f-WDBB Module

The C2f module is a fundamental feature-propagation component in YOLO architectures [32]. For the specific task of segmenting seedling-stage bok choy from weeds, the standard C2f structure was hypothesized to be suboptimal. The primary challenges are: (1) the detection targets (bok choy) are small, densely distributed, and exhibit high visual similarity to surrounding weeds; and (2) the system must be computationally efficient for potential field deployment. The fixed branching and single-path aggregation in the standard C2f module may not adequately model the complex, fine-grained feature interactions required to distinguish crops from weeds in such dense, visually ambiguous scenes. This could lead to a loss of discriminative detail that is crucial for high-precision segmentation.

To address these specific challenges—preserving critical detail for small, similar targets while maintaining a low computational footprint—the Wide Dense Bottleneck Block (WDBB) was designed to replace the core operations within C2f, resulting in the C2f-WDBB module. Its multi-branch design architecture, combined with structural re-parameterization, targets distinct aspects of the detection challenge: (1) A multi-branch feature extractor where each branch targets a specific visual characteristic: (a) The standard 3 × 3 convolution serves as the foundational feature learner; (b) the 1 × 1 convolution branch is intended to enhance local discriminative features critical for distinguishing bok choy from visually similar weed textures; (c) the average pooling branch promotes translation invariance, increasing robustness to minor positional shifts in plants in the field; and (d) the WDBB-specific horizontal and vertical convolutional branches are designed to capture the oriented, elongated stem and leaf structures typical of seedling bok choy, which are often lost in isotropic convolution operations. (2) Width expansion increases channel dimensions to boost feature diversity, allowing the network to model the broader set of visual patterns present in complex field scenes. (3) Structural re-parameterization is employed to decouple training-time capacity from inference-time efficiency. During training, the multi-branch structure provides a richer gradient flow and learning capacity, enabling the network to better fit the complex visual patterns of crops and weeds. During inference, these branches are linearly fused into a single, efficient convolutional layer. This ensures the robustness gained from multi-branch training is preserved without sacrificing inference speed or accuracy, as the fused layer equivalently represents the learned feature transformations. The re-parameterized weights are computed by:

W_{m e r g e d} = W_{m a i n} + W_{1 \times 1} + Z e r o P a d (W_{a v e r a g e}) + W_{h o r i z o n t a l} + W_{v e r t i c a l}

(1)

while absorbing batch normalization parameters for efficiency. The variables in the formula are defined in Table 1, and the architecture of C2f-WDBB is depicted in Figure 3.

2.5. Weed Detection

Weed species exhibit remarkable morphological diversity owing to different crop types, growth stages, and population densities, posing significant challenges for direct weed detection and compromising system robustness [33]. The proposed framework employs an indirect weed detection strategy, which reframes the problem from direct species recognition to a sequential pipeline of crop localization followed by residual vegetation processing. The complete workflow is initiated with the bounding box detection of bok choy provided by the RCW-YOLOv10 model. A binary crop mask was first generated from these detections. To conservatively handle potential localization inaccuracies at crop leaf boundaries and prioritize the avoidance of misclassifying crop pixels as weeds, this initial mask was morphologically dilated, establishing a buffer zone around each detected plant. Subsequently, vegetation outside this exclusion zone was segmented. An optimized Excess Green (ExG) index was applied to the RGB image, using normalized channel values for robustness, followed by an initial binary vegetation mask. This mask was then refined through post-processing: a morphological opening with a 3 × 3 kernel removes small, isolated noise pixels; a morphological closing with the same kernel fills small holes within potential weed regions; and finally, connected components with an area smaller than a defined threshold (150 pixels) were filtered out as agronomically insignificant residues. The centroids of the remaining connected components were calculated, yielding the image-coordinate locations of identified weed patches. To translate this into an actionable format for field operations, the image was partitioned into a uniform 6 × 8 grid (48 cells of 224 × 224 pixels). A grid cell was designed as a candidate treatment zone if it contained one or more weed centroids, thereby generating the final weed distribution map that spatially discretizes weed presence for potential guidance of targeted intervention. The full system workflow is illustrated in Figure 4.

2.6. The Experimental Platform

The experimental setup utilized PyTorch (version 1.13.0 with CUDA 11.6, https://pytorch.org, accessed 10 December 2024) on a high-performance computing platform featuring 128GB RAM, an Intel Core i9-10920X CPU (3.50 GHz), and an NVIDIA RTX 3080 Ti GPU running Ubuntu 20.04.1. The RCW-YOLOv10 model was pre-trained using ImageNet, which is a large dataset with more than 14 million labeled images [34], to initialize the weights using a transfer learning approach. A cosine annealing scheduler was employed to adjust the learning rate, decaying from the initial value (0.01) to 1 × 10⁻⁵ over the 100 training epochs, with a batch size of 16. The core hyperparameters followed the recommended configurations for YOLOv10, with the optimization process using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005.

2.7. Evaluation Metrics

The detection performance was comprehensively evaluated across both accuracy and efficiency using six key metrics: Precision, Recall, mAP50, mAP50-95, Parameters, and Giga Floating-point Operations per Second (GFLOPs).

Precision measures the accuracy of positive predictions:

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

Recall measures the model’s ability to identify all positive instances:

R e c a l l = \frac{T P}{T P + F N}

(3)

In the evaluation metrics, true positives (TP) represent samples correctly identified as positive (actual positive predicted as positive), false positives (FP) denote samples incorrectly classified as positive (actually negatives predicted as positives), and false negatives (FN) indicates those wrongly rejected as negative (actual positives predicted as negatives), where all terms are defined relative to the classifier predictions versus ground truth labels.

Intersection over Union (IoU) quantifies the spatial overlap between a predicted bounding box and its ground truth box, calculated as:

I o U = \frac{A r e a o f I n t e r s e c t i o n}{A r e a o f U n i o n}

(4)

For the evaluation, a predicted bounding box is considered a TP if its IoU with a ground truth box is greater than 0.5, following a common threshold in object detection for practical applications.

Average Precision (AP) quantifies the model’s detection quality across all levels by integrating the Precision–Recall curve. For object detection, it is computed by:

A P = \int_{0}^{1} p (r) d r

(5)

where p(r) is the Precision at Recall r, with the curve interpolated to ensure monotonicity.

Mean Average Precision (mAP) extends AP to multi-class scenarios:

m A P = \frac{\sum_{i = 1}^{N} A P i}{N}

(6)

with N being the number of classes.

In object detection evaluation, mAP50 and mAP50-95 are standardized variants of mAP. mAP50 calculates the mean AP across all categories at a fixed IoU threshold of 0.5, considering predictions with IoU exceeding 0.5 as true positives, reflecting baseline performance under moderate localization requirements, while mAP50-95 rigorously evaluates both classification and localization by averaging mAP over 10 equidistant IoU thresholds from 0.5 to 0.95 in 0.05 increments, formulated as:

{m A P}_{50 - 95} = \frac{1}{10} \sum_{t = 1}^{10} {m A P}_{I o U = 0.5 + 0.05 (t - 1)}

(7)

And the higher threshold range of up to IoU = 0.95 demands precise bounding box alignment for positive identification.

For real-time weeding applications, model complexity directly impacts field deployment viability, which can be evaluated through two quantifiable metrics: (1) Parameters representing total trainable elements, including weights, biases, and activations, with higher parameter counting increasing training time and computational resource demands; and (2) GFLOPs as a hardware efficiency metric, with lower values indicating reduced computational load and enabling faster real-time performance.

2.8. Image Processing

Building upon the successful detection of bok choy, the crop regions were accurately localized and masked in the original RGB image, while residual green vegetation in the background is algorithmically classified as potential weeds.

For automated weed segmentation in bok choy fields, the color-based pipeline employed an optimized ExG index. The ExG index is predicated on the distinctive spectral reflectance of photosynthetically active vegetation, which exhibits high reflectance in the green channel. The application of ExG aftercrop masking simplifies the scene, primarily to soil and residual plants, which mitigates some mis-classification risks compared to its application on a full image. Leveraging insights from Morid et al.’s foundational work [35] and Jin et al.’s refinements [33], the ExG index was further enhanced by the following formula to improve segmentation accuracy:

E x G = \{\begin{matrix} 0, i f (g < r o r g < b) \\ 2 \times g - r - b, o t h e r w i s e \end{matrix}\}

(8)

To mitigate sensitivity to ambient lighting conditions, the r, g, and b values in the above equation were normalized R, G, and B channel values, and calculated by:

r = \frac{R}{R + G + B}, g = \frac{G}{R + G + B}, b = \frac{B}{R + G + B}

(9)

The ExG index was selected for its computational efficiency and proven effectiveness in segmenting green vegetation from soil backgrounds under normal field lighting. Its application after crop masking provides a cost-effective means to identify residual vegetation as operational weeds. However, the performance of this color index can be compromised under conditions of severe shadow, specular highlights, or when targeting senescent (non-green) vegetation.

3. Results

3.1. Ablation Experiment

Ablation experiments were conducted to quantitatively evaluate the contributions of the proposed RFD and C2f-WDBB modules. All variants were trained for 100 epochs under identical hyperparameters. The results are summarized in Table 2 and visualized in Figure 5.

Experimental results confirmed that both the RFD module and C2f-WDBB module independently promote YOLOv10’s performance. The RFD module improved detection accuracy, increasing Precision, Recall, mAP50, and mAP50-95 by 1.3%, 0.9%, 0.5%, and 2.2% points, respectively, as shown in Table 2, while maintaining computational load, with GFLOPs unchanged at 8.4 G, in exchange for a marginal 1.3% increase in parameters. In contrast, the C2f-WDBB module delivered dual advantages of enhanced recognition accuracy alongside superior computational performance, achieving a 7.2% reduction in parameters and a reduction in GFLOPs (from 8.4 G to 6.5 G). Comparative analysis revealed that the structural complexity of the baseline C2f module, involving multiple convolutional layers and residual connections, resulted in a higher parameter count. In contrast, the proposed C2f-WDBB module employs architectural refinements such as parameter sharing and re-parameterization to achieve greater parameter efficiency. These architectural refinements collectively demonstrated how specialized modifications can simultaneously advance accuracy and efficiency in object detection systems, with the RFD module excelling in feature extraction quality while the C2f-WDBB module optimized computational operations.

Introducing the RFD module improves detection accuracy at the cost of a marginal parameter increase, while maintaining identical computational complexity (GFLOPs) compared to the baseline. This indicates that the RFD module successfully repurposes the existing computational budget for more effective feature extraction. However, this accuracy gain came at the cost of a moderate increase in model parameters. Its value lies in its ability to convert a minimal parameter overhead into a significant accuracy improvement, a favorable trade-off for applications where accuracy is prioritized and the marginal parameter increase is acceptable. The synergistic integration of both modules yielded superior improvements, achieving an optimal balance by increasing mAP50 by 1.1 percentage points, reducing GFLOPs by 1.1 G, and decreasing the total number of parameters by 38.5% compared to the baseline YOLOv10.

Comparative analysis of Table 2 (rows 2–3) indicated distinct optimization characteristics: the RFD module exhibited marginally superior accuracy enhancement, while the C2f-WDBB module demonstrated more substantial gains in model compactness (parameter reduction). Their integration yielded balanced improvements, simultaneously increasing mAP50 by 1.1% and mAP50-95 by 0.4% while significantly reducing Parameters and GFLOPs. The result suggests that the RFD and C2f-WDBB modules play complementary roles—enhancing feature representation and improving structural efficiency, respectively—and that their combined use can lead to a model that is both more accurate and more efficient than the baseline.

The achieved reduction in parameters and GFLOPs for the final RCW-YOLOv10 model represents a necessary condition and a strong theoretical indicator for improved operational efficiency, which is critical for real-time applications. However, definitive confirmation of real-time viability on specific hardware requires direct measurement of inference latency, which is beyond the scope of this architectural study but is a logical and necessary focus for subsequent deployment-oriented work.

Figure 5 illustrates the trend in mIoU of the predicted versus ground truth bounding boxes across the ablation variants. While mAP is the primary detection metric, mIoU provides a direct, threshold-agnostic measure of average localization quality, complementing the mAP analysis by showing the pure geometric overlap improvement.

Comparative evaluation in Table 3 delineates a clear efficiency–accuracy profile for RCW-YOLOv10. While its mAP50 of 98.0% is highly competitive, it does not lead all accuracy metrics, particularly the stringent mAP50-95, when compared to more complex two-stage detectors (DETR, Faster R-CNN). The primary advantage of RCW-YOLOv10 is its exceptional computational and parametric efficiency, requiring up to 84.7% fewer parameters and 76.5% fewer GFLOPs than these benchmarks. The substantial reduction in GFLOPs is a key prerequisite and a strong indicator for the potential of high-throughput, real-time processing, although definitive confirmation requires latency measurement on deployment hardware, a logical focus for future work.

3.2. Detection Results

Figure 6 visually validates the system’s performance under different typical field conditions. The left column shows a challenging and dense planting scenario with bok choy and weeds clustered, and the right column exhibits a case where bok choy departs from weeds. The RCW-YOLOv10 model achieved a consistently high Precision value of 95% across diverse field conditions, including variable lighting and occlusion scenarios. This reliable crop detection enables precise weed segmentation, with Precision improvements over conventional color-thresholding methods.

3.3. Weed Mapping

After bok choy detection, crop regions were masked from the original images to isolate remaining vegetation. These residual green areas were segmented as weeds using an optimized ExG index, followed by morphological filtering to reduce noise and refine the weed mask. Based on the segmentation results, a weed distribution map was generated, as shown in the second row of Figure 6, which can enable accurate localization of weed regions in image coordinates.

For operational planning, each image was divided into a uniform 6 × 8 grid, where the cell dimensions (224 × 224 pixels) were designed to correspond to a physical area of approximately 60 mm × 60 mm on the ground, following the spatial calibration. Grid cells containing weed centroids were designated as candidate treatment zones, as illustrated in the third row of Figure 6.

4. Discussion

While DCNNs have achieved reliable accuracy, their conventional implementation requires exhaustively annotated datasets, which is a significant data annotation bottleneck. This study addresses this by proposing an indirect strategy that circumvents the need for weed-species labels entirely. Instead of recognizing weeds, the framework first localizes bok choy and then segments residual green vegetation as operational weeds. The primary contribution is thus this data-efficient paradigm shift. The method’s effectiveness is contingent on two factors: the Recall of the crop detector and the performance of the ExG-based segmentation under field conditions. While the ExG index is efficient for green vegetation, its limitation in detecting senescent (non-green) weeds explicitly defines the operational boundary of the current framework. Therefore, the method’s utility lies in its simplified data requirement and functional output for targeted intervention, as validated in this study, with performance inherently tied to the reliability of its sequential steps.

The primary contribution of this approach is its ability to generate useful weed segmentation maps without weed-labeled data, circumventing the data bottleneck rather than demonstrating invariant performance across all unseen weed species. The quality and robustness of the final weed map are inherently dependent on two factors: the Precision and Recall of the initial bok choy detector and the effectiveness of the ExG-based segmentation and subsequent morphological filtering on the specific field conditions encountered. While the ExG index is generally effective for segmenting green vegetation, its performance can vary with plant pigmentation, lighting, and the presence of senescent (non-green) weeds. Therefore, the method’s practical utility lies in its simplified data requirement and functional output for targeted intervention, with the understanding that its performance is contingent on the reliability of its sequential steps as validated in this study on bok choy fields.

The YOLO series performs object detection through its single-stage regression architecture and multi-scale feature fusion, yet faces inherent deficiencies where grid-based detection restricts small-target sensitivity to below 30% AP for objects smaller than 32 pixels, while computational complexity challenges edge deployment [36]. Although YOLOv10 improves efficiency, its standard down-sampling may still contribute to information loss for small objects. The RCW-YOLOv10 model was designed to address these potential shortcomings through two backbone innovations: (1) RFD modules enhance multi-scale feature extraction by replacing standard down-sampling layers, a design intended to better preserve features of small, sparse targets; and (2) C2f-WDBBs optimize cross-level feature fusion efficiency. Given that the primary detection targets in this study (seedling stage bok choy) are characteristic small objects, the significant increase in overall mAP50 provides direct evidence supporting the effectiveness of these architectural improvements for small-target recognition. The model achieves a superior speed–accuracy trade-off, as evidenced by the higher mAP50 with a 38.5% reduction in parameters and maintained computational load (GFLOPs), making it suitable for bok choy detection in agricultural settings.

Most existing weed detection studies have primarily concentrated on evaluating model performance in terms of accuracy, such as Precision, Recall, or mAP, with the challenges of real-world field-to-action mapping often receiving less emphasis. This study advances beyond pure detection metrics by translating results into an operationally structured weed distribution map. The fixed grid system provides a spatial framework that can serve as the foundational data layer for potential downstream functions, such as guiding targeted intervention or informing path planning. It is crucial to distinguish this computational proof-of-concept from a validated field system. The conversion of grid coordinates to real-world action, real-time latency validation, and agronomic testing for variable-rate application are critical next steps. This work establishes the essential perceptual and mapping foundation upon which such future integrated systems can be built.

The resulting maps demonstrate the potential to support several operational functions in a future integrated system: (1) localization of weed clusters with a grid-based coordinates system, which could, with accurate calibration, support targeted intervention; (2) a modular spatial representation that could inform robotic path-planning algorithms; and (3) spatial zoning of weed presence at the image level, which forms a foundational data layer that could, in principle, support variable-rate decision based on weed density. Collectively, these components establish a scalable computational foundation for translating perception into potential action within automated weeding systems. The framework, by generating spatially explicit weed maps, holds promise for enabling more precise, crop-specific interventions in future precision agriculture applications that integrate reliable actuation and control systems.

It is important to note that while the reported reductions in GFLOPs and parameters indicate improved theoretical efficiency, the actual inference latency on specific hardware for real-time field deployment would require further profiling and optimization, which is a key direction for future applied work.

5. Conclusions

This study proposed RCW-YOLOv10, an enhanced YOLOv10 variant optimized for seeding-stage bok choy detection through two architectural innovations: (1) an RFD module designed to improve multi-scale feature extraction, and (2) C2f-WDBB to enhance feature fusion efficiency. The improved model achieved a 1.1% increase in mAP50 and a 38.5% reduction in parameters, while maintaining GFLOPs, compared to the baseline YOLOv10. Furthermore, the indirect detection strategy, first localizing bok choy and then segmenting residual green vegetation via an optimized ExG index, eliminated the need for species-specific weed annotations, addressing a key data bottleneck. The framework successfully generated operationally structured weed distribution maps, which provide a spatially discrete representation of weeds. Collectively, this work delivers a practical and extensible computational framework that establishes a critical data foundation, comprising an efficient detector, an annotation-light strategy, and a structured weed map, for integrating visual perception into future intelligent robotic weeding systems.

Author Contributions

Conceptualization, W.L. and X.J.; methodology, W.L. and F.G.; software, W.L., W.Z., Q.W. and K.H.; validation, W.L. and Q.W.; writing—original draft, W.L.; writing—review and editing: W.L. and X.J.; funding acquisition, X.J.; supervision, X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Weifang Science and Technology Development Plan Project (Grant No. 2024ZJ1097) and the Foundation of Jincheng College (Grant No. XJ2024002).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shohag, M.; Tian, S.; Sriti, N.; Liu, G. Enhancing Bok Choy growth through synergistic effects of hydrogel and different nitrogen fertilizer forms. Sci. Hortic. 2024, 336, 113400. [Google Scholar] [CrossRef]
Renzi, J.P.; Coyne, C.J.; Berger, J.; von Wettberg, E.; Nelson, M.; Ureta, S.; Hernández, F.; Smýkal, P.; Brus, J. How could the use of crop wild relatives in breeding increase the adaptation of crops to marginal environments? Front. Plant Sci. 2022, 13, 886162. [Google Scholar] [CrossRef]
Choudhary, M.; Kumar, S.; Onte, S.; Meena, V.K.; Malakar, D.; Garg, K.; Kumar, S.; Rajawat, M.V.S.; Awasthi, M.K.; Giri, B.S. Optimizing crop quality and yield: Assessing the impact of integrated potassium management on Chinese cabbage (Brassica rapa L. subsp. chinensis). Heliyon 2024, 10, e36208. [Google Scholar] [CrossRef]
Vijayakumar, S.; Shanmugapriya, P.; Saravanane, P.; Ramesh, T.; Murugaiyan, V.; Ilakkiya, S. Precision Weed Control Using Unmanned Aerial Vehicles and Robots: Assessing Feasibility, Bottlenecks, and Recommendations for Scaling. NDT 2025, 3, 10. [Google Scholar] [CrossRef]
Ofosu, R.; Agyemang, E.D.; Márton, A.; Pásztor, G.; Taller, J.; Kazinczi, G. Herbicide resistance: Managing weeds in a changing world. Agronomy 2023, 13, 1595. [Google Scholar] [CrossRef]
Clapp, J. Explaining growing glyphosate use: The political economy of herbicide-dependent agriculture. Glob. Environ. Change 2021, 67, 102239. [Google Scholar] [CrossRef]
Bond, W.; Grundy, A.C. Non-chemical weed management in organic farming systems. Weed Res. 2001, 41, 383–405. [Google Scholar] [CrossRef]
Murad, N.Y.; Mahmood, T.; Forkan, A.R.M.; Morshed, A.; Jayaraman, P.P.; Siddiqui, M.S. Weed detection using deep learning: A systematic literature review. Sensors 2023, 23, 3670. [Google Scholar] [CrossRef]
Qu, H.-R.; Su, W.-H. Deep learning-based weed–crop recognition for smart agricultural equipment: A review. Agronomy 2024, 14, 363. [Google Scholar] [CrossRef]
Gerhards, R.; Andújar Sanchez, D.; Hamouz, P.; Peteinatos, G.G.; Christensen, S.; Fernandez-Quintanilla, C. Advances in site-specific weed management in agriculture—A review. Weed Res. 2022, 62, 123–133. [Google Scholar] [CrossRef]
Zhang, Z.; Geiger, J.; Pohjalainen, J.; Mousa, A.E.-D.; Jin, W.; Schuller, B. Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Trans. Intell. Syst. Technol. 2018, 9, 1–28. [Google Scholar] [CrossRef]
Otter, D.W.; Medina, J.R.; Kalita, J.K.J. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef] [PubMed]
Juwono, F.H.; Wong, W.; Verma, S.; Shekhawat, N.; Lease, B.A.; Apriono, C. Machine learning for weed–plant discrimination in agriculture 5.0: An in-depth review. Artif. Intell. Agric. 2023, 10, 13–25. [Google Scholar] [CrossRef]
Wang, G.; Li, Z.; Weng, G.; Chen, Y. An overview of industrial image segmentation using deep learning models. Intell. Robot. 2025, 5, 143–180. [Google Scholar] [CrossRef]
Silva, J.; de Siqueira, V.S.; Mesquita, M.; Vale, L.S.R.; Marques, T.D.B.; da Silva, J.L.B.; da Silva, M.V.; Lacerda, L.N.; de Oliveira, J.F., Jr.; de Lima, J.; et al. Deep Learning for Weed Detection and Segmentation in Agricultural Crops Using Images Captured by an Unmanned Aerial Vehicle. Remote. Sens. 2024, 16, 4394. [Google Scholar] [CrossRef]
Dang, F.; Chen, D.; Lu, Y.; Li, Z.J.C. YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Comput. Electron. Agric. 2023, 205, 107655. [Google Scholar] [CrossRef]
Subeesh, A.; Bhole, S.; Singh, K.; Chandel, N.S.; Rajwade, Y.A.; Rao, K.; Kumar, S.; Jat, D.J. Deep convolutional neural network models for weed detection in polyhouse grown bell peppers. Artif. Intell. Agric. 2022, 6, 47–54. [Google Scholar] [CrossRef]
Yu, J.; Sharpe, S.M.; Schumann, A.W.; Boyd, N.S. Detection of broadleaf weeds growing in turfgrass with convolutional neural networks. Pest Manag. Sci. 2019, 75, 2211–2218. [Google Scholar] [CrossRef]
Sun, H.; Liu, T.; Wang, J.; Zhai, D.; Yu, J.J. Evaluation of two deep learning-based approaches for detecting weeds growing in cabbage. Pest Manag. Sci. 2024, 80, 2817–2826. [Google Scholar] [CrossRef] [PubMed]
Jin, X.; Bagavathiannan, M.; McCullough, P.E.; Chen, Y.; Yu, J.J. A deep learning-based method for classification, detection, and localization of weeds in turfgrass. Pest Manag. Sci. 2022, 78, 4809–4821. [Google Scholar] [CrossRef]
Jin, X.; McCullough, P.E.; Liu, T.; Yang, D.; Zhu, W.; Chen, Y.; Yu, J. A smart sprayer for weed control in bermudagrass turf based on the herbicide weed control spectrum. Crop Prot. 2023, 170, 106270. [Google Scholar] [CrossRef]
He, C.; Wan, F.; Ma, G.; Mou, X.; Zhang, K.; Wu, X.; Huang, X. Analysis of the impact of different improvement methods based on YOLOV8 for weed detection. Agriculture 2024, 14, 674. [Google Scholar] [CrossRef]
Chen, C.W.; Zang, Y.; Jiao, J.K.; Yan, D.Q.; Fan, Z.R.; Cui, Z.J.; Zhang, M.H. An Efficient Group Convolution and Feature Fusion Method for Weed Detection. Agriculture 2025, 15, 37. [Google Scholar] [CrossRef]
Kong, X.; Liu, T.; Chen, X.; Jin, X.; Li, A.; Yu, J.J. Efficient crop segmentation net and novel weed detection method. Eur. J. Agron. 2024, 161, 127367. [Google Scholar] [CrossRef]
Peng, H.; Li, Z.; Zhou, Z.; Shao, Y.Y. Weed detection in paddy field using an improved RetinaNet network. Comput. Electron. Agric. 2022, 199, 107179. [Google Scholar] [CrossRef]
LabelImg, T.J.U. Git Code (2015). Available online: https://github.com/tzutalin/labelImg (accessed on 10 December 2024).
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Bai, Q.; Gao, R.; Li, Q.; Wang, R.; Zhang, H. Recognition of the behaviors of dairy cows by an improved YOLO. Intell. Robot. 2024, 4, 1–19. [Google Scholar] [CrossRef]
Han, H.; Xue, X.; Li, Q.; Gao, H.; Wang, R.; Jiang, R.; Ren, Z.; Meng, R.; Li, M.; Guo, Y. Pig-ear detection from the thermal infrared image based on improved YOLOv8n. Intell. Robot. 2024, 4, 20–38. [Google Scholar] [CrossRef]
Hussain, M.; Khanam, R. In-depth review of yolov1 to yolov10 variants for enhanced photovoltaic defect detection. Solar 2024, 4, 351–386. [Google Scholar] [CrossRef]
Lu, W.; Chen, S.-B.; Tang, J.; Ding, C.H.; Luo, B. A robust feature downsampling module for remote-sensing visual tasks. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Feng, J.; Shi, H.; Qiu, J.; Yu, Z.; He, C. EF-Yolo: An Efficient and Lightweight Network for Real-Time Components Detection of Freight Trains. IEEE Sens. J. 2024, 24, 35872–35888. [Google Scholar] [CrossRef]
Jin, X.; Che, J.; Chen, Y. Weed identification using deep learning and image processing in vegetable plantation. IEEE Access 2021, 9, 10940–10950. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Miami, FL, USA; pp. 248–255. [CrossRef]
Morid, M.A.; Borjali, A.; Del Fiol, G.J. A scoping review of transfer learning research on medical image analysis using ImageNet. Comput. Biol. Med. 2021, 128, 104115. [Google Scholar] [CrossRef] [PubMed]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]

Figure 1. The architecture of the module with RCW-YOLOv10. The down-sampling modules in the original network’s backbone were replaced with RFD modules, while the C2f modules in the backbone were replaced with C2f-WDBB modules.

Figure 2. The RFD architecture in Figure (a–c) representing RFD, SRFD, and DRFD modules, respectively.

Figure 3. The architecture of C2f-WDBB. Convolutional sequences, multi-scale convolution, and average pooling were introduced.

Figure 4. The overall flowchart for indirect weed detection. The pipeline begins with input field images. Step 1: The RCW-YOLOv10 model detects bok choy, outputting bounding boxes. (2) Step 2: Image processing, is applied to segment residual weeds: (a) A mask of detected crops is created and morphologically dilated (5-pixel kernel) to define an exclusion zone; (b) the ExG index is calculated on the image area outside the exclusion zone to generate an initial vegetation mask; (c) the mask is refined by morphological opening followed by closing (both with a 3 × 3 kernel) to remove noise; and (d) connected components with an area <150 pixels are filtered out. The final output is a weed distribution map, generated by localizing the centroids of the remaining components on the original image.

Figure 5. The relationship between GFLOPs and mIoU for each module, with the GFLOPs representing the horizontal axis and mIoU the vertical axis.

Figure 6. The detection results for dense and sparse bok choy separately. For the first row, detection results with Precision value and bounding boxes were presented. Masking, ExG index, and noise removal were sequentially employed, and finally, weeds were segmented, as demonstrated in the middle row. As weeds were segmented, they can be localized in the original grid images, and the zones with weeds can be classified as candidate treatment zones. The third row shows the candidate treatment zones, which are highlighted in red color.

Table 1. Abbreviations and descriptions for the variables in the re-parameterized formula.

Variable	Abbreviation	Description
$W_{m a i n}$	Main branch weights	The weight matrix of the standard 3 × 3 convolution, serving as the baseline feature extractor
$W_{1 \times 1}$	1 × 1 branch weights	The weight matrix of the 1 × 1 convolutional branch, enhancing local feature interactions
$W_{a v g}$	Average pooling branch weights	The equivalent convolutional weights derived from the average pooling operation, improving translation invariance
$W_{h o r}$ $W_{v e r}$	Horizontal/vertical branch weights	The weight matrices of the directional convolutional branches, capturing orientation-sensitive features through (1 × k) and (k × 1) kernels, respectively
$W_{m e r g e d}$	Merged weights	The re-parameterized single convolution kernel, formed by linearly combining all branch weights during inference

Table 2. Ablation study results of YOLOv10 variants: Baseline (original YOLOv10), RFD-enhanced, C2f-WDBB-enhanced, and combined (RFD + C2f-WDBB) modules.

Model	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)	Parameters	GFLOPs (G)
YOLOv10	91.5	91.8	96.9	73.1	2,707,430	8.4
+RFD	92.8	92.7	97.4	75.3	2,741,875	8.4
+C2f-WDBB	92.3	92.4	97.1	74.4	2,265,363	6.5
+RFD + C2f-WDBB	95.0	93.3	98.0	74.8	1,666,208	8.3

Table 3. Comparison of detection performance between the proposed RCW-YOLOv10 and three state-of-the-art models: RetinaNet, DETR, and Faster R-CNN. The best results for each metric are highlighted in bold. Note the distinct efficiency–accuracy profiles: two-stage detectors (DETR, Faster R-CNN) achieve higher Precision and mAP50-95, while RCW-YOLOv10 offers a highly efficient single-stage alternative with competitive mAP50 (98.0%).

Model	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)	Parameters	GFLOPs (G)
RCW-YOLOv10	95.0	93.3	98.0	74.8	1,666,208	8.3
RetinaNet	95.3	94.2	98.5	76.5	8,556,383	18.6
DETR	96.1	94.1	98.7	78.1	10,930,123	20.5
Faster R-CNN	95.4	95.0	98.1	77.2	15,290,465	35.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Zhu, W.; Wang, Q.; Gao, F.; Han, K.; Jin, X. Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields. Agronomy 2026, 16, 907. https://doi.org/10.3390/agronomy16090907

AMA Style

Li W, Zhu W, Wang Q, Gao F, Han K, Jin X. Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields. Agronomy. 2026; 16(9):907. https://doi.org/10.3390/agronomy16090907

Chicago/Turabian Style

Li, Weili, Wenpeng Zhu, Qianyu Wang, Feng Gao, Kang Han, and Xiaojun Jin. 2026. "Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields" Agronomy 16, no. 9: 907. https://doi.org/10.3390/agronomy16090907

APA Style

Li, W., Zhu, W., Wang, Q., Gao, F., Han, K., & Jin, X. (2026). Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields. Agronomy, 16(9), 907. https://doi.org/10.3390/agronomy16090907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incorporating Crop-Centric Segmentation and Enhanced YOLOv10 for Indirect Weed Detection in Bok Choy Fields

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. The RCW-YOLOv10 Module

2.3. The RFD Module

2.4. C2f-WDBB Module

2.5. Weed Detection

2.6. The Experimental Platform

2.7. Evaluation Metrics

2.8. Image Processing

3. Results

3.1. Ablation Experiment

3.2. Detection Results

3.3. Weed Mapping

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI