YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy

Lei, Jinglin; Yu, Jialin; Han, Kang; Li, Mian; Jin, Xiaojun; Yin, Honglian

doi:10.3390/agronomy16111091

Open AccessArticle

YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy

by

Jinglin Lei

^1,2

,

Jialin Yu

³,

Kang Han

¹,

Mian Li

¹,

Xiaojun Jin

¹

and

Honglian Yin

^4,*

¹

National Engineering Research Center of Biomaterials, Nanjing Forestry University, Nanjing 210037, China

²

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC 3010, Australia

³

Shandong Laboratory of Advanced Agricultural Sciences, Peking University Institute of Advanced Agricultural Sciences, Weifang 261000, China

⁴

College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(11), 1091; https://doi.org/10.3390/agronomy16111091

Submission received: 14 March 2026 / Revised: 30 April 2026 / Accepted: 27 May 2026 / Published: 31 May 2026

(This article belongs to the Special Issue Integrated Weed Management for Field Crops: Innovations, Integration, and Impact)

Download

Browse Figures

Versions Notes

Abstract

Weed control is crucial for optimizing corn yield. In recent years, advances in computer vision and deep learning have created new opportunities for precision agriculture. However, annotating weed datasets is typically time-consuming, labor-intensive, and costly. To address this challenge, this study proposes an indirect weed detection strategy that reduces reliance on explicit weed annotations by focusing on accurate crop segmentation. Specifically, we develop YOLO-CornSeg, a lightweight segmentation model based on an improved YOLOv8n architecture, designed for precise corn seedling segmentation. The model incorporates a C2f_DWR module to enhance multi-scale feature extraction and a Segment_Efficient head to improve segmentation performance while maintaining computational efficiency. Based on the resulting segmentation masks, an indirect weed detection strategy is applied, in which non-crop green regions are identified as weeds using HSV-based image processing. Experimental results show that YOLO-CornSeg achieves a mean Intersection over Union (mIoU) of 91.1% with a model size of 8.3 MB, outperforming several state-of-the-art two-stage semantic segmentation models while maintaining low computational complexity and a compact model size. The improved segmentation accuracy further enhances the reliability of downstream weed inference. Overall, this study highlights the potential of combining lightweight crop segmentation with indirect weed detection strategies to support precision herbicide application.

Keywords:

computer vision; deep learning; corn seedling segmentation; weed detection; integrated weed control

1. Introduction

Corn (Zea mays L.) is one of the most important crops worldwide. It serves not only as a staple food in many regions but also as a major feed grain for livestock [1]. Globally, corn ranked as the second most produced commodity from 1994 to 2022 based on average production, surpassing rice (Oryza sativa L.) in terms of production volume [2]. Beyond its role in food and feed, corn is increasingly utilized as an industrial feedstock, supporting the production of biofuels, chemical compounds, and various materials, demonstrating its expanding application potential [3].

Weeds are a major constraint on corn yield. By competing for essential resources such as sunlight, water, and nutrients, weeds can cause substantial yield losses [4,5]. Global data indicate that weeds cause an average yield loss of 12.8% in corn even when control measures are implemented [5], and losses may reach 35–40% without proper management [6]. Therefore, effective weed control during the corn seedling stage is critical for maintaining crop productivity [7].

The importance of early weed control for maintaining corn yield has been documented. For instance, Page et al. [8] investigated the effect of weed control timing on corn biomass accumulation, allocation, and yield components. Their findings confirmed that delayed weed control can lead to adverse effects, such as an increase in the area of bare soil and a decrease in corn yield. They concluded that the critical period for weed control in corn typically begins at the third to fifth leaf tip stage. Therefore, effective weed control during the corn seedling stage is essential to minimize yield losses and ensure optimal crop performance.

Weed control can be achieved through the application of herbicides or by employing mechanical, thermal, and electrical methods [9,10,11]. Among these methods, synthetic herbicides are currently the most widely adopted due to their cost-effectiveness and adaptability to diverse terrains [12]. However, conventional herbicide application methods, typically involving broadcast application across the entire fields regardless of actual weed distribution, present several challenges, including increased weed management costs, environmental contamination, and potential risks to public health [13,14]. These issues highlight the need for precision herbicide application technologies that can reduce herbicide use while maintaining effective weed control [15,16].

Advanced computer vision technologies, particularly deep learning methods, are increasingly applied in agricultural fields [17,18,19]. Among these approaches, convolutional neural networks (CNNs) have attracted significant attention due to their ability to extract hierarchical features and adapt to complex agricultural environments [20]. Currently, CNNs have become a fundamental component of state-of-the-art systems in computer vision tasks such as image classification, object detection, semantic segmentation, and visual tracking [21,22], and have shown great potential in enabling precise herbicide application through accurate weed detection [23]. In addition, CNNs can be deployed on real-time sensing platforms such as smart sprayers, unmanned aerial vehicles (UAVs), and ground robots, offering favorable cost-effectiveness and reliability [24,25].

Among these platforms, UAV-based imaging technologies have become increasingly important due to their ability to rapidly acquire large-scale farmland data and provide high-resolution spatial information [26,27]. UAV-based remote sensing has been widely used for monitoring crop growth at early stages. For example, Ahmadi et al. [28] combined UAV imagery with machine learning models to detect early-stage diseases in oil palm plantations, demonstrating its effectiveness for large-scale and cost-efficient monitoring. Kumar et al. [29] proposed a pixel-based segmentation approach for maize tassel detection and growth stage estimation, showing that UAV imagery can effectively support large-scale crop monitoring while reducing the need for extensive labeled data and computational resources. Tirado et al. [30] used UAV-based RGB imagery to estimate plant height at different time points in corn fields, further demonstrating its potential for high-throughput and low-cost monitoring during early crop development.

Existing CNN-based weed detection methods can generally be categorized into direct weed identification and indirect detection based on crop segmentation. Direct methods typically rely on classification or object detection models to identify multiple weed species. For example, Dyrmann et al. [31] developed a CNN-based model for weed species classification, while Yu et al. [32] applied deep learning models such as VGGNet and DetectNet for bermudagrass weed detection. Jin et al. [33] further demonstrated the effectiveness of hybrid deep learning approaches in vegetable fields. However, these methods usually depend on large-scale multi-class annotated datasets. Due to the diversity of weed species, complex morphological variations, and high visual similarity between crops and weeds, models trained on region-specific datasets often struggle to generalize to other agricultural environments [34,35].

To reduce the reliance on weed annotations, indirect weed detection methods have gained increasing attention. Jin et al. [36] used object detection models to locate crop regions in UAV images and classified green areas outside the bounding boxes as weeds. While this approach simplifies the detection problem, it tends to miss weeds located within crop regions due to the limitations of bounding box representations. To further improve detection accuracy, Kong et al. [37] introduced semantic segmentation in subsequent work, enabling more precise extraction of crop boundaries at the pixel level. Compared with object detection methods, semantic segmentation provides finer-grained pixel-level information, allowing more accurate separation of crops and background, and demonstrating stronger robustness in complex scenarios such as occlusion and dense vegetation [38,39,40,41]. Liu et al. [35] adopted a similar approach by generating crop masks through semantic segmentation and identifying non-crop green pixels as weeds. Their evaluation on a corn dataset showed that multiple models, including CCNet, GCNet, ISA-Net, DeepLabV3, and DeepLabV3+, achieved accuracy (aAcc) and mean Intersection over Union (mIoU) values exceeding 99% and 93%, respectively. In addition, Cui et al. [39] proposed an improved U-Net model that achieved excellent performance on a self-constructed seedling-stage dataset, further demonstrating the effectiveness of semantic segmentation methods for this task.

Based on the above studies, indirect weed detection methods based on crop segmentation have shown promising potential for reducing the reliance on large-scale weed annotations. In such approaches, weeds are inferred as non-crop regions, shifting the focus from explicit weed classification to accurate crop segmentation. In the corn seedling stage, the field structure is relatively simple, and corn plants typically exhibit regular spatial distribution. Under these conditions, green vegetation outside crop regions can be reasonably approximated as weeds after appropriate vegetation extraction (e.g., HSV-based processing) [34]. This approximation, while not intended for strict botanical classification, is effective for supporting practical decision-making in precision agriculture, such as reducing unnecessary herbicide application and minimizing environmental impact. However, the performance of this indirect detection strategy is highly dependent on the quality of crop segmentation. Inaccurate segmentation may directly lead to false weed inference, especially under complex conditions such as occlusion, illumination variation, and soil interference [37].

Despite the advantages of semantic segmentation methods in indirect weed detection, a trade-off between segmentation accuracy and computational efficiency still exists, particularly when real-time deployment is required in resource-constrained agricultural environments. For example, DeepLab series models, as representative two-stage semantic segmentation approaches, often exhibit GFLOPs exceeding 100 in corn segmentation tasks [42,43,44]. In contrast, the YOLO series adopts a single-stage prediction mechanism, significantly improving inference speed while maintaining high accuracy [45,46,47]. For instance, Kong et al. reported that an improved YOLOv5 model achieved a GFLOPs as low as 21.7 in corn segmentation tasks. In recent years, YOLOv8 has achieved notable improvements in detection accuracy, computational efficiency, and model flexibility [48,49], and has been shown to outperform traditional two-stage methods in multiple studies [50]. These characteristics make it particularly suitable for precision agriculture applications requiring real-time response under limited computational resources [51,52,53]. Among the YOLOv8 variants, YOLOv8n has the smallest number of parameters and lowest computational requirements while maintaining competitive accuracy, demonstrating strong potential for agricultural applications [54,55]. To address the limitation that existing segmentation-based methods struggle to simultaneously achieve high accuracy and low computational cost for real-time deployment—particularly within indirect weed detection frameworks that rely heavily on segmentation quality—this study aims to develop a segmentation model that achieves both high precision and computational efficiency. Specifically, this study focuses on optimizing the corn segmentation component by proposing an efficient and accurate corn seedling segmentation model based on YOLOv8n.

The main contributions of this study are as follows:

(1) An architectural contribution: we propose YOLO-CornSeg, a lightweight semantic segmentation model based on an improved YOLOv8n architecture, optimized for precise segmentation of corn seedlings while maintaining computational efficiency.

(2) A methodological contribution: based on the crop segmentation results, an indirect weed detection strategy is adopted, in which non-crop green regions are identified as weeds using image processing methods, thereby enabling weed distribution inference without requiring explicit weed annotations.

(3) An experimental validation: through comparisons with several state-of-the-art semantic segmentation models, the proposed model is shown to achieve a favorable balance between segmentation accuracy and computational efficiency.

Therefore, this study further investigates whether improving corn segmentation accuracy through a modified YOLOv8n architecture can enhance the overall reliability of an indirect weed detection pipeline.

2. Materials and Methods

2.1. Dataset

Images for model training, validation, and testing were collected over several days between June 8 and 12, 2020, in three corn fields near Shenyang City, Liaoning Province, China (41.51° N, 123.36° E). A DJI Mavic Air UAV (Shenzhen Dajiang Innovation Technology Co., Ltd. Shenzhen, China) equipped with a 20-megapixel camera was used for image acquisition. The corn seedlings in the images were at the three- to four-leaf stage. The images were captured between 10:00 and 16:00 under varying lighting conditions, including sunny, cloudy, and overcast weather, to ensure diversity and robustness in corn seedling recognition. To acquire images, the UAV maintained a flight height of approximately 3 m relative to the ground surface. The images were in 4:3 format with a resolution of 8000 × 6000 pixels (Figure 1a). Each original image was systematically cropped into 25 sub-images, following a left-to-right and top-to-bottom order. These sub-images were 1600 × 1200 pixels in size, maintaining equal proportions in width and height (Figure 1b). A total of 2000 sub-images were obtained. A custom-developed Python 3.8 (Python Software Foundation, Wilmington, DE, USA) program was used to randomly divide the dataset into training, validation, and testing sets in a 6:2:2 ratio (Table 1). All images were pre-annotated, with each pixel labeled as either foreground (corn seedling) or background, providing the necessary ground truth for supervised learning.

2.2. YOLO-CornSeg

In this research, we propose an improved model based on YOLOv8n, namely YOLO-CornSeg (Figure 2). To further optimize the performance of YOLOv8n, we implemented two key modifications:

(1) The last two C2f convolutional modules in the backbone network of YOLOv8n were replaced with C2f_DWR convolutional modules.

(2) Two 3 × 3 convolutions were added to the Segment head to preprocess the input.

2.2.1. C2f_DWR Module

Most traditional extraction methods use multi-rate depth-wise dilated convolutions to directly obtain multi-scale contextual information from a single feature map (Figure 3a). To address limitations in capturing contextual information across different scales more effectively, Wei et al. [54] proposed an efficient multi-scale feature extraction method that decomposes the process into two steps: Region Residualization and Semantic Residualization (Figure 3b). In this method, multi-rate depth-wise dilated convolutions serve as lightweight feature extractors, generating compact regional form feature maps through the first step. In the second step, a simple, semantic morphological filter is applied through the required perceptual field to improve efficiency. This approach is embodied in the Dilation-wise Residual (DWR) module (Figure 4a).

Specifically, the first step, Region Residualization, generates meaningful residual features from the input feature map. A series of concise feature maps representing different region structures is generated to serve as input for morphological filtering in the second step. This step is implemented through convolution, batch normalization (BN), and ReLU layers.

In the second step, Semantic Residualization is performed using multi-rate depth-wise dilated convolutions to morphologically filter region features of different sizes. Each channel feature is only filtered by the required receptive field to minimize overlap and redundancy. The required region feature map learned in the first step is then used to back-project the receptive field according to the receptive field size in the second step. In this step, the region feature map is first divided into multiple groups, and then dilated depth-wise convolution operations with different rates are applied to each group.

After completing the multi-scale context mapping, the multi-outputs of all feature maps are aggregated. All feature maps are concatenated, and then a batch normalization (BN) operation is performed on these feature maps, followed by a point-wise convolution fusion feature to form the final residual. Finally, the final residual is superimposed on the initial input feature map to construct a more powerful and comprehensive feature representation.

In this study, in order to improve the multi-scale feature extraction capability of the C2f module, we optimized it using the DWR module. Specifically, the bottleneck structure of the C2f module (Figure 4) was replaced with the DWR module (Figure 5a), resulting in the C2f_DWR module (Figure 5b). This modified C2f_DWR module replaces the last two C2f modules in the backbone network, ensuring that the network captures highly detailed multi-scale contextual features before forwarding them to subsequent layers. Efficient feature extraction is achieved by grouping feature maps and applying multi-rate depth-wise dilated convolutions, thereby improving segmentation performance while maintaining computational efficiency.

2.2.2. Segment_Efficient Head

Convolution operations are effective in capturing complex local spatial patterns, such as edges, corners, and textures, and can progressively extract higher-level abstract features layer by layer [55,56]. To improve the segmentation ability of YOLOv8n, we introduced a stem structure composed of two 3 × 3 convolutions at the beginning of the segmentation head (Figure 6). The advantages of this design are 1) two consecutive 3 × 3 convolutional layers can perform more thorough local information extraction on the input feature map, 2) the SiLU nonlinear activation function at the end of each convolutional layer increases the nonlinearity of the model, enabling it to learn deeper features, and 3) the input features can be smoothed by the weight distribution of the convolutional kernel to eliminate noise and redundant information, thereby improving the efficiency of subsequent processing layers. Each 3 × 3 convolution also expands the receptive field. After two layers of convolutions, the network’s receptive field covers a larger area, enabling the model to capture more contextual information, which contributes to more accurate mask prediction in the corn seedling segmentation task.

2.3. Weed Detection Strategy

After segmenting the corn seedlings, the remaining weeds in the image need to be further identified. The proposed method involves the following steps:

(1) Set the segmented corn seedling mask output by the model to black, with zero transparency, to ensure it does not interfere with subsequent processing. Use the trained model to segment corn seedlings and obtain an RGB image containing the precise corn seedling mask.

(2) Convert both the RGB image and its corresponding corn seedling mask to the HSV color space, which enhances the ability to process the color information and identify green areas.

(3) Green vegetation regions are extracted using a predefined HSV threshold range (H: 30–80, S: 40–255, V: 40–255). To reduce potential interference from vegetation-like background elements (e.g., moss or low-saturation green regions), an additional HSV range (H: 35–50, S: 20–180, V: 20–180) is used to identify such pixels. A secondary mask is generated accordingly and subtracted from the initial green mask to improve robustness.

(4) Morphological dilation is applied to the refined green mask to fill small gaps and ensure spatial continuity of detected vegetation regions.

(5) After generating the green mask, a pixel-wise logical “AND” operation is applied on the mask and the original image containing the mask of the corn seedlings. Pixels are retained only when both the green mask and the original image exhibit green values at the same location; otherwise, they are set to black. This operation effectively removes corn seedlings from the image and isolates the weed regions.

2.4. Experiment Design

A total of 2000 preprocessed images were divided into training, validation, and testing datasets at a ratio of 6:2:2, comprising 1200, 400, and 400 images, respectively, as shown in Table 1. To evaluate the effectiveness of the YOLO-CornSeg model, a series of ablation experiments were conducted to compare the performance of the standard YOLOv8n model with progressively enhanced versions (C2f_DWR module, Segment_Efficient head, C2f_DWR module + Segment_Efficient head). Additionally, the YOLO-CornSeg model was compared against several state-of-the-art segmentation models. The components and hyperparameter configurations of the ablation experiments are shown in Table 2. The segmentation algorithm and its hyperparameter configurations used in the comparison experiments are shown in Table 3.

The performance of the segmentation model was evaluated using a confusion matrix, which includes four outcomes: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). In this study, TP indicates that the model correctly segments the pixels of the corn seedlings, FP refers to background pixels incorrectly classified as corn seedlings, TN refers to correctly identified background pixels, and FN indicates that the model incorrectly classifies the pixels of the corn seedlings as the background.

The evaluation metrics used to assess segmentation performance included mean Intersection over Union (mIoU), precision, recall, and Giga Floating-point Operations Per Second (GFLOPs). All metrics except GFLOPs were calculated using the confusion matrix [57].

Intersection over union (IoU) is the ratio of the area of intersection between the predicted region and the actual region to the area of the union between the predicted region and the actual region. It reflects the degree of overlap between the predicted region and the actual region. It quantifies the accuracy of the model in segmentation [58] and is calculated using the following formula:

I o U = \frac{T P}{T P + F P + F N}

(1)

mIoU refers to the mean Intersection over Union (IoU) across all classes, and is calculated using the following formula [58]:

m I o U = \frac{1}{C_{N}} \sum_{i = 1}^{C_{N}} I o U

(2)

Precision refers to the proportion of true positive samples among all samples predicted as positive [57]. It measures a model’s ability to accurately identify the target and was calculated using the following equation:

P r e c i s o n = \frac{T P}{T P + F P}

(3)

Recall is the proportion of actual positive samples that are correctly identified as positive by the model [57]. Recall measures the ability of the model to capture positive instances. The formula is as follows:

R e c a l l = \frac{T P}{T P + F N}

(4)

GFLOPs is an indicator used to assess the computational complexity and inference efficiency of a model, representing the number of billion floating-point operations required per second [57].

All models were implemented using the PyTorch deep learning framework (version 1.13.0) with CUDA 11.6 (NVIDIA Corporation, Santa Clara, CA, USA) support [59]. All training and evaluation processes were completed on a workstation equipped with 128 GB of RAM, an Intel(R) Core(TM) i9-10920X CPU @ 3.50 GHz, and an NVIDIA RTX 3080 Ti GPU, running on Ubuntu 20.04.1.

3. Results

3.1. Ablation Experiment Results

Ablation studies are an important experimental method in machine learning for evaluating the contribution of individual components within a model [60]. To assess the effectiveness of the proposed improvement modules for the YOLOv8n model in segmenting corn seedlings, a series of ablation experiments were conducted. These experiments evaluated both the individual impact of each improvement module and the combined effect when multiple modules were integrated. The results are shown in Table 4.

From the experimental results, it is evident that the YOLOv8n model demonstrates effective segmentation performance for corn seedlings, achieving an mIoU50 of 89.2%, an mIoU50-95 of 60.7%, a precision of 91.0%, and a recall of 81.0%. Additionally, the C2f_DWR module was used to replace the last two C2f modules in the backbone network, and the model was named VM1. This model variant shows improvements over the original model. The mIoU50 increases from 89.2% to 90.0%, mIoU50-95 from 60.7% to 61.5%, and recall from 81.0% to 82.6%, while GFLOPs decreases from 12.0 G to 11.9 G. Notably, the number of parameters is reduced from 3,258,259 to 3,197,203, indicating a reduction in model complexity. These results demonstrate that the C2f_DWR module effectively improves segmentation performance while reducing computational cost and model size.

In variant model 2, the original Segment head was replaced with the proposed Segment_Efficient head. This change resulted in improvements in mIoU50 from 89.2% to 90.0%, and mIoU50-95 from 60.7% to 60.8%, while maintaining comparable precision. The recall increased from 81.0% to 82.1%, and GFLOPs decreased from 12.0 G to 11.9 G, indicating greater computational efficiency. The new segmentation head maintained precision while improving mIoU and recall. Although the number of parameters increases compared to the baseline, the GFLOPs is reduced. This is because the introduced 3 × 3 convolutional layers in the Segment_Efficient head perform more effective local feature extraction at an early stage, which simplifies the subsequent feature processing and reduces redundant computations. As a result, the overall computational cost is slightly decreased despite the increase in model parameters.

Our proposed YOLO-CornSeg integrates both the C2f_DWR module and the Segment_Efficient head module to achieve optimal performance. Specifically, the C2f_DWR module replaces the last two C2f modules in the backbone network, while the Segment_Efficient head substitutes the original segmentation head. By combining the strengths of both modules, YOLO-CornSeg improves both segmentation accuracy and computational efficiency. Compared to the baseline YOLOv8n model, YOLO-CornSeg improves mIoU50 from 89.2% to 91.1%, mIoU50-95 from 60.7% to 63.1%, and recall from 81.0% to 84.5%, while reducing GFLOPs from 12.0 G to 11.8 G. This represents the most significant improvement among all tested variants. Specifically, the observed increase in mIoU indicates improved overall segmentation accuracy, including better delineation of object boundaries and finer details. The increased recall reflects the model’s enhanced ability to identify positive samples, thereby reducing missed detections. Meanwhile, the reduction in GFLOPs demonstrates decreased computational load, enabling faster and more efficient inference. Since GFLOPs reflects the number of floating-point operations required during inference, a lower GFLOPs indicates reduced computational cost and faster processing speed. This indicates improved inference efficiency and potential suitability for real-time deployment. These results suggest that the proposed model achieves a better balance between segmentation accuracy and computational efficiency under the evaluated conditions. Although the total number of model parameters has slightly increased, the GFLOPs were further reduced to 11.8, indicating improved computational efficiency. In summary, YOLO-CornSeg achieves improved overall performance compared to the baseline model.

3.2. Performance Comparison Between YOLOv8n and YOLO-CornSeg

To evaluate the robustness of our YOLO-CornSeg model compared to the baseline model, we compared their performance in terms of validation segmentation loss and recall rate during the experiment.

Figure 7 illustrates the comparison of YOLOv8n and YOLO-CornSeg on the validation set segmentation loss. YOLOv8n showed some instability as the loss surged to 18.552, while the validation segmentation loss of the YOLO-CornSeg model showed a smaller surge at the beginning of training than YOLOv8n, and dropped more quickly with less fluctuation, eventually stabilizing at approximately 0.8 after 80 epochs, showing better segmentation performance, training stability and convergence behavior.

Figure 8 illustrates the mean average precision at IoU threshold 0.5 (mAP50) for YOLOv8n and YOLO-CornSeg. The precision of both models increases rapidly at the beginning of training, while the curve of YOLO-CornSeg shows less fluctuation and more stable improvement. In the later stages of training, after 80 epochs, both models converge, with YOLO-CornSeg achieving slightly higher performance than YOLOv8n in most cases. These results indicate that YOLO-CornSeg provides more stable training behavior and improved segmentation performance under the evaluated conditions.

Figure 9 illustrates a comparison of the performance of the YOLOv8n and YOLO-CornSeg models in segmenting corn seedlings. In real-world detection and segmentation scenarios, a key performance difference lies in the accurate delineation of object edges [61]. As shown in the figure, the YOLO-CornSeg model demonstrates superior edge accuracy, particularly in areas where YOLOv8n fails to correctly segment leaf tip edges. This highlights YOLO-CornSeg’s advantage in precisely identifying and segmenting fine details.

The heatmaps employ distinct color gradients to indicate the degree of attention paid by the model to different regions in the image. Gradient-weighted Class Activation Mapping (GradCAM) was employed to visualize the attention features and regions for both models [62]. Figure 10 presents the heat map visualization results of the YOLOv8n and YOLO-CornSeg models in the corn seedling segmentation task. Specifically, red indicates stronger attention while blue indicates weaker attention, with the color gradient reflecting the intensity of feature activation.

As illustrated, both YOLOv8n and YOLO-CornSeg can locate the target accurately. However, YOLOv8n displays lighter colors at the edges of the corn leaves compared to the YOLO-CornSeg model, suggesting reduced focus and limited boundary detection capabilities. In contrast, the YOLO-CornSeg model shows stronger activation values, especially along the leaf margins, demonstrating more focused attention and improved segmentation accuracy. This improved boundary sensitivity can be attributed to the DWR module, which employs multi-rate dilated depth-wise convolutions to capture multi-scale contextual information. Unlike standard convolutions with a fixed receptive field, this design enables the model to simultaneously extract fine-grained local details and broader contextual cues, which are critical for accurate boundary delineation. In addition, the depth-wise convolution and residual structure help preserve edge-related features and reduce feature smoothing, resulting in stronger activation along leaf margins. These results indicate that YOLO-CornSeg more effectively captures fine details in areas where YOLOv8n underperforms, thereby achieving superior segmentation outcomes.

3.3. Comparison of YOLO-CornSeg with Other State-of-the-Art Segmentation Algorithms

To verify the advantages of the YOLO-CornSeg model in terms of lightweight and accuracy, we compared it with several state-of-the-art segmentation algorithms. Table 5 summarizes the experimental results. YOLO-CornSeg achieved the smallest model size at 8.3 MB, representing a 98.85% reduction compared to Swin Transformer, which had the largest model size, and a 92.28% decrease compared to BiseNet (R18), the second-largest model in this comparison. The trade-off between mIoU and model size for all compared models is visualized in Figure 11. YOLO-CornSeg in the upper left corner achieved the highest mIoU (91.1%) while maintaining a very small model size (8.3 MB), thus achieving the best balance between accuracy and efficiency. This demonstrates that YOLO-CornSeg greatly reduces model complexity while maintaining high accuracy, which makes it particularly appropriate for use in low-resource settings, including embedded platforms. Its lightweight and high-accuracy characteristics make it a favorable candidate for deployment in real-time, intelligent herbicide spraying systems.

3.4. Verification of Weed Detection in Corn Seedlings

To validate the effectiveness of the improved method, we first applied the weed detection method based on bounding box object detection [46], which identifies green pixels outside the bounding box as weeds. Subsequently, we used the method based on the segmentation mask proposed in this study, which identifies green pixels outside the mask as weeds, and compared the detection results of the two methods. Figure 12 and Figure 13 show the comparative results of weed detection using the bounding box and segmentation mask methods, respectively. The results clearly demonstrate that the corn seedling segmentation mask method detects more weeds and achieves superior detection performance. In contrast, the bounding box method fails to detect weeds located within the bounding boxes, resulting in missed detections.

Table 6 quantitatively compares the number of weed pixels detected between the two methods. For situations with high overall weed density but few weeds around the corn seedlings (e.g., image 1 and image 2), the difference between the two methods is relatively small: the corn segmentation mask detects 3.76% and 11.57% more weed pixels than the bounding box method, respectively. However, in scenarios with high weed density around the corn (e.g., image 3, image 4, and image 5), using the corn segmentation mask to process the image detects 43.93%, 86.75%, and 42.65% more weed pixels than the bounding box, respectively. These findings highlight the significant advantage of the segmentation mask approach, especially in complex weed detection scenarios. Specifically, the segmentation mask method accurately separates corn seedlings from surrounding weeds, effectively addressing the limitation of the bounding box approach, which ignores weed pixels within the boxes and thus leads to missed detections. This advantage is particularly evident in high-density weed areas, where weeds and crops are closely intertwined. Our proposed YOLO-CornSeg model, integrated with this segmentation-based detection method, achieves robust performance, effectively facilitating precise weed detection in corn seedlings.

4. Discussion

4.1. Effectiveness of the Proposed Modules

The ablation experiments demonstrate that both the C2f_DWR module and the Segment_Efficient head contribute to improving the segmentation performance of YOLOv8n. The C2f_DWR module enhances feature extraction while reducing redundant computations, which explains the simultaneous improvement in segmentation accuracy and reduction in model complexity. Similarly, the Segment_Efficient head improves segmentation performance while maintaining computational efficiency.

When the two modules are combined, YOLO-CornSeg achieves the best performance, indicating that the proposed architecture effectively balances segmentation accuracy and computational efficiency.

4.2. Lightweight Advantages of YOLO-CornSeg

Compared with several state-of-the-art segmentation models, which often rely on heavier architectures or large-scale annotated datasets, YOLO-CornSeg achieves competitive performance while maintaining a significantly smaller model size. This highlights the advantage of the proposed lightweight design in scenarios where computational resources are limited. This lightweight design makes the model particularly suitable for deployment on resource-constrained platforms, such as embedded agricultural devices, drones, and intelligent spraying systems.

The favorable balance between model size and segmentation accuracy highlights the practical applicability of YOLO-CornSeg for real-time precision agriculture applications.

4.3. Advantages of Corn Segmentation-Based Weed Detection Strategy

The experimental results demonstrate that segmentation-based weed detection provides clear advantages over bounding box-based detection. The bounding box approach cannot distinguish weeds located within crop bounding regions, leading to missed detections. In contrast, the segmentation mask approach precisely separates crop and weed regions at the pixel level, enabling more accurate detection of weeds even when crops and weeds are closely intertwined. This advantage becomes particularly significant in high-density weed environments, where accurate discrimination between crops and weeds is essential for precision herbicide application.

Compared with existing segmentation-based indirect weed detection approaches, such as those proposed by Jin et al. [36], Kong et al. [37], and Liu et al. [35], the proposed YOLO-CornSeg model follows a similar strategy of identifying weeds from non-crop regions derived from crop segmentation results. However, while some of these methods report higher segmentation accuracy under controlled experimental settings, they often rely on more complex network architectures or larger annotated datasets. In contrast, YOLO-CornSeg adopts a lightweight design that significantly reduces model size and computational cost, while maintaining competitive performance. This trade-off between accuracy and efficiency makes the proposed method more suitable for real-world agricultural applications, particularly in resource-constrained environments such as UAV platforms and embedded systems.

From a broader perspective, this segmentation-based approach provides a practical alternative to fully supervised weed detection methods. By reducing the dependence on detailed weed annotations, it offers a more scalable solution for agricultural applications where annotation cost and variability of weed species present significant challenges. In addition, the proposed method has the potential to support treatment scheduling. By providing timely and spatially explicit information on weed distribution through repeated monitoring (e.g., UAV-based observations), the model can help identify weed growth patterns during critical developmental stages. This may assist in determining more appropriate timing for herbicide application, particularly under conditions where crop and weed development are not synchronized.

4.4. Limitations and Future Work

Although YOLO-CornSeg achieves promising results in corn seedling segmentation and indirect weed detection, several limitations remain.

First, the proposed method does not include quantitative evaluation of weed detection performance due to the absence of explicit weed annotations in the dataset. Instead, the effectiveness of the indirect weed detection pipeline is inferred from the quantitative performance of corn segmentation. This is because the indirect weed detection results are directly derived from the crop segmentation outputs. While this approach provides a practical and cost-effective solution, it may not fully capture the precise performance of weed detection across diverse field conditions.

Second, the dataset used in this study mainly includes images collected from a limited number of fields under specific environmental conditions and at a single phenological stage. Future studies should evaluate the model using more diverse datasets, including different lighting conditions, crop growth stages, and weed species. In particular, variations in soil background (e.g., differences between dark loam and red clay soils) may influence the performance of HSV-based vegetation extraction. In addition, as the corn canopy develops in later growth stages, increased occlusion and shadow effects may reduce segmentation accuracy. This is because overlapping leaves and complex illumination conditions can obscure clear boundary information, making it more challenging for the model to distinguish corn from background vegetation.

Furthermore, the indirect weed detection strategy relies on a predefined HSV color range. This may limit the model’s ability to detect non-green weeds (e.g., drought-stressed weeds) or to distinguish vegetation-like background elements such as algae or moss on the soil surface. To mitigate this issue, an additional HSV range is introduced to identify and exclude low-saturation or moss-like green regions, partially reducing such interference. However, the method may still be limited in more complex scenarios, particularly when non-green weeds are present or when background conditions vary significantly.

5. Conclusions

This study introduces YOLO-CornSeg, a lightweight segmentation model for corn seedlings based on an improved YOLOv8n architecture. The model integrates a DWR-based C2f module and a Segment_Efficient head to enhance feature extraction and segmentation performance. Experimental results demonstrate that YOLO-CornSeg achieves an mIoU50 of 91.1%, with a precision of 90.8% and a recall of 84.5%, while maintaining low computational complexity (11.8 GFLOPs) and a compact model size (8.3 MB). These results indicate that the proposed model achieves a favorable balance between segmentation accuracy and computational efficiency. By combining lightweight crop segmentation with an indirect weed detection strategy, the proposed approach provides a practical basis for reducing unnecessary herbicide application in precision agriculture. However, this study does not include field-level deployment or operational validation, and further research is required to assess its performance in real-world agricultural environments. Future work will focus on evaluating the method across more diverse datasets, integrating real-time systems, and further optimizing model performance for practical applications.

Author Contributions

Conceptualization, J.L. and J.Y.; methodology, J.L.; software, J.L.; validation, J.L. and K.H.; formal analysis, J.L.; investigation, J.L. and M.L.; data curation, J.L. and M.L.; writing—original draft preparation, J.L.; writing—review and editing, J.Y., X.J. and H.Y.; visualization, J.L.; supervision, J.Y., X.J. and H.Y.; project administration, J.Y. and H.Y.; resources, J.Y. and H.Y.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key R&D Program of Shandong Province, China (Grant No. 202211070163), the National Natural Science Foundation of China (Grant No. 32072498), the Taishan Scholar of Shandong Program, China, and the Weifang Science and Technology Development Plan Project (Grant No. 2024ZJ1097).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shiferaw, B.; Prasanna, B.M.; Hellin, J.; Bänziger, M. Crops that feed the world 6: Past successes and future challenges to the role played by maize in global food security. Food Secur. 2011, 3, 307–327. [Google Scholar] [CrossRef]
FAOSTAT. Crops and Livestock Products. Available online: https://www.fao.org/faostat/ (accessed on 30 April 2026).
García-Lara, S.; Serna-Saldivar, S.O. Corn history and culture. In Corn: Chemistry and Technology, 3rd ed.; Serna-Saldivar, S.O., Ed.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 1–18. [Google Scholar]
Swanton, C.J.; Weise, S.F. Integrated weed management: The rationale and approach. Weed Technol. 1991, 5, 657–663. [Google Scholar] [CrossRef]
Oerke, E.-C.; Dehne, H.-W. Safeguarding production—Losses in major crops and the role of crop protection. Crop Prot. 2004, 23, 275–285. [Google Scholar] [CrossRef]
Doğan, M.N.; Ünay, A.; Boz, Ö.; Albay, F. Determination of optimum weed control timing in maize (Zea mays L.). Turk. J. Agric. For. 2004, 28, 349–354. [Google Scholar]
Soltani, N.; Dille, J.A.; Burke, I.C.; Everman, W.J.; VanGessel, M.J.; Davis, V.M.; Sikkema, P.H. Potential corn yield losses from weeds in North America. Weed Technol. 2016, 30, 979–984. [Google Scholar] [CrossRef]
Page, E.R.; Cerrudo, D.; Westra, P.; Loux, M.M.; Smith, K.L.; Foresman, C.; Wright, H.A.; Swanton, C.J. Why early season weed control is important in maize. Weed Sci. 2012, 60, 423–430. [Google Scholar] [CrossRef]
Gunsolus, J.L. Mechanical and cultural weed control in corn and soybeans. Am. J. Altern. Agric. 1990, 5, 114–119. [Google Scholar] [CrossRef]
Upadhyaya, M.K.; Blackshaw, R.E. Non-Chemical Weed Management: Principles, Concepts and Technology; CABI: Wallingford, UK, 2007. [Google Scholar]
Venkataraju, A.; Arumugam, D.; Stepan, C.; Kiran, R.; Peters, T. A review of machine learning techniques for identifying weeds in corn. Smart Agric. Technol. 2023, 3, 100102. [Google Scholar]
Harker, K.N.; O’Donovan, J.T. Recent weed control, weed management, and integrated weed management. Weed Technol. 2013, 27, 1–11. [Google Scholar] [CrossRef]
Pimentel, D. Environmental and economic costs of the application of pesticides primarily in the United States. Environ. Dev. Sustain. 2005, 7, 229–252. [Google Scholar] [CrossRef]
MacLaren, C.; Storkey, J.; Menegat, A.; Metcalfe, H.; Dehnen-Schmutz, K. An ecological future for weed science to sustain crop production and the environment. Agron. Sustain. Dev. 2020, 40, 31. [Google Scholar] [CrossRef]
Åstrand, B.; Baerveldt, A.-J. An agricultural mobile robot with vision-based perception for mechanical weed control. Auton. Robot. 2002, 13, 21–35. [Google Scholar] [CrossRef]
Monteiro, A.; Santos, S. Sustainable approach to weed management: The role of precision weed management. Agronomy 2022, 12, 118. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
Jin, X.; Bagavathiannan, M.; Maity, A.; Chen, Y.; Yu, J. Deep learning for detecting herbicide weed control spectrum in turfgrass. Plant Methods 2022, 18, 94. [Google Scholar] [CrossRef]
Tian, X.; Huang, L.; Zhai, M.; Zhang, M.; Hu, P.; Li, M.; Ren, L. Non-destructive prediction of apple SSC/TAC and firmness based on multilayer autoencoder and multilayer perceptron. Intell. Robot. 2025, 5, 181–201. [Google Scholar]
Lin, F.; Zhang, D.; Huang, Y.; Wang, X.; Chen, X. Detection of corn and weed species by the combination of spectral, shape and textural features. Sustainability 2017, 9, 1335. [Google Scholar] [CrossRef]
Chen, C.J.; Huang, Y.Y.; Li, Y.S.; Chang, C.Y.; Huang, Y.M. An AIoT-based smart agricultural system for pests detection. IEEE Access 2020, 8, 180750–180761. [Google Scholar]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef]
Phang, S.K.; Chiang, T.H.A.; Happonen, A.; Chang, M.M.L. From satellite to UAV-based remote sensing: A review on precision agriculture. IEEE Access 2023, 11, 127057–127076. [Google Scholar] [CrossRef]
Olson, D.; Anderson, J. Review on unmanned aerial vehicles, remote sensors, imagery processing, and their applications in agriculture. Agron. J. 2021, 113, 971–992. [Google Scholar] [CrossRef]
Ahmadi, P.; Mansor, S.; Farjad, B.; Ghaderpour, E. Unmanned aerial vehicle (UAV)-based remote sensing for early-stage detection of Ganoderma. Remote Sens. 2022, 14, 1239. [Google Scholar]
Kumar, A.; Taparia, M.; Rajalakshmi, P.; Guo, W.; Naik, B.; Marathi, B.; Desai, U.B. UAV-based remote sensing for tassel detection and growth stage estimation of maize crop using multispectral images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2020), Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1588–1591. [Google Scholar]
Tirado, S.B.; Hirsch, C.N.; Springer, N.M. UAV-based imaging platform for monitoring maize growth throughout development. Plant Direct 2020, 4, e00230. [Google Scholar]
Dyrmann, M.; Karstoft, H.; Midtiby, H.S. Plant species classification using deep convolutional neural networks. Biosyst. Eng. 2016, 151, 72–80. [Google Scholar] [CrossRef]
Yu, J.; Sharpe, S.M.; Schumann, A.W.; Boyd, N.S. Deep learning for image-based weed detection in turfgrass. Eur. J. Agron. 2019, 104, 78–84. [Google Scholar] [CrossRef]
Jin, X.; Che, J.; Chen, Y. Weed identification using deep learning and image processing in vegetable plantation. IEEE Access 2021, 9, 10940–10950. [Google Scholar] [CrossRef]
Hamuda, E.; McGinley, B.; Glavin, M.; Jones, E. Automatic crop detection under field conditions using the HSV colour space and morphological operations. Comput. Electron. Agric. 2017, 133, 97–107. [Google Scholar] [CrossRef]
Bakhshipour, A.; Jafari, A. Evaluation of support vector machine and artificial neural networks in weed detection using shape features. Comput. Electron. Agric. 2018, 145, 153–160. [Google Scholar] [CrossRef]
Liu, T.; Jin, X.; Han, K.; He, F.; Wang, J.; Chen, X.; Kong, X.; Yu, J. Semantic segmentation for weed detection in corn. Pest Manag. Sci. 2024, 81, 1512–1528. [Google Scholar] [CrossRef] [PubMed]
Jin, X.; Sun, Y.; Che, J.; Bagavathiannan, M.; Yu, J.; Chen, Y. A novel deep learning-based method for detection of weeds in vegetables. Pest Manag. Sci. 2022, 78, 1861–1869. [Google Scholar]
Kong, X.; Liu, T.; Chen, X.; Jin, X.; Li, A.; Yu, J. Efficient crop segmentation net and novel weed detection method. Eur. J. Agron. 2024, 161, 127367. [Google Scholar] [CrossRef]
Cui, J.; Tan, F.; Bai, N.; Fu, Y. Improving U-net network for semantic segmentation of corns and weeds during corn seedling stage in field. Front. Plant Sci. 2024, 15, 1344958. [Google Scholar] [CrossRef]
Zhai, Y.; Gao, Z.; Li, J.; Zhou, Y.; Xu, Y. An Enhanced SegNeXt with adaptive ROI for a robust navigation line extraction in multi-growth-stage maize fields. Agriculture 2026, 16, 367. [Google Scholar]
Fu, H.; Li, X.; Zhu, L.; Pan, X.; Wu, T.; Li, W.; Feng, Y. DSC-DeepLabv3+: A lightweight semantic segmentation model for weed identification in maize fields. Front. Plant Sci. 2025, 16, 1647736. [Google Scholar]
Liu, L.; Li, G.; Du, Y.; Li, X.; Wu, X.; Qiao, Z.; Wang, T. CS-net: Conv-simpleformer network for agricultural image segmentation. Pattern Recognit. 2024, 147, 110140. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
Jin, X.; Bagavathiannan, M.; McCullough, P.E.; Chen, Y.; Yu, J. A deep learning-based method for classification, detection, and localization of weeds in turfgrass. Pest Manag. Sci. 2022, 78, 4809–4821. [Google Scholar]
Bai, Q.; Gao, R.; Li, Q.; Wang, R.; Zhang, H. Recognition of the behaviors of dairy cows by an improved YOLO. Intell. Robot. 2024, 4, 1–19. [Google Scholar] [CrossRef]
Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on YOLOv8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Chitraningrum, N.; Banowati, L.; Herdiana, D.; Mulyati, B.; Sakti, I.; Fudholi, A.; Andria, A. Comparison study of corn leaf disease detection based on deep learning YOLO-v5 and YOLO-v8. J. Eng. Technol. Sci. 2024, 56, 61–70. [Google Scholar] [CrossRef]
Sapkota, R.; Ahmed, D.; Karkee, M. Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments. Artif. Intell. Agric. 2024, 13, 84–99. [Google Scholar] [CrossRef]
Qin, Z.; Wang, W.; Dammer, K.H.; Guo, L.; Cao, Z. Ag-YOLO: A real-time low-cost detector for precise spraying with case study of palms. Front. Plant Sci. 2021, 12, 753603. [Google Scholar]
Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with You Only Look Once (YOLO) algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
Han, H.; Xue, X.; Li, Q.; Gao, H.; Wang, R.; Jiang, R.; Ren, Z.; Meng, R.; Li, M.; Guo, Y.; et al. Pig-ear detection from thermal infrared images based on improved YOLOv8n. Intell. Robot. 2024, 4, 20–38. [Google Scholar] [CrossRef]
Sikati, J.; Nouaze, J.C. YOLO-NPK: A lightweight deep network for lettuce nutrient deficiency classification based on improved YOLOv8 nano. Eng. Proc. 2023, 58, 31. [Google Scholar]
Wu, T.; Miao, Z.; Huang, W.; Han, W.; Guo, Z.; Li, T. SGW-YOLOv8n: An improved YOLOv8n-based model for apple detection and segmentation in complex orchard environments. Agriculture 2024, 14, 1958. [Google Scholar]
Wei, H.; Liu, X.; Xu, S.; Dai, Z.; Dai, Y.; Xu, X. DWRSeg: Rethinking efficient acquisition of multi-scale contextual information for real-time semantic segmentation. arXiv 2022, arXiv:2212.01173. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2014. [Google Scholar]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.F.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000, 16, 412–424. [Google Scholar] [CrossRef] [PubMed]
PyTorch. PyTorch. Available online: https://pytorch.org (accessed on 30 April 2026).
Meyes, R.; Lu, M.; De Puiseau, C.W.; Meisen, T. Ablation studies in artificial neural networks. arXiv 2019, arXiv:1901.08644. [Google Scholar] [CrossRef]
Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar]

Figure 1. Example images showing original UAV images and cropped sub-images. (a) Dense weeds under cloudy light and (b) sparse weeds under sunny light.

Figure 2. Overall architecture of YOLO-CornSeg.

Figure 3. (a) presents a typical structure for drawing multi-scale contextual information from a single feature map, and (b) shows the improved structure in which RR and SR denote Region Residualization and Semantic Residualization, respectively;

d_{x}, x ϵ (1, n)

represents a dilated depth-wise convolution with a dilation rate

d_{x}

.

Figure 3. (a) presents a typical structure for drawing multi-scale contextual information from a single feature map, and (b) shows the improved structure in which RR and SR denote Region Residualization and Semantic Residualization, respectively;

d_{x}, x ϵ (1, n)

represents a dilated depth-wise convolution with a dilation rate

d_{x}

.

Figure 4. (a) presents the structure of the bottleneck in the C2f module and (b) presents the structure of the C2f module.

Figure 5. (a) Structure of the Dilation-wise Residual (DWR) module, where RR and SR denote Region Residualization and Semantic Residualization, respectively; (b) structure of the C2f_DWR module.

Figure 6. Structure of the Segment_Efficient module.

Figure 7. Comparison of validation segmentation loss between YOLOv8n and YOLO-CornSeg.

Figure 8. Comparison of the mean average precision at IoU threshold 0.5 between YOLOv8n and YOLO-CornSeg.

Figure 9. Example of the segmentation results of corn seedlings generated by YOLO-CornSeg and YOLOv8n.

Figure 10. Comparison of the heat maps of YOLOv8n and YOLO-CornSeg using gradient-weighted class activation mapping.

Figure 11. The relationship between model size and mIoU for each algorithm. The horizontal axis represents model size, and the vertical axis represents mIoU. Points on the left side of the chart indicate smaller models, while points at the top of the chart indicate better segmentation performance.

Figure 12. Example of using object detection to generate bounding boxes to detect weeds.

Figure 13. Examples of using segmentation methods to generate masks to detect weeds.

Table 1. Division of training, validation, and testing datasets.

Dataset Type	Image Count	Percentage
Training	1200	60%
Validation	400	20%
Testing	400	20%

Table 2. Hyperparameter configurations for variant models.

Model	C2f_DWR	Segment_Efficient	Epoch	Batch Size	Initial Learning Rate	Weight Decay Coefficient
YOLOv8n			100	16	0.01	0.0005
VM1	√		100	16	0.01	0.0005
VM2		√	100	16	0.01	0.0005
Ours (YOLO-CornSeg)	√	√	100	16	0.01	0.0005

Note: VM, Variant model. √ indicates that the corresponding module was used in the model.

Table 3. Segmentation algorithm configurations for comparison experiments.

Deep Learning Algorithm	Backbone	Epoch	Batch Size	Initial Learning Rate	Weight Decay Coefficient	Model Stage Type
DeepLabv3 (R18)	ResNet18	100	16	0.01	0.0005	Two stages
DeepLabv3 (R50)	ResNet50	100	16	0.01	0.0005	Two stages
BiseNet (R18)	ResNet18	100	16	0.01	0.0005	Two stages
BiseNet (R50)	ResNet50	100	16	0.01	0.0005	Two stages
Swin Transformer	Vision Transformer	100	16	0.01	0.0005	Two stages
FastFCN	ResNet50	100	16	0.01	0.0005	Two stages
Ours (YOLO-CornSeg)	CSPDarknet	100	16	0.01	0.0005	One stage

Table 4. Ablation experiment results for YOLOv8n and variant models.

Model	C2f_DWR	Segment_ Efficient	mIoU50 (%)	mIoU50-95 (%)	Precision (%)	Recall (%)	Parameters	GFLOPs (G)
YOLOv8n			89.2	60.7	91.0	81.0	3258259	12.0
VM1	√		90.0	61.5	90.5	82.6	3197203	11.9
VM2		√	90.0	60.8	91.0	82.1	4086035	11.9
Ours(YOLO-CornSeg)	√	√	91.1	63.1	90.8	84.5	4024979	11.8

Note: VM, variant model. √ indicates that the corresponding module was used in the model.

Table 5. Performance comparison of YOLO-CornSeg with other state-of-the-art segmentation algorithms.

Deep Learning Algorithm	mIoU (%)	Precision (%)	Recall (%)	Model Size (M)
DeepLabv3 (R18)	78.2	87.8	87.3	112
DeepLabv3 (R50)	79.0	88.4	88.2	545.2
BiseNet (R18)	75.6	84.4	84.4	107.5
BiseNet (R50)	76.3	85.4	85.4	474.3
Swin Transformer	84.6	91.7	91.7	719.9
FastFCN	75.4	81.9	81.9	551.0
Ours (YOLO-CornSeg)	91.1	90.8	84.5	8.3

Table 6. Comparison of pixel counts between the bounding box method and the segmentation mask method.

Input Image	Bounding Box Method Pixel Count	Segmentation Mask Method Pixel Count	Difference (Pixel)	Percentage Difference (%)
Image 1	45,372	47,148	1776	3.76
Image 2	28,207	31,897	3690	11.57
Image 3	112,430	200,503	88,073	43.93
Image 4	5044	38,068	33,024	86.75
Image 5	8251	14,388	6137	42.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lei, J.; Yu, J.; Han, K.; Li, M.; Jin, X.; Yin, H. YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy. Agronomy 2026, 16, 1091. https://doi.org/10.3390/agronomy16111091

AMA Style

Lei J, Yu J, Han K, Li M, Jin X, Yin H. YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy. Agronomy. 2026; 16(11):1091. https://doi.org/10.3390/agronomy16111091

Chicago/Turabian Style

Lei, Jinglin, Jialin Yu, Kang Han, Mian Li, Xiaojun Jin, and Honglian Yin. 2026. "YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy" Agronomy 16, no. 11: 1091. https://doi.org/10.3390/agronomy16111091

APA Style

Lei, J., Yu, J., Han, K., Li, M., Jin, X., & Yin, H. (2026). YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy. Agronomy, 16(11), 1091. https://doi.org/10.3390/agronomy16111091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-CornSeg: A Lightweight Segmentation Model for Corn Seedlings with an Indirect Weed Detection Strategy

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. YOLO-CornSeg

2.2.1. C2f_DWR Module

2.2.2. Segment_Efficient Head

2.3. Weed Detection Strategy

2.4. Experiment Design

3. Results

3.1. Ablation Experiment Results

3.2. Performance Comparison Between YOLOv8n and YOLO-CornSeg

3.3. Comparison of YOLO-CornSeg with Other State-of-the-Art Segmentation Algorithms

3.4. Verification of Weed Detection in Corn Seedlings

4. Discussion

4.1. Effectiveness of the Proposed Modules

4.2. Lightweight Advantages of YOLO-CornSeg

4.3. Advantages of Corn Segmentation-Based Weed Detection Strategy

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI