Next Article in Journal
Adoption of AI in Higher Education: Engineering Faculty Perceptions of Preparation for Industry 4.0
Next Article in Special Issue
A Hybrid MIL Architecture for Multi-Class Classification of Bacterial Microscopic Images
Previous Article in Journal
The Seed Optimization Method for Fuzz Testing Based on Neural Network-Guided Genetic Algorithm
Previous Article in Special Issue
Intelligent Identification of Rural Productive Landscapes in Inner Mongolia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity

by
Alzayat Saleh
1,*,
Shunsuke Hatano
1 and
Mostafa Rahimi Azghadi
1,2,3,*
1
College of Science and Engineering, James Cook University, Townsville, QLD 4811, Australia
2
Agriculture Technology and Adoption Centre, James Cook University, Townsville, QLD 4811, Australia
3
ARC Training Centre in Plant Biosecurity, James Cook University, Townsville, QLD 4811, Australia
*
Authors to whom correspondence should be addressed.
Computers 2026, 15(3), 171; https://doi.org/10.3390/computers15030171
Submission received: 22 January 2026 / Revised: 28 February 2026 / Accepted: 4 March 2026 / Published: 6 March 2026
(This article belongs to the Special Issue Machine Learning: Innovation, Implementation, and Impact)

Abstract

The automated management of invasive weeds is critical for sustainable agriculture, yet the performance of deep learning models in real-world fields is often compromised by two factors: challenging environmental conditions and the high cost of data annotation. This study tackles both issues through a diagnostic-driven, semi-supervised framework. Using a unique dataset of approximately 975 labelled and 10,000 unlabelled images of Guinea Grass in sugarcane, we first establish strong supervised baselines for classification (ResNet) and detection (YOLO, RF-DETR), achieving F1 scores up to 0.90 and mAP50 scores exceeding 0.82. Crucially, this foundational analysis, aided by interpretability tools, uncovered a pervasive “shadow bias,” where models learned to misidentify shadows as vegetation. This diagnostic insight motivated our primary contribution: a semi-supervised pipeline that leverages unlabelled data to enhance model robustness. By training models on a more diverse set of visual information through pseudo-labelling, this framework not only helps mitigate the shadow bias but also provides a tangible boost in recall, a critical metric for minimising weed escapes in automated spraying systems. To validate our methodology, we demonstrate its effectiveness in a low-data regime on a public crop–weed benchmark. Our work provides a clear and field-tested framework for developing, diagnosing, and improving robust computer vision systems for the complex realities of precision agriculture.

1. Introduction

Weeds pose a persistent threat to global crop production, requiring extensive resources for manual removal or chemical intervention [1]. In the pursuit of more sustainable and efficient agricultural practices, precision agriculture has emerged as a transformative paradigm. By leveraging advanced sensing and automation, precision agriculture enables targeted interventions that can significantly reduce the economic and environmental costs associated with weed management. Central to this approach is the development of robust, real-time weed detection systems capable of operating effectively in diverse and dynamic field conditions [2,3].
The emergence of deep learning has revolutionised computer vision, and its application to weed detection is no exception. Convolutional Neural Networks (CNNs), particularly single-stage object detectors like You Only Look Once (YOLO) [4], have been widely adopted for their ability to balance high accuracy with real-time inference speeds, making them ideal for deployment on agricultural robots. Concurrently, newer architectures based on the Transformer model, such as the Detection Transformer (DETR) [5] and its real-time variants [6,7], offer a compelling end-to-end alternative that removes the need for many hand-designed components [8].
Despite these advancements, the practical deployment of such models in real-world agricultural environments reveals critical challenges that are often abstracted away in benchmark datasets. First, model performance can degrade substantially under the variable lighting, shadows, and occlusions inherent to field operations [9]. Our preliminary work identified that models can develop a strong “shadow bias,” learning to associate high-contrast shadows with vegetation rather than relying on true morphological features. Second, the supervised learning paradigm, upon which these models are built, demands massive, meticulously annotated datasets. The manual labelling process represents both a significant resource bottleneck and a potential source of annotation error, with studies reporting that even trained agronomists can exhibit mislabelling rates as high as 12% [10]. We define “shadow bias” as a learned spurious correlation in which the model uses shadow edges and high-contrast boundaries as a proxy for vegetation presence, as distinct from general illumination variation.
Semi-supervised learning (SSL) has gained traction as a powerful strategy to mitigate this dependency on labelled data by leveraging the vast quantities of unlabelled imagery that are easy to collect [11]. While SSL has shown promise in agricultural segmentation tasks [12,13], and techniques like pseudo-labelling can boost recall in weed detection [14], a systematic comparison of how SSL impacts different object detection architectures (CNN vs. Transformer) in challenging field conditions remains underexplored.
This paper addresses these gaps by presenting a comprehensive framework for weed detection that tackles the dual problems of environmental variability and data scarcity. We report a systematic investigation that begins with supervised learning, uses interpretability to diagnose critical failure modes, and leverages semi-supervised learning as a practical solution.
Our research objectives are threefold: (1) to diagnose shadow-induced bias in weed detection models through interpretability analysis, (2) to develop a semi-supervised learning pipeline that leverages unlabelled data to mitigate this bias and improve recall, and (3) to compare CNN-based and Transformer-based detection architectures under identical conditions on both a proprietary field dataset and a public benchmark.
Our primary contributions are as follows:
  • A systematic comparative framework that benchmarks both quadrant-based classification (ResNet) and object detection (YOLO, RF-DETR), incorporating Grad-CAM interpretability analysis to diagnose “shadow bias” as a critical failure mode in agricultural vision systems.
  • A diagnostic-driven semi-supervised learning pipeline that integrates unlabelled data through single-pass pseudo-labelling, demonstrating measurable improvements in recall and generalisation under challenging field conditions.
  • A rigorous and transparent evaluation methodology, including explicit data leakage prevention, cross-architecture comparison with Optuna hyperparameter optimisation, and external validation on the public CropAndWeed benchmark.

2. Related Work

The automated identification of weeds has been a longstanding goal in precision agriculture, with methodologies evolving in sophistication alongside advances in computer vision and deep learning. This section reviews the key technological paradigms, from classical techniques to modern deep learning architectures, and situates our work within the current research landscape.

2.1. Classical and Early Machine Learning Approaches

Early attempts at automated weed detection relied on classical computer vision techniques, which typically involved a two-step process of feature engineering followed by classification. These methods often utilised hand-crafted features based on colour, shape, and texture. For instance, colour-space transformations and thresholding of vegetation indices were common for segmenting plants from soil backgrounds [15]. Subsequent classification was performed using traditional machine learning algorithms like Support Vector Machines (SVMs) or Random Forests [16]. While these methods were computationally efficient and required relatively small datasets, their performance was fragile. They often failed to generalise across varying field conditions, being highly sensitive to changes in illumination, soil type, and plant growth stages [16,17]. Their reliance on manual feature engineering limited their ability to capture the complex hierarchical patterns needed for robust weed identification.

2.2. Supervised Deep Learning for Weed Detection

The deep learning revolution marked a paradigm shift in weed detection, with Convolutional Neural Networks (CNNs) becoming the dominant approach. CNNs eliminate the need for manual feature engineering by learning discriminative representations directly from pixel data. Initial work focused on image-level classification, where models like Inception and ResNet were trained to distinguish between different weed species and crops [18]. The creation of large-scale public datasets, such as DeepWeeds [19], was instrumental in this progress, enabling models to achieve classification accuracies exceeding 95% on multi-species tasks. Architectures like ResNet [20], in particular, became a staple in agricultural vision due to their ability to train very deep networks effectively.
More recently, the focus has shifted from classification to the more challenging task of object detection, which provides the precise location of each weed instance required for targeted spraying. Two-stage detectors like Faster R-CNN have been successfully applied but often at the expense of real-time performance [21]. Consequently, single-stage detectors like You Only Look Once (YOLO) and its variants have gained prominence for their impressive balance of speed and accuracy [22]. Other single-stage models like RetinaNet have also been explored, particularly for their use of a focal loss to handle the class imbalance between sparse weeds and dense crops [23]. While these CNN-based detectors have set high benchmarks, they can still struggle with detecting very small or heavily occluded weeds.

2.3. Transformer-Based Architectures in Agriculture

Inspired by their success in natural language processing, Transformer-based architectures have recently been adapted for computer vision. The Detection Transformer (DETR) [5] introduced a fully end-to-end object detection pipeline, eliminating the need for hand-tuned components like anchors and non-maximum suppression (NMS). While pioneering, the original DETR suffered from slow convergence and high computational cost. Subsequent research has focused on developing more efficient variants. In agriculture, DETR-like models are beginning to be explored for tasks like weed detection [24,25]. For instance, RF-DETR [26] represents a recent effort to create a real-time, high-performance Transformer-based detector. However, the application and systematic comparison of these models in high-similarity crop–weed environments remains limited, with few studies directly benchmarking their performance against highly optimised CNNs under identical conditions.

2.4. Semi-Supervised Learning to Reduce Annotation Burden

A significant and practical barrier to deploying deep learning models in agriculture is the substantial cost and effort required for data annotation [27]. Semi-supervised learning (SSL) directly addresses this by leveraging large amounts of unlabelled data alongside a smaller labelled set. A common and effective SSL technique is pseudo-labelling, where a “teacher” model trained on labelled data generates annotations for unlabelled images. These pseudo-labelled samples are then used to train a “student” model, enhancing its generalisation. In agriculture, SSL has been successfully used to improve model accuracy while drastically reducing labelling requirements [14,28]. However, much of the existing work has focused on classification or segmentation tasks. The systematic application of SSL to different object detection architectures (i.e., YOLO vs. DETR) for weed detection, and an analysis of its impact on robustness against real-world challenges like “shadow bias,” is an underexplored area.
Our work bridges these research directions. We provide a direct, rigorous comparison of state-of-the-art CNN (YOLO) and Transformer (RF-DETR) architectures on a challenging, field-collected dataset. Crucially, we extend this comparison to the semi-supervised domain, evaluating how pseudo-labelling impacts the performance and robustness of each architectural paradigm, thereby addressing a key gap in the current literature.

3. Experiments

Our experimental methodology was designed to systematically evaluate and compare different approaches for weed detection under realistic and challenging field conditions. We structured our investigation into two primary pipelines: (1) a quadrant-based binary classification approach to detect weed presence within image regions, and (2) a bounding-box-based object detection approach for precise localisation.

3.1. Dataset Curation and Preparation

Our experimental evaluation is built upon a field-collected dataset designed to reflect operational conditions in sugarcane agriculture. All data was collected from sugarcane paddocks heavily infested with Guinea Grass.

3.1.1. Labelled Datasets: A and B

Our initial analysis revealed that model performance was highly dependent on environmental conditions, as quantified by the performance gap between Dataset A and Dataset B reported in Results Section 4. To study this systematically, we partitioned our labelled data into two distinct sets, as shown in Figure 1:
  • Dataset A: Comprising images from paddocks ‘mw5_1330’ and ‘mw5_1331’, this set features relatively clear, well-lit conditions, serving as our baseline for model performance. Dataset A contains approximately 620 images at 4000 × 3000 pixel resolution, with 4442 sugarcane and 351 Guinea Grass bounding-box annotations, collected under predominantly sunny midday conditions.
  • Dataset B: A more challenging compilation including images from ‘mw5_1327’ and ‘paddock_wt2’. Dataset B contains approximately 355 images with 506 sugarcane and 281 Guinea Grass annotations. This set is characterised by darker images, strong, inconsistent shadows, and higher visual similarity between crop and weed, designed specifically to test model robustness under adverse conditions. Quantitative illumination statistics are not available for these subsets; the qualitative distinction is supported by Figure 1 and the performance differential in our experiments.
Together, these sets contain approximately 975 labelled images. Table 1 presents the detailed distribution of annotations.

3.1.2. Quadrant Splitting and Label Generation for Classification

For the classification task, each high-resolution source image I was divided into four non-overlapping quadrants, { quadrant i } i = 1 4 . A quadrant was assigned a binary label y i { 0 , 1 } based on the presence of Guinea Grass (GG). A quadrant was labelled as positive ( y i = 1 ) if any single GG bounding box within it satisfied the following condition:
Area ( box g g quadrant i ) Area ( box g g ) τ
where the overlap threshold τ was set to 0.33. This strategy, visualised in Figure 2, ensures that only quadrants with significant weed presence are considered positive, minimising noise from marginal overlaps. The threshold τ = 0.33 requires at least one-third of a bounding-box area to overlap with a quadrant, ensuring meaningful spatial association. Preliminary experiments with τ = 0.25 produced excessive false positives, while τ = 0.50 overly reduced the number of positive quadrants; 0.33 provided the best balance. For the detection task, we found that quadrant-based splitting was detrimental, as splitting often truncated objects and degraded performance. Therefore, all detection models were trained on full, unsplit images.

3.1.3. Handling Class Imbalance

As shown in Table 1, the raw dataset exhibited a significant class imbalance, with sugarcane instances far outnumbering Guinea Grass instances. To address class imbalance, we downsampled the training set by removing images containing only sugarcane. This rebalancing was crucial for improving the recall of the minority class (Guinea Grass), which is the primary target of our detection system.

3.1.4. Dataset Integrity and Validation Strategy

All labelled data was partitioned into training (70%), validation (20%), and testing (10%) sets. We adhered to a strict validation protocol across all experiments to ensure the scientific validity of our results. In multi-stage training pipelines, such as those involving pretraining on one dataset and fine-tuning on another, it is critical to prevent any overlap between the final test set and any data used during any training phase. During initial exploratory experiments, we discovered that sequential frame extraction from video footage had introduced near-duplicate images across training and test splits, resulting in anomalously high test performance that did not generalise to new footage. Upon identifying this data leakage, we implemented corrective measures: (a) complete re-splitting of the dataset with temporal separation between splits to eliminate near-duplicates, (b) automated deduplication checks based on perceptual hashing, and (c) retraining all models from scratch on the corrected splits. This rigorous separation ensures that our metrics are a true and unbiased measure of the models’ generalisation capabilities.

3.1.5. Public Benchmark Dataset for Method Validation

To validate the generalisability of our proposed semi-supervised learning methodology, we also conduct experiments on a publicly available dataset, the CropAndWeed Image Dataset [29]. We explicitly note that the characteristics of this dataset differ from our own; CropAndWeed largely contains images of individual seedlings under more controlled lighting, making it ideal for detection tasks. In contrast, our proprietary dataset is designed to simulate complex detection scenarios with high foliage density and severe lighting variations. Therefore, the purpose of this experiment is not to directly compare raw mAP scores between datasets, but to verify that our semi-supervised pipeline provides a consistent performance improvement over a supervised baseline, even on data with different characteristics.

3.2. Supervised Learning Pipelines

3.2.1. Model Architectures

Our experiments compare leading architectures from both CNN and Transformer families. For classification, we used a ResNet-50 [20] backbone. For detection, we benchmarked YOLOv12-s, a highly optimised and lightweight CNN-based detector, against RF-DETR [26], a state-of-the-art real-time Transformer-based detector. The RF-DETR model used was the ‘RF-DETR-base’, which has a comparable complexity to a medium-sized YOLO model. The overall structure of this supervised pipeline is illustrated in Figure 3.

3.2.2. Implementation and Hyperparameter Optimisation

Models were implemented in PyTorch 2.3 and trained on a distributed system with NVIDIA RTX 3050 and 4060 Ti GPUs. We conducted an extensive hyperparameter search for all models using the Optuna framework [30]. The search space included learning rate, weight decay, momentum, loss function coefficients, and the intensity of various data augmentations (e.g., flips, rotation degrees, colour jitter). Training protocols were tailored to each architecture. The YOLO model was trained for up to 1000 epochs with an early stopping mechanism to prevent overfitting. In contrast, the RF-DETR model was trained for approximately 150 epochs without early stopping, saving the best-performing weights throughout the training process. Each model was trained to minimise the task-specific loss L over the training dataset D t r a i n :
θ ^ = arg min θ ( x , y ) D t r a i n L ( f ( x ; θ ) , y )
where f ( x ; θ ) is the model’s prediction for input x .

3.3. Semi-Supervised Learning Pipeline

To address the annotation bottleneck, we implemented a single-pass pseudo-labelling pipeline as illustrated in Figure 4. This approach leverages our ∼10,000 unlabelled images ( D u ) alongside our smaller labelled set ( D l ).
  • Train Teacher: A teacher model, f ( x ; θ ^ t e a c h e r ) , is first trained exclusively on the labelled set D l .
  • Generate Pseudo-Labels: The teacher model is used to predict bounding boxes and class labels, y ^ j , for each image x j u D u . Predictions were filtered using class-specific confidence thresholds: c = 0.5 uniformly for YOLOv12-s, and a mixed strategy ( c = 0.8 for Guinea Grass, c = 0.5 for sugarcane) for RF-DETR. Bounding boxes smaller than 32 × 32 pixels were removed. This yielded approximately 8200 and 6500 pseudo-labelled images for YOLO and RF-DETR, respectively, from 10,000 unlabelled images.
  • Train Student: A final student model, f ( x ; θ s t u d e n t ) , is then trained on the combined dataset D l D u . Its training objective is a weighted combination of the supervised loss on labelled data and the pseudo-supervised loss on unlabelled data:
    L t o t a l = L s u p + λ L p s e u d o = 1 N l i = 1 N l L d e t ( x i l , y i l ) + λ 1 N u j = 1 N u I ( p ^ j > c ) · L d e t ( x j u , y ^ j )
    where L d e t is the standard detection loss (e.g., from YOLO or DETR), λ is a weighting hyperparameter, I ( · ) is an indicator function that includes an unlabelled sample only if its predicted confidence p ^ j exceeds the threshold c. The weighting hyperparameter was set to λ = 0.5 in all experiments, prioritising the supervised loss. No EMA decay was used, as our single-pass pipeline employs a fixed teacher.

3.4. Evaluation Metrics

For classification, we report accuracy, precision, recall, and the F1 score ( F 1 = 2 · Precision · Recall Precision + Recall ), with F1 as the primary metric given class imbalance. For object detection, we adopt the COCO evaluation protocol and report mAP@50 and mAP@50-95 (averaged across IoU thresholds 0.5 to 0.95 in steps of 0.05), along with precision and recall. High recall is particularly critical in weed spraying applications to minimise escapes.
Our experimental design follows standard protocols: systematic comparison of YOLOv12-s and RF-DETR, Optuna-based hyperparameter optimisation, fixed 70%/20%/10% splits, and evaluation on a held-out test set never used during training. Reproducibility is ensured through fixed random seeds and consistent data partitions.

4. Results

In this section, we present the empirical outcomes of our experimental pipelines. We begin by detailing the performance of our supervised classification (SC) models, including a critical interpretability analysis that guided our subsequent work. We then present the comprehensive results from our object detection experiments, comparing fully supervised and semi-supervised approaches across both CNN and Transformer architectures.

4.1. Fully Supervised Classification

Our initial investigation focused on a quadrant-based classification pipeline using a ResNet-50 model. We first evaluated the model on the simpler Dataset A alone to establish a performance baseline, after which we trained on the more complex combined A + B dataset to assess robustness. The final results after extensive hyperparameter tuning are summarised in Table 2. The model achieved a strong F1 score of 0.88 on the Dataset A test set. When trained on the combined dataset, the model achieved a comparable test F1 of 0.88 from scratch, which was further improved to 0.89 by using the best-performing weights as a pretrained initialisation for a final tuning run.

4.1.1. Analysis of Model Behaviour

To better understand the model’s decision-making process, we analysed its performance using confusion matrices and t-SNE visualisations. The confusion matrices in Figure 5 show that while both training strategies are effective, the pretrained initialisation slightly reduces misclassifications. The t-SNE projections (Figure 6) corroborate this, showing slightly clearer separation between the sugarcane and Guinea Grass clusters for the model initialised with pretrained weights.

4.1.2. Diagnostic Finding: “Shadow Bias”

Despite the high F1 score, we sought to understand the sources of remaining error. We employed Grad-CAM to visualise the classifier’s attention, leading to a critical insight. As shown in Figure 7, the model frequently attended to high-contrast shadows rather than the actual morphological features of the weeds or irrelevant background textures. This discovery of a “shadow bias” demonstrated that a robust solution required more than simple classification; it necessitated a model with stronger spatial understanding, motivating our subsequent shift to object detection.

4.2. Semi-Supervised Classification Results

To leverage unlabelled data, we extended the best supervised model to incorporate the ∼10,000 unlabelled images via single-pass pseudo-labelling. A teacher model (SC2) was used to generate labels on the unlabelled set, with a high confidence threshold of 0.99 applied to ensure label quality. This process yielded 1390 pseudo-labelled images. As shown in Table 3, this approach resulted in a marginal improvement in the test F1 score from 0.89 to 0.90. While statistically positive, the modest nature of this gain further suggested that a classification framework might be reaching its performance ceiling for this complex task.

4.3. Object Detection Performance

Motivated by the limitations of the classification approach, we transitioned to object detection to enable precise localisation. We performed all detection experiments on full images, as our preliminary analysis showed that quadrant splitting degraded performance by truncating objects at boundaries.

4.3.1. Fully Supervised Detection Baselines

We first established supervised baselines for YOLOv12-s and RF-DETR on the combined A + B dataset. Training curves for representative runs are shown in Figure 8, illustrating that YOLOv12-s converges more rapidly than RF-DETR under our experimental conditions. The comprehensive results are presented in Table 4. The highly-tuned YOLOv12-s model (SD26) achieved a strong mAP@50 of 0.807 and an mAP@50-95 of 0.543. The RF-DETR model (SD27), with minimal augmentation, achieved a respectable mAP@50 of 0.777, demonstrating the viability of Transformer-based approaches.

4.3.2. Semi-Supervised Detection Enhancements

We then applied our single-pass pseudo-labelling pipeline to the detection models. Based on our analysis of teacher model confidence distributions, we used a uniform 0.5 confidence threshold for YOLO, while for RF-DETR we employed a mixed-threshold strategy (0.8 for the difficult Guinea Grass class, 0.5 for others) to maximise label quality.
As shown in Table 5, the impact of SSD was significant. The semi-supervised YOLOv12-s model (SSD8) improved its mAP@50 to 0.828. More critically for practical applications, its recall saw a substantial increase from 0.771 to 0.782. This reduction in false negatives (missed weeds) is a key outcome, as it directly translates to more effective field interventions. The semi-supervised RF-DETR model (SSD10) also showed improvement, reaching a mAP@50 of 0.783.

4.3.3. Qualitative Analysis and Public Benchmark Validation

To provide illustrative examples of typical model behaviour, a qualitative comparison of the detection models is presented in Figure 9. These examples are selected to demonstrate characteristic strengths and failure modes and are not intended as quantitative evidence; the quantitative evaluation is provided in the preceding tables. Finally, to validate the generalisability of our SSD methodology, we conducted an experiment on the public CropAndWeed dataset [29] under a low-data regime (10% labelled data). As shown in Table 6, our SSD pipeline outperformed the supervised baseline, confirming its utility as a general technique for reducing annotation costs in agricultural vision.

5. Discussion

Our results show that a diagnostic-driven approach (progressing from supervised classification through interpretability analysis to semi-supervised detection) yields measurable improvements in accuracy and recall. The semi-supervised YOLOv12-s achieved an mAP@50 of 0.828 (up from 0.807) with recall increasing from 0.771 to 0.782, and the pipeline also improved performance on the CropAndWeed benchmark (mAP@50: 0.90 to 0.91). We discuss these findings below.

5.1. From Classification to Detection: The Necessity of Spatial Awareness

Our initial approach using a quadrant-based ResNet-50 classifier achieved a high F1 score of 0.89. However, a purely metric-based assessment was misleading. Our critical finding, derived from Grad-CAM interpretability analysis (Figure 7), was the model’s development of a “shadow bias.” The model appeared to learn to associate high-contrast shadows with the presence of vegetation, a spurious correlation that limited its ability to generalise. This underscores a fundamental limitation of classification for dense-canopy tasks: it lacks the inherent spatial localisation needed to distinguish target objects from their complex surroundings. This diagnostic step was crucial, as it justified our pivot to object detection, which is generally better suited for precise localisation and mitigating the impact of confounding background features.

5.2. Architectural Comparison: The Value of Specialisation vs. Potential of Transformers

In our supervised detection experiments, the highly-tuned YOLOv12-s consistently outperformed RF-DETR. We attribute this performance gap not necessarily to an inherent superiority of CNNs, but likely to the maturity of the YOLO ecosystem. YOLOv12-s benefits from a powerful, built-in augmentation pipeline and has been subject to years of community-driven optimisation for speed and accuracy. Our hyperparameter sweeps (Figure 10) confirm this, showing that YOLO’s performance is highly sensitive to geometric and colour augmentations. In contrast, RF-DETR, a more recent architecture, required more careful tuning of its internal parameters (e.g., learning rates for the encoder) and was tested with minimal augmentation in our baseline. While our RF-DETR models achieved a respectable mAP of ∼0.78, suggesting strong potential, closing the performance gap may require a more extensive, domain-specific augmentation strategy and hyperparameter search. This finding suggests that while Transformers are promising, conclusions about the relative merits of CNN vs. Transformer architectures should be drawn cautiously, as they are specific to the configurations and tuning budgets evaluated in this study.

5.3. The Practical Impact of Semi-Supervised Learning

A notable outcome of our study is the observed benefit of semi-supervised learning. While the final mAP improvement from SSL was modest (from 0.807 to 0.828 for YOLO), the practical implication lies in the substantial boost in recall. In precision agriculture, particularly for automated spraying, a false negative (a missed weed) is far more costly than a false positive. A missed weed survives to compete with crops and produce seeds, leading to exponential future infestations. The ∼3% absolute increase in recall for our YOLO model means significantly fewer weeds are missed in the field, a critical improvement for any real-world deployment. Furthermore, our validation on a public benchmark provided evidence that our SSL pipeline may serve as a generalisable technique that can drastically reduce annotation requirements, a key bottleneck in scaling agricultural AI.
Although the absolute numerical gains are modest, each percentage point of recall improvement corresponds to proportionally fewer weeds escaping treatment across large paddock areas. Given the exponential reproductive potential of escaped weeds, even small recall improvements translate to substantial long-term reductions in weed pressure and herbicide use.

5.4. Limitations

The following methodological constraints should be considered when interpreting our results:
  • Proprietary Dataset: Our field-collected dataset is not publicly available, limiting direct reproducibility. Validation on the public CropAndWeed benchmark partially mitigates this concern.
  • Single-Pass SSL: We employed a single-pass pseudo-labelling strategy for computational efficiency, which may underperform more advanced iterative or consistency-based SSL methods.
  • Unequal Architecture Tuning: Our RF-DETR models were not subjected to the same degree of augmentation and tuning as our YOLO models, meaning performance differences between architectures should be interpreted cautiously.
  • Temporal and Environmental Scope: The dataset was collected at a single geographic location during a specific growth stage. Plant morphology, canopy density, and shadow patterns vary with phenological stage and seasonal conditions.
  • Single Crop–Weed Pair: All primary experiments involve sugarcane and Guinea Grass only.
  • No Statistical Significance Testing: Due to the computational cost of Optuna-based training, we do not report confidence intervals or multi-seed evaluations, though consistent improvements across architectures and benchmarks provide indirect robustness evidence.
  • No Edge-Device Evaluation: Inference speed and power consumption on deployment hardware were not assessed.

5.5. Future Directions

Building on the identified limitations, key avenues for future work include:
  • Creating and releasing a large-scale, high-density public benchmark for crop–weed detection in complex field conditions.
  • Investigating advanced SSL techniques such as iterative teacher–student loops, consistency regularisation, and domain adaptation.
  • Conducting a more exhaustive hyperparameter search for Transformer-based detectors with domain-specific augmentation strategies.
  • Performing cross-season, cross-location, and multi-species validation to establish temporal and geographic robustness.
  • Evaluating model quantisation and pruning for edge deployment on agricultural robots.
  • Reporting multi-seed evaluations with confidence intervals for formal statistical validation.
In summary, our work provides a pathway for developing weed detection systems under challenging field conditions by diagnosing model failures and leveraging unlabelled data. Broader applicability requires validation across diverse environments, species, and seasons.

6. Conclusions

In this work, we presented a comprehensive and systematic investigation into the detection of Guinea Grass in challenging, real-world sugarcane fields. Our study addressed the dual challenges of high crop–weed visual similarity and performance degradation due to environmental variability, as evidenced by the Grad-CAM analysis (Figure 7) revealing shadow-driven attention patterns and the substantial performance gap between the simpler Dataset A and the challenging Dataset B conditions. Our progression from classification to object detection demonstrated that a thorough, diagnostic-driven approach is critical for developing effective agricultural vision systems.
Our key contribution lies not in a single model, but in the methodology itself. We showed that interpretability tools like Grad-CAM are invaluable for uncovering subtle failure modes, such as the “shadow bias,” which can mislead standard accuracy metrics. This insight guided our transition to object detection, where we benchmarked state-of-the-art CNN (YOLOv12-s) and Transformer (RF-DETR) architectures. While a highly-tuned YOLO model established the strongest supervised baseline, our most significant finding was the practical utility of semi-supervised learning. By leveraging a large pool of unlabelled data through a simple and efficient pseudo-labelling pipeline, we were able to enhance model performance, most notably achieving a crucial increase in recall. This directly translates to fewer missed weeds, a paramount objective for automated spraying systems.
Quantitatively, our best semi-supervised YOLOv12-s model achieved an mAP@50 of 0.828, an improvement from the supervised baseline of 0.807. Recall increased from 0.771 to 0.782, while the classification pipeline reached an F1 score of 0.90. On the public CropAndWeed benchmark under a low-data regime, our SSL pipeline improved mAP@50 from 0.90 to 0.91, providing evidence of cross-dataset applicability.
These results are subject to important caveats: the primary dataset is from a single geographic location and growing season, involves a single crop–weed pair, and multi-seed statistical testing is not provided. Operational deployment would require additional field trials, edge-device optimisation, and multi-species validation. Nevertheless, our findings demonstrate that combining interpretability analysis with semi-supervised learning offers a practical strategy for developing robust weed detection systems under challenging field conditions.

Author Contributions

A.S.: Conceptualisation, Data Curation, Data Analysis, Supervision, Reviewing/editing the draft; S.H.: Conceptualisation, Data Curation, Data Analysis, Software Development, DL Algorithm Design, Visualization, Writing original draft; M.R.A.: Conceptualisation, Data Curation, Data Analysis, Supervision, Reviewing/editing the draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Australian Government’s Reef Trust and the Great Barrier Reef Foundation.

Data Availability Statement

The proprietary Guinea Grass dataset that supports the findings of this study was collected under a research agreement and is not publicly available due to commercial sensitivity and privacy restrictions. The public CropAndWeed dataset used for methodology validation is available at [29].

Acknowledgments

The authors extend our thanks to our project Industry partner, AutoWeed, specially, Alex Olsen, for collecting the GG data used in this study and to James Cook University for providing the necessary infrastructure and computational resources to conduct this study. During the preparation of this work, the author(s) used Gemini 2.5 pro, a large language model from Google, in order to assist with language editing. After using this tool, the author(s) reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Oerke, E.C. Crop losses to pests. J. Agric. Sci. 2006, 144, 31–43. [Google Scholar] [CrossRef]
  2. Azghadi, M.R.; Olsen, A.; Wood, J.; Saleh, A.; Calvert, B.; Granshaw, T.; Fillols, E.; Philippa, B. Precision robotic spot-spraying: Reducing herbicide use and enhancing environmental outcomes in sugarcane. Comput. Electron. Agric. 2025, 235, 110365. [Google Scholar] [CrossRef]
  3. Lammie, C.; Olsen, A.; Carrick, T.; Rahimi Azghadi, M. Low-power and high-speed deep FPGA inference engines for weed classification at the edge. IEEE Access 2019, 7, 51171–51184. [Google Scholar] [CrossRef]
  4. Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
  5. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision (ECCV); LNCS; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12346, pp. 213–229. [Google Scholar] [CrossRef]
  6. Guo, Z.; Cai, D.; Zhou, Y.; Xu, T.; Yu, F. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning. Plant Methods 2024, 20, 105. [Google Scholar] [CrossRef]
  7. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
  8. Saleh, A.; Olsen, A.; Wood, J.; Philippa, B.; Azghadi, M.R. WeedCLR: Weed contrastive learning through visual representations with class-optimized loss in long-tailed datasets. Comput. Electron. Agric. 2024, 227, 109526. [Google Scholar]
  9. Saleh, A.; Olsen, A.; Wood, J.; Philippa, B.; Azghadi, M.R. FieldNet: Efficient real-time shadow removal for enhanced vision in field robotics. Expert Syst. Appl. 2025, 279, 127442. [Google Scholar]
  10. Dyrmann, M.; Karstoft, H.; Midtiby, H.S. Plant species classification using deep convolutional neural network. Biosyst. Eng. 2016, 151, 72–80. [Google Scholar] [CrossRef]
  11. Van Engelen, J.; Hoos, H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar]
  12. Pérez-Ortiz, M.; Peña, J.; Gutiérrez, P.; Torres-Sánchez, J.; Hervás-Martínez, C.; López-Granados, F. A semi-supervised system for weed mapping in sunflower crops using unmanned aerial vehicles and a crop row detection method. Appl. Soft Comput. 2015, 37, 533–544. [Google Scholar] [CrossRef]
  13. Nong, C.; Fan, X.; Wang, J. Semi-supervised learning for weed and crop segmentation using UAV imagery. Front. Plant Sci. 2022, 13, 927368. [Google Scholar] [CrossRef]
  14. Saleh, A.; Olsen, A.; Wood, J.; Philippa, B.; Azghadi, M.R. Semi-supervised weed detection for rapid deployment and enhanced efficiency. Comput. Electron. Agric. 2025, 236, 110410. [Google Scholar] [CrossRef]
  15. Torres-Sánchez, J.; López-Granados, F.; Peña, J.M. An automatic object-based method for optimal thresholding in UAV images: Application for vegetation detection in herbaceous crops. Comput. Electron. Agric. 2015, 114, 43–52. [Google Scholar] [CrossRef]
  16. Wang, A.; Zhang, W.; Wei, X. A review on weed detection using ground-based machine vision and image processing techniques. Comput. Electron. Agric. 2019, 158, 226–240. [Google Scholar] [CrossRef]
  17. Hasan, A.S.M.M.; Sohel, F.; Diepeveen, D.; Laga, H.; Jones, M.G.K. A survey of deep learning techniques for weed detection from images. Comput. Electron. Agric. 2021, 184, 106067. [Google Scholar] [CrossRef]
  18. Ferreira, A.; Freitas, D.; Silva, G.; Pistori, H.; Folhes, M. Weed detection in soybean crops using ConvNets. Comput. Electron. Agric. 2017, 143, 314–324. [Google Scholar] [CrossRef]
  19. Olsen, A.; Konovalov, D.A.; Philippa, B.; Ridd, P.; Wood, J.C.; Johns, J.; Banks, W.; Girgenti, B.; Kenny, O.; Whinney, J.; et al. DeepWeeds: A multiclass weed species image dataset for deep learning. Sci. Rep. 2019, 9, 2058. [Google Scholar] [CrossRef]
  20. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NA, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
  21. Saleem, M.H.; Potgieter, J.; Arif, K.M. Weed detection by faster RCNN model: An enhanced anchor box approach. Agronomy 2022, 12, 1580. [Google Scholar] [CrossRef]
  22. Dang, F.; Chen, D.; Lu, Y.; Li, Z. YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Comput. Electron. Agric. 2023, 205, 107655. [Google Scholar] [CrossRef]
  23. Peng, H.; Li, Z.; Zhou, Z.; Shao, Y. Weed detection in paddy field using an improved RetinaNet network. Comput. Electron. Agric. 2022, 199, 107179. [Google Scholar] [CrossRef]
  24. Xu, Y.; Ren, S.; Li, L.; Wei, P.; Deng, H.; Wang, A.; Rao, Y. SP-DETR: Superior point weak semi-supervised DETR with teacher–student paradigm for crop and weed detection. Comput. Electron. Agric. 2025, 239, 111130. [Google Scholar] [CrossRef]
  25. Islam, T.; Sarker, T.T.; Ahmed, K.R.; Rankrape, C.B.; Gage, K. WeedVision: Multi-Stage Growth and Classification of Weeds using DETR and RetinaNet for Precision Agriculture. arXiv 2025, arXiv:2502.14890. [Google Scholar]
  26. Robicheaux, P.; Gallagher, J.; Nelson, J.; Robinson, I. RF-DETR: A SOTA Real-Time Object Detection Model. 2025. Available online: https://blog.roboflow.com/rf-detr/ (accessed on 20 March 2025).
  27. Liu, T.; Jin, X.; Zhang, L.; Wang, J.; Chen, Y.; Hu, C.; Yu, J. Semi-supervised learning and attention mechanism for weed detection in wheat. Crop Prot. 2023, 174, 106389. [Google Scholar] [CrossRef]
  28. Shorewala, S.; Ashfaque, A.; Sidharth, R.; Verma, U. Weed Density and Distribution Estimation for Precision Agriculture Using Semi-Supervised Learning. IEEE Access 2021, 9, 27971–27986. [Google Scholar] [CrossRef]
  29. Steininger, D.; Trondl, A.; Croonen, G.; Simon, J.; Widhalm, V. The CropAndWeed Dataset: A Multi-Modal Learning Approach for Efficient Crop and Weed Manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3729–3738. [Google Scholar]
  30. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Association for Computing Machinery: Anchorage, AK, USA, 2019; pp. 2623–2631. [Google Scholar]
Figure 1. Comparison of Dataset A (top row) vs. Dataset B (bottom row). Shadows and colour overlap make weed detection more difficult in B. Each column shows examples of (1) Guinea Grass, (2) Both/Ambiguous, and (3) Sugar Cane. Colored bounding boxes indicate ground truth annotations: blue for Guinea Grass and red for Sugarcane.
Figure 1. Comparison of Dataset A (top row) vs. Dataset B (bottom row). Shadows and colour overlap make weed detection more difficult in B. Each column shows examples of (1) Guinea Grass, (2) Both/Ambiguous, and (3) Sugar Cane. Colored bounding boxes indicate ground truth annotations: blue for Guinea Grass and red for Sugarcane.
Computers 15 00171 g001
Figure 2. Quadrant splitting and labelling with a 33% overlap threshold. The original image (left) has bounding boxes for Guinea Grass (blue) and Sugarcane (red). Each quadrant (right) is saved as a separate image. A quadrant is labelled ‘1’ (weed-positive) if at least one GG box overlaps 33 % of its bounding box area; otherwise ‘0’ (weed-negative).
Figure 2. Quadrant splitting and labelling with a 33% overlap threshold. The original image (left) has bounding boxes for Guinea Grass (blue) and Sugarcane (red). Each quadrant (right) is saved as a separate image. A quadrant is labelled ‘1’ (weed-positive) if at least one GG box overlaps 33 % of its bounding box area; otherwise ‘0’ (weed-negative).
Computers 15 00171 g002
Figure 3. Classification pipeline: each labelled image is split into quadrants, then a ResNet-50 model (initialized from ImageNet) classifies each quadrant as weed or not weed. In the images, red and blue bounding boxes indicate Sugarcane and Guinea Grass, respectively. Orange and teal boxes denote the base network and custom classification layers, while grey-blue blocks represent the assembled Teacher (T) and Student (S) models. Dashed arrows indicate the transfer of pseudo-labels and model weights from the Teacher to the Student.
Figure 3. Classification pipeline: each labelled image is split into quadrants, then a ResNet-50 model (initialized from ImageNet) classifies each quadrant as weed or not weed. In the images, red and blue bounding boxes indicate Sugarcane and Guinea Grass, respectively. Orange and teal boxes denote the base network and custom classification layers, while grey-blue blocks represent the assembled Teacher (T) and Student (S) models. Dashed arrows indicate the transfer of pseudo-labels and model weights from the Teacher to the Student.
Computers 15 00171 g003
Figure 4. Detection pipeline: YOLOv12-s or RF-DETR is trained on full images with bounding-box annotations. Both pipelines eventually finalize with inference and evaluation. In the images, red and blue bounding boxes indicate Sugarcane and Guinea Grass, respectively. Light blue and purple boxes differentiate the YOLOv12 and RF-DETR architectures. Grey-blue blocks represent the assembled Teacher (T) and Student (S) models. Dashed arrows indicate the transfer of pseudo-labels and model weights from the Teacher to the Student.
Figure 4. Detection pipeline: YOLOv12-s or RF-DETR is trained on full images with bounding-box annotations. Both pipelines eventually finalize with inference and evaluation. In the images, red and blue bounding boxes indicate Sugarcane and Guinea Grass, respectively. Light blue and purple boxes differentiate the YOLOv12 and RF-DETR architectures. Grey-blue blocks represent the assembled Teacher (T) and Student (S) models. Dashed arrows indicate the transfer of pseudo-labels and model weights from the Teacher to the Student.
Computers 15 00171 g004
Figure 5. Confusion matrices on the Dataset A + B test set, comparing (a) from-scratch training with (b) using the best from-scratch model as a pretrained initialisation.
Figure 5. Confusion matrices on the Dataset A + B test set, comparing (a) from-scratch training with (b) using the best from-scratch model as a pretrained initialisation.
Computers 15 00171 g005
Figure 6. t-SNE visualisation of final-layer features. (a) From-scratch training features exhibit greater variability across classes. The pretrained model (b) shows slightly clearer cluster separation for sugarcane (green) and Guinea Grass (blue), aligning with its improved F1 score.
Figure 6. t-SNE visualisation of final-layer features. (a) From-scratch training features exhibit greater variability across classes. The pretrained model (b) shows slightly clearer cluster separation for sugarcane (green) and Guinea Grass (blue), aligning with its improved F1 score.
Computers 15 00171 g006
Figure 7. Grad-CAM visualisation revealing the “shadow bias.” The heatmaps show the model’s attention (red areas) drifting towards background shadows instead of the target weed leaves, a key failure mode that guided our research towards object detection. Red and yellow regions indicate high model attention, while blue regions indicate low attention.
Figure 7. Grad-CAM visualisation revealing the “shadow bias.” The heatmaps show the model’s attention (red areas) drifting towards background shadows instead of the target weed leaves, a key failure mode that guided our research towards object detection. Red and yellow regions indicate high model attention, while blue regions indicate low attention.
Computers 15 00171 g007
Figure 8. Validation mAP curves for YOLOv12-s (left) and RF-DETR (right) during supervised training. YOLO converges faster, while RF-DETR demonstrates a slower but steady improvement. Note: different colors are different runs.
Figure 8. Validation mAP curves for YOLOv12-s (left) and RF-DETR (right) during supervised training. YOLO converges faster, while RF-DETR demonstrates a slower but steady improvement. Note: different colors are different runs.
Computers 15 00171 g008
Figure 9. Illustrative qualitative comparison on a challenging test image. (a) Ground truth. (b) The semi-supervised YOLO model correctly identifies all weed instances. (c) The supervised RF-DETR performs well but misses a partially occluded weed. Note on colors: In (a,b), pink bounding boxes denote Sugarcane and the blue bounding box denotes Guinea Grass. In (c), purple bounding boxes denote Sugarcane and the red bounding box denotes Guinea Grass.
Figure 9. Illustrative qualitative comparison on a challenging test image. (a) Ground truth. (b) The semi-supervised YOLO model correctly identifies all weed instances. (c) The supervised RF-DETR performs well but misses a partially occluded weed. Note on colors: In (a,b), pink bounding boxes denote Sugarcane and the blue bounding box denotes Guinea Grass. In (c), purple bounding boxes denote Sugarcane and the red bounding box denotes Guinea Grass.
Computers 15 00171 g009
Figure 10. Optuna hyperparameter importance dashboards illustrating the differing sensitivities of YOLO and RF-DETR. The length of each bar indicates the parameter’s influence on the final mAP score during our extensive hyperparameter search. (a) For YOLOv12-s, the initial learning rate (lr0) is the most critical parameter, followed by the choice of optimizer and several geometric augmentations such as fliplr and shear. This highlights the model’s high sensitivity to both the core training setup and the data augmentation pipeline. (b) In contrast, RF-DETR’s performance is overwhelmingly dictated by the choice of optimizer, followed by architectural settings like the encoder learning rate (lr_encoder) and multi-scale training. These findings confirm that while both models depend on fundamental training parameters, YOLO’s performance is tightly coupled with its augmentation strategy, whereas RF-DETR’s is more sensitive to its core architectural configuration.
Figure 10. Optuna hyperparameter importance dashboards illustrating the differing sensitivities of YOLO and RF-DETR. The length of each bar indicates the parameter’s influence on the final mAP score during our extensive hyperparameter search. (a) For YOLOv12-s, the initial learning rate (lr0) is the most critical parameter, followed by the choice of optimizer and several geometric augmentations such as fliplr and shear. This highlights the model’s high sensitivity to both the core training setup and the data augmentation pipeline. (b) In contrast, RF-DETR’s performance is overwhelmingly dictated by the choice of optimizer, followed by architectural settings like the encoder learning rate (lr_encoder) and multi-scale training. These findings confirm that while both models depend on fundamental training parameters, YOLO’s performance is tightly coupled with its augmentation strategy, whereas RF-DETR’s is more sensitive to its core architectural configuration.
Computers 15 00171 g010
Table 1. Bounding-box distribution across labelled paddocks in Datasets A and B.
Table 1. Bounding-box distribution across labelled paddocks in Datasets A and B.
Paddock IDSugarcaneGuinea Grass
paddock_A13605239
paddock_A2837112
Dataset A4442351
paddock_B117029
paddock_B2336252
Dataset B506281
Table 2. Fully supervised classification (SC) results on the test set. The model’s performance on Dataset A alone reflects its relative simplicity, while fine-tuning on the combined A + B dataset yields the highest overall F1 score. Note: Bold values indicate the best performance.
Table 2. Fully supervised classification (SC) results on the test set. The model’s performance on Dataset A alone reflects its relative simplicity, while fine-tuning on the combined A + B dataset yields the highest overall F1 score. Note: Bold values indicate the best performance.
IDDataset (s)Training StrategyVal F1Test F1
SC1A OnlyFrom Scratch0.960.88
SC2A + BFrom Scratch0.860.88
SC3A + BSC2 as Pretrained Init0.860.89
Table 3. Comparison of fully supervised vs. semi-supervised (SSC) classification performance. Adding pseudo-labelled data provides a marginal increase to the final test F1 score. Note: Bold values indicate the best performance.
Table 3. Comparison of fully supervised vs. semi-supervised (SSC) classification performance. Adding pseudo-labelled data provides a marginal increase to the final test F1 score. Note: Bold values indicate the best performance.
IDTraining StrategyVal F1Test F1
SC3Fully Supervised (Best)0.860.89
SSC1Semi-Supervised (Student)0.850.90
Table 4. Fully supervised detection (SD) results on the combined Dataset A + B test set. Note: Bold values indicate the best performance.
Table 4. Fully supervised detection (SD) results on the combined Dataset A + B test set. Note: Bold values indicate the best performance.
IDModelmAP@50mAP@50-95PrecisionRecall
SD26YOLOv12-s0.8070.5430.8040.771
SD27RF-DETR0.7770.5130.7770.664
Table 5. Semi-supervised detection (SSD) results. SSD significantly improves performance, most notably boosting the recall of the YOLOv12-s model. Note: Bold values indicate the best performance.
Table 5. Semi-supervised detection (SSD) results. SSD significantly improves performance, most notably boosting the recall of the YOLOv12-s model. Note: Bold values indicate the best performance.
IDModelmAP@50mAP@50-95PrecisionRecall
SSD8YOLOv12-s0.8280.5290.8140.782
SSD10RF-DETR0.7850.5070.7850.675
Table 6. Validation of the SSD pipeline on the public CropAndWeed dataset under a low-data regime. Our SSD approach significantly outperforms the supervised baseline, demonstrating its effectiveness in reducing labelling effort. Note: Bold values indicate the best performance.
Table 6. Validation of the SSD pipeline on the public CropAndWeed dataset under a low-data regime. Our SSD approach significantly outperforms the supervised baseline, demonstrating its effectiveness in reducing labelling effort. Note: Bold values indicate the best performance.
Training MethodLabelled Data UsedmAP@50
Supervised Baseline10%0.90
Our SSD Pipeline10% (+90% unlabelled)0.91
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Saleh, A.; Hatano, S.; Rahimi Azghadi, M. Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity. Computers 2026, 15, 171. https://doi.org/10.3390/computers15030171

AMA Style

Saleh A, Hatano S, Rahimi Azghadi M. Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity. Computers. 2026; 15(3):171. https://doi.org/10.3390/computers15030171

Chicago/Turabian Style

Saleh, Alzayat, Shunsuke Hatano, and Mostafa Rahimi Azghadi. 2026. "Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity" Computers 15, no. 3: 171. https://doi.org/10.3390/computers15030171

APA Style

Saleh, A., Hatano, S., & Rahimi Azghadi, M. (2026). Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity. Computers, 15(3), 171. https://doi.org/10.3390/computers15030171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop