YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery

Das, Anindita; Yang, Yong; Subburaj, Vinitha Hannah

doi:10.3390/agriengineering7100313

Open AccessArticle

YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery

by

Anindita Das

,

Yong Yang

^*

and

Vinitha Hannah Subburaj

College of Engineering, West Texas A&M University, Canyon, TX 79016, USA

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(10), 313; https://doi.org/10.3390/agriengineering7100313

Submission received: 28 August 2025 / Revised: 19 September 2025 / Accepted: 22 September 2025 / Published: 23 September 2025

Download

Browse Figures

Versions Notes

Abstract

Weed detection is critical for precision agriculture, enabling targeted herbicide application to reduce costs and enhance crop health. This study utilized UAV-acquired RGB imagery from cotton fields to develop and evaluate deep learning models for weed detection. As sustainable resource management gains importance in rainfed agricultural systems, precise weed identification is essential to optimize yields and minimize herbicide use. However, distinguishing weeds from crops in complex field environments remains challenging due to their visual similarity. This research employed YOLOv7, YOLOv7-w6, and YOLOv7-x models to detect and classify weeds in cotton fields, using a dataset of 9249 images collected under real field conditions. To improve model performance, we enhanced the annotation process using LabelImg and Roboflow, ensuring accurate separation of weeds and cotton plants. Additionally, we fine-tuned key hyperparameters, including batch size, epochs, and input resolution, to optimize detection performance. YOLOv7, achieving the highest estimated accuracy at 83%, demonstrated superior weed detection sensitivity, particularly in cluttered field conditions, while YOLOv7-x with accuracy at 77% offered balanced performance across both cotton and weed classes. YOLOv7-w6 with accuracy at 63% faced difficulties in distinguishing features in shaded or cluttered soil regions. These findings highlight the potential of UAV-based deep learning approaches to support site-specific weed management in cotton fields, providing an efficient, environmentally friendly approach to weed management.

Keywords:

weed detection; YOLOv7; deep learning; UAV imagery; cotton fields; object detection; image annotation

1. Introduction

Weed detection and management are pivotal in modern agriculture, particularly for cotton production, where weeds pose significant threats by competing for essential resources such as water, nutrients, and sunlight, ultimately leading to reduced yields and increased production costs [1]. The reliance on traditional methods, such as manual labor or broad-spectrum herbicide application, has proven inefficient, labor-intensive, and environmentally detrimental, contributing to issues like herbicide resistance and ecological pollution [2,3,4]. To address these challenges, precision agriculture has emerged as a transformative approach, leveraging advanced technologies to optimize resource use and minimize environmental impact [5]. Central to this approach is the accurate detection and identification of weeds, enabling targeted interventions such as site-specific herbicide application, which can significantly reduce chemical usage and enhance sustainability. Sa et al. [6] demonstrated the potential of UAV-acquired multispectral images combined with machine learning for precise weed mapping, supporting the shift toward sustainable farming practices.

The advent of computer vision and machine learning has revolutionized weed detection, offering automated, efficient, and accurate methods to identify and locate weeds in agricultural fields [7]. Among these, deep learning has shown particular promise due to its ability to handle complex patterns and large datasets, learning hierarchical representations from image data [8]. Early applications of deep learning in fields like pharmaceutical research [9] have demonstrated its potential, which extends to agriculture for feature extraction and pattern recognition in image data. In agriculture, deep learning models, such as convolutional neural networks (CNNs), have been employed for classifying weeds and crops, with architectures like AlexNet, GoogLeNet, InceptionV3, and Xception demonstrating high accuracy in distinguishing plant species [8]. However, object detection, which involves both localization and classification, presents a more complex challenge, necessitating models like Faster R-CNN, Mask R-CNN, and the YOLO series [10].

Unmanned Aerial Vehicles (UAVs) equipped with cameras offer a versatile and efficient means to capture high-resolution imagery over large agricultural areas, providing frequent updates and data from various angles and heights [11,12]. The use of UAVs for weed detection is particularly advantageous in cotton fields, where weed distribution can be patchy and requires scalable monitoring. Pei et al. [13] showed that UAV imagery combined with deep learning can reduce herbicide use in maize fields, suggesting similar potential for cotton. The integration of UAV imagery with advanced models like YOLOv7 enhances the ability to perform real-time, accurate, and scalable weed detection, leveraging the high-resolution data UAVs provide to improve detection outcomes across diverse field conditions.

Weed detection in cotton fields presents unique challenges, including the need to differentiate weeds from cotton plants under varying environmental conditions, such as changing light and weather patterns, which can affect image quality and detection accuracy [14]. UAV imagery introduces additional complexities, such as the processing of large datasets and the need for robust models to handle diverse field scenarios. YOLOv7’s advanced architecture, with its capacity to manage multi-scale features and deliver high detection speed, positions it as an effective tool to address these challenges, potentially improving precision agriculture practices by enabling rapid and precise weed identification.

YOLOv7’s application in weed detection, particularly in cotton fields, is promising due to its ability to handle multi-scale features and achieve high detection accuracy in complex agricultural settings [15]. Studies such as Dang et al. [14] have benchmarked various YOLO versions, including YOLOv7, for multi-class weed detection in cotton production systems, using the CottonWeedDet12 dataset, which comprises 5648 images and 9370 bounding box annotations, collected under natural light conditions and at varied weed growth stages in cotton fields in the southern U.S. While specific performance metrics for YOLOv7 were not detailed in the abstract, the study indicates detection accuracy ranging from 88.14% to 95.22% for mAP@0.5 across different YOLO versions, suggesting YOLOv7’s potential for real-time applications, especially when combined with data augmentation techniques.

Another relevant study by Gallo et al. [16] evaluated YOLOv7 on a dataset of UAV-based RGB images for chicory plantations, achieving mAP@0.5 of 56.6%, recall of 62.1%, and precision of 61.3%, outperforming other YOLO variants. Although focused on chicory, this study highlights YOLOv7’s capability in UAV-based weed detection, which can be extrapolated to cotton fields given similar challenges like varying lighting and biological variability.

Peng et al. [17] proposed an enhanced YOLOv7 model for weed detection by adding the Convolutional Block Attention Module (CBAM) to improve the focus on weed features. The model achieved a mAP of 91.15% on a diverse, augmented dataset, outperforming YOLOv5l, Faster R-CNN, and YOLOv4-tiny. It performed well in complex environments with occlusion and dense weed clusters. However, the limitations included a small dataset size and high memory usage, which indicates the need for further optimization for real-world deployment.

Wang et al. [18] presented CSCW-YOLOv7 as an advanced YOLOv7-based system for detecting weeds in complicated wheat fields. The model includes multiple enhancements which include CARAFE up-sampling operator, SE attention modules, contextual transformer (CoT) and Wise IoU (WIoU) loss function to enhance small object detection and global feature learning. The model achieved high performance with an average precision of 94.4%, reaching up to 97.7% in specific categories. The model performed well in handling occlusion and overlapping weeds but still had challenges in reducing false positives due to the visual similarity between wheat and weeds and improving detection in densely populated weed areas.

Unlike previous YOLOv7 applications that focus on general-purpose datasets such as COCO or VisDrone, this study evaluates YOLOv7 and its variants in a domain-specific, high-density UAV imagery setting for precision agriculture. Our contribution is to benchmark three YOLOv7 models, YOLOv7, YOLOv7-w6, and YOLOv7-x, on a challenging, real-world cotton field dataset with high object overlap, background clutter, and class imbalance. This targeted evaluation provides new insights into model behavior in complex field conditions, which helps practitioners to select architectures that are best suited for weed detection in resource-constrained agricultural scenarios.

2. Materials and Methods

2.1. Experimental Site

The research data originated from rainfed cotton fields situated in Bushland, Texas, USA (35.169734° N, 102.092734° W, elevation 1170 m), which serves as a vital agricultural area because of its semi-arid climate. The annual precipitation in this area averages ~470 mm, with 70% falling from May to September, while evaporation rates reach ~2600 mm annually, and dry seasons extend for 180–220 days per year. The Texas Panhandle faces hot summer temperatures (July highs averaging 91 °F/33 °C) and mild winter conditions (December lows ~30.9 °F/−0.6 °C), together with strong winds and unpredictable rainfall patterns. The local soil consists mainly of Pullman series clay loams (fine, mixed, thermic Torrertic Paleustoll), which have moderate to high water-holding capacity due to 27–50% clay content but are slowly permeable and susceptible to surface crusting and erosion because of their high shrink-swell potential and cracking when dry. The fields present features of limited vegetation growth during dry periods and dry soil conditions, and no irrigation system exists.

2.2. Dataset Preparation

The images were captured using the Autel Robotics EVO II Dual 640T Rugged Bundle V3, equipped with an RGB camera, under clear sky conditions with stable illumination and minimal wind (<8 m/s). Data collection was conducted at an altitude of approximately 15 m above ground level and a flight speed of approximately 3 m per second. UAV flights were performed on 1 May 2024, between 12:30 p.m. and 1:00 p.m., and on 23 July 2024, between 11:30 a.m. and 12:00 p.m., to ensure consistent sun angle and minimize shadow effects. At the time of image acquisition, the cotton crop was in the second to third week of the squaring stage. The initial dataset contained approximately 300 UAV-acquired RGB images of cotton fields which were taken at 4096 × 3072-pixel resolution. To increase the dataset size and enhance the visibility of small weed objects, the full-resolution images were tiled into smaller patches of 585 × 438 pixels using a custom-developed Python script. Tiling high-resolution UAV imagery into smaller patches is an established strategy in computer vision and remote sensing, as it ensures that the full spatial resolution of images is preserved for annotation and improves the efficiency of training and validation in deep learning models [19]. The tiling process expanded the raw dataset to 9249 images (Figure 1), while retaining the original spatial detail necessary for precise annotation, training, and validation of the YOLOv7 models. The tiled images were subsequently annotated to label weeds and cotton plants. Of these, approximately 1000 images primarily featured weeds, typically captured at the corners or scattered areas within the field and the remaining images contained growing cotton crops, with some instances of weed presence observed within them. Each image was carefully annotated to distinguish weeds from cotton plants. In total, the dataset contained 57,534 annotated instances, with a clear imbalance between classes. Table 1 summarizes the distribution of annotated objects.

The accuracy of image annotation directly influences both the training effectiveness of the model and the object detection precision [20]. The image annotation tools LabelImg v1.8.1 [21] and Roboflow [22] were employed to annotate weeds and cotton plants in YOLO format. The YOLO format stores annotations through individual text files for each image which contain object class labels and normalized bounding box coordinates. The annotation files were automatically linked with their respective images for training purposes. The YOLO format uses a compact and efficient structure to store key label information about object categories along with bounding box coordinates and image sizes which makes it suitable for real-time object detection models. During the labeling process, we assigned the class label ‘0’ for weed and ‘1’ for cotton to differentiate the target objects. The YOLO model framework required this labeling convention to be used uniformly throughout all annotated files. The dataset consisted of three parts: training (7399 images), validation (925 images), and testing (925 images). We selected this split to approximate a 80/10/10 ratio, balancing sufficient training data with adequate validation and testing for the limited dataset. The model received its training data from the training set while hyperparameter adjustments happened with the validation set and performance evaluation took place with the testing set. The test set is independent and was never used during training or validation, ensuring unbiased evaluation of model performance under consistent cotton field conditions. The models received bounding box annotations which defined object regions to learn accurate spatial features.

The weed detection task in this study was quite challenging because cotton plants and weeds often look very similar. They share a similar green color, especially when the cotton plants are still young (Figure 2b). The shapes and sizes of the leaves are also hard to tell apart, making visual separation difficult (Figure 2b). In many images, weeds and cotton grow close together or even overlap in the same area, which creates confusion during annotation. Additionally, the dry soil background and changing sunlight conditions in the field further complicate accurate detection.

2.3. Model Selection and Training

The YOLO (You Only Look Once) family of object detection models, introduced by Redmon et al. [23], has gained significant attention for its real-time performance and accuracy, making it particularly well-suited for agricultural applications where rapid processing is essential. YOLO models process images in a single forward pass, enabling efficient detection suitable for dynamic field environments. This manuscript focuses on YOLOv7, a state-of-the-art model known for its enhanced speed and accuracy [15], which incorporates innovations like model re-parameterization and a new backbone network, setting new benchmarks for real-time object detectors.

In this study, after annotating and preprocessing our dataset using LabelImg v1.8.1 and Roboflow, the data were split into training, validation, and testing sets in an 8:1:1 ratio. The prepared dataset was then used to train three YOLOv7-based models—YOLOv7, YOLOv7-w6, and YOLOv7-x—on a GPU-enabled system for detecting weeds in cotton fields. Each model was trained using the YOLOv7 PyTorch implementation with pretrained weights from the MS COCO dataset. On average, each training session took approximately 1 h and 40 min to complete. This setup ensured efficient learning from UAV images, enabling the models to effectively detect and distinguish between cotton crops and various weed types in field conditions. The architectural framework of the YOLOv7-based weed detection models is illustrated in Figure 3.

The YOLOv7 model structure includes four main sections which are Input, Backbone, Neck, and Head shown in Figure 3. The four distinct components within this system work together to enable drone image weed detection at both high speed and precise accuracy according to [18].

The input section CBS modules with Convolution (Conv), Batch Normalization (BN) and SiLU activation functions. The layers function to prepare the image by identifying basic patterns which include edges and shapes that the model can interpret [15].
The Backbone component takes the input image to extract its deeper features. Through E-ELAN blocks (Extended Efficient Layer Aggregation Network) repeated multiple times the model learns to extract information from various depth levels more effectively [16]. The MaxPooling (MP) components function here to decrease data dimensions while preserving crucial data points. The model benefits from this mechanism because it helps focus on important image regions.
The Neck functions as a component which links the Backbone to prediction layers. SPPCSPC (Spatial Pyramid Pooling—Cross Stage Partial Connections) along with ELAN-M blocks enhance the model’s capacity to detect various object dimensions. The image collection modules gather information across both small and extensive parts of the image. The Neck employs Upsample, Concat (Concatenation), and MaxPooling layers for combining information across various image levels. The complex field environments benefit from this architecture because it detects both small and partially hidden weeds effectively [16,18].
The Head represents the last portion of this model. The RepConv modules accelerate and enhance the processing speed [24]. The IDetect layers generate the ultimate model outputs by drawing detection boundaries, identifying object types and determining prediction confidence. The model receives improved results throughout training through advanced loss functions that allow it to improve its performance over time.

The architecture enables YOLOv7 to operate quickly while remaining lightweight and accurate for drone-based real-time weed detection in agricultural settings.

2.4. Models and Parameters

The research assessed three YOLOv7-based object detection models including YOLOv7, YOLOv7-w6 and YOLOv7-x to evaluate their effectiveness for UAV-based weed detection in cotton fields. The three YOLOv7 models differ from each other through their network depth and width and input size and parameter count which affects their detection accuracy and model complexity and processing speed [15].

Table 2 contains architectural and runtime characteristics which include backbone design and model size and frame rate and use case suitability. Table 3 shows the training hyperparameters that are common to all three models including optimizer settings, batch size and training epochs. All models were trained using transfer learning with pretrained weights from the MS COCO dataset and implemented using the official PyTorch-based YOLOv7 framework. The performance of each model was evaluated to determine the most suitable architecture for precise weed detection in aerial imagery.

The model training and testing were performed on a workstation with a 13th Gen Intel(R) Core (TM) i9-13900H processor and an NVIDIA Tesla V100s GPU. The system was running on Windows 11 and was configured with CUDA 11.7 to support GPU acceleration. The development environment was built using Python 3.11 and the PyTorch 1.13.1 deep learning framework. This setup allowed for efficient training of YOLOv7-based models with fast processing speeds and reliable performance, which was especially important given the high-resolution drone imagery and the complexity of distinguishing between cotton crops and weeds in the dataset. The parameters of deep learning models conducted in this experiment are shown in Table 4.

2.5. Evaluation Indicators

In this study, we evaluate the performance of YOLOv7, YOLOv7-w6, and YOLOv7-x models using Average Precision (AP), mean Average Precision (mAP), confusion matrix, and accuracy as metrics. AP, derived from precision (P) and recall (R), quantifies model performance by calculating the area under the Precision–Recall curve, balancing false positives and false negatives [18,25]. Precision measures the proportion of correctly predicted weed detections, while recall indicates the proportion of actual weeds correctly identified [18]. mAP, a comprehensive metric, averages AP values across all weed categories and various Intersection-over-Union (IoU) thresholds. A confusion matrix, also known as an error matrix, is a valuable tool for evaluating the performance of a classification model. It provides a visual representation of how well the model distinguishes between different classes by displaying the number of correct and incorrect predictions. This grid-based format helps identify specific types of errors and offers insight into the model’s overall accuracy and reliability [26]. Accuracy is another commonly used metric for evaluating classification models, as it measures the overall proportion of correct predictions. The formulas for computing these metrics are as follows:

P = \frac{T P}{T P + F P}

(1)

R = \frac{T P}{T P + F N}

(2)

A P = \int_{0}^{1} P (R) d R

(3)

m A P = \frac{\sum_{i = 1}^{n} A P (i)}{n}

(4)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} \times 100 %

(5)

where

T P

,

T N

,

F P

, and

F N

are true positive, true negative, false positive, and false negative, respectively, and

A P (i)

is the Average Precision for weed class

i

and

n

is the total number of classes [18,25].

3. Results

This section presents the evaluation results of three object detection models—YOLOv7, YOLOv7-w6, and YOLOv7-x—trained to detect weeds using UAV-captured images. Overall, the YOLOv7 models showed significant training results in terms of conventional metrics such as mAP, F1-score, recall, and precision [16].

Table 5 summarizes the performance of three YOLOv7 variants on our dataset. YOLOv7 produced the most balanced results with a precision of 0.87 and recall of 0.78 and the highest F1-score (0.82). The model achieved the highest mAP@0.5 score of 0.88 which made it the most suitable option for real-world weed–cotton detection applications. YOLOv7-x achieved competitive results by demonstrating the highest recall rate of 0.81 and a strong mAP@[0.5:0.95] score of 0.46 which indicates its ability to generalize across different IoU thresholds yet its mAP@0.5 score of 0.83 was lower than YOLOv7. YOLOv7-w6 achieved moderate performance through its precision (0.77) and recall (0.73) which resulted in an F1-score of 0.75 and mAP@0.5 of 0.79 thus offering a balance between accuracy and computational cost. In terms of per-class detection, all models struggled to detect weeds consistently, with average precision values ranging only from 0.034 to 0.067. The detection of cotton performed better than weeds since AP values ranged from 0.033 to 0.091 with YOLOv7-x achieving the highest results. Although these AP values appear low compared to standard object detection benchmarks, this can be explained by dataset imbalance, the high visual similarity between young cotton and weed seedlings in UAV imagery, and the small size of weed objects which makes AP highly sensitive to minor localization errors. Despite this, the models still achieved stable weed recall, showing their practical ability to detect weed presence under real field conditions. YOLOv7 achieved the optimal combination of precision and recall and detection accuracy while YOLOv7-x demonstrated superior performance in generalizing across different IoU thresholds. Overall, Table 4 highlights how each model has its strengths depending on the application needs.

Figure 4 shows how the three versions of the YOLOv7 model—YOLOv7, YOLOv7-w6, and YOLOv7-x—performed when classifying cotton and weeds. The confusion matrices give a visual summary of what each model got right and where they made mistakes. All three models demonstrated superior performance in detecting cotton compared to weeds. The models received more training examples of cotton plants because the dataset contained mostly cotton images and the number of cotton annotations exceeded weed annotations. The models received more training examples of cotton which resulted in better detection performance for cotton than for weeds.

Among the models, YOLOv7 achieved the highest weed detection accuracy (0.83), suggesting its architecture is more sensitive to subtle weed patterns in complex scenes. YOLOv7-x, while slightly lower in weed accuracy (0.77), showed the most balanced overall results and achieved the lowest false negative rate for background classification (0.23), indicating better generalization and fewer missed detections.

The error patterns showed that weed detection false positives mostly happened in dry soil areas and shadowed spots because these regions had weed-like background textures. The detection system produced most false negatives in dense vegetation areas because weeds blended with cotton plants due to their identical natural light appearance and shape and texture characteristics. The YOLOv7-w6 model faced the most difficulties because it achieved the lowest weed accuracy (0.63) and produced the highest weed false negative rate (0.36) while also showing the highest background false positive rate (0.90) which indicated its poor performance in distinguishing features in shaded or cluttered soil regions.

Table 6 summarizes typical misclassification types, the associated image conditions, and the models most affected. This targeted error analysis provides a clearer understanding of the model limitations and identifies field-specific challenges.

As shown in Table 7, the results emphasize weed-based accuracy, aligning directly with the research goal of supporting practical decision-making in agricultural applications.

Weed accuracy is reported separately, as it better reflects model effectiveness for precision agriculture, while overall accuracy serves as a general performance measure. As shown in Table 7, YOLOv7 achieved the highest weed detection accuracy of 83%, despite the visual similarity between weeds and cotton in aerial images. YOLOv7-x reached 77%, while YOLOv7-w6 performed the weakest at 63%, particularly in areas with shadows, clutter, or dense weed clusters where weeds and soil textures appeared nearly identical. These results indicate that larger, more complex models do not always outperform smaller ones, especially when trained on limited and imbalanced datasets [27]. The standard YOLOv7 architecture effectively balanced model complexity and generalization, avoiding overfitting and capturing critical features for small-object weed detection, whereas YOLOv7-w6 struggled to distinguish weeds under challenging field conditions. Practically, the higher weed accuracy achieved by YOLOv7 supports more precise pesticide application, reducing chemical waste and improving crop management efficiency.

Figure 5 shows how the three versions of the YOLOv7 model—YOLOv7, YOLOv7-w6, and YOLOv7-x performed when detecting weeds and cotton in the same image. YOLOv7 correctly identified 2 weed patches and 6 cotton plants with confidence scores above 0.70 (Figure 5a). It was able to detect both small and large weed patches while maintaining consistent cotton detection, showing that its architecture can recognize weed features even when they are close to crops. By contrast, YOLOv7-w6 failed to detect the weeds and only recognized cotton plants (Figure 5b), reflecting its difficulty when weeds blend with cotton in crowded or shaded areas. YOLOv7-x detected both weed patches but with lower confidence (0.69–0.79) and slightly weaker cotton predictions (as low as 0.38), producing a more balanced detection overall but at the cost of reduced reliability compared to YOLOv7 (Figure 5c).

The visual assessment supports YOLOv7 clearly outperforms the other variants in weed detection, achieving 83% weed accuracy, the highest F1-score (0.82), and top mAP@0.5 (0.88). Its balanced architecture captures small weed features effectively without overfitting, while YOLOv7-w6 struggles in shaded or dense areas, and YOLOv7-x, although larger, shows slightly lower weed accuracy (77%). Overall, these results make it clear that YOLOv7 is the most reliable option for practical weed detection in UAV images, which is exactly what’s needed for precision agriculture.

4. Discussion

This study utilized UAV-acquired RGB imagery and YOLOv7-based deep learning models to detect weeds in cotton fields, leveraging a dataset of 9249 high-resolution images. While cotton dominated most images, weeds appeared less frequently, resulting in an imbalanced dataset that challenged the models’ weed detection performance. Despite this, the trained models achieved reliable results without external data augmentation, underscoring the importance of high-quality images and accurate annotations. Although commonly used in object detection tasks, augmentation techniques such as mosaic and color jitter were intentionally excluded in this study due to the already high object density and visual complexity present in our UAV images, which we anticipated could introduce excessive visual clutter and hinder detection accuracy. As our primary goal was to achieve high accuracy tailored to this specific dataset, we opted to train without aggressive augmentation strategies. These findings align with prior research emphasizing that well-labeled, representative datasets are critical for effective deep learning applications in agriculture [7]. To maintain fairness across the YOLOv7 variants, we trained with a consistent setup and did not experiment with alternative preprocessing methods or hyperparameters. Future work will explore these strategies to further improve accuracy and robustness. reprocessing methods or hyperparameters. In future work, we plan to try these approaches to see if they can improve accuracy and robustness.

The main goal of this research is to detect weeds in cotton fields; therefore, model performance was evaluated both for weed detection and overall accuracy. Cotton was labeled during training to help distinguish it from weeds, since both appear visually similar in aerial images, but accurate weed identification remained the primary objective. High weed detection accuracy is especially important because it enables precise pesticide application, reducing chemical waste and improving field management. Among the three YOLOv7 variants tested, the standard YOLOv7 achieved the highest weed detection accuracy at 83%, demonstrating superior performance in complex field environments with dense plant growth. The YOLOv7-x model, with its advanced architecture, reached 77% accuracy for both weeds and cotton but showed lower sensitivity for weeds compared to YOLOv7. The YOLOv7-w6 variant performed the weakest at 63%, struggling to separate weeds from cotton in dense or shaded areas. These findings show that larger models do not always guarantee better results [27], particularly in agricultural imagery where visual complexity and class similarity make detection more difficult. Several factors explain why YOLOv7 outperformed its larger counterparts. First, YOLOv7 handled the limited and imbalanced dataset more effectively, while larger models tend to require more balanced and diverse data to fully utilize their representational capacity, otherwise risking overfitting [28,29]. Second, weed detection in UAV imagery is inherently difficult because weeds are small objects that closely resemble young cotton plants; deeper models often lose fine spatial detail during feature downsampling, which is critical for accurate small-object detection [29,30]. Third, larger models are more sensitive to hyperparameter settings, training schedules, and hardware constraints. Without extensive tuning, their additional complexity may not translate into higher accuracy in practice. Taken together, these factors indicate that YOLOv7 provided the best balance between model capacity and generalization ability, while YOLOv7-x and YOLOv7-w6 underperformed due to overfitting, difficulties with small-object localization, and training inefficiencies.

The results align with previous research showing that advanced YOLO architectures perform well in agricultural environments with complex backgrounds [15]. While YOLOv7-w6 produced acceptable results, its limitations in separating weeds from other vegetation suggest that model selection must be tailored to field-specific conditions. The relatively strong performance of YOLOv7-x can be attributed to its enhanced detection layers and training optimizations, as documented in prior research [31], which may make it suitable for real-time weed detection in fields where crops and weeds are visually similar or spatially adjacent.

Model performance varied significantly with image composition, particularly visual density and object separation. Images featuring isolated weed patches or distinct cotton rows yielded higher detection accuracy, whereas overlapping vegetation or low-contrast scenes increased class confusion. These observations are consistent with agricultural object detection literature, which highlights the critical role of image clarity and class separability in deep learning outcomes [32]. Although this study maintained consistent flight height and lighting during image collection, variations in these factors could further influence model robustness. Future research should investigate the impact of diverse environmental conditions, such as varying altitudes, lighting, or seasonal changes, to enhance model generalizability across heterogeneous cotton fields.

This study goes beyond typical YOLOv7 benchmarks by testing the models on UAV images collected in real agricultural fields where dense crops, overlapping plants, and cluttered backgrounds create a much more challenging environment than standard datasets. Unlike earlier studies that evaluate YOLOv7 using clean, well-balanced datasets, our research looks at how different YOLOv7 model versions perform under the specific conditions found in cotton fields. By doing so, we offer practical guidance for choosing the right model for real-world weed detection tasks, especially in precision agriculture, where understanding the field context is essential for effective deployment.

5. Conclusions

This study explored the application of UAV-based imagery and YOLOv7 deep learning models for weed detection in cotton fields. By using a manually annotated dataset of 9249 high-resolution images, the research evaluated the performance of three YOLOv7 model variants YOLOv7, YOLOv7-w6, and YOLOv7-x in distinguishing between cotton and weed classes under real field conditions. According to accuracy results, YOLOv7-x demonstrated the most balanced performance across all classes, while YOLOv7 achieved the highest accuracy in weed detection, indicating stronger sensitivity toward identifying weeds.

The research shows that YOLO-based object detection models work effectively for field-level weed detection because they provide fast results with precise object localization for multiple targets. The models demonstrated reliable detection accuracy despite the limited dataset size because of proper image annotation and model selection that matched the dataset complexity.

The research provides precision agriculture with a foundation for detecting weeds in cotton fields through UAV image analysis while establishing a basis for creating portable real-time detection systems. Future research should focus on using bigger datasets together with multispectral inputs and adaptive learning techniques to improve detection robustness and enable detection across various crops and weed species and geographic areas.

Author Contributions

Conceptualization, Y.Y. and V.H.S.; methodology, Y.Y. and V.H.S.; software, A.D.; validation, A.D., Y.Y. and V.H.S.; formal analysis, A.D.; investigation, A.D.; resources, Y.Y.; data curation, V.H.S.; writing—original draft preparation, A.D.; writing—review and editing, A.D., Y.Y. and V.H.S.; visualization, A.D.; supervision, Y.Y. and V.H.S.; project administration, Y.Y. and V.H.S.; funding acquisition, Y.Y. and V.H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research and APC were funded by the United States Department of Agriculture (USDA)—National Institute of Food and Agriculture (NIFA), Grant No. 2023-67014-39637.

Data Availability Statement

The dataset is available in Mendeley Data at DOI: 10.17632/fwg6pt6ckd.1 (Direct URL is https://data.mendeley.com/datasets/fwg6pt6ckd/1) (published on: 21 August 2025). The trained YOLOv7 models’ weights are available in Mendeley data at DOI: 10.17632/rwmk64n99v.1 (Direct URL is https://data.mendeley.com/datasets/rwmk64n99v/1) (published on: 25 August 2025).

Acknowledgments

We gratefully acknowledge the contributions of all the student workers, field and technical staff who assisted with data collection and processing. We also thank Nathan Howell the PI for his support and collaboration during this project. The views and conclusions contained in this material are those of the authors and should not be interpreted as representing the official policies or endorsements of the USDA or the U.S. Government.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript

UAV	Unmanned Aerial Vehicle
YOLO	You Only Look Once
CNN	Convolutional Neural Network
IoU	Intersection over Union
SGD	Stochastic Gradient Descent
P	Precision
R	Recall
AP	Average Precision
mAP	Mean Average Precision

References

Bonny, S. Genetically Modified Herbicide-Tolerant Crops, Weeds, and Herbicides: Overview and Impact. Environ. Manag. 2016, 57, 31–48. [Google Scholar] [CrossRef]
Rose, M.T.; Cavagnaro, T.R.; Scanlan, C.A.; Rose, T.J.; Vancov, T.; Kimber, S.; Kennedy, I.R.; Kookana, R.S.; Van Zwieten, L. Impact of Herbicides on Soil Biology and Function. Adv. Agron. 2016, 136, 133–220. [Google Scholar]
Kent Shannon, D.; Clay, D.E.; Sudduth, K.A. An Introduction to Precision Agriculture. In ASA, CSSA, and SSSA Books; Kent Shannon, D., Clay, D.E., Kitchen, N.R., Eds.; American Society of Agronomy and Soil Science Society of America: Madison, WI, USA, 2018; pp. 1–12. ISBN 978-0-89118-367-9. [Google Scholar]
Mahlein, A.-K. Plant Disease Detection by Imaging Sensors—Parallels and Specific Demands for Precision Agriculture and Plant Phenotyping. Plant Dis. 2016, 100, 241–251. [Google Scholar] [CrossRef] [PubMed]
Nawar, S.; Corstanje, R.; Halcro, G.; Mulla, D.; Mouazen, A.M. Delineation of Soil Management Zones for Variable-Rate Fertilization. Adv. Agron. 2017, 143, 175–245. [Google Scholar]
Sa, I.; Popović, M.; Khanna, R.; Chen, Z.; Lottes, P.; Liebisch, F.; Nieto, J.; Stachniss, C.; Walter, A.; Siegwart, R. WeedMap: A Large-Scale Semantic Weed Mapping Framework Using Aerial Multispectral Imaging and Deep Neural Network for Precision Farming. Remote Sens. 2018, 10, 1423. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Dyrmann, M.; Karstoft, H.; Midtiby, H.S. Plant Species Classification Using Deep Convolutional Neural Network. Biosyst. Eng. 2016, 151, 72–80. [Google Scholar] [CrossRef]
Gawehn, E.; Hiss, J.A.; Schneider, G. Deep Learning in Drug Discovery. Mol. Inform. 2016, 35, 3–14. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Bendig, J.; Bolten, A.; Bennertz, S.; Broscheit, J.; Eichfuss, S.; Bareth, G. Estimating Biomass of Barley Using Crop Surface Models (CSMs) Derived from UAV-Based RGB Imaging. Remote Sens. 2014, 6, 10395–10412. [Google Scholar] [CrossRef]
Primicerio, J.; Di Gennaro, S.F.; Fiorillo, E.; Genesio, L.; Lugato, E.; Matese, A.; Vaccari, F.P. A Flexible Unmanned Aerial Vehicle for Precision Agriculture. Precis. Agric. 2012, 13, 517–523. [Google Scholar] [CrossRef]
Pei, H.; Sun, Y.; Huang, H.; Zhang, W.; Sheng, J.; Zhang, Z. Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4. Agriculture 2022, 12, 975. [Google Scholar] [CrossRef]
Dang, F.; Chen, D.; Lu, Y.; Li, Z. YOLOWeeds: A Novel Benchmark of YOLO Object Detectors for Multi-Class Weed Detection in Cotton Production Systems. Comput. Electron. Agric. 2023, 205, 107655. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2022. [Google Scholar]
Gallo, I.; Rehman, A.U.; Dehkordi, R.H.; Landro, N.; La Grassa, R.; Boschetti, M. Deep Object Detection of Crop Weeds: Performance of YOLOv7 on a Real Case Dataset from UAV Images. Remote Sens. 2023, 15, 539. [Google Scholar] [CrossRef]
Peng, M.; Zhang, W.; Li, F.; Xue, Q.; Yuan, J.; An, P. Weed Detection with Improved Yolov 7. EAI Endorsed Trans. IoT 2023, 9, e1. [Google Scholar] [CrossRef]
Wang, K.; Hu, X.; Zheng, H.; Lan, M.; Liu, C.; Liu, Y.; Zhong, L.; Li, H.; Tan, S. Weed Detection and Recognition in Complex Wheat Fields Based on an Improved YOLOv7. Front. Plant Sci. 2024, 15, 1372237. [Google Scholar] [CrossRef]
Gautam, D.; Mawardi, Z.; Elliott, L.; Loewensteiner, D.; Whiteside, T.; Brooks, S. Detection of Invasive Species (Siam Weed) Using Drone-Based Imaging and YOLO Deep Learning Model. Remote Sens. 2025, 17, 120. [Google Scholar] [CrossRef]
Li, J.; Zhang, W.; Zhou, H.; Yu, C.; Li, Q. Weed Detection in Soybean Fields Using Improved YOLOv7 and Evaluating Herbicide Reduction Efficacy. Front. Plant Sci. 2024, 14, 1284338. [Google Scholar] [CrossRef] [PubMed]
LabelImg. Available online: https://github.com/tzutalin/labelImg (accessed on 29 June 2022).
Roboflow. Available online: https://roboflow.com/ (accessed on 15 July 2024).
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-Style ConvNets Great Again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13728–13737. [Google Scholar]
Deng, L.; Miao, Z.; Zhao, X.; Yang, S.; Gao, Y.; Zhai, C.; Zhao, C. HAD-YOLO: An Accurate and Effective Weed Detection Model Based on Improved YOLOV5 Network. Agronomy 2024, 15, 57. [Google Scholar] [CrossRef]
Lekha, J.; Vijayalakshmi, S. Enhanced Weed Detection in Sustainable Agriculture: A You Only Look Once v7 and Internet of Things Sensor Approach for Maximizing Crop Quality. Eng. Proc. 2024, 82, 100. [Google Scholar]
Chen, J.; Liu, H.; Zhang, Y.; Zhang, D.; Ouyang, H.; Chen, X. A Multiscale Lightweight and Efficient Model Based on YOLOv7: Applied to Citrus Orchard. Plants 2022, 11, 3260. [Google Scholar] [CrossRef] [PubMed]
Di, X.; Zhang, Y.; Li, H.; Wang, Q. Toward efficient UAV-based small object detection. Remote Sens. 2025, 17, 2235. [Google Scholar] [CrossRef]
Nikouei, M.; Chen, H.; Zhang, Y. Small object detection: A comprehensive survey. Pattern Recognit. 2025, 159, 110123. [Google Scholar] [CrossRef]
Shi, Y.; Li, J.; Sun, H. FocusDet: An efficient object detector for small object detection. Sci. Rep. 2024, 14, 61136. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Ferentinos, K.P. Deep Learning Models for Plant Disease Detection and Diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]

Figure 1. Representative UAV-captured images from the dataset used in this study. (a,b) Images showing areas with only weed presence, typically found at field edges or bare regions; (c,d) Images showing cotton crop rows with varying levels of weed presence.

Figure 2. Example of image annotation using LabelImg for YOLO model training. (a) Image showing annotated weed instances labeled as class 0; (b) Image showing annotated cotton and weed instances labeled as class 1 and class 0, respectively. All bounding boxes were manually drawn and saved in YOLO format as individual .txt files.

Figure 3. YOLOv7 network Architecture [18].

Figure 4. Confusion matrices for three YOLOv7 model variants. (a) YOLOv7; (b) YOLOv7-w6; (c) YOLOv7-x.

Figure 5. Detection results of three models. (a) YOLOv7; (b) YOLOv7-w6; (c) YOLOv7-x.

Table 1. Distribution of annotated weed and cotton instances.

Class	Annotation Box
Weed	5713
Cotton	50,821
Total	57,534

Table 2. Comparison between YOLOv7 model variants.

Model	Backbone	Params (M)	FPS	Use Case Suitability
YOLOv7	E-ELAN	36.9	161	Real-time detection, resource-limited setup
YOLOv7-w6	Wider E-ELAN	70.4	84	Balanced performance and precision
YOLOv7-x	Deeper and wider E-ELAN	71.3	114	Highest accuracy, high-compute environment

Table 3. Parameters used for training YOLOv7, YOLOv7-w6, and YOLOv7-x weed detection models.

Parameters	Values
Optimizers	SGD
Learning rate	$1 \times 10^{- 3}$
Momentum	0.937
Weigh decay	$5 \times 10^{- 4}$
Pretrained	MS COCO dataset
Epoch	300
Batch size	32
Workers	8

Table 4. Operational Environment.

Hardware and Software	Configuration
CPU	13th Gen Intel(R) Core(TM) i9-13900H
GPU	NVIDIA TESLA V100s
Operating system	Windows 11
Computational platform	CUDA 11.7
Programming language	Python 3.11
Deep learning framework	PyTorch 1.13.1

Table 5. Performance Comparison of YOLOv7 Model Variants.

Model	Precision	Recall	F1-Score	mAP@0.5	mAP@[0.5:0.95]	AP (Weed)	AP (Cotton)
YOLOv7	0.87	0.78	0.82	0.88	0.50	0.034	0.075
YOLOv7-w6	0.77	0.73	0.75	0.79	0.43	0.067	0.03
YOLOv7-x	0.81	0.81	0.81	0.83	0.46	0.043	0.091

Table 6. Common Misclassification Scenarios.

Misclassification Type	Common Conditions Observed	Most Affected Model (s)
False Positive (Weed → Background)	Background clutter resembling weed texture	YOLOv7, YOLOv7-w6
False Negative (Missed Weed)	Overlapping vegetation; weeds occluded by cotton leaves	All models (most in YOLOv7-w6)
False Positive (Cotton → Background)	Deeper and wider E- Low-contrast cotton in shaded areas	YOLOv7, YOLOv7-w6
False Negative (Cotton → Weed)	Cotton rows misaligned with expected patterns; high weed density nearby	YOLOv7-w6
Background False Positive (Dry Soil → Weed)	dark regions with irregular textures	YOLOv7-w6, YOLOv7-x
False Negative (Missed Weed)	Overlapping vegetation; occlusion by cotton	All (most in YOLOv7-w6)

Table 7. Accuracy of YOLOv7 Model Variants Based on Weed Detection.

Model	Accuracy (%)
YOLOv7	83
YOLOv7-w6	63
YOLOv7-x	77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Das, A.; Yang, Y.; Subburaj, V.H. YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery. AgriEngineering 2025, 7, 313. https://doi.org/10.3390/agriengineering7100313

AMA Style

Das A, Yang Y, Subburaj VH. YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery. AgriEngineering. 2025; 7(10):313. https://doi.org/10.3390/agriengineering7100313

Chicago/Turabian Style

Das, Anindita, Yong Yang, and Vinitha Hannah Subburaj. 2025. "YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery" AgriEngineering 7, no. 10: 313. https://doi.org/10.3390/agriengineering7100313

APA Style

Das, A., Yang, Y., & Subburaj, V. H. (2025). YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery. AgriEngineering, 7(10), 313. https://doi.org/10.3390/agriengineering7100313

Article Menu

YOLOv7 for Weed Detection in Cotton Fields Using UAV Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Site

2.2. Dataset Preparation

2.3. Model Selection and Training

2.4. Models and Parameters

2.5. Evaluation Indicators

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI