Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments

Wang, Pengyan; Li, Chengshuai; Wei, Linjing; Li, Yue; Guo, Linhai

doi:10.3390/agriculture16070741

Open AccessArticle

Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments

by

Pengyan Wang

,

Chengshuai Li

,

Linjing Wei

^*

,

Yue Li

and

Linhai Guo

College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730070, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(7), 741; https://doi.org/10.3390/agriculture16070741

Submission received: 13 February 2026 / Revised: 21 March 2026 / Accepted: 26 March 2026 / Published: 27 March 2026

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Timely and accurate detection of flax (Linum usitatissimum L.) pests and diseases is essential for ensuring stable yield and product quality. Nevertheless, practical field deployment remains challenging due to complex backgrounds, large variations in target scale, and limited computational resources. To tackle these issues, a lightweight optimization framework based on YOLOv11 is developed for flax pest and disease recognition. A dedicated dataset comprising seven common flax pest and disease categories is established, and an instance-level data augmentation strategy is employed to enhance data diversity and mitigate class imbalance. Building upon the YOLOv11 baseline, ADown and C3K2-STAR modules are incorporated to strengthen multi-scale feature representation while reducing computational redundancy, and auxiliary detection heads are introduced to provide additional supervision during training. Experimental results show that the proposed approach achieves better performance than Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n by 4.4, 7.3, 4.1, 2.4, 2.6, 2.2, and 2.8 percentage points in mAP@50, respectively. For the more stringent mAP@50:95 metric, the proposed model also outperforms Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv5n, YOLOv11n, and YOLOv12n by 2.9, 3.6, 4.3, 2.8, 2.0, and 0.5 percentage points, respectively, while achieving performance comparable to YOLOv8n. Meanwhile, it achieves substantial model compression, with 55.1%, 94.8%, 77.0%, 18.6%, 15.1%, and 14.5% fewer parameters than Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv8n, YOLOv11n, and YOLOv12n, respectively, and it reduces computational cost (GFLOPS) by 31.5%, 95.6%, 61.5%, 19.1%, 12.7%, and 12.7%. These results indicate that the proposed method achieves a favorable balance between detection accuracy and computational efficiency, suggesting its potential suitability for practical flax pest and disease monitoring as well as deployment on resource-constrained edge devices, although further validation in more diverse scenarios is still needed.

Keywords:

flax; pest and disease detection; lightweight object detection; YOLOv11; instance-level data augmentation; deep learning; precision agriculture

1. Introduction

Common flax (Linum usitatissimum L.) is an important oilseed crop widely cultivated in temperate regions worldwide, with substantial economic and agricultural significance. It is primarily grown in two forms, namely fiber flax for textile production and oilseed flax for edible oil and functional food applications, reflecting its broad utilization potential across multiple industries. Owing to its high content of α-linolenic acid, dietary fiber, lignans, and phenolic compounds, flaxseed is recognized as a valuable source of bioactive substances associated with cardiovascular protection, antioxidant activity, and other health-promoting effects [1,2,3]. However, the yield and quality potential of flax are severely constrained by various insect pests and diseases. These biotic stresses damage multiple plant organs and disrupt normal physiological processes, leading to reduced photosynthetic capacity, growth inhibition, and yield loss [4,5]. Previous studies have reported that severe pest and disease infestations can result in substantial reductions in seed weight and overall productivity, posing a persistent threat to stable flax production [6,7,8,9,10]. Therefore, timely and accurate monitoring of flax pest and disease occurrence is of great importance for ensuring yield and quality.

Pest and disease monitoring is a fundamental prerequisite for ensuring the healthy growth and stable yield of flax (Linum usitatissimum L.). In current flax production, conventional monitoring approaches still mainly rely on manual field scouting, biological trapping, and chemical analysis, but these methods exhibit evident limitations [11]. Manual visual inspection is highly labor-intensive and time-consuming, and it is only suitable for small-scale fields, making it difficult to satisfy the requirements of large-area flax cultivation. Biological monitoring methods, such as insect traps and sticky boards, are susceptible to environmental conditions and subjective judgment, which may lead to inaccurate or delayed estimation of pest populations. Chemical analysis, which involves laboratory examination of plant tissues or soil samples, is usually applied only after severe infection or obvious symptoms have appeared. Consequently, these traditional approaches cannot meet the demand for timely, efficient, and large-scale monitoring of flax diseases and insect pests, highlighting the urgent need for automated and intelligent monitoring technologies [11].

Following the limitations of traditional monitoring approaches, machine learning methods have been explored to improve the accuracy and efficiency of crop pest and disease detection. For instance, Bhatia et al. [12] combined support vector machines (SVMs) and logistic regression to predict powdery mildew incidence in tomato plants, demonstrating the potential of hybrid models for disease forecasting. Duarte-Carvajalino et al. [13] applied multiple machine learning algorithms, including multilayer perceptrons, convolutional neural networks, support vector regression, and random forests, to quantitatively assess late blight severity in potato crops using multispectral data captured by unmanned aerial vehicles. Similarly, Skawsang et al. [14] integrated satellite-derived crop phenology, ground meteorological observations, and artificial neural networks (ANNs) to predict rice pest populations, supporting proactive pest management in Thailand. Despite these promising results, conventional machine learning models generally rely on handcrafted features and often fail to capture complex spatial patterns in agricultural fields, limiting their generalization and detection accuracy under diverse conditions. The application of neural network-based intelligent diagnostic frameworks has also been widely explored in other engineering and fault detection domains, further demonstrating the versatility of deep learning-driven pattern recognition approaches in complex signal and image analysis tasks [15].

To address these limitations, deep learning-based object detection methods have been widely applied in crop pest and disease monitoring, particularly in complex agricultural scenarios, significantly improving detection accuracy and efficiency. Among them, the YOLO series [16,17,18,19] and the R-CNN series [20,21] represent one-stage and two-stage detection frameworks, respectively, and both have achieved favorable performance in agricultural pest and disease detection tasks [22]. Guan et al. [23] proposed a GC-Faster R-CNN model with a hybrid attention mechanism to improve feature representation for multi-scale and highly similar pest targets; experiments on the Insect25 dataset showed substantial gains over the baseline Faster R-CNN in terms of mAP and recall. Santhosh and Thiyagarajan [24] enhanced the YOLOv3-Tiny framework by integrating a convolutional and vision transformer-based module (ConViT-TDD) with channel-spatial and self-attention mechanisms for turmeric disease detection, achieving an overall accuracy of 93.16%, outperforming classical CNN models. Lutfiah and Musdalifah [25] applied YOLOv5 for chili plant pest and disease detection using smartphone-acquired images, achieving a test accuracy of 0.947, with precision, recall, and mAP of 0.946, 0.936, and 0.959, respectively. Wang et al. [26] proposed an improved lightweight YOLOv8-based model (RGC-YOLO) for multi-scale rice pest and disease detection, integrating RepGhost, GhostConv, and a hybrid attention module. The model achieved mAP@0.5 of 93.2% while reducing parameters by 33.2% and GFLOPs by 29.27%, demonstrating suitability for real-time deployment on embedded devices. Wu et al. [27] developed YOLO-Lychee-advanced, a lightweight YOLOv11-based detector for lychee stem-borer damage, integrating a dual-branch residual C2f module, CBAM attention, and CIoU loss. The model achieved 91.7% mAP@0.5 and 61.6% mAP@0.5–0.95 while maintaining real-time speed (37 FPS), outperforming YOLOv9t and YOLOv10n. Meng et al. [28] proposed an improved YOLOv12 for tomato leaf disease detection by introducing SPDConv, a Parallelized Patch-Aware Attention (PPA) module, and a dedicated small-target head, increasing mAP@0.5 by 3%, mAP@0.5–0.95 by 5.4%, and AP-Small by 4.5%, demonstrating enhanced detection of small and occluded disease spots while maintaining a lightweight structure.

Despite the remarkable performance of existing YOLO-based models in agricultural pest and disease detection, several challenges remain. Under complex field conditions, detection accuracy can be compromised due to target scale variation, occlusion, and heterogeneous backgrounds. Additionally, many models are relatively large and computationally intensive, limiting their applicability on resource-constrained devices. Another critical issue is the lack of standardized and publicly available datasets for flax pests and diseases, which restricts model training and benchmarking. To address these limitations, this study adopts YOLOv11 as the baseline, leveraging its efficient single-stage detection framework, and proposes several targeted architectural integrations and adaptations. Furthermore, we construct a dedicated flax pest and disease dataset, enabling robust model training and systematic evaluation.

To address these challenges, several targeted architectural improvements are introduced. Small pests and disease lesions in flax fields often occupy only a few pixels, and conventional downsampling operations may cause the loss of fine-grained details during feature extraction. Therefore, the Adaptive Downsampling (ADown) module is adopted to preserve informative regions while reducing redundant background features. In addition, pest bodies and disease spots frequently resemble surrounding leaf textures, which makes discriminative feature representation difficult. To enhance the modeling of critical spatial and channel information, the C3K2-STAR module is incorporated to strengthen feature representation. Furthermore, shallow feature layers may receive insufficient supervision during training, which can hinder the optimization of small and weak targets. To alleviate this issue, auxiliary detection heads are introduced to provide additional supervision and improve gradient propagation. Through the integration of these components, the proposed model aims to improve small-target detection, feature representation, and training stability under complex field conditions.

Building upon these efforts, the main contributions of this work are summarized as follows:

(1): A dedicated flax pest and disease image dataset covering seven common categories is constructed, and instance-level data augmentation is employed to expand data diversity and alleviate class imbalance.
(2): A lightweight YOLOv11-based detection framework is developed by integrating several effective architectural components, including ADown, C3K2-STAR, and auxiliary detection heads, building upon existing mechanisms to enhance feature representation and small-target detection under complex field conditions.
(3): Comprehensive comparative experiments and ablation studies are conducted to validate the effectiveness and efficiency of the proposed model.

2. Materials and Methods

2.1. Dataset Description

2.1.1. Data Collection

The experimental data were collected in situ by the authors. Field surveys were conducted at three sites in Gansu Province, China: the Oil Crops Research Station of the Dingxi Academy of Agricultural Sciences (34.488° N, 104.520° E), Laoba area of Shangchuan Town, Gaolan County (36.760° N, 103.600° E), and the Xianglushan experimental field in Shangchuan Town, Gaolan County (36.760° N, 103.586° E). Data acquisition was conducted from June to July 2024, corresponding to the flowering stage of flax, which is a period with a high incidence of pest and disease outbreaks and is therefore suitable for large-scale data collection. To enhance the robustness and generalization capability of the proposed model, images were captured from multiple viewpoints (front, side, top, back, and bottom), at different times of day (morning, noon, and afternoon), and under diverse weather conditions (sunny, cloudy, and overcast). All images were acquired using a Huawei smartphone (model ANY-AN00, Huawei Technologies Co., Ltd., Shenzhen, China). The image resolution was 3000 × 4000 pixels, and all images were stored in JPG format. The dataset mainly consists of flax pest and disease images, totaling 986 field-collected samples, including 402 disease images and 584 pest images. To further construct a more comprehensive flax pest and disease recognition dataset, an additional 152 high-quality images were automatically collected using a web-crawling strategy. The dataset covers five major flax pest species (Blister Beetle, Grasshopper, Flax Flea Beetle, Beet Webworm, and Sesame Webworm) and two common flax diseases (Flax Yellows and Fusarium Wilt). Representative examples of each category are shown in Figure 1.

The number of original images for each pest and disease category before data augmentation is summarized in Table 1. Due to the natural occurrence frequency of pests and diseases in field environments, the number of collected images varies among different categories, resulting in a certain degree of class imbalance. This imbalance may affect the generalization performance of the model, especially for categories with limited samples.

2.1.2. Data Preprocessing and Augmentation

To ensure data quality and improve the effectiveness of model training, a series of preprocessing operations were applied to the collected images. First, all images were manually inspected to remove severely blurred, low-quality, and mislabeled samples. For images containing highly dense disease regions with severe overlap among lesions, local cropping was performed to separate individual disease spots, which were then independently annotated. This operation alleviates the difficulty of accurate labeling caused by excessive bounding-box overlap and facilitates more effective learning of individual lesion characteristics. All images were annotated using the LabelImg tool, where bounding boxes were drawn to indicate the locations and categories of pests and diseases. The annotations were saved in the YOLO format.

On the basis of the divided dataset, an instance-level data augmentation strategy was further adopted to increase data diversity and partially alleviate class imbalance. Unlike conventional image-level augmentation, instance-level augmentation operates on object instances extracted from images and then recombined with background scenes. According to previous studies, this strategy generally consists of two steps: cutting object instances from source images and pasting them onto suitable background images after appropriate transformations [29,30,31,32]. This approach has been shown to enrich object appearance variations and contribute to improve detection performance. Specifically, in this study, object instances were cropped according to their annotated bounding boxes. The cropped instances were subjected to random geometric transformations, including random rotations of 90°, 180°, and 270° and random scaling operations. During rotation, the corresponding bounding-box annotations were synchronously updated to preserve accurate object localization. The scaling ratio was determined according to the pixel size of the target instances and manually verified to ensure that the transformed samples remained visually realistic and consistent with natural field conditions. The transformed instances were then pasted onto new background images selected from the dataset. This process simulates the real distribution of pests and diseases in large-scale farmland scenes and enhances the model’s ability to learn robust features under complex backgrounds. To avoid data leakage and ensure training fairness, instance-level augmentation was performed only within the training set after dataset splitting. Representative examples of original images and their corresponding instance-level augmented images are shown in Figure 2, where the newly introduced instances in the augmented images are indicated by thin circular outlines.

After augmentation, the dataset size increased to 2000 images. The augmented dataset was then randomly divided into a training set and a validation set at a ratio of 8:2, resulting in 1600 training images and 400 validation images. Since multiple pests or diseases may appear in a single image, the number of annotated object instances is larger than the number of images. In total, 3470 object instances were labeled in the final dataset. The distribution of annotated instances for each category is summarized in Table 2.

2.2. Improved YOLOv11 Network Architecture

2.2.1. YOLOv11 Baseline Network

YOLOv11, the latest version in the YOLO series released by Ultralytics [33], is a versatile real-time object detection model widely applied in computer vision tasks. The network primarily consists of three components: Backbone, Neck, and Head. The Backbone is responsible for feature extraction and incorporates the C3K2 module to replace the previous C2f block, reducing computational complexity while improving processing speed, as well as retaining multi-scale feature outputs. The Neck aggregates multi-scale features provided by the Backbone and includes the SPPF module for multi-scale pooling and the C2PSA module to enhance attention to spatially critical regions. The Head generates the final predictions, including bounding boxes, class labels, and confidence scores, enabling accurate object detection.

However, the original YOLOv11 architecture has limitations when detecting small flax targets in complex field backgrounds and under uneven illumination. These challenges are not fully addressed in the baseline network and may lead to a decline in detection performance, particularly in precision-demanding agricultural applications.

2.2.2. Improved YOLOv11 Architecture for Flax Pest and Disease Detection

The overall architecture of the proposed improved YOLOv11 model for flax pest and disease detection is illustrated in Figure 3. The complete workflow mainly consists of image input, network training, and performance evaluation.

All images are resized to 640 × 640 and fed into the detection network to satisfy the input requirements. The network is trained using the corresponding annotations to obtain optimal model weights.

Based on the baseline YOLOv11 architecture, several targeted modifications and integrations of existing mechanisms are introduced to better address the challenges of small target size, complex background, and uneven illumination in flax field scenarios. Specifically, except for the 0th and 1st convolutional layers in the backbone, the remaining standard convolution (Conv) layers are replaced with the optimized downsampling module (ADown) to improve feature extraction efficiency while reducing computational redundancy. In the neck network, all C3K2 modules are substituted with the proposed C3K2-STAR blocks to strengthen multi-scale feature representation and spatial feature modeling. Furthermore, inspired by the auxiliary head design in YOLOv7, two auxiliary detection heads (AuxDet) are incorporated into intermediate layers of the network. These auxiliary heads provide additional supervision during training by introducing auxiliary loss branches, which guide shallow-layer feature learning and accelerate network convergence. It is worth noting that the auxiliary detection heads are only activated during the training phase and are removed during inference, ensuring that no additional computational overhead is introduced at deployment.

After training, the obtained model weights are used to perform inference on the test set, and key evaluation metrics, including Precision and Recall, are calculated to comprehensively assess the detection accuracy and stability. Overall, the proposed architecture aims to achieve accurate and efficient recognition of flax pests and diseases in complex field environments.

2.2.3. Adaptive Downsampling (ADown) Module

In flax pest and disease detection tasks, the complexity of field environments (e.g., leaf occlusion, soil background interference, and uneven illumination), together with the inherent characteristics of pests and diseases (e.g., small object size, large scale variation, and weak feature responses), poses significant challenges to conventional downsampling strategies. Fixed-stride convolution or pooling operations widely used in mainstream convolutional neural networks compress feature maps layer by layer to enlarge the receptive field and reduce computational cost; however, such operations inevitably weaken or even discard fine-grained details of small objects, including lesion boundary textures and insect body contours, thereby degrading localization accuracy and classification performance.

To alleviate these issues, the Adaptive Downsampling (ADown) module proposed in YOLOv9 [34] is introduced into the backbone of YOLOv11 to enhance small-target feature preservation while maintaining computational efficiency. As shown in Figure 4, ADown adopts a content-aware sampling mechanism that dynamically adjusts the downsampling process according to feature saliency. Regions containing rich and discriminative information are preserved at higher spatial resolution to avoid detail loss, whereas background regions with relatively low information density are more aggressively downsampled to reduce redundancy.

Furthermore, ADown integrates multi-scale dilated convolution to maintain a consistent receptive field and strengthen contextual representation. Compared with traditional fixed-stride convolution, ADown performs adaptive sampling at alternate pixel positions based on feature importance, enabling more effective retention of dense-detail regions while efficiently compressing background areas [34]. This design allows the network to better capture multi-scale characteristics of flax pests and diseases, ranging from pixel-level textures to semantic-level structures, thereby improving the signal-to-noise ratio of small-object features and providing more discriminative representations for subsequent feature fusion and detection.

Specifically, given an input feature map f∈R^H×W×C, conventional downsampling performs uniform spatial sampling with a fixed stride d. In contrast, the ADown module introduces a downsampling mask m∈{0,1}^H/d×W/d to indicate the importance of different spatial regions, thereby enabling region-adaptive resolution control. The adaptive downsampling process is formulated in Equations (1) and (2).

ψ_a (f, m) = (ψ_a^{^'} (f_{1, 1}^{^'}), \dots, ψ_a^{^'} (f_{(H / d, W / d)}^{^'}))

(1)

ψ_{a}^{'} (f_{i, j}^{'}) = {\begin{matrix} f_{i, j}^{'}, if m_{i, j} = 0, \\ ψ^{'} (f_{i, j}^{'}), if m_{i, j} = 1, \end{matrix}

(2)

where f denotes the input feature map with height H, width W, and channels C; d is the base downsampling factor; m is the downsampling mask whose element m_{i, j}∈{0,1} indicates whether the corresponding region is a key-information area or a background area; f’_i,j represents the d × d sub-region of f at spatial location (i,j); ψ_a(⋅) denotes the adaptive downsampling function of ADown; and ψ’(⋅) is a conventional downsampling operator such as pooling or fixed-stride convolution. The final output of the ADown module is denoted as F_ADown.

In practical implementation, since different spatial regions are represented at different resolutions, ADown must address the issue of convolution on multi-resolution feature maps. To this end, a sparse-convolution-based strategy is adopted. Low-resolution features are mapped onto a high-resolution grid, and submanifold sparse convolution (SSC) is employed to maintain structural consistency of the feature map. Specifically, convolution operations are performed only on active units (i.e., regions not downsampled according to the mask), while inactive units are filled by feature replication to avoid information discontinuity. This design reduces redundant computation while preserving spatial continuity.

To further enhance feature representation, ADown adopts the same receptive-field-preserving strategy as in the original design. A dilated convolution with dilation rate d is applied after adaptive downsampling, ensuring that the receptive field remains consistent with that of conventional fixed-stride convolution. Consequently, the ADown module can replace standard downsampling layers without requiring architectural re-design, providing good portability and general applicability.

Moreover, for the flax pest and disease detection task, a saliency-guided mask generation scheme is adopted. The mask is derived from intermediate backbone feature responses via saliency detection, highlighting high-response regions corresponding to lesion areas and pest structures. This enables the network to adaptively focus on pest- and disease-related regions during downsampling, achieving the objective of preserving critical details while compressing background information.

2.2.4. C3K2-STAR Module

In flax pest and disease detection, there is a significant demand for fine-grained feature modeling. Pest and disease regions typically exhibit small scales, blurred boundaries, and high similarity to background leaf textures, making object features easily weakened or lost during multi-level downsampling and feature fusion. Therefore, enhancing the representation of critical texture and structural information while maintaining a lightweight network is crucial for improving detection performance.

The C3K2 module in YOLOv11 addresses part of this challenge by leveraging the Cross Stage Partial (CSP) design. It introduces k × k convolutional branches combined with residual connections, which reduce parameter redundancy while enhancing the receptive field and local feature modeling capacity. This design stabilizes gradient propagation and is particularly suitable for lightweight detection networks. However, the original C3K2 module treats all channel features uniformly and does not selectively emphasize discriminative regions that vary across target classes. In the context of flax pest and disease detection, this limitation is evident: key information such as lesion textures and insect contours is often distributed in only a subset of channels and local regions, while background leaf textures dominate the feature map, introducing redundant responses.

To overcome this, we propose the C3K2-STAR module, which integrates a selective enhancement mechanism into the feature fusion stage of C3K2, inspired by the star operation introduced in [35]. The module adaptively emphasizes channels and regions that are more critical for detection tasks, suppressing redundant background responses and strengthening representations of high-discriminative features such as lesion patterns and insect contours. This selective enhancement allows the network to focus on fine-grained features without significantly increasing the number of parameters. Such capability is particularly important in field environments, where disease spots often exhibit low-contrast textures and insects may be occluded or densely distributed. By reinforcing the response to key regions, the C3K2-STAR module effectively reduces missed detections and localization errors, thereby improving detection accuracy for small and subtle targets. The structure of the proposed C3K2-STAR module is illustrated in Figure 5.

2.2.5. Auxiliary Detection Head (AuxDet)

To further enhance the detection performance of YOLOv11 on small-scale flax pest and disease targets, an Auxiliary Detection Head (AuxDet) is introduced by attaching auxiliary detection heads in parallel with the original detection heads. This design is inspired by the auxiliary head mechanism in YOLOv7 [36], which provides additional supervisory signals during training, thereby facilitating more effective feature learning and improving optimization stability.

Let the multi-scale feature maps produced by the backbone and neck be denoted as F= {F₃, F₄, F₅}, where F_i∈R^Ci×Hi×Wi represents feature maps at different resolutions. In the baseline YOLOv11, these features are directly fed into the main detection heads for bounding-box regression and classification. In contrast, the proposed AuxDet introduces three auxiliary detection heads connected to intermediate feature layers, forming parallel prediction branches with the main heads.

During training, both the main detection heads and auxiliary detection heads participate in loss optimization, enabling deeper supervision for shallow and intermediate layers. This mechanism facilitates more effective gradient propagation and encourages the network to learn more discriminative representations for small and weak targets. The overall training objective can be formulated as shown in Equation (3):

L = L_{d e t} + λ L_{a u x}

(3)

where

L_{d e t}

denotes the standard detection loss of the main heads,

L_{a u x}

denotes the detection loss from the auxiliary heads, and λ is a weighting factor controlling the contribution of auxiliary supervision.

It is worth noting that the auxiliary detection heads are only activated during training and are removed during inference. Therefore, no additional computational overhead is introduced at test time.

By integrating AuxDet, the proposed model effectively exploits multi-scale supervisory signals and significantly enhances the discrimination of small lesions and pest instances under complex field conditions, leading to improved detection accuracy and robustness.

2.2.6. Training Settings

All experiments in this study were conducted on a personal workstation. To ensure fairness and reproducibility, all baseline, ablation, and comparative experiments were performed under identical hardware conditions, as summarized in Table 3. Nevertheless, given the limited dataset size, the statistical robustness of small performance differences should be interpreted with caution.

The experimental environments for the two frameworks differed according to the operating system and implementation platform. For the improved YOLOv11 model, all experiments were conducted on a Windows 11 Home system (24H2) with Python 3.10.15, PyTorch 2.5.0, and Torchvision 0.20.0. CUDA 12.1 and cuDNN 9.0.100 were employed to accelerate computations, while OpenCV 4.10.0 and NumPy 1.26.4 were used for image processing and numerical operations. For Faster R-CNN, experiments were performed in a WSL2 Ubuntu 24.04.3 LTS environment with Python 3.9.25. The model utilized PyTorch 2.0.1 with CUDA 11.8 and cuDNN 8.7.0, ensuring GPU acceleration. This configuration provided a consistent and reproducible framework for evaluating detection performance across different models.

To minimize the influence of hyperparameter variations on experimental outcomes, all models were trained using identical settings. Key training parameters—including input image size, number of training epochs, batch size, number of data-loading workers, and optimization strategy—were kept consistent across all experiments. The detailed configurations are summarized in Table 4.

2.2.7. Evaluation Metrics

To comprehensively assess the performance of the proposed model for flax pest and disease detection, we employed several widely accepted quantitative metrics in the field of object detection, including Precision (P), Recall (R), mean Average Precision at IoU 0.5 (mAP@50), mean Average Precision across IoU thresholds 0.5:0.95 (mAP@50:95), number of trainable parameters (Parameters, M), and computational complexity measured by GFLOPS.

Precision (P) quantifies the proportion of correctly predicted positive instances among all predicted positives. Let TP and FP denote the numbers of true positives and false positives, respectively. True positives (TPs) correspond to correctly detected targets, while false positives (FPs) indicate non-targets incorrectly classified as targets. Precision is calculated as

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

Recall (R) reflects the model’s ability to detect all actual positive instances. Let FN denote the number of false negatives, representing ground-truth targets that the model fails to detect. Recall is defined as

R e c a l l = \frac{T P}{T P + F N}

(5)

Mean Average Precision (mAP) is the principal metric for object detection. For each class c, the Average Precision (AP) is computed as the area under the precision–recall curve:

A P_{c} = \int_{0}^{1} P_{c} (R) d R

(6)

The overall mAP across N_c classes is then obtained by averaging all class-wise APs:

m A P = \frac{1}{N_{c}} \sum_{c = 1}^{N_{c}} A P_{c}

(7)

In this study, two variants of mean Average Precision (mAP) were considered. mAP@50 is computed at an Intersection over Union (IoU) threshold of 0.5 and averaged across all classes, reflecting detection performance under a relatively lenient matching criterion. In contrast, mAP@50:95 is calculated across ten IoU thresholds ranging from 0.5 to 0.95 with a step of 0.05 and then averaged over all classes, providing a more comprehensive evaluation of the model’s robustness under varying detection strictness.

To evaluate model efficiency, we considered both the total number of trainable parameters and the computational cost. The number of parameters M is defined as the sum of all weights and biases across the network:

M = \sum_{l = 1}^{L} (k_{l}^{2} \cdot C_{i n}^{l} \cdot C_{o u t}^{l} + C_{o u t}^{l})

(8)

where L is the total number of layers, k_l is the kernel size, and C_in^l and C_out^l are the input and output channels of the l-th layer. The computational cost, measured in GFLOPS, is approximated as

F L O P S = 2 \sum_{l = 1}^{L} (H_{l} \cdot W_{l} \cdot k_{l}^{2} \cdot C_{i n}^{l} \cdot C_{o u t}^{l}), G F L O P S = \frac{F L O P s}{10^{9}}

(9)

where H_l and W_l denote the height and width of the output feature map of the l-th layer. These evaluation metrics collectively provide a comprehensive assessment of the model’s detection accuracy, robustness, and computational efficiency, facilitating a thorough comparison with the existing approaches.

3. Results

3.1. Effect of Instance-Level Data Augmentation

To evaluate the isolated contribution of the proposed instance-level data augmentation strategy, an additional experiment was conducted using the baseline YOLOv11n model. Two training settings were compared: (1) standard data augmentation only, and (2) standard augmentation combined with the proposed instance-level augmentation. All other training settings were kept identical to ensure a fair comparison.

The quantitative results are presented in Table 5. As shown in the table, incorporating instance-level augmentation leads to consistent improvements across several evaluation metrics. Specifically, Precision increases from 82.4% to 88.8%, Recall increases from 68.7% to 69.6%, mAP@50 improves from 76.2% to 77.4%, and mAP@50:95 increases from 45.2% to 46.4%. These results suggest that the instance-level augmentation strategy can enhance detection performance to some extent. By recombining pest and disease instances with diverse background images and geometric transformations, the augmentation process increases the diversity of training samples and partially alleviates class imbalance. This allows the model to observe target objects under more varied spatial contexts, which may contribute to improved generalization performance in complex field environments.

3.2. Ablation Experiments

To comprehensively validate the effectiveness of each proposed component in the improved YOLOv11 model, ablation experiments were conducted on the flax disease and pest dataset. The original YOLOv11n was adopted as the baseline model, and three key modules—namely the auxiliary detection head (AuxDet), the adaptive downsampling module (ADown), and the feature enhancement module (C3K2-STAR)—were progressively integrated into the baseline architecture.

All ablation experiments were performed under identical experimental conditions, including the same training dataset, input resolution, batch size, number of epochs, and optimization strategy, in order to ensure fair and reliable comparisons. The influence of each module on detection accuracy and model complexity was systematically evaluated by analyzing multiple quantitative metrics.

The detailed ablation results are summarized in Table 6.

Introducing each proposed module individually demonstrates their distinct contributions to detection performance and model efficiency. Incorporating AuxDet slightly improves recall (from 69.6% to 70.2%) and increases mAP@50 from 77.4% to 78.1% without introducing additional computational overhead, indicating that the auxiliary detection head enhances feature supervision and target localization. This improvement may be attributed to the additional supervision provided by the auxiliary branch during training, which helps guide intermediate feature learning and improves the model’s ability to capture small and ambiguous targets in complex agricultural scenes. Adding ADown significantly reduces model parameters from 2.58 M to 2.10 M and GFLOPS from 6.3 to 5.3, while maintaining comparable accuracy, demonstrating its effectiveness in lightweight downsampling. This result suggests that the adaptive downsampling strategy preserves essential spatial information while reducing redundant computations, thereby improving computational efficiency without significantly degrading feature quality. Introducing C3K2-STAR improves recall to 72.4% and mAP@50:95 to 47.0%, confirming its ability to strengthen feature representation through enhanced spatial–channel interactions. The improvement may be related to the STAR mechanism, which enhances the interaction between spatial and channel features, enabling the network to better highlight discriminative regions of pests and disease symptoms.

When two modules are combined, complementary effects become evident. The AuxDet + ADown configuration achieves a notable increase in mAP@50:95 to 48.3% with reduced parameters (2.24 M), demonstrating a favorable balance between efficiency and accuracy. This indicates that the improved supervision from AuxDet compensates for potential information loss caused by aggressive downsampling, thereby maintaining detection accuracy while reducing model complexity. The ADown + C3K2-STAR combination further reduces parameters to 2.19 M while maintaining competitive mAP@50 (78.4%) and recall (73.9%), indicating strong synergy between efficient downsampling and enhanced feature extraction. In this configuration, ADown reduces redundant features while C3K2-STAR focuses on enhancing informative representations, resulting in more efficient and discriminative feature learning. The AuxDet + C3K2-STAR model improves mAP@50:95 to 48.1%, highlighting that auxiliary supervision and strengthened backbone representation jointly contribute to better localization and classification. This suggests that deeper feature refinement combined with additional supervision helps the network better distinguish visually similar pest and disease patterns.

Finally, integrating AuxDet, ADown, and C3K2-STAR achieves the best overall performance, achieving a precision of 89.3%, recall of 72.6%, mAP@50 of 79.6%, and mAP@50:95 of 48.4%, with only 2.19 M parameters and 5.5 GFLOPS. These results demonstrate that the proposed modules are mutually reinforcing and collectively enable accurate and efficient detection of flax pests and diseases. Specifically, ADown reduces computational redundancy, C3K2-STAR enhances feature representation capability, and AuxDet strengthens training supervision, allowing the network to simultaneously achieve lightweight design and improved detection accuracy. The full model thus provides a favorable trade-off between detection accuracy and computational complexity, suggesting its potential suitability for practical agricultural monitoring under resource-constrained environments.

3.3. Comparative Experiments

3.3.1. Quantitative Comparison with Different Models

To comprehensively evaluate the effectiveness of the proposed Improved YOLOv11 model for flax disease and pest detection, comparative experiments were conducted against a range of representative object detection methods. These methods include the classical two-stage detector Faster R-CNN, the Transformer-based end-to-end detector RT-DETR-ResNet50, as well as several lightweight one-stage YOLO-series models, namely YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n. All models were trained and tested on the same dataset under identical experimental settings to ensure a fair comparison.

Due to differences in evaluation protocols across detection frameworks, the Precision and Recall metrics for Faster R-CNN are not directly available from the employed implementation. Therefore, mAP@50 and mAP@50:95 are adopted as the primary evaluation metrics for cross-model comparison in this study. In addition, model complexity indicators, including the number of parameters and GFLOPS, are reported to provide a reference for computational efficiency.

The quantitative comparison results are summarized in Table 7.

As shown in Table 7, on the constructed flax disease and pest detection dataset, the proposed Improved YOLOv11 achieves an mAP@50 of 79.6% and an mAP@50:95 of 48.4%, demonstrating superior detection accuracy among all compared methods. Specifically, the mAP@50 of Improved YOLOv11 is 4.4, 7.3, 4.1, 2.4, 2.6, 2.2, and 2.8 percentage points higher than those of the two-stage Faster R-CNN, the transformer-based end-to-end RT-DETR-ResNet50, and the single-stage YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n models, respectively. For the more stringent mAP@50:95 metric, Improved YOLOv11 outperforms Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv5n, YOLOv11n, and YOLOv12n by 2.9, 3.6, 4.3, 2.8, 2.0, and 0.5 percentage points, respectively, while achieving performance comparable to YOLOv8n. In terms of model complexity, Improved YOLOv11 contains only 2.19M parameters, which is 55.1%, 94.8%, 77.0%, 18.6%, 15.1%, and 14.5% fewer than Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv8n, YOLOv11n, and YOLOv12n, respectively. Meanwhile, its computational cost is only 5.5 GFLOPS, representing reductions of 31.5%, 95.6%, 61.5%, 19.1%, 12.7%, and 12.7% compared with the corresponding models. These results demonstrate that the proposed Improved YOLOv11 achieves a favorable balance between detection accuracy and computational efficiency, demonstrating strong potential for efficient and practical flax disease and pest monitoring applications under complex field conditions.

3.3.2. Training Curve Analysis

To further investigate the convergence behavior and training dynamics of different detection models, the mAP@50 and mAP@50:95 curves over training epochs are presented in Figure 6. These curves provide an intuitive comparison of optimization stability, convergence speed, and final detection accuracy among the evaluated methods.

As shown in Figure 6, most models exhibit a rapid increase in mAP values during the early training stage, accompanied by noticeable fluctuations, which can be attributed to the initial adaptation of network parameters and feature representations. As training proceeds, the curves of different models gradually become smoother and converge, indicating progressive stabilization of the optimization process. Although several baseline models may temporarily reach comparable or slightly higher values at certain epochs, the proposed Improved YOLOv11 (red curve) maintains a consistently superior performance during the convergence stage and finally achieves the highest mAP@50 and mAP@50:95 values among all compared methods. This behavior suggests that the introduced architectural improvements contribute to more effective feature learning and more reliable convergence. The two-stage Faster R-CNN exhibits relatively stable curves with smaller oscillations, reflecting robust optimization characteristics; however, its final detection accuracy remains at a medium-to-lower level compared with the one-stage YOLO-based models. Overall, these observations further demonstrate that Improved YOLOv11 achieves a favorable balance between convergence stability and detection accuracy, making it well suited for practical flax disease and pest detection tasks.

3.4. Visualization of Detection Results

To further illustrate the detection performance of the proposed model across all target categories, representative visualization results for seven flax pest and disease classes are presented in Figure 7. Specifically, the selected models include Faster R-CNN, RT-DETR-resnet50, YOLOv11n, and the proposed Improved YOLOv11, alongside the original images. The seven categories include Blister Beetle, Grasshopper, Flax Flea Beetle, Beet Webworm, Sesame Webworm, Flax Yellows, and Fusarium Wilt. Each category is represented by a representative image, with detection results displayed for all models.

As shown in Figure 7, the Improved YOLOv11 consistently produces bounding boxes with higher confidence scores across all categories compared to the other models. The model demonstrates improved localization accuracy, better discrimination of small or partially occluded pests, and reduced missed detections as well as fewer false positives in complex field backgrounds. These qualitative observations are consistent with the quantitative performance reported in Section 3.3, further confirming the efficacy of the proposed modifications in enhancing detection robustness and reliability under real-world conditions.

4. Discussion

The experimental results indicate that the proposed Improved YOLOv11 achieves a favorable balance between detection accuracy and computational efficiency for flax (Linum usitatissimum L.) pest and disease detection under complex field conditions. Compared with several representative detection models, the proposed method achieves higher mAP while maintaining a lightweight architecture with relatively low parameter count and computational cost. These improvements can be attributed to the combined effects of data enhancement and architectural modifications. Although model complexity indicators such as parameters and GFLOPS provide useful measures of computational efficiency, further evaluation on embedded or edge hardware is required to fully assess real-world deployment performance. Inference speed (FPS) may vary significantly depending on hardware configurations and runtime conditions; therefore, it is not explicitly reported in this study and will be systematically evaluated in future work under standardized deployment settings.

In addition to network design, the instance-level data augmentation strategy contributes to improved detection performance. By recombining transformed pest and disease instances with different background images, the augmentation process increases the diversity of the training samples and alleviates the class imbalance present in the original dataset. This strategy allows the model to observe target instances under more varied spatial contexts and background conditions, which helps the network learn more robust visual representations and improves its adaptability to complex field environments.

From the architectural perspective, the ADown module may improve feature extraction efficiency through adaptive downsampling. Conventional downsampling operations often apply uniform spatial compression, which could cause the loss of fine-grained details, especially for small objects such as insects or early-stage disease symptoms. In contrast, ADown potentially preserves regions with higher information density while compressing less informative background areas. This selective mechanism may help retain critical spatial details while reducing redundant computation. The C3K2-STAR module is designed to potentially enhance feature representation by strengthening spatial–channel interactions. Pest bodies and disease lesions often exhibit subtle texture patterns that can be easily obscured by background leaf structures. By selectively emphasizing discriminative responses related to insect contours and lesion textures, the module may improve the network’s ability to capture fine-grained characteristics of pest and disease regions. Moreover, the interaction between ADown and C3K2-STAR may contribute to the observed performance gains. Adaptive downsampling reduces background redundancy during early feature extraction, while C3K2-STAR potentially enhances informative spatial and channel responses in subsequent stages. This complementary mechanism could enable the network to maintain discriminative feature representations even with reduced model complexity. In addition, the Auxiliary Detection Head (AuxDet) may provide additional supervision during training, facilitating gradient propagation and improving optimization stability, which is likely beneficial for detecting small or ambiguous targets. The above architectural interpretations are based on observed performance trends and intuitive understanding, and they should be considered tentative rather than fully demonstrated mechanisms.

It should be noted that detailed class-level and error-level analysis, such as per-class AP, confusion patterns, or systematic failure modes, has not been fully conducted. Therefore, while aggregate mAP provides a general performance overview, per-class differences and specific failure modes should be interpreted cautiously.

Overall, the proposed Improved YOLOv11 model provides an effective and practical solution for flax pest and disease detection under the studied field conditions. While the current work demonstrates promising performance, the model generalization capability is still constrained by dataset scale, regional coverage, and environmental variability. Future research will focus on expanding multi-region, multi-season, and multi-sensor flax pest and disease datasets, integrating severity assessment mechanisms, and exploring more robust lightweight detection frameworks to further enhance practical applicability in precision agriculture systems. In addition, the detection outputs could serve as a core component of visual surveillance-based monitoring systems for early warning, severity assessment, and precision management in flax cultivation.

5. Conclusions

This study presents a lightweight improved YOLOv11 model for flax pest and disease detection, supported by a dedicated dataset and instance-level data augmentation. The proposed model achieves high detection accuracy while maintaining low computational cost. Specifically, it achieves an mAP@50 of 79.6% and an mAP@50:95 of 48.4% with only 2.19M parameters and 5.5 GFLOPS, outperforming several representative detection models, indicating potential for near-real-time applications. Key contributions include the following: (1) construction of a seven-category flax pest and disease dataset with instance-level augmentation to increase diversity and alleviate class imbalance; (2) enhancement of YOLOv11 with ADown and C3K2-STAR modules and auxiliary detection heads for improved multi-scale feature representation; (3) experimental evaluation showing competitive performance compared with several state-of-the-art models.

Future work will focus on expanding the dataset in scale and diversity, integrating temporal or multi-sensor information, exploring lightweight attention or transformer modules, and exploring real-time deployment on edge devices or mobile platforms. These efforts aim to further improve robustness and practical applicability, supporting efficient and precise flax pest and disease monitoring in agricultural production.

Author Contributions

Conceptualization, P.W., C.L., L.W. and Y.L.; methodology, P.W., C.L., L.W. and Y.L.; software, P.W. and C.L.; validation, P.W. and C.L.; formal analysis, P.W.; investigation, P.W. and L.G.; resources, P.W., Y.L. and L.G.; data curation, P.W. and C.L.; writing—original draft preparation, P.W.; writing—review and editing, P.W.; visualization, P.W. and C.L.; supervision, L.W.; project administration, L.W. and Y.L.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Gansu Provincial Department of Education: 2023 Higher Education Industry Support Project (grant number 2023CYZC-54); the Gansu Provincial Department of Science and Technology: Key R&D Program (grant number 23YFWA0013) and Science and Technology Innovation Talent Project (grant number 25RCKAO15); the Lanzhou Science and Technology Bureau: Fourth Batch Science and Technology Plan Project (grant number 2025-4-009); the Ministry of Education of the People’s Republic of China: Industry-University Cooperation Collaborative Education Project (grant number 2507013317); the Gansu Research Academy of Forestry Science and Technology (grant number LKY20250644); the Gansu Provincial Science and Technology Plan—Natural Science Foundation Key Program (grant number 23JRRA1403); and the National Natural Science Foundation of China (grant numbers 32460443 and 32060437).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed in this study were collected jointly by the first author and one co-author. All subsequent data filtering, annotation, and augmentation procedures were performed solely by the first author. Due to project restrictions, the data are not publicly available, but they can be obtained from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to all those who contributed to the completion of this research. We are deeply thankful to Linjing Wei for her invaluable guidance and continuous support throughout this study. Her broad vision and dedication to advancing the academic development of her students have been instrumental in shaping this work. Through opportunities provided by her, we were able to participate in academic exchanges and engage with outstanding researchers, which greatly enriched our perspective and motivation. We also extend our warm thanks to Chengshuai Li for his patient assistance and technical support during the manuscript preparation. His insightful suggestions and practical help in experiment design and troubleshooting were of great significance to the progress of this research. Additionally, we acknowledge the support received from colleagues and peers who provided meaningful discussions and assistance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kajla, P.; Sharma, A.; Sood, D.R. Flaxseed—A potential functional food source. J. Food Sci. Technol. 2015, 52, 1857–1871. [Google Scholar] [CrossRef] [PubMed]
Rymar, E. Własciwości prozdrowotne lnu (Linum ussitatissimum L.). Herbalism 2017, 3, 92–111. [Google Scholar] [CrossRef]
Noreen, S.; Tufail, T.; Ul Ain, H.B.; Awuchi, C.G. Pharmacological, nutraceutical, and nutritional properties of flaxseed (Linum usitatissimum): An insight into its functionality and disease mitigation. Food Sci. Nutr. 2023, 11, 6820–6829. [Google Scholar] [CrossRef]
Tumenbayeva, N.; Mendigaliyeva, A.; Kaliyeva, L.; Tulegenova, D.; Sarmanova, R.; Orynbayev, A.; Mombayeva, B. Reducing the harmfulness of the main phytophages of oilseed flax. Casp. J. Environ. Sci. 2025, 23, 619–626. [Google Scholar] [CrossRef]
Muslima, A.; Dilrabo, X. Results of the study of the influence of viral liver damage in white rats under experimental conditions on liver tissue (trichrome Masson and immunohistochemistry marker cd68). In Proceedings of the Global Summit on Life Sciences and Bio-Innovation: From Agriculture to Biomedicine (GLSBIA 2024), Kunming, China, 5–7 July 2024; p. 03013. [Google Scholar] [CrossRef]
Khasenov, A.U. Phytosanitary State of Oilseed Flax Crops in Northern Kazakhstan and Improvement of Plant Protection Measures. Ph.D. Thesis, Kazakh Research Institute of Plant Protection and Quarantine, Almaty, Kazakhstan, 2022. [Google Scholar]
Alrashedi, M.; Izadi, A.; Razmara, S.; Janpors, M.A.; Barzamini, R. An optimal strategy to determine the electricity tariff during different operational conditions. Lett. High Energy Phys. 2024, 2024, 199–208. [Google Scholar]
Razmara, S.; Barzamini, R.; AlirezaIzadi, N. A hybrid neural network approach for congestion control in TCP/IP networks. Spec. Ugdym. 2022, 1, 8504–8518. [Google Scholar]
Zilola, K.; Sultanova, N.; Malika, M.; Azizjon, S.; Nodira-Tashkent, R.; Diyorbek-teacher, Y.; Elmurodov, T. The impact of probiotics on blood pressure in hypertensive patients: A randomized double-blind placebo-controlled trial. Rev. Latinoam. Hipertens. 2025, 20, 237. [Google Scholar] [CrossRef]
Sadriddin, P.; Feruz, R.; Buzulaykho, K.; Kosim, R.; Aziza, D.; Rano, I.; Salokhiddin, Q. Personalized exercise regimens in post-stroke rehabilitation: Optimizing blood pressure variability and functional independence. Rev. Latinoam. Hipertens. 2025, 20, 306–311. [Google Scholar] [CrossRef]
Chen, T.J.; Zeng, J.; Xie, C.J.; Wang, R.J.; Liu, W.C.; Zhang, J.; Li, R.; Chen, H.B.; Hu, H.Y.; Dong, W. Intelligent identification system of disease and insect pests based on deep learning. China Plant Prot. Guide 2019, 39, 26–34. [Google Scholar]
Bhatia, A.; Chug, A.; Singh, A.P. Hybrid SVM-LR classifier for powdery mildew disease prediction in tomato plant. In Proceedings of the 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 27–28 February 2020; pp. 218–223. [Google Scholar] [CrossRef]
Duarte-Carvajalino, J.M.; Alzate, D.F.; Ramirez, A.A.; Santa-Sepulveda, J.D.; Fajardo-Rojas, A.E.; Soto-Suárez, M. Evaluating late blight severity in potato crops using unmanned aerial vehicles and machine learning algorithms. Remote Sens. 2018, 10, 1513. [Google Scholar] [CrossRef]
Skawsang, S.; Nagai, M.; Tripathi, N.K.; Soni, P. Predicting rice pest population occurrence with satellite-derived crop phenology, ground meteorological observation, and machine learning: A case study for the Central Plain of Thailand. Appl. Sci. 2019, 9, 4846. [Google Scholar] [CrossRef]
Wrat, G.; Ranjan, P.; Mishra, S.K.; Jose, J.T.; Das, J. Neural network-enhanced internal leakage analysis for efficient fault detection in heavy machinery hydraulic actuator cylinders. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2025, 239, 1021–1031. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Jani, M.; Fayyad, J.; Al-Younes, Y.; Najjaran, H. Model Compression Methods for YOLOv5: A Review. arXiv 2023, arXiv:2307.11904. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 2969–2977. [Google Scholar]
Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
Guan, B.; Wu, Y.; Zhu, J.; Kong, J.; Dong, W. GC-Faster RCNN: The object detection algorithm for agricultural pests based on improved hybrid attention mechanism. Plants 2025, 14, 1106. [Google Scholar] [CrossRef]
Santhosh, S.; Thiyagarajan, R. An Enhancement of YOLOV3-Tiny Model for Turmeric Plant Disease Detection. J. Intell. Syst. Internet Things 2025, 16, 61–74. [Google Scholar] [CrossRef]
Lutfiah, S.; Musdalifah, F. Detection of Chili Plant Pests and Diseases using Yolov5. Media J. Gen. Comput. Sci. 2025, 2, 73–81. [Google Scholar] [CrossRef]
Wang, J.; Ma, S.; Wang, Z.; Ma, X.; Yang, C.; Chen, G.; Wang, Y. Improved Lightweight YOLOv8 Model for Rice Disease Detection in Multi-Scale Scenarios. Agronomy 2025, 15, 445. [Google Scholar] [CrossRef]
Wu, X.; Su, X.; Ma, Z.; Xu, B. YOLO-lychee-advanced: An optimized detection model for lychee pest damage based on YOLOv11. Front. Plant Sci. 2025, 16, 1643700. [Google Scholar] [CrossRef] [PubMed]
Meng, X.; Bu, F.; Wu, J.; He, Z.; Li, Y.; Wang, X. Research and Application of Tomato Leaf Disease Recognition Method Based on Improved YOLOv12. In Proceedings of the 2025 International Conference on Trustworthy Big Data and Artificial Intelligence (ICTBAI), Tokyo, Japan, 21–23 August 2025; pp. 67–72. [Google Scholar] [CrossRef]
Dwibedi, D.; Misra, I.; Hebert, M. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1301–1310. [Google Scholar] [CrossRef]
Georgakis, G.; Mousavian, A.; Berg, A.C.; Kosecka, J. Synthesizing Training Data for Object Detection in Indoor Scenes. arXiv 2017, arXiv:1702.07836. [Google Scholar] [CrossRef]
Dvornik, N.; Mairal, J.; Schmid, C. Modeling visual context is key to augmenting object detection datasets. In Proceedings of the 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; pp. 364–380. [Google Scholar] [CrossRef]
Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.-Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, USA, 19–25 June 2021; pp. 2918–2928. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. Yolov11: An Overview of The Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, 29 September–4 October 2024. [Google Scholar] [CrossRef]
Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the stars. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]

Figure 1. Representative samples of the flax pest and disease dataset: (a) pest categories including Blister Beetle, Grasshopper, Flax Flea Beetle, Beet Webworm, and Sesame Webworm; (b) disease categories including Flax Yellows and Fusarium Wilt.

Figure 2. Examples of original images and corresponding instance-level augmented images. The red circles indicate the newly introduced instances generated by the instance-level data augmentation process.

Figure 3. Overall architecture of the proposed improved YOLOv11 model for flax pest and disease detection.

Figure 4. Structure of the ADown (Adaptive Downsampling) module.

Figure 5. Structure of the proposed C3K2-STAR module.

Figure 6. Training curves of (a) mAP@50 and (b) mAP@50:95 for different models on the flax disease and pest dataset.

Figure 7. Visualization of detection results for seven flax pest and disease categories. From left to right: Original image, Faster R-CNN, RT-DETR-resnet50, YOLOv11n, and Improved YOLOv11.

Table 1. Distribution of original images in the dataset before augmentation.

Category	Images
Blister Beetle	13
Grasshopper	13
Flax Flea Beetle	82
Flax Yellows	52
Fusarium Wilt	350
Beet Webworm	162
Sesame Webworm	14

Table 2. Distribution of annotated instances in the final dataset.

Category	Instances
Blister Beetle	282
Grasshopper	212
Flax Flea Beetle	280
Flax Yellows	1237
Fusarium Wilt	514
Beet Webworm	658
Sesame Webworm	287

Table 3. Hardware specifications of the experimental workstation.

Component	Specification
CPU	Intel(R) Core(TM) i9-12900HX @ 2.30 GHz (Intel Corporation, Santa Clara, CA, USA)
GPU	NVIDIA GeForce RTX 3080Ti Laptop, 16 GB (NVIDIA Corporation, Santa Clara, CA, USA)
Memory	Samsung DDR5 4800MHz, 16 GB × 2 (Samsung Electronics, Suwon, South Korea)
Disk	Samsung MZVL22T0HBLB-00B00, 2 TB (Samsung Electronics, Suwon, South Korea)

Table 4. Training parameters used in all experiments.

Parameter	Value
Input size (imgsz)	640
Epochs	500
Batch size	16
Number of workers	3
Optimizer	SGD

Table 5. Effect of instance-level data augmentation on YOLOv11n. “×” indicates without instance-level augmentation, and “√” indicates with instance-level augmentation.

Model	Instance-Level Augmentation	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50:95 (%)
YOLOv11n	×	82.4	68.7	76.2	45.2
YOLOv11n	√	88.8	69.6	77.4	46.4

Table 6. Ablation study of the improved YOLOv11 model on the flax disease and pest dataset. “√” indicates that the corresponding module is included, while blank indicates that the module is not used.

AuxDet	ADown	C3K2-STAR	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50:95 (%)	Parameters (M)	GFLOPS
			88.8	69.6	77.4	46.4	2.58	6.3
✓			88.6	70.2	78.1	47.8	2.58	6.3
	✓		86.2	71.7	76.7	46.5	2.10	5.3
		✓	86.9	72.4	77.4	47	2.54	6.4
✓	✓		86.2	71.3	78.4	48.3	2.24	6.4
	✓	✓	87.4	73.9	78.4	47.4	2.19	5.5
✓		✓	88.1	72.3	77.9	48.1	2.54	6.4
✓	✓	✓	89.3	72.6	79.6	48.4	2.19	5.5

Table 7. Performance comparison of different object detection models on the flax disease and pest dataset.

Model	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50:95 (%)	Parameters (M)	GFLOPS
Faster R-CNN	-	-	75.2	45.5	4.875	8.03
RT-DETR-resnet50	82	69.2	72.3	44.8	41.9	125.7
YOLOv3-tiny	88.9	68.9	75.5	44.1	9.52	14.3
YOLOv5n	87.6	71.9	77.2	45.6	2.18	5.8
YOLOv8n	83	72.4	77	48.7	2.69	6.8
YOLOv11n	88.8	69.6	77.4	46.4	2.58	6.3
YOLOv12n	88.4	69.9	76.8	47.9	2.56	6.3
Improved YOLOv11	89.3	72.6	79.6	48.4	2.19	5.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, P.; Li, C.; Wei, L.; Li, Y.; Guo, L. Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments. Agriculture 2026, 16, 741. https://doi.org/10.3390/agriculture16070741

AMA Style

Wang P, Li C, Wei L, Li Y, Guo L. Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments. Agriculture. 2026; 16(7):741. https://doi.org/10.3390/agriculture16070741

Chicago/Turabian Style

Wang, Pengyan, Chengshuai Li, Linjing Wei, Yue Li, and Linhai Guo. 2026. "Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments" Agriculture 16, no. 7: 741. https://doi.org/10.3390/agriculture16070741

APA Style

Wang, P., Li, C., Wei, L., Li, Y., & Guo, L. (2026). Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments. Agriculture, 16(7), 741. https://doi.org/10.3390/agriculture16070741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved YOLOv11 for Lightweight and Accurate Detection of Flax Diseases and Pests in Complex Field Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Description

2.1.1. Data Collection

2.1.2. Data Preprocessing and Augmentation

2.2. Improved YOLOv11 Network Architecture

2.2.1. YOLOv11 Baseline Network

2.2.2. Improved YOLOv11 Architecture for Flax Pest and Disease Detection

2.2.3. Adaptive Downsampling (ADown) Module

2.2.4. C3K2-STAR Module

2.2.5. Auxiliary Detection Head (AuxDet)

2.2.6. Training Settings

2.2.7. Evaluation Metrics

3. Results

3.1. Effect of Instance-Level Data Augmentation

3.2. Ablation Experiments

3.3. Comparative Experiments

3.3.1. Quantitative Comparison with Different Models

3.3.2. Training Curve Analysis

3.4. Visualization of Detection Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI