1. Introduction
Common flax (
Linum usitatissimum L.) is an important oilseed crop widely cultivated in temperate regions worldwide, with substantial economic and agricultural significance. It is primarily grown in two forms, namely fiber flax for textile production and oilseed flax for edible oil and functional food applications, reflecting its broad utilization potential across multiple industries. Owing to its high content of α-linolenic acid, dietary fiber, lignans, and phenolic compounds, flaxseed is recognized as a valuable source of bioactive substances associated with cardiovascular protection, antioxidant activity, and other health-promoting effects [
1,
2,
3]. However, the yield and quality potential of flax are severely constrained by various insect pests and diseases. These biotic stresses damage multiple plant organs and disrupt normal physiological processes, leading to reduced photosynthetic capacity, growth inhibition, and yield loss [
4,
5]. Previous studies have reported that severe pest and disease infestations can result in substantial reductions in seed weight and overall productivity, posing a persistent threat to stable flax production [
6,
7,
8,
9,
10]. Therefore, timely and accurate monitoring of flax pest and disease occurrence is of great importance for ensuring yield and quality.
Pest and disease monitoring is a fundamental prerequisite for ensuring the healthy growth and stable yield of flax
(Linum usitatissimum L.). In current flax production, conventional monitoring approaches still mainly rely on manual field scouting, biological trapping, and chemical analysis, but these methods exhibit evident limitations [
11]. Manual visual inspection is highly labor-intensive and time-consuming, and it is only suitable for small-scale fields, making it difficult to satisfy the requirements of large-area flax cultivation. Biological monitoring methods, such as insect traps and sticky boards, are susceptible to environmental conditions and subjective judgment, which may lead to inaccurate or delayed estimation of pest populations. Chemical analysis, which involves laboratory examination of plant tissues or soil samples, is usually applied only after severe infection or obvious symptoms have appeared. Consequently, these traditional approaches cannot meet the demand for timely, efficient, and large-scale monitoring of flax diseases and insect pests, highlighting the urgent need for automated and intelligent monitoring technologies [
11].
Following the limitations of traditional monitoring approaches, machine learning methods have been explored to improve the accuracy and efficiency of crop pest and disease detection. For instance, Bhatia et al. [
12] combined support vector machines (SVMs) and logistic regression to predict powdery mildew incidence in tomato plants, demonstrating the potential of hybrid models for disease forecasting. Duarte-Carvajalino et al. [
13] applied multiple machine learning algorithms, including multilayer perceptrons, convolutional neural networks, support vector regression, and random forests, to quantitatively assess late blight severity in potato crops using multispectral data captured by unmanned aerial vehicles. Similarly, Skawsang et al. [
14] integrated satellite-derived crop phenology, ground meteorological observations, and artificial neural networks (ANNs) to predict rice pest populations, supporting proactive pest management in Thailand. Despite these promising results, conventional machine learning models generally rely on handcrafted features and often fail to capture complex spatial patterns in agricultural fields, limiting their generalization and detection accuracy under diverse conditions. The application of neural network-based intelligent diagnostic frameworks has also been widely explored in other engineering and fault detection domains, further demonstrating the versatility of deep learning-driven pattern recognition approaches in complex signal and image analysis tasks [
15].
To address these limitations, deep learning-based object detection methods have been widely applied in crop pest and disease monitoring, particularly in complex agricultural scenarios, significantly improving detection accuracy and efficiency. Among them, the YOLO series [
16,
17,
18,
19] and the R-CNN series [
20,
21] represent one-stage and two-stage detection frameworks, respectively, and both have achieved favorable performance in agricultural pest and disease detection tasks [
22]. Guan et al. [
23] proposed a GC-Faster R-CNN model with a hybrid attention mechanism to improve feature representation for multi-scale and highly similar pest targets; experiments on the Insect25 dataset showed substantial gains over the baseline Faster R-CNN in terms of mAP and recall. Santhosh and Thiyagarajan [
24] enhanced the YOLOv3-Tiny framework by integrating a convolutional and vision transformer-based module (ConViT-TDD) with channel-spatial and self-attention mechanisms for turmeric disease detection, achieving an overall accuracy of 93.16%, outperforming classical CNN models. Lutfiah and Musdalifah [
25] applied YOLOv5 for chili plant pest and disease detection using smartphone-acquired images, achieving a test accuracy of 0.947, with precision, recall, and mAP of 0.946, 0.936, and 0.959, respectively. Wang et al. [
26] proposed an improved lightweight YOLOv8-based model (RGC-YOLO) for multi-scale rice pest and disease detection, integrating RepGhost, GhostConv, and a hybrid attention module. The model achieved mAP@0.5 of 93.2% while reducing parameters by 33.2% and GFLOPs by 29.27%, demonstrating suitability for real-time deployment on embedded devices. Wu et al. [
27] developed YOLO-Lychee-advanced, a lightweight YOLOv11-based detector for lychee stem-borer damage, integrating a dual-branch residual C2f module, CBAM attention, and CIoU loss. The model achieved 91.7% mAP@0.5 and 61.6% mAP@0.5–0.95 while maintaining real-time speed (37 FPS), outperforming YOLOv9t and YOLOv10n. Meng et al. [
28] proposed an improved YOLOv12 for tomato leaf disease detection by introducing SPDConv, a Parallelized Patch-Aware Attention (PPA) module, and a dedicated small-target head, increasing mAP@0.5 by 3%, mAP@0.5–0.95 by 5.4%, and AP-Small by 4.5%, demonstrating enhanced detection of small and occluded disease spots while maintaining a lightweight structure.
Despite the remarkable performance of existing YOLO-based models in agricultural pest and disease detection, several challenges remain. Under complex field conditions, detection accuracy can be compromised due to target scale variation, occlusion, and heterogeneous backgrounds. Additionally, many models are relatively large and computationally intensive, limiting their applicability on resource-constrained devices. Another critical issue is the lack of standardized and publicly available datasets for flax pests and diseases, which restricts model training and benchmarking. To address these limitations, this study adopts YOLOv11 as the baseline, leveraging its efficient single-stage detection framework, and proposes several targeted architectural integrations and adaptations. Furthermore, we construct a dedicated flax pest and disease dataset, enabling robust model training and systematic evaluation.
To address these challenges, several targeted architectural improvements are introduced. Small pests and disease lesions in flax fields often occupy only a few pixels, and conventional downsampling operations may cause the loss of fine-grained details during feature extraction. Therefore, the Adaptive Downsampling (ADown) module is adopted to preserve informative regions while reducing redundant background features. In addition, pest bodies and disease spots frequently resemble surrounding leaf textures, which makes discriminative feature representation difficult. To enhance the modeling of critical spatial and channel information, the C3K2-STAR module is incorporated to strengthen feature representation. Furthermore, shallow feature layers may receive insufficient supervision during training, which can hinder the optimization of small and weak targets. To alleviate this issue, auxiliary detection heads are introduced to provide additional supervision and improve gradient propagation. Through the integration of these components, the proposed model aims to improve small-target detection, feature representation, and training stability under complex field conditions.
Building upon these efforts, the main contributions of this work are summarized as follows:
- (1)
A dedicated flax pest and disease image dataset covering seven common categories is constructed, and instance-level data augmentation is employed to expand data diversity and alleviate class imbalance.
- (2)
A lightweight YOLOv11-based detection framework is developed by integrating several effective architectural components, including ADown, C3K2-STAR, and auxiliary detection heads, building upon existing mechanisms to enhance feature representation and small-target detection under complex field conditions.
- (3)
Comprehensive comparative experiments and ablation studies are conducted to validate the effectiveness and efficiency of the proposed model.
3. Results
3.1. Effect of Instance-Level Data Augmentation
To evaluate the isolated contribution of the proposed instance-level data augmentation strategy, an additional experiment was conducted using the baseline YOLOv11n model. Two training settings were compared: (1) standard data augmentation only, and (2) standard augmentation combined with the proposed instance-level augmentation. All other training settings were kept identical to ensure a fair comparison.
The quantitative results are presented in
Table 5. As shown in the table, incorporating instance-level augmentation leads to consistent improvements across several evaluation metrics. Specifically, Precision increases from 82.4% to 88.8%, Recall increases from 68.7% to 69.6%, mAP@50 improves from 76.2% to 77.4%, and mAP@50:95 increases from 45.2% to 46.4%. These results suggest that the instance-level augmentation strategy can enhance detection performance to some extent. By recombining pest and disease instances with diverse background images and geometric transformations, the augmentation process increases the diversity of training samples and partially alleviates class imbalance. This allows the model to observe target objects under more varied spatial contexts, which may contribute to improved generalization performance in complex field environments.
3.2. Ablation Experiments
To comprehensively validate the effectiveness of each proposed component in the improved YOLOv11 model, ablation experiments were conducted on the flax disease and pest dataset. The original YOLOv11n was adopted as the baseline model, and three key modules—namely the auxiliary detection head (AuxDet), the adaptive downsampling module (ADown), and the feature enhancement module (C3K2-STAR)—were progressively integrated into the baseline architecture.
All ablation experiments were performed under identical experimental conditions, including the same training dataset, input resolution, batch size, number of epochs, and optimization strategy, in order to ensure fair and reliable comparisons. The influence of each module on detection accuracy and model complexity was systematically evaluated by analyzing multiple quantitative metrics.
The detailed ablation results are summarized in
Table 6.
Introducing each proposed module individually demonstrates their distinct contributions to detection performance and model efficiency. Incorporating AuxDet slightly improves recall (from 69.6% to 70.2%) and increases mAP@50 from 77.4% to 78.1% without introducing additional computational overhead, indicating that the auxiliary detection head enhances feature supervision and target localization. This improvement may be attributed to the additional supervision provided by the auxiliary branch during training, which helps guide intermediate feature learning and improves the model’s ability to capture small and ambiguous targets in complex agricultural scenes. Adding ADown significantly reduces model parameters from 2.58 M to 2.10 M and GFLOPS from 6.3 to 5.3, while maintaining comparable accuracy, demonstrating its effectiveness in lightweight downsampling. This result suggests that the adaptive downsampling strategy preserves essential spatial information while reducing redundant computations, thereby improving computational efficiency without significantly degrading feature quality. Introducing C3K2-STAR improves recall to 72.4% and mAP@50:95 to 47.0%, confirming its ability to strengthen feature representation through enhanced spatial–channel interactions. The improvement may be related to the STAR mechanism, which enhances the interaction between spatial and channel features, enabling the network to better highlight discriminative regions of pests and disease symptoms.
When two modules are combined, complementary effects become evident. The AuxDet + ADown configuration achieves a notable increase in mAP@50:95 to 48.3% with reduced parameters (2.24 M), demonstrating a favorable balance between efficiency and accuracy. This indicates that the improved supervision from AuxDet compensates for potential information loss caused by aggressive downsampling, thereby maintaining detection accuracy while reducing model complexity. The ADown + C3K2-STAR combination further reduces parameters to 2.19 M while maintaining competitive mAP@50 (78.4%) and recall (73.9%), indicating strong synergy between efficient downsampling and enhanced feature extraction. In this configuration, ADown reduces redundant features while C3K2-STAR focuses on enhancing informative representations, resulting in more efficient and discriminative feature learning. The AuxDet + C3K2-STAR model improves mAP@50:95 to 48.1%, highlighting that auxiliary supervision and strengthened backbone representation jointly contribute to better localization and classification. This suggests that deeper feature refinement combined with additional supervision helps the network better distinguish visually similar pest and disease patterns.
Finally, integrating AuxDet, ADown, and C3K2-STAR achieves the best overall performance, achieving a precision of 89.3%, recall of 72.6%, mAP@50 of 79.6%, and mAP@50:95 of 48.4%, with only 2.19 M parameters and 5.5 GFLOPS. These results demonstrate that the proposed modules are mutually reinforcing and collectively enable accurate and efficient detection of flax pests and diseases. Specifically, ADown reduces computational redundancy, C3K2-STAR enhances feature representation capability, and AuxDet strengthens training supervision, allowing the network to simultaneously achieve lightweight design and improved detection accuracy. The full model thus provides a favorable trade-off between detection accuracy and computational complexity, suggesting its potential suitability for practical agricultural monitoring under resource-constrained environments.
3.3. Comparative Experiments
3.3.1. Quantitative Comparison with Different Models
To comprehensively evaluate the effectiveness of the proposed Improved YOLOv11 model for flax disease and pest detection, comparative experiments were conducted against a range of representative object detection methods. These methods include the classical two-stage detector Faster R-CNN, the Transformer-based end-to-end detector RT-DETR-ResNet50, as well as several lightweight one-stage YOLO-series models, namely YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n. All models were trained and tested on the same dataset under identical experimental settings to ensure a fair comparison.
Due to differences in evaluation protocols across detection frameworks, the Precision and Recall metrics for Faster R-CNN are not directly available from the employed implementation. Therefore, mAP@50 and mAP@50:95 are adopted as the primary evaluation metrics for cross-model comparison in this study. In addition, model complexity indicators, including the number of parameters and GFLOPS, are reported to provide a reference for computational efficiency.
The quantitative comparison results are summarized in
Table 7.
As shown in
Table 7, on the constructed flax disease and pest detection dataset, the proposed Improved YOLOv11 achieves an mAP@50 of 79.6% and an mAP@50:95 of 48.4%, demonstrating superior detection accuracy among all compared methods. Specifically, the mAP@50 of Improved YOLOv11 is 4.4, 7.3, 4.1, 2.4, 2.6, 2.2, and 2.8 percentage points higher than those of the two-stage Faster R-CNN, the transformer-based end-to-end RT-DETR-ResNet50, and the single-stage YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n models, respectively. For the more stringent mAP@50:95 metric, Improved YOLOv11 outperforms Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv5n, YOLOv11n, and YOLOv12n by 2.9, 3.6, 4.3, 2.8, 2.0, and 0.5 percentage points, respectively, while achieving performance comparable to YOLOv8n. In terms of model complexity, Improved YOLOv11 contains only 2.19M parameters, which is 55.1%, 94.8%, 77.0%, 18.6%, 15.1%, and 14.5% fewer than Faster R-CNN, RT-DETR-ResNet50, YOLOv3-tiny, YOLOv8n, YOLOv11n, and YOLOv12n, respectively. Meanwhile, its computational cost is only 5.5 GFLOPS, representing reductions of 31.5%, 95.6%, 61.5%, 19.1%, 12.7%, and 12.7% compared with the corresponding models. These results demonstrate that the proposed Improved YOLOv11 achieves a favorable balance between detection accuracy and computational efficiency, demonstrating strong potential for efficient and practical flax disease and pest monitoring applications under complex field conditions.
3.3.2. Training Curve Analysis
To further investigate the convergence behavior and training dynamics of different detection models, the mAP@50 and mAP@50:95 curves over training epochs are presented in
Figure 6. These curves provide an intuitive comparison of optimization stability, convergence speed, and final detection accuracy among the evaluated methods.
As shown in
Figure 6, most models exhibit a rapid increase in mAP values during the early training stage, accompanied by noticeable fluctuations, which can be attributed to the initial adaptation of network parameters and feature representations. As training proceeds, the curves of different models gradually become smoother and converge, indicating progressive stabilization of the optimization process. Although several baseline models may temporarily reach comparable or slightly higher values at certain epochs, the proposed Improved YOLOv11 (red curve) maintains a consistently superior performance during the convergence stage and finally achieves the highest mAP@50 and mAP@50:95 values among all compared methods. This behavior suggests that the introduced architectural improvements contribute to more effective feature learning and more reliable convergence. The two-stage Faster R-CNN exhibits relatively stable curves with smaller oscillations, reflecting robust optimization characteristics; however, its final detection accuracy remains at a medium-to-lower level compared with the one-stage YOLO-based models. Overall, these observations further demonstrate that Improved YOLOv11 achieves a favorable balance between convergence stability and detection accuracy, making it well suited for practical flax disease and pest detection tasks.
3.4. Visualization of Detection Results
To further illustrate the detection performance of the proposed model across all target categories, representative visualization results for seven flax pest and disease classes are presented in
Figure 7. Specifically, the selected models include Faster R-CNN, RT-DETR-resnet50, YOLOv11n, and the proposed Improved YOLOv11, alongside the original images. The seven categories include Blister Beetle, Grasshopper, Flax Flea Beetle, Beet Webworm, Sesame Webworm, Flax Yellows, and Fusarium Wilt. Each category is represented by a representative image, with detection results displayed for all models.
As shown in
Figure 7, the Improved YOLOv11 consistently produces bounding boxes with higher confidence scores across all categories compared to the other models. The model demonstrates improved localization accuracy, better discrimination of small or partially occluded pests, and reduced missed detections as well as fewer false positives in complex field backgrounds. These qualitative observations are consistent with the quantitative performance reported in
Section 3.3, further confirming the efficacy of the proposed modifications in enhancing detection robustness and reliability under real-world conditions.
4. Discussion
The experimental results indicate that the proposed Improved YOLOv11 achieves a favorable balance between detection accuracy and computational efficiency for flax (Linum usitatissimum L.) pest and disease detection under complex field conditions. Compared with several representative detection models, the proposed method achieves higher mAP while maintaining a lightweight architecture with relatively low parameter count and computational cost. These improvements can be attributed to the combined effects of data enhancement and architectural modifications. Although model complexity indicators such as parameters and GFLOPS provide useful measures of computational efficiency, further evaluation on embedded or edge hardware is required to fully assess real-world deployment performance. Inference speed (FPS) may vary significantly depending on hardware configurations and runtime conditions; therefore, it is not explicitly reported in this study and will be systematically evaluated in future work under standardized deployment settings.
In addition to network design, the instance-level data augmentation strategy contributes to improved detection performance. By recombining transformed pest and disease instances with different background images, the augmentation process increases the diversity of the training samples and alleviates the class imbalance present in the original dataset. This strategy allows the model to observe target instances under more varied spatial contexts and background conditions, which helps the network learn more robust visual representations and improves its adaptability to complex field environments.
From the architectural perspective, the ADown module may improve feature extraction efficiency through adaptive downsampling. Conventional downsampling operations often apply uniform spatial compression, which could cause the loss of fine-grained details, especially for small objects such as insects or early-stage disease symptoms. In contrast, ADown potentially preserves regions with higher information density while compressing less informative background areas. This selective mechanism may help retain critical spatial details while reducing redundant computation. The C3K2-STAR module is designed to potentially enhance feature representation by strengthening spatial–channel interactions. Pest bodies and disease lesions often exhibit subtle texture patterns that can be easily obscured by background leaf structures. By selectively emphasizing discriminative responses related to insect contours and lesion textures, the module may improve the network’s ability to capture fine-grained characteristics of pest and disease regions. Moreover, the interaction between ADown and C3K2-STAR may contribute to the observed performance gains. Adaptive downsampling reduces background redundancy during early feature extraction, while C3K2-STAR potentially enhances informative spatial and channel responses in subsequent stages. This complementary mechanism could enable the network to maintain discriminative feature representations even with reduced model complexity. In addition, the Auxiliary Detection Head (AuxDet) may provide additional supervision during training, facilitating gradient propagation and improving optimization stability, which is likely beneficial for detecting small or ambiguous targets. The above architectural interpretations are based on observed performance trends and intuitive understanding, and they should be considered tentative rather than fully demonstrated mechanisms.
It should be noted that detailed class-level and error-level analysis, such as per-class AP, confusion patterns, or systematic failure modes, has not been fully conducted. Therefore, while aggregate mAP provides a general performance overview, per-class differences and specific failure modes should be interpreted cautiously.
Overall, the proposed Improved YOLOv11 model provides an effective and practical solution for flax pest and disease detection under the studied field conditions. While the current work demonstrates promising performance, the model generalization capability is still constrained by dataset scale, regional coverage, and environmental variability. Future research will focus on expanding multi-region, multi-season, and multi-sensor flax pest and disease datasets, integrating severity assessment mechanisms, and exploring more robust lightweight detection frameworks to further enhance practical applicability in precision agriculture systems. In addition, the detection outputs could serve as a core component of visual surveillance-based monitoring systems for early warning, severity assessment, and precision management in flax cultivation.