1. Introduction
Cotton is a critically important plant fiber crop both in China and on a global scale. It is cultivated in more than 50 countries, covering an extensive geographical area. Among the world’s top five cotton-producing countries, China holds the leading position in terms of production volume [
1]. According to the data collected from 2017 to 2019, the loss rate of cotton production attributed to pest infestations and adverse environmental conditions was substantial [
2]. In the process of cotton cultivation, approximately 90% of disease symptoms are observed on cotton leaves, encompassing conditions such as cotton boll blight, cotton anthracnose, and cotton leaf mildew. These diseases impair plant physiological functions and disrupt the vascular system, thereby directly leading to reduced cotton yields and degraded fiber quality. In this context, the adoption of automated detection technology for large–scale cotton crop monitoring carries considerable importance in reducing disease–induced losses and facilitating the sustainable growth of the cotton sector [
3].
In traditional cotton leaf disease detection methods, diagnosing cotton diseases mainly involves analyzing the characteristic features of cotton leaves [
4]. However, the identification of leaf diseases with the human eye is prone to errors due to the similarity between some diseases, making it challenging for humans to identify diseases accurately. This method also requires significant labor input and exhibits strong subjectivity, which renders it unsuitable for meeting the large-scale agricultural demands of today. In contrast, automated disease detection has undergone significant evolution, progressing from early machine learning techniques to contemporary deep learning approaches. Machine learning achieved notable breakthroughs in crop disease classification during its early stages. Its main strength resides in the markedly improved precision and effectiveness in plant pathological classification relative to previous approaches, thereby broadening its scope of applicable scenarios [
5]. Traditional machine learning shows limited generalization for diverse disease symptoms [
6]. With the advancement of intelligent sensors in China, the replacement of human labor with machines has emerged as an inevitable trend. The integration of state-of-the-art tech into sensors, including AI, 5G, human–machine interaction, and big data, promotes the use of computer vision and deep learning in agricultural advancement [
7].
The integration of deep learning technology into the agricultural sector is progressively advancing, particularly in the areas of crop disease detection and classification. Deep learning models, particularly convolutional neural networks (CNNs), have been widely utilized in sensors for the detection and categorization of pests and diseases that are harmful to specific crops [
8]. These models are capable of processing and analyzing vast amounts of image data, automatically extracting relevant features, and performing precise disease classification, which is critical for the early detection and prevention of crop diseases. Through the analysis of satellite imagery and drone-captured field images, CNNs can autonomously identify crop species, assess their health and growth status, and monitor vegetation indices, such as the Normalized Difference Vegetation Index (NDVI), to evaluate crop conditions [
9]. However, crops may simultaneously be affected by multiple pests and diseases during various growth stages. The identification of individual pests or diseases is no longer sufficient to meet the needs of growers. Consequently, researchers have begun to focus on combinations of pests and diseases that more frequently co-occur within the same growth stage. By employing enhanced deep learning models, they extract features of these pests and diseases and classify them accordingly.
These studies have enabled growers to utilize a single model for distinguishing among multiple plant pests and diseases, thereby promoting advancements in agricultural applications. Within the domain of sensors, deep learning technology has exhibited considerable potential, with particular emphasis on the detection and classification of crop diseases [
10]. As crops may encounter the challenge of multiple pests and diseases at various growth stages, researchers are utilizing advanced deep learning models to identify and classify these diseases, thereby addressing growers’ requirements for multi-disease identification. In scenarios with limited datasets, the model’s generalization capability and recognition accuracy are enhanced by leveraging public plant datasets and transfer learning technologies [
11]. The advancement of these technologies not only enhances the precision of crop monitoring but also refines crop management strategies, facilitates the development of precision sensors, and strengthens quality control of agricultural products. As technology continues to evolve and datasets become increasingly diverse, the application of deep learning in sensors is expected to expand further and delve deeper into various domains.
Deep learning-based detection models, such as the YOLO series, have demonstrated superior performance in agricultural pest and disease monitoring, particularly in detecting leaf diseases in crops like cotton [
12]. These models are capable not only of identifying the disease type but also of precisely locating the disease on the leaf, which is critical for the targeted application of treatments and the safeguarding of crop health [
13]. The high adaptability of these models in scenarios involving multiple diseases allows them to maintain consistent recognition performance across varying environmental conditions, such as fluctuating illumination and complex backgrounds, thus providing farmers with a dependable monitoring solution.
To further enhance the practicality of these models, researchers are investigating methods to enable their functionality under more complex real-world field conditions. This has been accomplished by integrating leaf images of pests and diseases—captured with various natural backgrounds—into the training dataset, which in turn enhances the model’s resilience in complex environments [
14]. This approach enhances the model’s ability to accurately identify and locate diseases in practical scenarios, even when leaves are partially occluded or environmental conditions are suboptimal.
To satisfy the real-time and convenience demands of agricultural field operations, researchers are actively exploring methods to optimize the model, aiming to minimize computational resource consumption while maintaining high accuracy and accelerating detection speed. This involves designing more lightweight model architectures and refining algorithms for compatibility with mobile and embedded devices. Such advancements not only improve the efficiency of pesticide application and mitigate its environmental impact but also enhance the overall yield and quality of crops [
15]. The integration of deep learning technology into agricultural pest detection is advancing steadily, enabling farmers to achieve more precise crop protection and management through rapid and accurate disease identification and localization. As technology continues to evolve, these models are anticipated to play an increasingly significant role in enhancing the efficiency and sustainability of agricultural production [
16].
In the domain of modern agricultural disease management, YOLO series models have emerged as a pivotal technology for outdoor real-time monitoring due to their high accuracy and substantial advantages in single-stage target detection [
17]. These models demonstrate a remarkable capacity to rapidly adapt to the instabilities inherent in outdoor environments, such as fluctuations in lighting and background disturbances. This guarantees consistent and dependable detection results under diverse conditions. Ongoing progress in the YOLO series, which encompasses components such as PGI and GELAN in YOLOv9, along with the C3k2 module and C2PSA module in YOLOv10 and YOLOv11, has significantly boosted the model’s capacity for addressing complex detection tasks [
18]. Notably, these improvements have been particularly effective in detecting small and occluded objects. Furthermore, the incorporation of attention mechanisms, such as the SE module, enables the model to prioritize critical image features, thereby improving the accuracy of disease recognition under challenging field conditions. The integration of these technologies not only elevates the detection performance of the model but also enhances its robustness in dynamic environments [
19]. These developments suggest that YOLO series models will play an increasingly significant role in smart sensors, contributing to the improvement of crop management efficiency and effectiveness while fostering the sustainable development of sensors.
While YOLO series models are widely employed in the object detection domain owing to their fast detection speed and strong real-time capability, some constraints still exist when they are applied in cotton leaf disease detection [
20]. First, the YOLO series models exhibit constrained performance in small target detection. Given that some cotton leaf diseases manifest as small lesions during their initial stages, this often leads to missed detections or false positives. Second, the model’s target detection capability in the presence of complex backgrounds requires enhancement. Cotton plants possess diverse leaf shapes, and the growth environment includes interference factors such as weeds, which complicates the accurate identification of cotton leaf diseases. Additionally, the YOLO series models demonstrate insufficient generalization ability regarding disease features [
21]. Cotton leaf diseases vary significantly across different growth stages and environments, further reducing the model’s detection accuracy in new scenarios or with novel disease types. Consequently, these challenges hinder the models from meeting the stringent requirements for precise cotton leaf disease detection.
To address the abovementioned issues, considering the constraints of YOLO series models in cotton leaf disease detection and the complexity of such detection scenarios, there is an urgent demand for innovative advances in research. The U-Net v2 model, which is well known in the field of medical image segmentation for its unique encoder–decoder structure and strong feature extraction capacities, has offered new perspectives [
22]. Drawing inspiration from these findings, the current study introduces the U-Net v2 module into the domain of cotton leaf disease detection. Leveraging the advanced YOLOv11 as the base architecture, the framework was refined through multiple approaches: integrating an attention mechanism to guide the model in focusing on critical disease features; comprehensively optimizing modules such as SPPF and C3k2 to enhance the network’s capacity for capturing disease-related information across diverse scales; and improving the activation function to strengthen the model’s nonlinear expression capability. Through a series of innovations and integrations, the ACURS-YOLO network was developed to address contemporary requirements, aiming to resolve the challenge of cotton leaf disease detection via novel architectural design and algorithmic optimization. The primary contributions of this research are outlined as follows:
A dataset comprising 3000 images of six typical cotton leaf diseases was built, and the effectiveness of the improved model was verified via data augmentation methods within a scientifically structured experimental setup.
To tackle the challenges of complicated background disturbances, failure to identify small target lesions, and inadequate adaptability to diverse disease types in cotton leaf disease identification, the U-Net v2 module from the medical image segmentation field was incorporated into the YOLOv11 core network. Furthermore, this SimSPPF component was engineered to replace the conventional SPPF, thus lowering computational load while maintaining multi-scale feature extraction capacity and boosting inference speed. This C3k2_RCM component was embedded in the neck network by integrating a rectangular self-calibration mechanism to strengthen long-range contextual modeling. Furthermore, the ARelu activation function was employed to alleviate gradient vanishing problems, thereby achieving simultaneous improvements in detection accuracy and training stability.
The ACURS-YOLO network is benchmarked against the YOLO series to evaluate and validate the disease detection capability and overall performance of the proposed model in cotton leaf disease scenarios.
The remainder of this paper is structured as follows:
Section 2 details the materials and methods, including dataset construction, the model architecture, and the experimental setup.
Section 3 presents the experimental results, including performance comparisons with classical models, ablation studies, and model testing in complex scenarios.
Section 4 discusses the implications of the results, limitations of the current study, and future research directions. Finally,
Section 5 summarizes the key findings and contributions of this work.
4. Discussion
In this research, we successfully developed an advanced cotton leaf disease identification model designated as ACURS-YOLO, which was built upon the improved YOLOv11 architecture. Experimental findings show that the model delivers superior performance. In comparison with classical models, ACURS-YOLO markedly surpasses the original YOLOv11 and other state-of-the-art object detection models with respect to mAP_0.5, mAP_0.5:0.95, precision, recall, and F1 score. This indicates that the improvement strategies implemented in the model are highly effective. From the perspective of the improved modules, the U-Netv2 module was integrated into the backbone network of YOLOv11 to enhance the model’s multi-scale feature extraction capability. The successful experience gained from medical image segmentation has proven to be applicable to cotton leaf disease detection, effectively improving the model’s ability to capture complex disease features. The integration of the CBAM attention mechanism allows the model to focus more effectively on critical disease features, thereby significantly improving detection accuracy. By utilizing both channel and spatial attention modules, the model achieves a more precise emphasis on disease-related regions. The SimSPPF module optimizes multi-scale feature fusion, thereby enhancing computational efficiency while maintaining accuracy and addressing the high computational latency and complexity issues inherent in the traditional SPPF module. The C3k2_RCM module enhances neck feature fusion, boosting the model’s capacity to detect multi-scale targets in complex environments and offsetting the contextual modeling constraints of the original C3k2 module. The ARelu activation function alleviates the gradient vanishing problem, thereby optimizing model training stability and detection integrity, and effectively reduces false negatives in challenging field environments [
42,
43].
Compared with prior studies, this study innovatively incorporated a variety of improvement strategies to tackle the complexity in cotton leaf disease detection, thereby addressing issues related to the high similarity and large detection errors associated with certain diseases inherent in traditional detection methods. Nevertheless, this study has certain limitations. Although the model’s recognition capability for complex backgrounds has been improved, there is still a risk of misclassification. Furthermore, model performance may be affected in extremely challenging environments, such as severely damaged leaves, significant occlusion, or extremely poor lighting conditions.
Future research may proceed as follows: First, the model architecture can be further optimized by investigating more effective feature extraction and fusion methods. For example, advanced attention mechanisms or neural network structures might be integrated to boost the model’s robustness in complex environments. Second, the dataset’s scale and diversity should be expanded to encompass images of leaf diseases from various growing conditions and cotton varieties, thus enhancing the model’s generalizability. Third, efforts could be made to deploy the model on mobile terminals while integrating additional sensor data, such as spectral and multimodal data, to enrich the information available for disease detection and thus improve the accuracy and reliability of the system.
5. Conclusions
Aiming to address the critical challenges in cotton leaf disease detection, such as complex background interference, missed detection of small target lesions, and insufficient generalization for multi-form diseases, this study proposes an ACURS-YOLO detection network that integrates medical image segmentation concepts with lightweight improvement strategies. This provides an efficient solution for automated disease monitoring in smart sensors. Built upon YOLOv11, the proposed network incorporates multi-scale feature enhancement, an adaptive attention mechanism, and a robust training strategy through cross-domain knowledge transfer and multi-module collaborative optimization. These enhancements significantly improve the model’s detection performance in real-world field environments.
Firstly, we developed a specialized dataset comprising six typical cotton leaf diseases (Cotton Leaf Spot, Cotton Leaf Curl, Cotton Brown Spot, Cotton White Mold, Cotton Verticillium Wilt, and Cotton Fusarium Wilt). Through field collection and data augmentation techniques, the dataset was expanded to include 3000 samples, encompassing various lighting conditions, degrees of leaf occlusion, and stages of disease progression. This dataset provides robust and reliable support for model training and validation. To address the limitations of traditional YOLO series models in small target detection and adaptability to complex backgrounds, this study innovatively incorporates the U-Net v2 module, which is commonly used in medical image segmentation, to reconstruct the backbone network. By utilizing its encoder–decoder framework, the model strengthens its multi-scale feature fusion ability, allowing for the effective capture of fine-grained textures and contextual associations of leaf disease lesions.
Meanwhile, through the integration of the CBAM attention module, the model can dynamically focus on disease regions, reduce interference from leaf texture and environmental noise, and boost the precision of feature selection. In the neck network architecture, the C3k2_RCM module is integrated to reinforce the modeling of long-range contextual dependencies via a rectangular self-calibration mechanism, thereby addressing the issue of the insufficient exploration of multi-scale target semantic correlations in the original model. The enhanced SimSPPF module employs a parallel pooling structure to reduce computational complexity while preserving multi-scale feature extraction capabilities, thus improving inference speed. Additionally, the ARelu activation function alleviates the vanishing gradient problem through adaptive residual learning and improves the training stability of the deep network model.
The experimental results demonstrate that ACURS-YOLO significantly outperforms the original YOLOv11 and classical object detection models, such as SSD and other established models, in key performance indicators. From a practical application standpoint, ACURS-YOLO exhibits remarkable generalization capabilities when evaluated in complex scenarios. Notably, the recognition accuracy for easily confused diseases has improved by 19.2% compared to the original model, while ACURS-YOLO maintains a rate of 148 FPS—close to YOLOv11’s 156 FPS—demonstrating its practical value in real-time agricultural monitoring. This advancement effectively addresses the issue of misclassification caused by high disease similarity in traditional approaches.
However, this study has certain limitations that warrant attention. While the model demonstrates strong performance in conventional field environments, its detection accuracy tends to decrease under extreme conditions, such as severe leaf occlusion, extremely low light levels, or the early miniaturization of disease spots. Furthermore, although the dataset encompasses six primary diseases, it lacks representation of disease characteristics specific to different cotton varieties (e.g., insect-resistant cotton and long-staple cotton), which could potentially compromise the model’s generalizability across diverse planting scenarios. Going forward, with additional polishing and tuning, the ACURS-YOLO network model is poised for wider use in agricultural image detection, thus promoting a shift toward enhanced intelligence and precision in agricultural production.