1. Introduction
As a major economic crop, cotton plays a crucial role in supplying raw material for the textile industry [
1,
2]. Verticillium wilt, caused by
Verticillium dahlia Kleb., is a common and highly damaging disease during cotton growth, causing wilting, yellowing, and death of the leaves, which leads to reduced cotton yield and quality [
3,
4]. In terms of pathogens, the Verticillium genus is a soil-borne plant pathogen that causes wilt in temperate and subtropical regions, capable of infecting over 200 plant species [
5]. Traditional detection of Verticillium wilt mainly relies on manual visual inspection, which is subjective and inefficient. Establishing an efficient and accurate monitoring system is essential for large-scale detection, thus improving disease prevention and control efficiency [
6,
7,
8].
In recent years, plant disease recognition based on RGB images and deep learning has become a key direction in agricultural intelligence research. RGB images are easy to obtain and cost-effective, especially when captured rapidly through mobile devices such as smartphones, making this method highly practical and promising for field applications [
9,
10,
11]. The widespread application of machine and deep learning technologies has significantly improved the accuracy and efficiency of physiological status and disease recognition of the plants [
12,
13,
14,
15]. After integrating deep learning algorithms, inexpensive and readily available RGB images have achieved high accuracy on various standard datasets [
16,
17,
18,
19]. A variety of deep learning paradigms have been explored, including Transformer-based architectures (e.g., DETR and its variants) and segmentation-based methods (e.g., U-Net, DeepLab, Mask R-CNN) [
20,
21,
22,
23]. While Transformer models excel at capturing global contextual information and segmentation methods provide pixel-level delineation of lesions, both approaches usually require large training datasets and high computational resources, which constrain their applicability in real-time field scenarios. By contrast, YOLO-based object detection achieves a favorable balance between accuracy, speed, and computational efficiency, making it more suitable for lightweight, real-time applications in agricultural scenarios.
With the development of deep learning, YOLO-based object detection algorithms have demonstrated high accuracy and robustness in disease recognition, offering fast detection speeds and ease of optimization [
24,
25,
26,
27,
28]. Sun et al. [
29] constructed a pest recognition model, YOLO-PEST, based on YOLOv8n, which outperforms the original YOLOv8n model in detection accuracy, with a 3.46% increase in mAP50 and a 7.81% improvement in recall (R). In recent years, researchers have made various improvements to the YOLO model to tackle small-scale or overlapping disease spot recognition tasks in complex environments, such as incorporating attention mechanisms, dynamic convolutional structures, and optimizing loss functions. These optimizations have significantly enhanced the model’s detection accuracy [
30,
31,
32]. Baek et al. [
33] introduced an improved AppleStem-YOLO model that incorporates ghost bottlenecks together with global attention to achieve apple stem segmentation. This modification decreased the overall model parameters while boosting efficiency in computation. Xue et al. [
34] introduced the PEW-YOLO lightweight detection model to address the issue of low detection efficiency for citrus pests and diseases. By optimizing the PP-LCNet backbone network, introducing the lightweight PGNet backbone, replacing the original C2f module with an integrated multi-scale attention-enhanced C2f_EMA module, and using the Wise-IoU loss function, they improved mAP50 by 1.8% and reduced the model’s parameters by 32.2%, meeting real-time detection requirements. Meng et al. [
35] proposed the YOLO-MSM maize leaf disease detection algorithm, which integrates multi-scale variable kernel convolutions, develops the C2F-SK module for optimized feature extraction and representation, and uses MPDIOU to optimize the loss function, achieving increases of 0.66% in precision (P) and 1.61% in R compared to the baseline algorithm.
Cotton Verticillium wilt exhibits a variety of visual features, such as leaf curling, yellowing, and spot distribution, at different growth stages of cotton, making it a typical small-object disease. Therefore, the recognition of cotton Verticillium wilt is easily affected by factors such as light variation, leaf occlusion, and background interference [
6,
36]. Although recently emerging YOLO variants (such as LCDDN-YOLO [
37] and CDDLite-YOLO [
38]) have demonstrated promising performance in cotton disease detection, their application for automated detection of cotton Verticillium wilt still faces numerous challenges. Furthermore, most existing studies have not adequately addressed the trade-off between detection accuracy and lightweight deployment, which is crucial for large-scale practical applications.
To address the aforementioned challenges, this study proposes an intelligent disease recognition method based on RGB images of cotton Verticillium wilt. By employing an enhanced YOLO-MSPM model, it achieves precise lesion localization and lightweight deployment under complex background conditions. The YOLO-MSPM model incorporates the MobileNetV4 architecture into its backbone network, enhancing multi-scale feature extraction capabilities while meeting lightweight requirements. To address the diversity of disease features, this study incorporates the single-head self-attention (SHSA) mechanism and pinwheel-shaped convolution (PConv). This leads to the development of novel cross stage partial with single-head self-attention (C2PSHSA) and PC3k2 modules, which enhance the model’s ability to capture disease features, thereby ensuring precise lesion localization. Additionally, mobile inverted bottleneck convolution (MBConv) is introduced in the detection head to further improve the model’s accuracy in predicting lesion bounding boxes. To validate the model’s effectiveness, this study compares YOLO-MSPM with multiple YOLO series models, RetinaNet, and EfficientDet in terms of performance. Through ablation experiments, the contributions of each module are analyzed. With these innovative designs, the YOLO-MSPM model provides an effective solution for efficient identification and lightweight deployment in cotton Verticillium wilt detection.
4. Conclusions
This study developed a lightweight cotton Verticillium wilt detection model, YOLO-MSPM, based on the YOLOv11n model. The model integrates a lightweight feature extraction network, MobileNetV4, into the backbone network and replaces the original attention mechanism in the C2PSA module with the SHSA mechanism. In addition, PConv was introduced in the C3k2 module, and the MBConv module was added to both the classification and regression heads of the YOLOv11n detection head. These optimizations effectively improved the model’s accuracy in detecting cotton Verticillium wilt and significantly reduced the computational burden. The performance of the improved YOLO-MSPM model achieved a P of 0.933, R of 0.920, mAP50 of 0.970, and mAP50-95 of 0.797. Compared to YOLOv11n, the model showed improvements of 1.413%, 2.222%, 1.891%, and 3.238%, respectively, and outperformed the YOLOv5n, YOLOv6n, YOLOv8n, and YOLOv10n models, as well as the RetinaNet and EfficientDet models. The parameter size of YOLO-MSPM was only 1.773 M, a 31.332% reduction compared to YOLOv11n, with GFLOPs of 5.4 and a model size of 4.0 MB.
Results from this study demonstrate that the YOLO-MSPM model performs excellently in identifying cotton Verticillium wilt. Future research will explore the adaptability and performance of the YOLO-MSPM model on datasets collected from different regions with varying climate and soil conditions, diverse shooting environments, and multiple types of acquisition devices. Additionally, the model architecture will be further optimized to enhance its target recognition capabilities in complex environments, enabling stable and efficient operation in diverse and dynamic natural settings. Furthermore, the model will be continuously improved with reference to current state-of-the-art plant disease detection methods and deployed in real world field environments to systematically evaluate its detection performance and applicability under practical conditions.