LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8

Feng, Haoran; Chen, Xiqu; Duan, Zhaoyan

doi:10.3390/agriculture15040421

Open AccessArticle

LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8

by

Haoran Feng

,

Xiqu Chen

^* and

Zhaoyan Duan

School of Electric & Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(4), 421; https://doi.org/10.3390/agriculture15040421

Submission received: 21 January 2025 / Revised: 11 February 2025 / Accepted: 14 February 2025 / Published: 17 February 2025

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

To address the challenges of detecting cotton pests and diseases in natural environments, as well as the similarities in the features exhibited by cotton pests and diseases, a Lightweight Cotton Disease Detection in Natural Environment (LCDDN-YOLO) algorithm is proposed. The LCDDN-YOLO algorithm is based on YOLOv8n, and replaces part of the convolutional layers in the backbone network with Distributed Shift Convolution (DSConv). The BiFPN network is incorporated into the original architecture, adding learnable weights to evaluate the significance of various input features, thereby enhancing detection accuracy. Furthermore, it integrates Partial Convolution (PConv) and Distributed Shift Convolution (DSConv) into the C2f module, called PDS-C2f. Additionally, the CBAM attention mechanism is incorporated into the neck network to improve model performance. A Focal-EIoU loss function is also integrated to optimize the model’s training process. Experimental results show that compared to YOLOv8, the LCDDN-YOLO model reduces the number of parameters by 12.9% and the floating-point operations (FLOPs) by 9.9%, while precision, mAP@50, and recall improve by 4.6%, 6.5%, and 7.8%, respectively, reaching 89.5%, 85.4%, and 80.2%. In summary, the LCDDN-YOLO model offers excellent detection accuracy and speed, making it effective for pest and disease control in cotton fields, particularly in lightweight computing scenarios.

Keywords:

deep learning; cotton pests and diseases; lightweight model; C2f

1. Introduction

Cotton is an important economic crop and textile material, playing a significant role in driving economic development and improving people’s well-being [1]. However, during the cotton cultivation process, various pests and diseases severely threaten its normal growth, such as leaf spot disease, aphids, bollworm, wilt disease, Fusarium wilt, gray mold, and leaf curl disease. These pests and diseases not only pose a challenge to the healthy growth of cotton, but also significantly reduce its yield and quality [2]. Thus, promptly and accurately detecting cotton pests and diseases in the field, along with correctly identifying their types, is essential for enhancing both cotton yield and quality [3].

Conventional approaches to detecting cotton pests and diseases primarily depend on manual observation [4]. This approach is not only inefficient, but also susceptible to human subjective factors, leading to inaccurate judgments. As cotton cultivation expands and the demand for agricultural intelligence grows, manual identification methods are becoming inadequate to meet the practical needs of large-scale pest and disease management. In this context, the advancement of automated recognition and detection technologies that are both cost-effective and highly efficient has emerged as a focal point of research within the field of agricultural intelligence [5].

Conventional approaches to detecting cotton pests and diseases primarily depend on manual observation [6]. Traditional image processing methods typically rely on manually designed disease and pest features, which are then classified using classifiers [7]. These methods typically include processes like image segmentation, edge detection, feature extraction, and classification. Although these methods can achieve certain results in specific scenarios, they face difficulty in meeting the conditions of complex natural environments, leading to limitations in speed and accuracy that cannot fulfill the practical demands of modern agricultural production.

In contrast, deep learning techniques offer considerable advantages in detecting plant pests and diseases. These algorithms can autonomously identify and learn intricate features, which allows them to outperform traditional image processing methods in detection accuracy [8]. These algorithms can learn from large-scale feature datasets and accurately detect crop pests and diseases with minimal human intervention. For instance, Qi et al. [9] incorporated the squeeze-and-excitation (SE) module, which utilizes an attention mechanism, into the YOLOv5 model. This enhancement enabled the model to focus on and extract features from crucial regions. As a result, the modified model exhibited robust performance during both the training and testing phases on the dataset. AtoleR et al. [10] used the AlexNet model to classify three types of rice diseases (including healthy leaves). LiY et al. [11] proposed a fine-tuned GoogLeNet model for identifying crop disease types, which also integrated several preprocessing techniques to adapt to complex backgrounds. Bao et al. [12] developed a compact version of the CADenseNetBC40 model, which is based on DenseNet. This model incorporates a coordinated attention mechanism into every submodule of the dense block, thereby improving the extraction of features associated with cotton aphid symptoms.

Object detection is a key area at the convergence of computer vision and deep learning. It involves both classifying objects within images and precisely determining their locations [13]. Among methods for object detection, the YOLO (You Only Look Once) algorithm can directly predict object categories from feature maps, showing significant advantages in both detection speed and accuracy, making it particularly suitable for detection tasks [14]. Researchers in this field have proposed many efficient detection models. For example, Li et al. [15] expanded the receptive field by quadrupling the down-sampling in the feature pyramid to better detect small objects, and incorporated the CBAM attention mechanism into the neural network to mitigate the gradient vanishing problem during training, thereby enhancing detection accuracy and robustness to interference. Chen et al. [16] introduced a novel Involution Bottleneck module, which reduces both the number of parameters and computational demands, while effectively capturing long-range spatial information. Additionally, an SE module was incorporated to increase the model’s sensitivity to channel-specific features, leading to significant improvement in both detection accuracy and processing speed. Zhang et al. [17] introduced an efficient channel attention mechanism, Hard-Swish activation function, and Focal Loss function in YOLOX to improve the model’s ability to extract image features and enhance detection accuracy and speed. Liu et al. [18] proposed the MRF-YOLO algorithm based on deep learning, reconstructing a deep convolutional network suitable for small-to-medium-sized target crop image scenes and complex detection backgrounds. By incorporating multi-scale modules into the backbone, adding multi-receptive field extraction modules, and fusing multi-level features, they improved small-object detection accuracy, achieving a balance between speed and precision. Liu et al. [19] used the DCNv3 structure to replace the ordinary convolution in the C2f module Bottleneck structure of YOLOv8n [20], calling it C2f-DCNv3. Secondly, adding high-efficiency channel attention after the last C2f-DCNv3 module of the head keeps real-time detection possible while improving model accuracy. Pan et al. [21] replaced the Bottleneck structure in the C2f module of the backbone network with Partial Convolution. In the neck network, a thin neck structure was employed, and the C2f module was substituted with the GSConv and VoVGSCSP modules. This approach effectively achieved a balanced trade-off between detection speed, accuracy, and model size.

Although deep learning-based crop pest and disease recognition shows ideal detection capabilities, applying these methods directly in practical production is challenging [22]. Crop pests and diseases vary in size and shape, and the detection process becomes more complex, especially under conditions with significant interference from complicated backgrounds. At the same time, compared to the powerful GPU devices typically used for training and evaluation, the computational capabilities of practical deployment devices are usually much lower [23]. Moreover, some deep learning models with high detection accuracy tend to have redundant model parameters, which hinder the improvement of detection efficiency in practical applications. Therefore, issues such as complex environmental backgrounds, diverse pest and disease types, and difficulties in deploying large-scale equipment pose significant challenges for crop pest and disease detection [24].

To overcome the aforementioned challenges, this paper introduces the LCDDN-YOLO algorithm, which is built upon the YOLOv8n framework. To adapt to the limited computational resources of devices in practical applications, some convolutional layers in the backbone network are replaced with DSConv, effectively reducing model parameters and computational complexity. The BiFPN network is integrated into the original architecture, incorporating learnable weights to quantify the importance of different input features. Additionally, we innovatively fuse DSConv with the PConv module in the neck network’s C2f module, resulting in an enhanced structure referred to as PDS-C2f. Furthermore, to improve the model’s success rate in detecting cotton pests and diseases, the CBAM attention mechanism is incorporated into the neck network to enhance feature representation across different channels and extract critical spatial information from various locations, thus improving model performance. Together, these optimizations improve detection accuracy, while simultaneously reducing both the number of parameters and the computational burden. This makes the deployment of the model on resource-limited devices feasible, and offers a practical solution for detecting cotton pests and diseases. The key contributions of this paper are summarized as follows:

1. A dataset consisting of 6712 images of cotton pests and diseases was established. We collected images of cotton pests and diseases from natural environments for model training, validation, and testing.

2. This paper introduces the innovative PDS-C2f module to develop a lightweight detection model. The network architecture is restructured by employing the lightweight BiFPN as the neck component of LCDDN-YOLO, and integrating the Convolutional Block Attention Module (CBAM), along with the new PDS-C2f module. By integrating features across various resolutions and scales, this approach enhances the model’s capacity to represent features more effectively. Additionally, it achieves an optimal balance between detection speed and accuracy.

3. The Lightweight Cotton Disease Detection in Natural Environment (LCDDN-YOLO) network is compared with the YOLO series. The comparison validates the model’s performance in detecting pests and diseases, particularly those that are challenging to differentiate.

2. Materials and Methods

2.1. Experimental Design

The implementation process of the cotton disease and pest detection system is shown in Figure 1. Initially, images of cotton diseases and pests from seven varieties, along with healthy leaf images, were captured using cameras. Subsequently, the raw images underwent data processing to eliminate and rectify erroneous, inconsistent, incomplete, and redundant data through dataset cleansing. Next, the images were labeled to generate disease and pest identification tags. The images were subsequently paired with their corresponding labels to form a dataset for cotton disease and pest detection. An LCDDN-YOLO model was then developed, and the dataset was used for both training and validation of the model. After training, the model was employed for detecting cotton diseases and pests. Ultimately, the LCDDN-YOLO model generated prediction results and predicted images with labeled boxes.

2.2. Preparation of the Dataset

Due to the limitations of existing public cotton pest and disease datasets, such as a small data size and simplistic image backgrounds, a more comprehensive dataset was created. The data were sourced from outdoor images of cotton pests and diseases taken by the authors. Because healthy cotton plants are widely distributed in cotton fields, it is easy to take clear and complete images. In contrast, cotton infected with pests and diseases may be more scattered, and the symptoms of pests and diseases may be obscured or inconspicuous, increasing the difficulty and cost of collection. Some diseases are difficult to take a large number of pictures of in the real environment, and some pictures were found on the Internet for expansion. The search terms for obtaining supplementary images from Google and Baidu were “leaf curl disease” and “gray mold”. The self-captured image data were obtained in Yichang, Hubei Province, under natural outdoor conditions, using a handheld smartphone. The image collection covered a range of weather conditions, including sunny, cloudy, and overcast days. These images were captured at various growth stages of cotton and from different field locations. The dataset includes seven common pests and diseases in China, along with healthy plants, as experimental subjects. These pests and diseases include leaf spot disease, aphids, bollworm, Fusarium wilt, wilt disease, gray mold, and leaf curl disease, as shown in Figure 2. After data cleaning of the captured images, errors, inconsistencies, incompleteness, and redundant data in the dataset were removed and repaired. After removing some invalid images from the captured images, the dataset consisted of 6,712 images, with specific distribution details provided in Table 1. After data collection, all images were annotated using the LabelImg image annotation tool, and the corresponding annotation files were generated according to the format required by the YOLO object detection algorithm.

2.3. YOLOv8 Object Detection Model

YOLOv8 is an advanced detection model that builds upon the principles established by earlier versions of YOLO. It delivers rapid and precise performance, showcasing remarkable proficiency in tasks including image classification, object detection and tracking, instance segmentation, and pose estimation [25]. The structure of the YOLOv8 network [26] is composed of three main components: the backbone, the neck, and the detection head. These components are illustrated in Figure 3. The backbone network is a series of convolutional layers that extract relevant features from the input image. These convolutional layers process the input feature maps by performing convolution operations. This enables the effective extraction of local features from the image, such as edges and textures, which are essential for the network’s interpretation of the image information. The C2f module integrates high-level features with contextual data to enhance detection accuracy. Meanwhile, the SPPF layer and the following convolutional layers handle features at multiple scales. The neck of YOLOv8 includes a PAA module and two PAN modules [27], employing multi-scale feature fusion technology to merge feature maps from different stages of the backbone network, enhancing feature representation capabilities. The detection head generates the final object detection results, including bounding box regression and classification. Optimizations in the detection head improve both detection accuracy and speed. Typically, the detection head includes multiple parallel prediction layers, each responsible for predicting objects at different scales. These prediction layers output the coordinates of the bounding boxes, confidence scores, and class probabilities.

2.4. LCDDN-YOLO Object Detection Model

This study aims to tackle the challenges associated with detecting cotton pests and diseases in natural environments by using the lightweight YOLOv8n model as the baseline. The model is optimized to enhance both detection accuracy and processing speed. To reduce the computational burden, traditional convolution operations in the backbone and neck are substituted with more efficient DSConv operations. Furthermore, the BiFPN architecture is integrated into the neck, which contributes to a reduction in the overall model size. This modification also enhances the fusion of critical features, improving the accuracy of cotton pest and disease identification. Additionally, the DSConv and PConv modules are combined within the C2f module of the neck network, resulting in an enhanced version of the C2f module, termed PDS-C2f. Following this improvement, a CBAM attention mechanism is incorporated to bolster the representation and extraction of crucial information across various channels and spatial locations within the feature space. The structure of the refined lightweight LCDDN-YOLO network model is illustrated in Figure 4.

2.4.1. Convolution Module Improvement

Convolution operations are essential in computer vision and deep learning, as they efficiently extract and process features through local perception and parameter sharing mechanisms [28]. However, when handling high-resolution or high-channel data, traditional convolutions generate a substantial number of parameters, which increases both computational costs and memory requirements. In contrast, Distributed Shift Convolution (DSConv) [29] offers a more efficient solution by reducing memory usage and accelerating computation. DSConv achieves this by decomposing the standard convolution kernel into two components: the Variable Quantized Kernel (VQK) and the distribution shift. By storing only integer values in the VQK, it minimizes memory usage and speeds up computation, while still producing outputs equivalent to the original convolution via kernel- and channel-based distribution shifts. The structure of DSConv is depicted in Figure 5.

The VQK, which serves as the quantization element of DSConv, enables faster and more memory-efficient multiplication operations. This tensor contains variable-length integer values and is sized identically to the original convolution tensor, with dimensions (cho, chi, k, k), where cho represents the number of output channels, chi is the number of input channels, and k denotes the kernel’s width and height. The distribution shift component of DSConv seeks to adjust the VQK distribution in order to mimic the distribution pattern of the original convolution kernel. This is accomplished through two tensors: the Kernel Distribution Shifter (KDS), which modifies the distribution in each (1, BLK, 1, 1) slice of the VQK—where BLK is a hyperparameter defining the block size—and the Channel Distribution Shifter (CDS), which shifts the distribution across each channel, affecting the (1, chi, k, k) slices.

2.4.2. C2f Module Improvement

In detection models, to improve detection accuracy, it is necessary to extract feature maps that are rich in semantic information from images. The C2f module is a key feature fusion module in YOLOv8, which enhances the gradient flow and information transfer through feature fusion, generating high-resolution and semantically rich feature maps. This significantly improves the accuracy and performance of the model [30]. However, the integration of the C2f module leads to an increase in the number of convolution operations. Consequently, this results in a notable rise in computational load. This poses challenges for deployment in resource-constrained devices. To achieve lightweight cotton disease detection, it is essential to reduce the model’s complexity and computational load, to enable efficient operation on resource-limited devices. Optimizing the convolution operators in the C2f module is an effective way to reduce computational overhead and improve model efficiency. Improving the C2f module allows for reduced computational resource consumption while maintaining high detection accuracy, making the model more suitable for embedded devices and other resource-constrained applications. In the original C2f structure, we improved its Bottleneck part by replacing the conventional convolution operators with DSConv and PConv [31]; we refer to the resulting part as PDSBlock. The PDS-C2f structure is shown in Figure 6. The PDS-C2f module offers several key advantages, which are primarily highlighted in the following aspects: (1) Improved Feature Extraction and Efficiency: By incorporating DSConv and PConv, the module significantly enhances feature extraction, while drastically reducing both computational complexity and memory access. The use of Partial Convolutions allows the model to focus on important features and minimize unnecessary calculations, resulting in more efficient data processing. (2) Enhanced Multi-Branch Design: The PDS-C2f module retains the multi-branch architecture of the original C2f module, splitting the input data into two branches. One branch directly passes the input data to the output, while the other processes the data through three PDSBlock modules before merging it with the output of the first branch. This structure reduces redundant computations and introduces diversity in feature representation, strengthening the model’s capacity to capture and express features. Although the PDSBlock modules maintain the same design, the features they process vary according to their position in the network and the inputs they receive. This variability enables each module to extract distinct features, further improving the model’s feature extraction capabilities and allowing it to capture finer details in the input images. (3) Seamless Integration and Lightweight Implementation: Serving as an optimized replacement for the original C2f module, the PDS-C2f module can be easily integrated into existing model architectures. This allows for a lightweight implementation that does not require major changes to the overall structure. The result is a more efficient model that maintains high flexibility and scalability.

PDSBlock consists of three modules: PConv, CBS, and DSConv. DSConv stands for Distributed Shift Convolution, and PConv represents Partial Convolution. The PConv structure is illustrated in Figure 7. It applies convolution to only one-quarter of the input channels, leaving the remaining three-quarters of the channels unchanged. Afterward, the output from the one-quarter of the channels is merged with the untouched three-quarters of the channels. Although three-quarters of the channels do not participate in the convolution, valuable information is extracted from these channels in the subsequent DSConv. This approach improves the efficiency of spatial feature extraction by reducing redundant computations and the concurrency of memory accesses.

2.4.3. Neck Network Improvement

The Feature Pyramid Network (FPN) is a crucial element in YOLOv8, primarily designed to combine feature maps across different scales, thereby improving the model’s ability to detect objects at multiple scales [32]. In the FPN, feature fusion is performed using a straightforward weighted approach, where it is assumed that all features contribute equally. In practice, however, input features often vary in resolution, and their relevance to the output features is inconsistent, which can affect the model’s detection accuracy. To address this issue, we integrate the Bidirectional Feature Pyramid Network (BiFPN) [33] into YOLOv8. Unlike the traditional FPN, the BiFPN introduces an additional pathway from high-resolution to low-resolution features, optimizing the feature fusion process. Furthermore, the BiFPN eliminates nodes that only receive input from a single node, making it more efficient and lightweight compared to PANet. The structures of the FPN and BiFPN are illustrated in Figure 8.

2.4.4. Attention Module Improvement

In computer vision, the attention mechanism refers to the ability to prioritize relevant regions of an image while ignoring less important areas. It is viewed as a dynamic process that selects crucial information, facilitating rapid and effective analysis of relevant details within complex scenes [34]. The Convolutional Block Attention Module (CBAM) [35] is a lightweight attention mechanism that integrates both channel and spatial attention. An illustration of the CBAM attention mechanism is provided in Figure 9.

In the channel attention module, the channel dimension remains unchanged, while the spatial dimension is reduced. This module prioritizes meaningful information from the input image. The feature map is processed through parallel MaxPool and AvgPool operations, compressing it from C

*

H

*

W to C

*

1

*

1 size. It then passes through a shared MLP module, where the channel number is initially reduced by a factor of 1/r (reduction), and later restored to its original size. After applying the ReLU activation function, two activated results are generated. These outputs are element-wise added and passed through a sigmoid activation function, producing the final channel attention output. This result is then multiplied by the original image, returning to the size of C*H*W.

The channel attention formula is as follows:

M_{c} (F) = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F)))

(1)

In the formula, F is the input feature map, AvgPool and MaxPool represent the global average pooling and maximum pooling operations, respectively,

M L P

represents the multilayer perceptron, and

σ

represents the sigmoid activation function.

In the spatial attention module, the spatial dimension remains unchanged, while the channel dimension is compressed. This module focuses on the positional information of the target. The output of the channel attention is passed through MaxPool and AvgPool to obtain two 1

*

H

*

W feature maps, which are then concatenated using a Concat operation. Afterward, a 7

*

7 convolution is applied to convert the feature maps into a single-channel feature map. This is followed by a sigmoid activation to obtain the Spatial Attention feature map. Finally, the output is multiplied by the original image to return to the size of C*H*W.

The spatial attention formula is as follows:

M_{s} (F) = σ (f^{7 * 7} ([A v g P o o l (F); M a x P o o l (F)]))

(2)

In the formula,

f^{7 * 7}

represents a 7 × 7 convolution operation, [AvgPool (F); MaxPool (F)] represents the average and maximum pooling results stitched together along the channel axis, and

σ

represents the sigmoid activation function.

2.4.5. Loss Function Improvement

The performance of object detection algorithms is easily influenced by the loss function. Intersection over Union (

I o U

) [36] is a commonly used metric to evaluate the performance of object detection models. The boundary box regression loss function used in YOLOv8 is Complete Intersection over Union (

C I o U

) [37]. Its calculation formula is as follows:

L_{C I o U} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(3)

v = \frac{4 {(a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})}^{2}}{π^{2}}

(4)

α = \frac{v}{1 - I O U + v}

(5)

In the formula,

I O U

represents the intersection over union between the Predicted Box and the True Box. b, w, and h represent the center coordinates, width, and height of the predicted box, respectively. c is the diagonal length of the smallest enclosing box that covers both the Predicted Box and the True Box.

b^{g t}

,

w^{g t}

, and

h^{g t}

represent the center coordinates, width, and height of the true box, respectively.

α a n d v

are hyperparameters.

Since

v

only reflects the aspect ratio difference, CIoU loss optimizes similarity in an unreasonable manner, which is detrimental to further learning and optimization of the model. To address this issue, the Focal-Enhanced Intersection over Union (Focal-EIoU) [38] loss function is introduced. The Focal-EIoU loss function takes into account three key metrics: the intersection area between bounding boxes, the distance between the centers of the bounding boxes, and the relative differences in their width and height. Additionally, it incorporates parameters that regulate the suppression of outliers. This design ensures that, even when the two bounding boxes do not overlap, the loss function continues to generate an effective gradient, thereby aiding the model’s training process and promoting convergence. The calculation formulas for

L_{E I o U}

are shown in Formulas (6) and (7).

L_{F o c a l - E I o U} = {I O U}^{γ} L_{E I o U}

(6)

L_{E I o U} = L_{I o U} + L_{d i s} + L_{a s p} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{{(w^{c})}^{2} + {(h^{c})}^{2}} + \frac{ρ^{2} (w, w^{g t})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{g t})}{{(h^{c})}^{2}}

(7)

In the formula,

I O U

denotes the Intersection over Union between the Predicted Box and the True Box. The parameter γ controls the extent of outlier suppression.

L_{I o U}

represents the overlap loss,

L_{d i s}

refers to the distance loss, and

L_{a s p}

indicates the width–height loss. The variables b, w, and h correspond to the center coordinates, width, and height of the predicted box, respectively. The value of c is the diagonal length of the smallest enclosing box that covers both the Predicted and True Boxes. The variables

b^{g t}, w^{g t}

, and

h^{g t}

represent the center coordinates, width, and height of the ground truth box. Additionally,

w^{c}

and

h^{c}

are the dimensions of the minimum bounding rectangle that encompasses both the predicted and true boxes, and ρ represents the Euclidean distance between them.

2.5. Experimental Environment

The hardware configuration of the training platform is detailed as follows: the central processing unit (CPU) is an 18 vCPU AMD EPYC 9754 128-Core Processor, while the graphics processing unit (GPU) is an RTX 4090D with 24 GB of memory. The system is equipped with 64 GB of operational memory, and the operating system is Windows 10 64-bit; the deep learning framework is PyTorch 1.10.0; and the programming language is Python with the compilation environment Python 3.8 and CUDA 11.3. The main parameter settings are shown in Table 2.

2.6. Evaluation Indicators

The relevant evaluation metrics selected for this model are as follows: Precision (P), Mean Average Precision (mAP), Recall (R), Model Parameters (Parameters), and Computational Cost FLOPs as indicators for assessing the model’s detection accuracy, with mAP evaluated at a threshold of 50%, mAP@50 [39]. The calculation expressions for each metric are as follows:

P = \frac{T P}{T P + F P}

(8)

A P = \int_{0}^{1} p (R) d R

(9)

m A P = \frac{\sum_{i = 0}^{n} A P (i)}{n}

(10)

R = \frac{T P}{T P + F N}

(11)

P a r a m e t e r s = \sum (K * K * C_{i n} * C_{o u t})

(12)

F L O P s = \sum (K * K * C_{i n} * C_{o u t} * H * W)

(13)

In the formula,

T P

represents the number of samples correctly labeled and detected as true;

F P

represents the number of samples incorrectly labeled as true, but actually false;

F N

represents the number of samples incorrectly labeled as false, but actually true;

A P

represents average precision;

p (R)

represents the precision value when the recall rate is

R

;

i

represents the number of categories;

K

represents the size of the convolution kernel;

C_{i n}

is the input size;

C_{o u t}

is the output size; and

H * W

represents the size of the output feature map.

3. Results

3.1. LCDDN-YOLO Model Performance

Figure 10 presents a comparison between the baseline model, YOLOv8n, and the enhanced LCDDN-YOLO model. The precision, recall, mAP@50, Params, and FLOPs were compared. The figure shows that the P and R values of both models exhibited significant fluctuations during the first 160 epochs, and after the next 40 epochs, the curves gradually stabilized, with the performance of the LCDDN-YOLO model consistently outperforming the baseline model. The final P and R values of the LCDDN-YOLO model were 89.5% and 80.2%, respectively. For the mAP@50 metric, there was a noticeable change in the curve during the first 130 epochs, and the curve gradually stabilized during the subsequent 70 epochs. The LCDDN-YOLO model’s curve consistently outperformed the baseline model, with the final mAP@50 value reaching 85.4%. Furthermore, a comparison was conducted between the LCDDN-YOLO and YOLOv8n models, focusing on parameters and FLOPs. The optimized LCDDN-YOLO model demonstrated a 12.9% decrease in the number of parameters and a 9.9% reduction in FLOPs. In conclusion, the LCDDN-YOLO model achieved further optimization while preserving the accuracy of the original model, leading to a reduction in the overall model size.

To validate the model’s generalization, a total of 24 images from three randomly selected sample groups in real-world and online scenarios were tested. To demonstrate the improvement of the model, the samples were validated in the YOLOv8n and LCDDN-YOLO networks. Figure 11 illustrates part of the actual detection results of YOLOv8n and LCDDN-YOLO. The results demonstrated that compared to YOLOv8n, in detecting diverse cotton diseases and pests under real-world conditions, YOLOv8n exhibited partial omissions (e.g., in the second image and last image) and lower recognition accuracy. The LCDDN-YOLO model enhanced detection capabilities in complex scenarios, achieved precise leaf identification with higher confidence, and effectively mitigated missed detections.

3.2. Comparison of C2f Module Performance

To evaluate the effectiveness of the proposed PDS-C2f module, a comparative analysis was performed between this module and the original C2f module incorporated into the model. The results of this comparison are provided in Table 3. The findings indicate that the model employing the PDS-C2f module outperformed the original C2f module across all evaluated metrics. Specifically, precision improved by 1%, reaching 85.9%, mean average precision (mAP) increased by 0.4% to 79.4%, and recall saw a 1.6% boost to 74.0%. Additionally, the PDS-C2f module reduced the parameter count by 0.3 M, bringing it down to 2.8 M, and lowered FLOPs by 0.7 G, resulting in a total of 7.4 G. These results highlight the significant improvements offered by the PDS-C2f module, particularly in reducing the model’s complexity, confirming its effectiveness.

3.3. Comparison of Loss Function Performance

To enhance both regression speed and accuracy, the Focal-EIoU, SIoU, and WIoU loss functions were chosen and compared with the CIoU loss function, which was used in the original model. The results of this comparison are presented in Table 4. Given that the choice of loss function had a negligible effect on the model parameters, these parameters are not included as evaluation metrics in the table. The comparison of these four loss functions reveals that the model utilizing the Focal-EIoU loss function outperformed the others across all metrics, achieving a precision of 84.9%, a mean average precision (mAP) of 81.9%, and a recall rate of 77.2%. Among the tested loss functions, Focal-EIoU, particularly when compared to the original IoU loss function, emerges as the most effective option for the model.

3.4. Ablation Experiments

To assess the contribution of each module to the model’s performance, ablation experiments were conducted. These experiments involved selectively adding or removing specific modules from the model. The results of these experiments are summarized in Table 5. As shown in the table, substituting the standard convolution with DSConv in the YOLOv8n network reduced the model’s parameter count by 9.6%, computational load by 4.9%, precision by 2.3%, mAP@50 by 2.6%, and recall by 2.5%. While the model’s complexity was significantly reduced, accuracy was still improved. Incorporating the improved C2f module resulted in a 9.6% reduction in the model’s parameter count, and an 8.6% decrease in computational load, along with a 1.0% increase in precision, a 0.5% rise in mAP@50, and a 1.6% gain in recall. These changes contributed to a significant overall performance improvement, underscoring the efficacy of the PDS-C2f module. When comparing models 1, 2, 3, 6, 7, 8, and 10, it is evident that applying both DSConv and PDS-C2f to the original model resulted in a 16.1% reduction in parameters, a 13.5% decrease in computational load, and slight improvements in precision, mAP@50, and recall. This indicates that both modules effectively preserve accuracy, while significantly reducing the model’s complexity and computational demands. The introduction of the Focal-EIoU loss function led to improvements of 3.0% in mAP@50 and 6.8% in recall, while keeping the number of parameters and computational load unchanged. This indicates that the Focal-EIoU loss function effectively enhances the model’s detection capabilities. The addition of the CBAM attention mechanism to the original model led to an increase in both parameter count and computational load, but also improved precision, mAP@50, and recall. Notably, the application of both the BiFPN network structure and the CBAM attention mechanism resulted in significant improvements in precision, mAP@50, and recall, as demonstrated by the comparison between models 11 and 15. These enhancements were achieved without altering the parameter count or computational load. This indicates that the combination of BiFPN and CBAM optimally enhances the model’s performance. Finally, by replacing the original modules with the improved convolution module, the C2f module, and the Focal-EIoU loss function, while also integrating the BiFPN network structure and the CBAM attention mechanism, the model achieved a 4.6% increase in precision, a 6.5% improvement in mAP@50, a 7.8% boost in recall, a 12.9% reduction in parameters, and a 9.9% decrease in FLOPs. These results demonstrate significant overall improvements compared to the original model.

3.5. Comparison with Mainstream Object Detection Models

To evaluate the effectiveness of the improved model, this study compares the proposed model with several current state-of-the-art object detection models, including YOLOv8n, YOLOv3-tiny [40], Faster-RCNN [41], YOLOv5n [42], YOLOv9t [43], YOLOX [44], and SSD [45]. The comparison results are shown in Table 6. In the comparison experiments with mainstream models, the proposed model outperforms others in precision, mAP@50, recall, FLOPs and Size, achieving 89.5%, 85.4%, 80.2%, 7.2G, and 7.6MB, respectively. In terms of parameters, the improved model performs second only to YOLOv9t, with a 3.8% higher performance than YOLOv9t. However, YOLOv9t has lower precision, mAP@50, and recall by 6.4%, 3.9%, and 5.4%, respectively, compared to the proposed model, and is behind our model in FPS and Size. In terms of FPS, the improved model underperformed YOLOv8n, YOLOv5n, and YOLOX, with reductions of 4.7%, 12.0%, and 6.8%, respectively. However, the proposed model outperforms YOLOv8n, YOLOv5n, and YOLOX in precision, mAP@50, recall rate, and model size. In conclusion, the improved model presented in this paper achieves the best overall performance, balancing detection accuracy and model complexity. It shows strong classification and detection capabilities for cotton pests and disease, with significant application potential.

Figure 12 presents a comparison line chart displaying the mean average precision (mAP@50) results for each model. In the same experimental environment and with identical parameter settings, the improved model proposed in this paper consistently outperforms the other models. This indicates that the enhanced model demonstrates faster convergence, extracts more complex features, and outperforms other models in cotton pest and disease detection.

4. Discussion

4.1. Main Features of Proposed Method

Deep learning models need a lot of computing power to extract features accurately. This is a big challenge for embedded devices with limited resources. The problem is especially clear when using these models on devices like agricultural inspection robots. These devices often cannot handle such heavy computations in a given time. Some detection networks can provide high-precision results. However, their large size and complex calculations put too much pressure on the devices. On the other hand, lightweight detection models are faster and require less computing power, but they usually have lower accuracy, which affects performance. Therefore, when using cotton disease detection models on agricultural robots or other devices with limited resources, a key issue is determining how to make the model lightweight while keeping high detection accuracy.

The LCDDN-YOLO algorithm proposed in this paper is a lightweight approach for cotton pest and disease detection, built upon the YOLOv8n model. This model incorporates several lightweight modules, including DSConv, PDS-C2f, and BiFPN. The advantages of the LCDDN-YOLO model are as follows:

1. The model’s parameter count and floating-point operations are reduced to 2.7 M and 7.2 G, respectively, making it suitable for deployment in agricultural inspection robots and other resource-constrained agricultural devices.

2. In terms of detection accuracy, the LCDDN-YOLO model achieves precision, mAP@50, and recall of 89.5%, 85.4%, and 80.2%, respectively. The LCDDN-YOLO model strikes a balance between improving accuracy and simplifying model efficiency, providing technical support for precision agriculture management.

4.2. Future Work

The experimental results demonstrate that the proposed model exhibits broad application prospects. In practical implementations, the LCDDN-YOLO model can be seamlessly integrated with diverse data acquisition and processing tools to achieve efficient cotton pest and disease detection. Firstly, data collection can be conducted through agricultural inspection robots or unmanned aerial vehicles equipped with high-resolution cameras. These devices autonomously navigate cotton fields to capture real-time images of cotton plants, transmitting the image data to a central processing system via wireless networks. The central processing system, deployable on cloud-based or local servers, utilizes the LCDDN-YOLO model for real-time image analysis. The model’s high efficiency and lightweight design enable operation on resource-constrained devices, while maintaining superior detection accuracy in complex natural environments. Secondly, the LCDDN-YOLO model can be integrated with existing agricultural management systems. Through API interfaces, detection results are directly transmitted to management platforms. Administrators can visualize these results through graphical interfaces and implement appropriate control measures based on model-generated recommendations. Furthermore, the model can be synchronized with automated pesticide spraying equipment to enable targeted pesticide application, thereby reducing chemical usage and environmental contamination. Finally, to enhance model performance continuously, newly acquired field data can be periodically incorporated into training processes for model optimization and updates. This closed-loop data processing workflow not only improves the model’s generalization capability, but also adapts to pest/disease variations across regions and seasons, ensuring the long-term effectiveness of the detection system.

5. Conclusions

Detecting cotton pests and diseases in natural environments presents significant challenges, particularly due to the similarities between the features of these pests and diseases. To address these issues, this study introduces an enhancement to the YOLOv8 algorithm, resulting in an improved version named LCDDN-YOLO. The LCDDN-YOLO model first replaces a subset of traditional convolutional layers in the backbone network with DSConv layers. This modification leads to a significant reduction in both the model’s parameter count and computational complexity, while maintaining or even enhancing its ability to capture features. Consequently, LCDDN-YOLO achieves an optimal balance between detection speed and accuracy, making it a suitable and efficient solution for deployment on embedded devices. Furthermore, the model incorporates the BiFPN structure into its architecture, which enhances its feature representation capabilities and further improves detection accuracy. This addition allows the model to better identify cotton pests and diseases, even in complex environments with varying scales and backgrounds. Additionally, the LCDDN-YOLO algorithm integrates DSConv and PConv modules into the neck network’s C2f module, creating a new module called PDS-C2f. This modification is designed to improve the network’s feature extraction capacity and strengthen multi-level feature fusion, leading to improved identification of cotton pests and diseases. To further enhance the model’s ability to express features, the LCDDN-YOLO model also incorporates the CBAM attention mechanism into the neck network of the YOLOv8n model. These combined improvements increase the model’s sensitivity to subtle differences, reducing both false positives and false negatives in more complex environments. LCDDN-YOLO also uses the Focal-EIoU loss function. This helps to improve the model’s training process. As a result, the model becomes more accurate and robust in detecting cotton pests and diseases. Experimental results show that LCDDN-YOLO outperforms the traditional YOLOv8 model. Compared to YOLOv8n, the LCDDN-YOLO model reduces the parameter count by 12.9% and FLOPs by 9.9%, while increasing precision, mAP@50, and recall by 4.6%, 6.5%, and 7.8%, respectively, allowing them to reach 89.5%, 85.4%, and 80.2%. These results indicate that the LCDDN-YOLO model, with its excellent detection accuracy and efficient computational speed, is already capable of meeting practical application demands, particularly in lightweight computing scenarios. In the future, with further optimization and adjustments, the LCDDN-YOLO model is expected to be applied in a broader range of agricultural detection fields and provide strong technical support for real-time detection and precise control of cotton field pests and diseases.

Author Contributions

Conceptualization, X.C.; methodology, X.C and H.F.; software, H.F.; validation, H.F.; formal analysis, H.F.; investigation, X.C and H.F.; resources, X.C and H.F.; data curation, X.C and H.F.; writing—original draft preparation, H.F and Z.D.; writing—review and editing, X.C.; visualization, H.F and Z.D.; supervision, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Program of Hubei Provincial Department of Education, grant number B2020067.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Thanks to all of the authors cited in this article and the referees for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

All abbreviations in this article are explained as follows.

LCDDN	Lightweight Cotton Disease Detection in Natural Environment.
YOLO	You Only Look Once.
DSConv	Distributed Shift Convolution.
BiFPN	Bidirectional Feature Pyramid Network.
PConv	Partial Convolution.
C2f	Faster implementation of CSP Bottleneck with two convolutions.
CBAM	Convolutional Block Attention Module.
Focal-EIoU	Focal-Enhanced Intersection over Union.
FLOPs	Floating point operations per second.
mAP@50	Mean Average Precision (IoU = 50%).
SE	Squeeze-and-excitation networks.
SPPF	Spatial Pyramid Pooling with Feature Concatenation.
PAA	Probabilistic Anchor Assignment.
PAN	Path Aggregation Network.
VQK	Variable Quantized Kernel.
KDS	Kernel Distribution Shifter.
CDS	Channel Distribution Shifter.
CBS	Class-Balanced Sampling.
FPN	Feature Pyramid Network.
IoU	Intersection over Union.

References

Yomgirovna, R.G. Formation of cotton crop elements. Eur. J. Mod. Med. Pract. 2023, 3, 113–115. [Google Scholar]
Huang, G.; Huang, J.-Q.; Chen, X.-Y.; Zhu, Y.-X. Recent advances and future perspectives in cotton research. Annu. Rev. Plant Biol. 2021, 72, 437–462. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Wahid, A.; Ahmad, M.; Tahir, M.T.; Ahmed, M.; Ahmad, S.; Hasanuzzaman, M. World cotton production and consumption: An overview. In Cotton Production and Uses; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
Zekiwos, M.; Bruck, A. Deep Learning-Based Image Processing for Cotton Leaf Disease and Pest Diagnosis. J. Electr. Comput. Eng. 2021, 2021, 9981437. [Google Scholar]
Li, L.; Zhang, S.; Wang, B. Plant disease detection and classification by deep learning—A review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
Sujatha, R.; Chatterjee, J.M.; Jhanjhi, N.; Brohi, S.N. Performance of deep learning vs. machine learning in plant leaf disease detection. Microprocess. Microsyst. 2021, 80, 103615. [Google Scholar] [CrossRef]
Choudhary, G.; Sethi, D. From conventional approach to machine learning and deep learning approach: An experimental and comprehensive review of image fusion techniques. Arch. Comput. Methods Eng. 2023, 30, 1267–1304. [Google Scholar] [CrossRef]
Anubha Pearline, S.; Sathiesh Kumar, V.; Harini, S. A study on plant recognition using conventional image processing and deep learning approaches. J. Intell. Fuzzy Syst. 2019, 36, 1997–2004. [Google Scholar] [CrossRef]
Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
Atole, R.R.; Park, D. A multiclass deep convolutional neural network classifier for detection of common rice plant anomalies. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 67–70. [Google Scholar]
Li, Y.; Wang, H.; Dang, L.M.; Sadeghi-Niaraki, A.; Moon, H. Crop pest recognition in natural scenes using convolutional neural networks. Comput. Electron. Agric. 2020, 169, 105174. [Google Scholar] [CrossRef]
Bao, W.; Cheng, T.; Zhou, X.-G.; Guo, W.; Wang, Y.; Zhang, X.; Qiao, H.; Zhang, D. An improved DenseNet model to classify the damage caused by cotton aphid. Comput. Electron. Agric. 2022, 203, 107485. [Google Scholar] [CrossRef]
Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Li, R.; Wu, Y. Improved YOLO v5 wheat ear detection algorithm based on attention mechanism. Electronics 2022, 11, 1673. [Google Scholar] [CrossRef]
Chen, Z.; Wu, R.; Lin, Y.; Li, C.; Chen, S.; Yuan, Z.; Chen, S.; Zou, X. Plant disease recognition model based on improved YOLOv5. Agronomy 2022, 12, 365. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, B.; Hu, Y.; Li, C.; Li, Y. Accurate cotton diseases and pests detection in complex background based on an improved YOLOX model. Comput. Electron. Agric. 2022, 203, 107484. [Google Scholar] [CrossRef]
Liu, Q.; Zhang, Y.; Yang, G. Small unopened cotton boll counting by detection with MRF-YOLO in the wild. Comput. Electron. Agric. 2023, 204, 107576. [Google Scholar] [CrossRef]
Runfei, L. Cotton pest detection algorithm based on improved YOLOv8. Agric. Eng. 2024, 14, 42–47. [Google Scholar]
Ultralytics. Available online: https://docs.ultralytics.com/ (accessed on 10 January 2023).
Pan, P.; Shao, M.; He, P.; Hu, L.; Zhao, S.; Huang, L.; Zhou, G.; Zhang, J. Lightweight cotton diseases real-time detection model for resource-constrained devices in natural environments. Front. Plant Sci. 2024, 15, 1383863. [Google Scholar] [CrossRef] [PubMed]
Shoaib, M.; Shah, B.; Ei-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An advanced deep learning models-based plant disease detection: A review of recent research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Sharma, V.; Tripathi, A.K.; Mittal, H. Technological advancements in automated crop pest and disease detection: A review & ongoing research. In Proceedings of the 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS), Kochi, India, 23–25 June 2022; pp. 1–6. [Google Scholar]
Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 17 May 2024; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Yang, R.; Yuan, D.; Zhao, M.; Zhao, Z.; Zhang, L.; Fan, Y.; Liang, G.; Zhou, Y. Camellia oleifera Tree Detection and Counting Based on UAV RGB Image and YOLOv8. Agriculture 2024, 14, 1789. [Google Scholar] [CrossRef]
Gong, Y.; Yu, X.; Ding, Y.; Peng, X.; Zhao, J.; Han, Z. Effective fusion factor in FPN for tiny object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1160–1168. [Google Scholar]
Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Nascimento, M.G.D.; Fawcett, R.; Prisacariu, V.A. Dsconv: Efficient convolution operator. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5148–5157. [Google Scholar]
Hussain, M. Yolov1 to v8: Unveiling each variant—A comprehensive review of yolo. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 658–666. [Google Scholar]
Du, S.; Zhang, B.; Zhang, P.; Xiang, P. An improved bounding box regression loss function based on CIOU loss for multi-scale object detection. In Proceedings of the 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML), Chengdu, China, 16–18 July 2021; pp. 92–98. [Google Scholar]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, Y.; Xu, X.; Liu, L.; Yue, J.; Ding, R.; Lu, Y.; Liu, J.; Qiao, H. GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing. Remote Sens. 2024, 16, 3046. [Google Scholar] [CrossRef]
Gong, H.; Li, H.; Xu, K.; Zhang, Y. Object detection based on improved YOLOv3-tiny. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 3240–3245. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Paris, France, 26–27 March 2025; pp. 1–21. [Google Scholar]
Liu, W.; Zhai, Y.; Xia, Y. Tomato leaf disease identification method based on improved YOLOX. Agronomy 2023, 13, 1455. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14, 2016. pp. 21–37. [Google Scholar]

Figure 1. The implementation process of the cotton disease and pest detection system.

Figure 2. A selection of images from the dataset.

Figure 3. The structure of the YOLOv8 network.

Figure 4. The structure of the LCDDN-YOLO network.

Figure 5. The structure of DSConv.

Figure 6. Structure of PDS-C2f.

Figure 7. The structure of PConv.

Figure 8. The structures of the FPN and BiFPN.

Figure 9. The structure of the CBAM.

Figure 10. Results comparison of YOLOv8n and LCDDN-YOLO.

Figure 11. Actual detection results of YOLOv8n and LCDDN-YOLO: (a) original images, (b) YOLOv8n detection results, (c) LCDDN-YOLO detection results.

Figure 12. Comparison of LCDDN-YOLO and other excellent models.

Table 1. Dataset of various types of cotton diseases and pests.

Category	Number of Training Sets	Number of Test Sets	Total
leaf spot disease	914	215	1129
aphids	528	146	674
bollworm	550	125	675
Fusarium wilt	718	176	894
wilt disease	639	136	775
gray mold	594	98	692
leaf curl disease	508	107	615
healthy	1013	245	1258
total	5464	1248	6712

Table 2. Main parameter settings.

Parameter	Values
Initial learning rate	0.01
Weight decay	0.005
Epoch	200

Table 3. Comparison of performance of different C2f models.

Module	P(%)	mAP@50(%)	R(%)	Parameters/M	FLOPs/G
C2f	84.9	78.9	72.4	3.1	8.1
PDS-C2f	85.9	79.4	74.0	2.8	7.4

Table 4. Comparison of performance of different loss functions.

Loss Function	P(%)	mAP@50 (%)	R(%)
CIoU	84.9	78.9	72.4
Focal-EIoU	84.9	81.9	77.2
SIoU	84.5	78.4	75.9
WIoU	81.0	79.7	75.7

Table 5. Results of ablation experiments.

Number	DSConv	PDS-C2f	BiFPN	CBAM	Focal-EIoU	P(%)	mAP@50 (%)	R(%)	Parameters/M	FLOPs/G
1						84.9	78.9	72.4	3.1	8.1
2	√					87.2	81.5	74.9	2.8	7.7
3		√				85.9	79.4	74.0	2.8	7.4
4			√			85.5	80.0	72.3	3.1	8.1
5				√		87.9	81.2	73.3	3.2	8.2
6					√	84.9	81.9	77.2	3.1	8.1
7		√			√	85.9	82.4	78.8	2.8	7.4
8	√				√	87.2	84.5	79.7	2.8	7.7
9				√	√	87.9	84.2	78.1	3.2	8.2
10	√	√			√	85.7	82.1	77.6	2.6	7.0
11		√		√	√	84.1	82.5	77.1	2.9	7.5
12		√	√		√	86.5	83.5	78.7	2.8	7.4
13	√	√		√	√	84.7	82.9	78.4	2.7	7.1
14	√		√	√	√	85.7	81.6	74.6	2.8	7.5
15		√	√	√	√	87.3	84.1	79.5	2.9	7.5
16	√	√	√	√	√	89.5	85.4	80.2	2.7	7.2

Table 6. Comparison of performance of different object detection models.

Method	P(%)	mAP@50(%)	R(%)	Parameters/M	FLOPs/G	FPS/(frame·s⁻¹)	Size/MB
LCDDN-YOLO	89.5	85.4	80.2	2.7	7.2	161	7.6
YOLOv8n	84.9	78.9	72.4	3.1	8.1	169	8.7
YOLOv3-tiny	35.8	44.0	54.7	12.1	18.9	142	17.0
Faster-RCNN	76.9	73.8	69.2	137.1	370.2	124	108.7
YOLOv5n	76.4	79.0	77.9	4.1	7.3	183	8.6
YOLOv9t	83.1	81.5	74.8	2.6	10.7	69	24.0
YOLOX	78.3	76.6	71.4	5	7.6	176	70.3
SSD	69.5	60.0	56.9	26.3	62.8	133	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, H.; Chen, X.; Duan, Z. LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8. Agriculture 2025, 15, 421. https://doi.org/10.3390/agriculture15040421

AMA Style

Feng H, Chen X, Duan Z. LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8. Agriculture. 2025; 15(4):421. https://doi.org/10.3390/agriculture15040421

Chicago/Turabian Style

Feng, Haoran, Xiqu Chen, and Zhaoyan Duan. 2025. "LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8" Agriculture 15, no. 4: 421. https://doi.org/10.3390/agriculture15040421

APA Style

Feng, H., Chen, X., & Duan, Z. (2025). LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8. Agriculture, 15(4), 421. https://doi.org/10.3390/agriculture15040421

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LCDDN-YOLO: Lightweight Cotton Disease Detection in Natural Environment, Based on Improved YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. Preparation of the Dataset

2.3. YOLOv8 Object Detection Model

2.4. LCDDN-YOLO Object Detection Model

2.4.1. Convolution Module Improvement

2.4.2. C2f Module Improvement

2.4.3. Neck Network Improvement

2.4.4. Attention Module Improvement

2.4.5. Loss Function Improvement

2.5. Experimental Environment

2.6. Evaluation Indicators

3. Results

3.1. LCDDN-YOLO Model Performance

3.2. Comparison of C2f Module Performance

3.3. Comparison of Loss Function Performance

3.4. Ablation Experiments

3.5. Comparison with Mainstream Object Detection Models

4. Discussion

4.1. Main Features of Proposed Method

4.2. Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI