A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection

Song, Deao; Peng, Yiran; Gu, Xinyuan; U, KinTak

doi:10.3390/math13244002

Open AccessArticle

A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection

¹

Faculty of Humanities and Arts, Macau University of Science and Technology, Macau 999078, China

²

Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(24), 4002; https://doi.org/10.3390/math13244002

Submission received: 27 November 2025 / Revised: 11 December 2025 / Accepted: 15 December 2025 / Published: 16 December 2025

(This article belongs to the Special Issue Intelligent Mathematics and Applications)

Download

Browse Figures

Versions Notes

Abstract

To address the pressing need for accurate and efficient detection of corn diseases, we propose a novel, lightweight object detection framework, CBS-YOLOv8 (C2f-BiFPN-SCConv YOLOv8), which builds upon the YOLOv8 architecture to enhance performance for corn disease detection. The model incorporates two key components, the GhostNetV2 block and SCConv (Selective Convolution). The GhostNetV2 block improves feature representation by reducing computational complexity, while SCConv optimizes convolution operations dynamically, adjusting based on the input to ensure minimal computational overhead. Together, these features maintain high detection accuracy while keeping the network lightweight. Additionally, the model integrates the C2f-GhostNetV2 module to eliminate redundancy, and the SimAM attention mechanism improves lesion-background separation, enabling more accurate disease detection. The Bi-directional Feature Pyramid Network (BiFPN) enhances feature representation across multiple scales, strengthening detection across varying object sizes. Evaluated on a custom dataset of over 6000 corn leaf images across six categories, CBS-YOLOv8 achieves improved accuracy and reliability in object detection. With a lightweight architecture of just 8.1M parameters and 21 GFLOPs, it enables real-time deployment on edge devices in agricultural settings. CBS-YOLOv8 offers high detection performance while maintaining computational efficiency, making it ideal for precision agriculture.

Keywords:

deep learning; corn disease detection; YOLOv8; GhostNetV2; SimAM attention mechanism

MSC:

68U10; 68T45; 68T07

1. Introduction

With the continuous growth of the global population, ensuring food security has become a critical global challenge [1]. Corn is a key cereal crop essential to global agriculture and food supply. During growth, diseases like leaf spot and pests such as corn borer often damage plants and reduce yields [2]. If these issues are not accurately identified and effectively controlled at an early stage, they may result in substantial economic losses for farmers and pose serious threats to agricultural productivity [3]. Consequently, developing reliable and efficient methods for identifying diseases at an early stage helps improve corn yield and supports sustainable agriculture.

Traditional crop disease detection methods rely predominantly on handcrafted feature extraction, particularly focusing on identifying color, texture, and morphological characteristics of disease spots [4]. Studies have shown that separating the green channel from the RGB color model can help identify rice blast and wheat leaf rust. Applying Sobel edge detection further improves the identification process in controlled scenarios [5]. Such approaches perform well in simple environments. However, they often struggle in real-world field conditions. Lighting variations, occlusions, and background clutter introduce significant noise. As a result, their robustness and generalization ability are limited.

In recent years, deep learning technologies like CNNs, RNNs, and GANs have improved crop disease and pest detection systems. They help extract features better, make models more robust, and increase detection accuracy and efficiency [6]. CNNs automatically extract hierarchical visual features from raw images, capturing complex patterns in disease spots that traditional handcrafted features often miss [7]. RNNs analyze temporal changes in crop growth or disease progression. GANs generate high-quality synthetic images to address limited annotated datasets [8]. Deep learning outperforms traditional methods in feature extraction and adaptability. It shows strong performance in dynamic agricultural settings with varying light, occlusion, and noise. Additionally, attention mechanisms and multi-scale feature aggregation enhance its use in smart agriculture, precision monitoring, and automated plant health management.

Detection models in modern deep neural architectures generally evolve along two major paradigms: the two-stage framework and the single-stage scheme [9]. For example, Chen et al. [10] proposed an improved YOLOv8n method. They replaced backbone modules, introduced attention mechanisms, and used a lightweight detection head with WIoU loss. This approach enhances detection accuracy while reducing computational cost. Xu et al. [11] enhanced YOLOv5 with attention modules for tea pest and disease detection, achieving an accuracy of 82.6%. A modified YOLOv5 model has been effectively applied to farmland pest detection, achieving an accuracy of 80.91% [12]. Similarly, another enhanced YOLOv5 variant attained a remarkable 98.3% accuracy in wheat scab detection [13]. Although these models demonstrate strong detection performance, their high demands often hinder deployment in resource-constrained agricultural environments. Therefore, achieving high accuracy with reduced model complexity and computational cost remains an urgent challenge in practical crop disease detection.

Addressing challenges in accuracy, efficiency, and practical deployment for corn disease detection, the proposed CBS-YOLOv8 introduces the following innovations:

C2f-GhostNetv2 hybrid backbone: Combines multi-branch feature extraction of C2f with efficient GhostNetV2, incorporating lesion-aware calibration to preserve fine-grained lesion details while remaining lightweight.
SimAM-guided feature enhancement: Embeds SimAM into GhostNetV2 to emphasize lesion-relevant features and suppress background noise, improving detection of subtle and early-stage lesions.
Hierarchical feature fusion with BiFPN: Integrates multi-scale features at the neck, enabling robust detection across variable lesion sizes and enhancing overall performance.
Optimized detection head with SCConv: Uses lesion-adaptive kernel selection guided by BiFPN features, reducing computational overhead while capturing fine-grained and irregular lesion patterns.

2. Related Work

2.1. Crop Disease Detection

Crop disease detection is essential in precision agriculture because early identification can prevent significant yield loss and maintain food security [14]. Traditional approaches detect disease symptoms using manually designed characteristics like color, texture, and shape. These perform well under controlled conditions, where the environment is uniform, and disease patterns are distinct. However, in real-world agricultural fields, variable lighting, leaf orientation, occlusion, and complex backgrounds often reduce the effectiveness of these methods.

Moreover, traditional techniques generally require careful preprocessing, segmentation, and feature engineering, which are time-consuming and labor-intensive [15]. They are also less adaptable to multiple disease types or new variations that appear over time. As crop production scales and environmental conditions become more unpredictable, these limitations have driven research toward automated, robust, and adaptable detection systems.

2.2. Intelligent Computing for Disease Detection

Deep learning has become a cornerstone in crop and fruit disease detection, providing the ability to automatically extract complex features and learn rich data representations [16]. Convolutional neural networks excel at capturing spatial patterns in images. Sequential models track temporal changes in disease progression. Generative models alleviate dataset limitations by producing synthetic samples. These capabilities make deep learning models more robust. They handle lighting variations, occlusion, and background clutter. These are common challenges in real-world agricultural environments.

Beyond traditional crops like corn, recent studies have applied intelligent computing to a variety of plants and fruits. For example, machine learning approaches have been used for sugarcane disease identification [17], enhanced hybrid attention deep learning has been employed for avocado ripeness classification [18], and the TLI-YOLO framework has been developed for rice disease detection [19]. These works show that lightweight, efficient deep learning frameworks can achieve high performance. When combined with attention mechanisms and multi-scale feature fusion, they excel in diverse agricultural tasks. Such advancements support real-time deployment, enabling automated and accurate monitoring of plant health in precision agriculture applications.

2.3. Challenges in Disease Detection

Despite the success of deep learning methods, practical crop disease detection still faces numerous challenges. Real-world agricultural environments are highly complex, with variable lighting, shadows, overlapping leaves, and subtle differences between disease stages that can confuse automated systems [20]. Environmental factors such as soil condition, humidity, and seasonal changes further influence disease appearance, complicating detection. Additionally, the limited availability of annotated datasets is a challenge. Collecting and labeling large numbers of crop images is time-consuming. Imbalanced datasets can also bias model performance toward dominant disease classes. Many high-accuracy models also have high computational demands, restricting their use in resource-limited environments.

Another critical challenge is the handling of multi-view crop images. In practical field monitoring, UAVs or portable devices often capture top-view or wide-angle images, which introduce leaf overlap, geometric distortion, and increased background clutter. Existing lightweight models are primarily optimized for standard front-facing images and lack a targeted design for multi-view feature extraction, leading to performance degradation in real-world applications [21]. Addressing these challenges is essential for developing efficient, accurate, and robust disease detection systems suitable for real-world deployment.

3. Previous Work

3.1. Baseline YOLOv8 Architecture

We present the original YOLOv8 architecture as the baseline, without any of the proposed improvements such as GhostNetV2, SimAM, or BiFPN. YOLOv8 balances speed and accuracy in object detection [22]. Given an input image

I \in R^{H \times W \times C}

, where H, W, and C denote height, width, and channels, the backbone extracts hierarchical feature maps

F^{(l)} \in R^{H_{l} \times W_{l} \times C_{l}}

across L layers using the standard C2f module with residual connections. Multi-scale features are aggregated via the FPN+PANet neck. Attention mechanisms are not part of standard YOLOv8. They can refine features only when customly integrated. SimAM is exclusively an enhancement in our CBS-YOLOv8. The detection head predicts bounding boxes and class probabilities.

F^{(l)} = C 2 fConv (F^{(l - 1)}) + Residual (F^{(l - 1)}), l = 1, 2, \dots, L

(1)

{\hat{F}}^{(l)} = \sum_{i = 1}^{L} w_{i} \cdot Upsample (F_{top}^{(i)}) + \sum_{j = 1}^{L} w_{j}^{'} \cdot Downsample (F_{bottom}^{(j)}), l = 1, 2, \dots, L

(2)

{\tilde{F}}^{(l)} = {\hat{F}}^{(l)} ⊙ σ (\frac{{({\hat{F}}^{(l)} - μ_{neuron})}^{2}}{σ_{neuron}^{2} + ϵ}), l = 1, 2, \dots, L

(3)

The detection head

H

outputs bounding boxes

b_{k} = (x_{k}, y_{k}, w_{k}, h_{k}) \in R^{4}

and class probabilities

p_{k} \in R^{C}

for each anchor k across all feature levels:

(b_{k}, p_{k}) = H ({\tilde{F}}^{(1)}, {\tilde{F}}^{(2)}, \dots, {\tilde{F}}^{(L)}), k = 1, \dots, K

(4)

This baseline design enables YOLOv8 to handle small objects in complex backgrounds with high precision and supports deployment on edge devices. Figure 1 shows the complete YOLOv8 architecture, including the backbone, neck, and detection head.

3.2. Related Lightweight Modules

This subsection introduces external lightweight modules that can be integrated into the YOLOv8 baseline to improve efficiency and feature representation. These modules are not part of the original YOLOv8 architecture but serve as the technical foundation for CBS-YOLOv8.

GhostNet [23] is a lightweight convolutional block designed to generate more feature maps with fewer parameters. Let the input tensor be

X \in R^{H \times W \times C}

, where H is height, W is width, and C is the number of channels. Intrinsic features are extracted via standard convolution and activation:

F_{intrinsic}^{(l)} = σ (\sum_{i = 1}^{C} \sum_{m = - k}^{k} \sum_{n = - k}^{k} W_{i, m, n}^{(l)} \cdot X_{:, :, i}^{(m, n)} + b_{i}^{(l)}), l = 1, 2, \dots, L

(5)

where L is the number of layers,

W_{i, m, n}^{(l)}

and

b_{i}^{(l)}

are the convolution weights and bias for channel i, k is the convolution kernel radius, and

σ (\cdot)

is a nonlinear activation function. Ghost features are cheaply generated from intrinsic features using lightweight transformations:

F_{ghost}^{(l, j)} = ϕ_{j} (\sum_{p = 1}^{C_{l}^{'}} \sum_{q = - k^{'}}^{k^{'}} \sum_{r = - k^{'}}^{k^{'}} K_{p, q, r}^{(l, j)} * F_{intrinsic, p}^{(l)} + b_{p}^{(l, j)}), j = 1, 2, \dots, S_{l}

(6)

where

C_{l}^{'}

is the number of intrinsic feature channels at layer l,

S_{l}

is the number of ghost transformations,

K_{p, q, r}^{(l, j)}

are lightweight kernels (depthwise or linear),

ϕ_{j}

is a nonlinear activation, and

b_{p}^{(l, j)}

is the bias. The output of the GhostNet block concatenates intrinsic and ghost features, optionally followed by channel attention:

Y^{(l)} = Concat (F_{intrinsic}^{(l)}, F_{ghost}^{(l, 1)}, \dots, F_{ghost}^{(l, S_{l})}) + A (F_{intrinsic}^{(l)} + \sum_{j = 1}^{S_{l}} F_{ghost}^{(l, j)})

(7)

Here,

Concat (\cdot)

denotes channel-wise concatenation,

A (\cdot)

is a channel attention mechanism, and

Y^{(l)} \in R^{H^{'} \times W^{'} \times (C^{'} + C^{''})}

is the output feature map, where

H^{'}, W^{'}

are spatial dimensions after any downsampling,

C^{'}

is the number of intrinsic channels, and

C^{″}

is the total number of ghost channels. SimAM [24] is a parameter-free attention mechanism that enhances feature discrimination by computing neuron importance based on variance:

{\tilde{F}}_{i, j, k} = F_{i, j, k} \cdot σ (\frac{{(F_{i, j, k} - μ_{i, j})}^{2}}{σ_{i, j}^{2} + ϵ})

(8)

where

F_{i, j, k}

is the feature at spatial location

(i, j)

and channel k,

μ_{i, j} = \frac{1}{C} \sum_{c = 1}^{C} F_{i, j, c}

is the mean across channels,

σ_{i, j}^{2} = \frac{1}{C} \sum_{c = 1}^{C} {(F_{i, j, c} - μ_{i, j})}^{2}

is the variance,

ϵ = 1 \times 10^{- 6}

ensures numerical stability, and

σ (\cdot)

is the sigmoid function. This allows for effective feature enhancement without extra parameters, suitable for resource-constrained agricultural applications.

4. Research Method

4.1. Improved Structural Framework of YOLOv8

This paper introduces CBS-YOLOv8, a lightweight model built upon the YOLOv8 detection network. The architecture redesigns the Backbone, Neck, and Detection Head to improve feature representation and reduce computational cost while remaining lightweight for real-time agricultural deployment. We have clarified the C2f-GhostNetV2 integration, explaining how it combines C2f’s multi-branch feature extraction with GhostNetV2’s efficient feature generation to achieve a balance between lightweight design and enhanced feature representation. The lightweight design also affects the precision-recall trade-off: by focusing on salient and discriminative features, the network favors precision while slightly limiting recall for small or subtle lesions.

The backbone employs a GhostNetv2 bottleneck in place of the original C2f module. The input is represented by

X \in R^{H \times W \times C}

. The intrinsic feature maps are first extracted using standard convolution operations:

Q_{intrinsic}^{(l)} = σ (\sum_{i = 1}^{C} \sum_{m = - k}^{k} \sum_{n = - k}^{k} W_{i, m, n}^{(l)} \cdot X_{:, :, i}^{(m, n)} + b_{i}^{(l)}), l = 1, 2, \dots, L

(9)

To enhance feature diversity while reducing redundancy, GhostNetV2 generates additional ghost features. SimAM attention is integrated to suppress background noise and emphasize disease-specific regions:

Q_{ghost}^{(l, j)} = {\tilde{Q}}_{i, j, k}^{(l, j)} = Q_{intrinsic, i, j, k}^{(l)} \cdot σ (\frac{{(Q_{intrinsic, i, j, k}^{(l)} - μ_{i, j}^{(l)})}^{2}}{σ_{i, j}^{2, (l)} + ϵ}), j = 1, 2, \dots, S_{l}

(10)

Combining GhostNetV2 with SimAM attention enables the network to focus on key lesion features, improving the capture of fine-grained corn disease patterns like gray leaf spot and rust pustules, reducing false positives, and enhancing precision in a lightweight framework.

In the Neck, PANet is replaced with BiFPN for multi-scale feature integration. The feature map at level l is computed as follows:

{\hat{Q}}^{(l)} = \sum_{i = 1}^{L} w_{i} \cdot Upsample ({\tilde{F}}_{top}^{(i)}) + \sum_{j = 1}^{L} w_{j}^{'} \cdot Downsample ({\tilde{F}}_{bottom}^{(j)}), l = 1, \dots, L

(11)

where

w_{i}

and

w_{j}^{'}

are learnable fusion weights, and up/downsampling enables bidirectional flow. BiFPN improves multi-scale feature integration, mitigating recall loss from the lightweight backbone and supporting robust detection across lesions of different sizes.

In the Detection Head, a shared-parameter SCConv module is used:

Q^{(l)} = SCConv ({\hat{F}}^{(l)}) = \sum_{c = 1}^{C_{l}} \sum_{p = - k_{s}}^{k_{s}} \sum_{q = - k_{s}}^{k_{s}} K_{p, q, c}^{(l)} * {\hat{F}}_{i + p, j + q, c}^{(l)} + b_{c}^{(l)}, Y \in R^{H^{'} \times W^{'} \times C^{'}}

(12)

SCConv reduces feature and channel redundancy while preserving representational power, supporting the lightweight and precise nature of CBS-YOLOv8. SCConv further contributes to the lightweight nature of CBS-YOLOv8, ensuring that high-precision detection is maintained while computational cost remains low, without significantly compromising recall. CBS-YOLOv8 integrates GhostNetV2 with SimAM attention. It uses BiFPN for multi-scale feature fusion. SCConv is applied in the detection head. The framework balances computational efficiency and fine-grained lesion recognition. It enables accurate detections in real-time agricultural scenarios. Figure 2 presents the refined architectural configuration.

4.2. GhostNetV2 Module

The GhostNetV2 module generates additional feature representations while reducing parameters and computation. Given an input tensor

X \in R^{H \times W \times C}

, the primary features are extracted via standard convolution:

F_{intrinsic} = σ (\sum_{i = 1}^{C} \sum_{m = - k}^{k} \sum_{n = - k}^{k} W_{i, m, n} \cdot X_{:, :, i}^{(m, n)} + b_{i}), F_{intrinsic} \in R^{H^{'} \times W^{'} \times C^{'}}

(13)

Here,

W_{i, m, n}

and

b_{i}

are the convolution weights and bias,

σ (\cdot)

denotes the activation function, and k is the kernel radius. The cheap operation generates additional ghost feature maps based on the intrinsic ones:

F_{ghost} = ⋃_{j = 1}^{S} ϕ_{j} (F_{intrinsic} * K_{j} + b_{j}^{'}), F_{ghost} \in R^{H^{'} \times W^{'} \times C^{″}}

(14)

where

ϕ_{j}

denotes nonlinear transformations such as ReLU or SiLU,

K_{j}

are lightweight convolution kernels (depthwise or linear operations),

b_{j}^{'}

are bias terms, and S is the number of ghost transformations per intrinsic map. The model integrates both the GhostNetV2 block and the SCConv module, which work together to reduce computational complexity while enhancing feature extraction. Finally, the output of the GhostNetV2 block is obtained by concatenating intrinsic and ghost feature maps, optionally followed by a residual connection and channel attention module:

Y = Concat (F_{intrinsic}, F_{ghost}) + A (F_{intrinsic} + F_{ghost}), Y \in R^{H^{'} \times W^{'} \times (C^{'} + C^{″})}

(15)

Here,

A (\cdot)

represents a channel attention mechanism, enhancing feature representation while keeping computational overhead low. By combining the GhostNetV2 block for efficient feature representation and SCConv for optimized convolution operations, this design strikes a balance between accuracy and computational efficiency. This design allows GhostNetV2 to maintain high representational power with reduced FLOPs, making it ideal for lightweight backbones in YOLOv8 and edge devices. The module diagram of GhostNetV2 is shown in Figure 3.

4.3. SimAM Attention Mechanism

SimAM is a compact and effective module that strengthens CNN feature extraction. It directs the model’s emphasis toward important features by evaluating correlations across spatial and channel dimensions, while reducing redundant or irrelevant information. In contrast to conventional attention methods, SimAM produces weights according to feature similarity rather than through complex training procedures.

The output of SimAM can be expressed as a combination of spatial and channel attention:

F_{out} = F ⊙ A_{space} ⊙ A_{channel}

(16)

The spatial attention map is computed over a local neighborhood and normalized:

A_{space} (i, j) = \frac{exp (\sum_{p = - P}^{P} \sum_{q = - Q}^{Q} F_{i + p, j + q} \cdot F_{i, j} / \sqrt{\sum_{r = - P}^{P} \sum_{s = - Q}^{Q} F_{i + r, j + s}^{2} + ϵ})}{\sum_{u, v} exp (\sum_{p = - P}^{P} \sum_{q = - Q}^{Q} F_{u + p, v + q} \cdot F_{u, v} / \sqrt{\sum_{r = - P}^{P} \sum_{s = - Q}^{Q} F_{u + r, v + s}^{2} + ϵ})}

(17)

The channel attention weights are computed by comparing feature similarity across channels:

A_{channel} (k) = \frac{exp (\sum_{c^{'} = 1}^{C} F_{c^{'}} \cdot F_{k} / \sqrt{\sum_{c^{″} = 1}^{C} F_{c^{″}}^{2} + ϵ})}{\sum_{c^{'} = 1}^{C} exp (\sum_{c^{″} = 1}^{C} F_{c^{″}} \cdot F_{k} / \sqrt{\sum_{c^{‴} = 1}^{C} F_{c^{‴}}^{2} + ϵ})}

(18)

Here,

F \in R^{H \times W \times C}

represents the input tensor,

i, j

denote spatial positions, k indicates the channel index, and

P, Q

define the local neighborhood, with a small constant

ϵ

included to ensure numerical stability. These three equations concisely describe the SimAM mechanism, showing how it combines spatial and channel attention to improve feature extraction while remaining lightweight. Figure 4 illustrates the SimAM attention module.

4.4. BiFPN Structure

To enhance the detection performance of YOLOv8 across objects of different sizes, we replace the baseline FPN (Figure 5A) with a BiFPN structure, as shown in Figure 5B. Unlike the standard FPN, BiFPN employs bidirectional information flow, enabling more effective multi-scale feature fusion.

In the top-down pathway, higher-level features are upsampled and fused with lower-level features:

{\tilde{F}}_{l}^{td} = w_{l}^{td} \cdot F_{l} + w_{l + 1}^{td} \cdot Upsample ({\tilde{F}}_{l + 1}^{td})

(19)

Here,

F_{l}

is the feature map at level l,

{\tilde{F}}_{l}^{td}

is the fused top-down feature, and

w_{l}^{td}, w_{l + 1}^{td}

are learnable weights.

During the bottom-up pathway, the fused features are further propagated upwards:

{\tilde{F}}_{l}^{bu} = w_{l}^{bu} \cdot {\tilde{F}}_{l}^{td} + w_{l - 1}^{bu} \cdot Downsample ({\tilde{F}}_{l - 1}^{bu})

(20)

This bidirectional fusion allows BiFPN to combine information across scales more effectively than the baseline YOLOv8 FPN, improving the representation of small and large objects simultaneously. Finally, the output feature at each level is computed by aggregating neighboring fused features:

F_{l}^{out} = σ (\sum_{i \in N (l)} w_{i} \cdot F_{i}), where \sum_{i} w_{i} = 1

(21)

By integrating BiFPN into YOLOv8, as illustrated in Figure 5, the network achieves enhanced multi-scale feature representation and improved detection accuracy compared to the baseline architecture in Figure 1.

4.5. Synergistic Mechanism Analysis

The CBS-YOLOv8 framework employs a task-driven, synergistic integration of modules tailored for corn disease detection. Rather than simply stacking existing components, the modules interact to enhance feature representation, efficiency, and robustness in complex field scenarios. The C2f-GhostNetV2 hybrid backbone combines C2f’s multi-branch extraction with GhostNetV2’s efficient feature generation, preserving multi-scale lesion details while reducing redundant computations. A lesion-aware calibration mechanism dynamically adjusts each feature map’s contribution, focusing on disease-relevant regions and suppressing background noise.

F_{calibrated} = α F_{intrinsic} + (1 - α) \cdot G (F_{intrinsic}),

(22)

where

G (\cdot)

denotes ghost feature generation, and

α

is a learned weight balancing intrinsic and ghost features.

Embedding SimAM into GhostNetV2 enables mutual modulation between attention and ghost features, ensuring that ghost features emphasize lesion-relevant regions while suppressing background noise:

F_{ghost}^{'} = F_{ghost} ⊙ σ (\frac{{(F_{intrinsic} - μ)}^{2}}{σ^{2} + ϵ}),

(23)

where

σ (\cdot)

is the Sigmoid function, and

μ, σ^{2}

denote the mean and variance of the intrinsic features.

The BiFPN neck and SCConv detection head interact through cross-module calibration. BiFPN fusion weights are dynamically informed by SCConv channel statistics:

{\hat{H}}^{(l)} = \sum_{i} w_{i}^{(l)} F_{i}^{up} + \sum_{j} w_{j}^{(l)} F_{j}^{down}, w_{i}^{(l)} \propto \frac{1}{1 + γ \cdot Var (F_{i}^{SCConv})},

(24)

where the variance of SCConv channels

Var (F_{i}^{SCConv})

guides BiFPN to prioritize lesion-rich scales, and

γ

is a scaling factor.

The coordinated synergy of C2f-GhostNetV2, SimAM, BiFPN, and SCConv addresses corn disease challenges—fine-grained lesions, variable sizes, complex backgrounds, and edge constraints. Their interaction produces a lightweight, high-performance network, demonstrating that CBS-YOLOv8’s strength lies in the joint optimization of feature discrimination, multi-scale representation, and computational efficiency for practical deployment.

5. Experimental Setup

5.1. Dataset

This dataset was captured on-site in the northwest region of China and was self-constructed. Four agricultural experts collaborated on data annotation and quality verification. The LabelImg v1.8.1 tool was used for annotation, with precise bounding boxes drawn around the diseased areas. Each annotation includes the disease category and corresponding bounding box coordinates, following COCO standards. When multiple diseases appeared on a single leaf, each lesion was labeled separately to ensure accuracy and precision. In addition to our self-collected dataset, we also used the publicly available corn_maize_leaf_disease dataset https://project-agml.github.io/AgML/datasets/corn_maize_leaf_disease.html (accessed on 16 August 2025) supplement training and evaluation, ensuring reproducibility for other researchers. Figure 6 provides a sample from the dataset.

A thorough review process was conducted to ensure annotation quality and consistency. Twenty percent of the samples were randomly selected for re-examination by an independent expert group. The annotation consistency, measured by Intersection over Union (IoU), reached 96.7% (IoU > 0.8). Forty-three blurry or ambiguous images were excluded. The final dataset contains 6095 images: 4267 for training, 1219 for validation, and 609 for testing. It includes 1223 healthy leaves, 822 grass armyworm eggs, 1331 grass armyworm sequelae, 1074 gray spot disease, 996 rust disease, and 648 gray leaf spot disease images. The dataset is diverse and well-balanced. A detailed breakdown of the category distribution is provided in Table 1 below.

5.2. Experimental Configuration

The experiments were conducted on a machine running Windows 11, equipped with an AMD Ryzen 7 7735H processor, an NVIDIA RTX 4060 GPU with 16GB memory, and 16GB of RAM. This configuration provides ample computational resources for training deep learning models on extensive image datasets. Model implementation was carried out using Python 3.10 and PyTorch 2.1 with CUDA 12.1, enabling GPU acceleration and optimized performance. Additional libraries, including OpenCV 4.11.0 and NumPy 2.3.2, were employed for data preprocessing, image augmentation, and feature extraction, providing a robust environment for model development and evaluation.

A portion of the dataset was reserved for evaluation to ensure unbiased testing. All models, including CBS-YOLOv8 and the baselines, were trained for 120 epochs using the Adam optimizer. The same hyperparameters and learning rate schedule were applied to all models. The learning rate for the Adam optimizer was set to 0.001 and adjusted using cosine decay scheduling to ensure stable convergence. Batch size was 16, and parameters such as weight decay (0.0005) and momentum (0.9) were used to maintain training stability and prevent overfitting.

To ensure full reproducibility of our experiments, including the ablation studies, we provide a comprehensive listing of all hyperparameters and augmentation settings in Table 2. We also report evaluation using mAP@0.5 for completeness.

5.3. Evaluation Metrics

Model performance evaluation is essential for validating detection accuracy. This study evaluates using precision (P), recall (R), and mean average precision (mAP). The following formulas define these parameters:

P = \frac{\sum_{i = 1}^{N} 1 {y_{i} = 1 \land {\hat{y}}_{i} = 1}}{\sum_{i = 1}^{N} 1 {{\hat{y}}_{i} = 1}}

(25)

R = \frac{\sum_{i = 1}^{N} 1 {y_{i} = 1 \land {\hat{y}}_{i} = 1}}{\sum_{i = 1}^{N} 1 {y_{i} = 1}}

(26)

AP = \int_{0}^{1} P (R) d R \approx \sum_{k = 1}^{M} P (R_{k}) Δ R_{k}

(27)

mAP = \frac{1}{C} \sum_{c = 1}^{C} {AP}_{c}

(28)

Here,

y_{i}

and

{\hat{y}}_{i}

denote the ground-truth and predicted labels for sample i, respectively.

1 \cdot

serves as a binary function, yielding 1 if the condition is satisfied and 0 otherwise. M represents the number of thresholds applied to compute the area under the precision–recall curve. C denotes the number of classes, and

{AP}_{c}

the average precision for class c. Precision is the ratio of true to predicted positives, while recall is the proportion of true positives among actual positives. mAP aggregates the mean precision across all categories to provide an overall assessment of model performance.

5.4. Comparative Analysis of Algorithms

To assess the performance of the proposed CBS-YOLOv8 model, its results were evaluated against several mainstream and recent object detection algorithms, including two-stage approaches such as Faster R-CNN [25], single-stage methods like SSD [26], YOLOv5 [27], and YOLOv7 [28], as well as lightweight models including YOLOv8 [29], PP-YOLOE [30], and NanoDet [31]. All experiments for the models were conducted using the same corn disease dataset under identical conditions.

Metrics considered include precision, recall, mAP, model size, and inference speed. Results show that Faster R-CNN achieves high accuracy but low speed. SSD is fast but less precise on small or occluded lesions. YOLOv5, YOLOv7, and YOLOv8 perform well. CBS-YOLOv8 reduces parameters and FLOPs through backbone and attention improvements, achieving comparable or higher mAP with real-time inference. PP-YOLOE and NanoDet are fast. CBS-YOLOv8 achieves a better balance of precision and speed, ideal for crop disease detection in the field.

6. Results

6.1. Data-Enhanced Evaluation

To evaluate the impact of data augmentation on model performance, four strategies were applied to YOLOv8: no augmentation, mirroring only, scaling only, and a combination of both. The baseline model was analyzed using the control variable method to isolate the effects of each augmentation. As shown in Table 3, both mirroring and scaling individually improved precision, recall, and mAP@0.5, indicating that even single augmentations can enhance feature learning and robustness.

The combination of mirroring and scaling produced the most substantial improvements, achieving an accuracy of 92.3% and an mAP@0.5 of 88.6%, outperforming individual augmentations. Moreover, the augmented model demonstrates increased tolerance to moderate image blurriness and variations in leaf orientation or size, which are common in field-acquired crop images. These results suggest that employing multiple complementary augmentations not only optimizes model performance but also enhances its generalization and robustness for real-world disease detection under diverse operational conditions.

6.2. Ablation Experiment

The CBS-YOLOv8 corn disease detection network was evaluated on the corn disease dataset, with YOLOv8 serving as the baseline model to examine the impact of the GhostNetV2, SimAM, and BiFPN components. Pre-trained weights were loaded to facilitate faster convergence. The input resolution was configured to

640 \times 640

with 120 training iterations, using the Adam optimizer, and identical experimental settings and hyperparameters were maintained across all model variants. Table 4 provides a summary of the results.

In Table 4, integrating individual modules into the baseline model improves performance to varying degrees. The addition of GhostNetV2 slightly increases precision but slightly reduces recall, indicating a more compact yet efficient feature extraction. SimAM boosts precision while keeping recall comparable to the baseline. BiFPN contributes to a notable increase in both recall and mAP@0.5, demonstrating the advantage of multi-scale feature fusion. The complete CBS-YOLOv8 model, incorporating GhostNetV2, SimAM, and BiFPN, achieves the best overall performance, confirming that each module enhances the network’s ability to detect fine-grained corn disease lesions.

6.3. Comparison and Analysis

Figure 7 illustrates the relationship between the confidence level and the accuracy rate for various categories, including fall armyworm eggs, healthy leaves, post-fall armyworm damage, blight, common rust, gray spot disease, and the overall combined categories. Each curve shows an upward trend as the confidence level increases, indicating a positive correlation between the model’s confidence in its predictions and the accuracy. Notably, the lightweight CBS-YOLOv8 model achieves high accuracy even at moderate confidence levels, reflecting its ability to focus on salient and discriminative features while maintaining efficiency. Spodoptera frugiperda eggs (light blue) exhibit high accuracy at lower confidence levels, suggesting that early predictions for this category are generally correct. In contrast, the curves for healthy leaves (orange) and Spodoptera frugiperda damage aftermath (green) show significant improvements in accuracy within the medium confidence range, indicating a rapid enhancement in model performance as confidence increases.

The curves for epidemic disease (red), common rust (purple), and gray spot disease (brown) vary in shape, reflecting differences in predictive performance across categories. The overall performance curve (dark blue) demonstrates that CBS-YOLOv8 maintains high precision across categories, reaching near-perfect accuracy (1.0) at a confidence level of 0.993, highlighting the effectiveness of its lightweight modules in prioritizing reliable detections. Comparing these curves highlights the model’s strengths in high-precision categories and its limitations in categories requiring further optimization. This analysis aids in selecting confidence thresholds, tuning model performance, and balancing precision and recall for improved prediction effectiveness.

In Figure 8, the horizontal axis represents confidence levels (0.0–1.0), and the vertical axis represents recall rates (0.0–1.0). Curves correspond to fall armyworm eggs (light blue), healthy leaves (orange), post-fall armyworm damage (green), blight (red), common rust (purple), gray spot disease (brown), and all classes combined (dark blue). At a confidence level of 0.0, the overall recall rate is 0.91. As confidence increases, recall generally decreases, demonstrating the trade-off between recall and confidence across categories and how performance adjusts with different thresholds. This illustrates the typical precision–recall trade-off in lightweight models, where CBS-YOLOv8 favors precision due to its emphasis on salient features, while still achieving competitive recall across categories.

After applying the improved algorithm to model training, the best-performing model was obtained and used to predict images from our corn disease dataset. Figure 9 shows the prediction results following the model enhancements. As seen in the figure, the improved CBS-YOLOv8 algorithm maintains excellent detection performance while remaining lightweight. The model accurately locates and labels various disease targets, including dense spot-like infections and large leaf lesions, demonstrating superior precision compared to the pre-improvement model. These results, obtained from real-world testing on our dataset, highlight the robustness, high accuracy, and computational efficiency of the CBS-YOLOv8 algorithm in practical crop disease detection tasks.

6.4. Comparative Experiment

The CBS-YOLOv8 model was systematically evaluated against several representative object detection frameworks, including two-stage detectors (Faster R-CNN), single-stage detectors (SSD, YOLOv5, YOLOv7), and lightweight networks (YOLOv8, PP-YOLOE, NanoDet). All models were trained and tested on the same corn disease dataset under identical experimental conditions to ensure fair and reproducible comparison.

Key evaluation metrics include precision, recall, mAP@0.5, average error (%), and inference time per image (ms). Table 5 summarizes the quantitative results. Faster R-CNN achieves high precision (92.1%) but lower recall (81.5%) and the largest error (11.7%) with longer inference time (215 ms), limiting its practical use in real-time scenarios. SSD is faster (30.4 ms) but shows reduced precision (85.7%) and recall (78.4%) with higher error (18.1%). YOLOv5, YOLOv7, and YOLOv8 provide a balanced performance, with YOLOv8 reaching 91.5% precision, 83.0% recall, and 88.5% mAP, while maintaining 32.5 ms inference time and 11.5% error. PP-YOLOE and NanoDet focus on efficiency, trading off some accuracy.

Notably, CBS-YOLOv8 achieves the best balance of accuracy and efficiency, with the lowest error among high-precision models and competitive inference speed. These improvements stem from the C2f-GhostNetv2 module, which suppresses redundant parameters while enhancing fine-grained lesion detection. This enables accurate identification of subtle lesions, such as gray leaf spot and rust, even under complex field conditions.

Figure 10 illustrates CBS-YOLOv8’s superior trade-off between precision and inference time compared to other models, highlighting its suitability for efficient, real-time deployment in corn disease detection applications.

6.5. Inference Speed and Real-Time Performance

Real-time performance is a crucial consideration for practical crop disease detection, especially in field environments where timely decisions are essential. We evaluated inference speed (FPS), model size (parameters), and computational complexity (GFLOPs) of eight representative object detection models, including two-stage detectors, single-stage detectors, and lightweight networks. The results are summarized in Table 6.

As shown in Table 6, Faster R-CNN achieves high precision but suffers from a low inference speed of only 4 FPS, due to its heavy backbone and two-stage design. SSD and NanoDet provide faster inference, ranging from 35 to 42 FPS, at the cost of slightly lower precision and recall, reflecting the common trade-off between speed and accuracy. The YOLO family models (YOLOv5, YOLOv7, and YOLOv8) achieve moderate inference speeds of 28–32 FPS while maintaining balanced accuracy, representing a compromise between performance and efficiency.

The proposed CBS-YOLOv8 achieves 36 FPS, slightly higher than YOLOv5 and YOLOv7, while maintaining competitive precision and recall. These results suggest that CBS-YOLOv8 provides a practical balance between real-time performance and predictive capability, though further optimization may be possible to improve both speed and accuracy in future work.

To better visualize the trade-off between inference speed and model performance, Figure 11 shows a comparison of inference speeds across different object detection models. The bar color gradient emphasizes higher FPS values, and numerical labels above each bar indicate the exact FPS. This visualization highlights CBS-YOLOv8’s competitive speed relative to other lightweight models and clearly shows the performance gap with slower models.

6.6. Class-Wise Detection Performance

To provide a detailed evaluation of the proposed CBS-YOLOv8 model, we analyzed its performance across different disease categories in the corn disease dataset. Table 7 presents the class-wise evaluation metrics, illustrating how the model manages both large lesions and smaller or more occluded targets.

From Table 7, blight shows the highest precision (93.0%) and F1-score (88.6%), suggesting the model effectively detects well-defined and larger lesions. In contrast, Spodoptera frugiperda damage aftermath shows a slightly lower recall (82.0%), likely due to the smaller size and higher density of the lesions, which pose more challenges for detection. Overall, the model achieves balanced performance across all classes, with precision consistently above 89% and recall above 82%, indicating reliable and robust detection capabilities across diverse disease types.

To further illustrate these results, Figure 12 shows class-wise performance metrics. The bar chart clearly shows that while all categories maintain high precision, recall varies slightly among smaller or overlapping lesions, reflecting the inherent difficulty in detecting these targets. The F1-score, combining precision and recall, demonstrates the model’s robustness across disease types.

In summary, this analysis illustrates that CBS-YOLOv8 can reliably detect both prominent and subtle disease symptoms. The integrated evaluation metrics offer a thorough assessment of the model, confirming its suitability for real-world corn disease detection scenarios. The figure and table together indicate that the model is both accurate and robust, with only minor limitations in more challenging lesion categories.

6.7. Summary of Results

Overall, the experimental results demonstrate that CBS-YOLOv8 effectively balances detection accuracy, computational efficiency, and real-time applicability. Across all evaluations—including data augmentation, ablation studies, category-wise performance, and comparative experiments—the proposed model consistently outperforms baseline YOLOv8 and other mainstream detection networks in terms of precision, recall, and mAP@0.5, while maintaining a lightweight structure with reduced parameters and GFLOPs.

The model’s integration of GhostNetV2, SimAM, BiFPN, and SCConv contributes to enhanced feature representation, precise localization of lesions, and robustness to moderate image blurriness, leaf occlusions, and variable lighting conditions commonly encountered in field-acquired corn disease images. The class-wise evaluation shows high precision and balanced recall across diverse disease types, highlighting the model’s capability to detect both large and subtle lesions.

However, limitations remain in scenarios involving multi-view images, highly overlapping leaves, or extreme environmental variations. These factors can reduce detection accuracy, suggesting future work should focus on adaptive feature extraction, multi-scale illumination normalization, and further optimization for edge-device deployment. Overall, Section 6 confirms that CBS-YOLOv8 is a practical and efficient solution for real-world corn disease detection, with clear advantages in both accuracy and computational efficiency.

7. Conclusions

CBS-YOLOv8 integrates GhostNetV2, SimAM, BiFPN, and SCConv to enhance feature representation, optimize convolution operations, and enable multi-scale feature fusion, achieving a strong balance between accuracy and efficiency suitable for real-time agricultural deployment. Experimental results demonstrate that CBS-YOLOv8 surpasses YOLOv8 and other mainstream detection models on key metrics, while its lightweight design reduces model complexity and parameter count, improving computational efficiency and practical applicability for field-based corn disease detection.

However, certain limitations remain. Detection accuracy may be affected by multi-view or oblique images, occluded lesions, and complex lighting or background conditions in real agricultural scenarios. Future research will focus on addressing these challenges, enhancing generalization across diverse crop diseases, and exploring strategies for deploying CBS-YOLOv8 on edge devices for real-time, resource-efficient monitoring. These improvements aim to support intelligent agricultural solutions and sustainable crop disease management.

Additionally, the model is most effective on clear or moderately blurred images; severe blurriness can reduce the visibility of fine-grained lesion features, affecting precision and recall. Future work will focus on improving robustness to image quality variations, enhancing generalization across diverse crop diseases, and exploring deployment strategies on edge devices for real-time, resource-efficient monitoring. These improvements aim to support intelligent agricultural solutions and sustainable crop disease management.

Author Contributions

Conceptualization, D.S. and Y.P.; methodology, D.S.; software, X.G.; validation, D.S., Y.P. and X.G.; formal analysis, D.S.; investigation, D.S.; resources, K.U.; data curation, X.G.; writing—original draft preparation, D.S.; writing—review and editing, Y.P. and K.U.; visualization, X.G.; supervision, K.U.; project administration, K.U.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to project-specific confidentiality agreements.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shehu, H.A.; Ackley, A.; Mark, M.; Eteng, O.E.; Sharif, M.H.; Kusetogullari, H. YOLO for Early Detection and Management of Tuta absoluta-Induced Tomato Leaf Diseases. Front. Plant Sci. 2025, 16, 1524630. [Google Scholar] [CrossRef] [PubMed]
Guo, F.; Yao, C.; Yang, R.; Ma, M.; Wu, X.; Xu, Z.; Lu, M.; Zhang, J.; Gong, G. A high-availability segmentation algorithm for corn leaves and leaf spot disease based on feature fusion. Crop Prot. 2024, 187, 106957. [Google Scholar] [CrossRef]
Zhang, K.; Luo, W.; Zhong, Y.; Ma, L.; Liu, W.; Li, H. Adversarial Spatio-Temporal Learning for Video Deblurring. IEEE Trans. Image Process. 2018, 28, 291–301. [Google Scholar] [CrossRef]
Chen, N.; Li, B.; Wang, Y.; Ying, X.; Wang, L.; Zhang, C.; Guo, Y.; Li, M.; An, W. Motion and Appearance Decoupling Representation for Event Cameras. IEEE Trans. Image Process. 2025, 34, 5964–5977. [Google Scholar] [CrossRef] [PubMed]
Cao, J.; Peng, B.; Gao, M.; Hao, H.; Guo, J.; Liu, X.; Liu, W. Detecting Small Damage on Wind Turbine Surfaces Using an Improved YOLO in Drone-Captured Scenes. J. Fail. Anal. Prev. 2025, 25, 725–740. [Google Scholar] [CrossRef]
Pan, W.; Yang, Z. A lightweight enhanced YOLOv8 algorithm for detecting small objects in UAV aerial photography. Vis. Comput. 2025, 2, 269. [Google Scholar] [CrossRef]
Wang, X.; Wu, Z.; Xiao, G.; Han, C.; Fang, C. YOLOv7-DWS: Tea Bud Recognition and Detection Network in Multi-Density Environment via Improved YOLOv7. Front. Plant Sci. 2025, 15, 1503033. [Google Scholar] [CrossRef]
Diao, Z.; Ma, S.; Li, J.; Zhang, J.; Li, X.; Zhao, S.; He, Y.; Zhang, B.; Jiang, L. Navigation Line Detection Algorithm for Corn Spraying Robot Based on Improved LT-YOLOv10s. Precis. Agric. 2025, 26, 46. [Google Scholar] [CrossRef]
Feng, Z.; Shi, R.; Jiang, Y.; Han, Y.; Ma, Z.; Ren, Y. SPD-YOLO: A Method for Detecting Maize Disease Pests Using Improved YOLOv7. Comput. Mater. Contin. 2025, 84, 3559–3575. [Google Scholar] [CrossRef]
Chen, X.; Jiao, Z.; Liu, Y. Improved YOLOv8n based helmet wearing inspection method. Sci. Rep. 2025, 15, 1945. [Google Scholar] [CrossRef]
Xu, X.; Zhou, B.; Li, W.; Wang, F. A Method for Detecting Persimmon Leaf Diseases Using the Lightweight YOLOv5 Model. Expert Syst. Appl. 2025, 284, 127567. [Google Scholar] [CrossRef]
Zhang, H.; Liang, M.; Wang, Y. YOLO-BS: A traffic sign detection algorithm based on YOLOv8. Sci. Rep. 2025, 15, 7558. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Xie, G.; Cui, J.; Guo, M. Surface Mine Personnel Object Video Tracking Method Based on YOLOv5-Deepsort Algorithm. Sci. Rep. 2025, 15, 17123. [Google Scholar] [CrossRef] [PubMed]
Singh, A.; Kaur, J.; Singh, K.; Singh, M.L. Deep transfer learning-based automated detection of blast disease in paddy crop. Signal Image Video Process. 2024, 18, 569–577. [Google Scholar] [CrossRef]
Feng, Z.R.; Li, Y.H.; Chen, W.Z.; Su, X.P.; Chen, J.N.; Li, J.P.; Liu, H.; Li, S.B. Infrared and Visible Image Fusion Based on Improved Latent Low-Rank and Unsharp Masks. Spectrosc. Spectr. Anal. 2025, 45, 2034–2044. [Google Scholar]
Radočaj, D.; Radočaj, P.; Plaščak, I.; Jurišić, M. Evolution of Deep Learning Approaches in UAV-Based Crop Leaf Disease Detection: A Web of Science Review. Appl. Sci. 2025, 15, 10778. [Google Scholar] [CrossRef]
Daphal, D.; Koli, S.M. Enhanced deep learning technique for sugarcane leaf disease classification and mobile application integration. Heliyon 2024, 10, e29438. [Google Scholar] [CrossRef]
Nuanmeesri, S. Enhanced hybrid attention deep learning for avocado ripeness classification on resource-constrained devices. Sci. Rep. 2025, 15, 3719. [Google Scholar] [CrossRef]
Li, Z.; Wu, W.; Wei, B.; Li, H.; Zhan, J.; Deng, S.; Wang, J. Rice disease detection: TLI-YOLO innovative approach for enhanced detection and mobile compatibility. Sensors 2025, 25, 2494. [Google Scholar] [CrossRef]
Shafay, M.; Hassan, T.; Owais, M.; Hussain, I.; Khawaja, S.G.; Seneviratne, L.; Werghi, N. Recent advances in plant disease detection: Challenges and opportunities. Plant Methods 2025, 21, 140. [Google Scholar] [CrossRef]
Tan, T.; Cao, X.; Liu, H.; Chen, L.; Wang, J.; Chen, X.; Wang, G. Characteristic analysis and model predictive-improved active disturbance rejection control of direct-drive electro-hydrostatic actuators. Expert Syst. Appl. 2026, 301, 130565. [Google Scholar] [CrossRef]
Wang, N.; Liu, H.; Li, Y.; Zhou, W.; Ding, M. Segmentation and Phenotype Calculation of Rapeseed Pods Based on YOLO v8 and Mask R-Convolution Neural Networks. Plants 2023, 12, 3328. [Google Scholar] [CrossRef]
Mu, C.; Zhang, F.; Feng, J.; Haidarh, M.; Liu, Y. GhostNet and pair-wise similarity module for cross-domain few-shot classification of hyperspectral images. Appl. Soft Comput. 2025, 183, 113717. [Google Scholar] [CrossRef]
DeRose, J.F.; Wang, J.; Berger, M. Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models. IEEE Trans. Vis. Comput. Graph. 2021, 27, 1160–1170. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9905, pp. 21–37. [Google Scholar]
Yaseen, M. YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-Time Vision. arXiv 2024, arXiv:2407.02988. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2023, arXiv:2207.02696. [Google Scholar]
Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Architecture, Training, and Performance. arXiv 2024, arXiv:2408.15857. [Google Scholar]
Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Wang, G.; Dang, Q.; Wei, S.; Du, Y.; et al. PP-YOLOE: An Evolved Version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar] [CrossRef]
Miao, D.; Wang, Y.; Yang, L.; Wei, S. Foreign Object Detection Method of Conveyor Belt Based on Improved NanoDet. IEEE Access 2023, 11, 23046–23052. [Google Scholar] [CrossRef]

Figure 1. Overview of the baseline YOLOv8 architecture.

Figure 2. The improved structural framework of CBS-YOLOv8.

Figure 3. The module diagram of GhostNetV2.

Figure 4. SimAM attention mechanism.

Figure 5. The BiFPN structure integrated into YOLOv8. The left subfigure (A) shows the FPN structure, and the right subfigure (B) shows the BiFPN structure.

Figure 6. Dataset sample.

Figure 7. Category-wise Precision Trends Across Confidence Levels.

Figure 8. Category-wise Recall Trends Across Confidence Levels.

Figure 9. Prediction results of CBS-YOLOv8 on the corn disease dataset.

Figure 10. Visualization of the comparative experiment ((a) Faster R-CNN, (b) SSD, (c) YOLOv5, (d) YOLOv7, (e) YOLOv8, (f) PP-YOLOE, (g) NanoDet, (h) CBS-YOLOv8. (A) and (B) represent two different sets of examples).

Figure 11. Inference speed comparison among different object detection models.

Figure 12. Class-wise Detection Metrics on the Corn Disease Dataset.

Table 1. Distribution of Samples Across Disease Categories and Dataset Splits.

Disease Category	Training	Validation	Test	Total
Healthy Leaves	856	245	122	1223
Spodoptera frugiperda eggs	576	164	82	822
Spodoptera frugiperda damage aftermath	932	266	133	1331
Blight	752	215	107	1074
Rust Disease	698	199	99	996
Gray Leaf Spot Disease	453	130	65	648

Table 2. Core Experimental Environment and Hyperparameters.

Name	Parameter
Operating System	Windows 11
CPU	AMD Ryzen 7 7735H
GPU	NVIDIA RTX 4060
GPU Memory	16GB
Training Epochs	120
Training/Validation Split	8:2
Batch Size	16
Optimizer	Adam
Initial Learning Rate	Adam: 0.001
Learning Rate Schedule	Cosine decay
Weight Decay	0.0005
Momentum	0.9
IoU Threshold (mAP)	0.5 for full evaluation
Image Size	640 × 640
Data Augmentation	Random flip, scale, color jitter, Mosaic, MixUp

Note: All models used the same hyperparameters. This ensures fair comparison and reproducibility.

Table 3. Effect of Data Augmentation on Model Performance.

Mirror	Scale	Precision (%)	Recall (%)	mAP@0.5 (%)
×	×	90.3	81.8	86.1
×	√	91.5	82.1	87.2
√	×	91.8	82.9	87.9
√	√	92.3	82.9	88.6

Table 4. Ablation Study of CBS-YOLOv8 Modules.

No.	Model	Precision	Recall	mAP@0.5
1	YOLOv8	0.896	0.875	0.865
2	YOLOv8+GhostNetV2	0.899	0.861	0.873
3	YOLOv8+SimAM	0.908	0.876	0.869
4	YOLOv8+BIFPN	0.910	0.885	0.875
5	CBS-YOLOv8	0.913	0.889	0.882

Table 5. Comparison of CBS-YOLOv8 with Other Models on Corn Disease.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	Error (%)	Time (ms)
Faster R-CNN [25]	92.1	81.5	88.3	11.7	215
SSD [26]	85.7	78.4	81.9	18.1	30.4
YOLOv5 [27]	90.2	82.7	87.5	12.5	33.1
YOLOv7 [28]	91.1	83.5	88.2	11.8	34.2
YOLOv8 [29]	91.5	83.0	88.5	11.5	32.5
PP-YOLOE [30]	90.8	82.9	87.8	12.2	29.8
NanoDet [31]	89.5	81.7	86.8	13.2	28.5
CBS-YOLOv8	92.3	84.2	88.9	11.1	31.0

Table 6. Inference Speed and Complexity of Object Detection Models.

Model	FPS	Params (M)	GFLOPs
Faster R-CNN	4	42.3	215
SSD	35	26.5	30.4
YOLOv5	28	9.4	26.4
YOLOv7	30	10.9	27.5
YOLOv8	32	11.2	28.6
PP-YOLOE	33	8.8	22.1
NanoDet	42	4.1	12.3
CBS-YOLOv8	36	8.1	21.0

Table 7. Class-wise Detection Performance of CBS-YOLOv8 on Corn Disease.

Category	Precision (%)	Recall (%)	F1-Score (%)
Spodoptera frugiperda eggs	90.5	85.7	88.0
Healthy leaves	92.0	83.5	87.5
Spodoptera frugiperda damage aftermath	89.8	82.0	85.7
Blight	93.0	84.5	88.6
Common rust	91.5	82.8	87.0
Gray spot disease	90.8	83.0	86.8
Overall	91.3	83.6	87.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, D.; Peng, Y.; Gu, X.; U, K. A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection. Mathematics 2025, 13, 4002. https://doi.org/10.3390/math13244002

AMA Style

Song D, Peng Y, Gu X, U K. A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection. Mathematics. 2025; 13(24):4002. https://doi.org/10.3390/math13244002

Chicago/Turabian Style

Song, Deao, Yiran Peng, Xinyuan Gu, and KinTak U. 2025. "A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection" Mathematics 13, no. 24: 4002. https://doi.org/10.3390/math13244002

APA Style

Song, D., Peng, Y., Gu, X., & U, K. (2025). A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection. Mathematics, 13(24), 4002. https://doi.org/10.3390/math13244002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight YOLOv8-Based Network for Efficient Corn Disease Detection

Abstract

1. Introduction

2. Related Work

2.1. Crop Disease Detection

2.2. Intelligent Computing for Disease Detection

2.3. Challenges in Disease Detection

3. Previous Work

3.1. Baseline YOLOv8 Architecture

3.2. Related Lightweight Modules

4. Research Method

4.1. Improved Structural Framework of YOLOv8

4.2. GhostNetV2 Module

4.3. SimAM Attention Mechanism

4.4. BiFPN Structure

4.5. Synergistic Mechanism Analysis

5. Experimental Setup

5.1. Dataset

5.2. Experimental Configuration

5.3. Evaluation Metrics

5.4. Comparative Analysis of Algorithms

6. Results

6.1. Data-Enhanced Evaluation

6.2. Ablation Experiment

6.3. Comparison and Analysis

6.4. Comparative Experiment

6.5. Inference Speed and Real-Time Performance

6.6. Class-Wise Detection Performance

6.7. Summary of Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI