Next Article in Journal
AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios
Previous Article in Journal
Energy–Exergy–Exergoeconomic Evaluation of a Two-Stage Ammonia Refrigeration Cycle Under Industrial Operating Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss

1
College of Science and Information, Qingdao Agricultural University, Qingdao 266109, China
2
Haidu College, Qingdao Agricultural University, Laiyang 265200, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1162; https://doi.org/10.3390/app16031162
Submission received: 5 December 2025 / Revised: 26 December 2025 / Accepted: 7 January 2026 / Published: 23 January 2026
(This article belongs to the Section Optics and Lasers)

Abstract

Precise identification of early peanut leaf spot is strategically significant for safeguarding oilseed supplies and reducing pesticide reliance. However, general-purpose detectors face severe domain adaptation bottlenecks in unstructured field environments due to small feature dissipation, physical occlusion, and class imbalance. To address this, this study constructs a dataset spanning two phenological cycles and proposes POD-YOLO, a physics-aware and dynamics-optimized lightweight framework. Anchored on the YOLOv10n architecture and adhering to a “data-centric” philosophy, the framework optimizes the parameter convergence path via a synergistic “Augmentation-Loss-Optimization” mechanism: (1) Input Stage: A Physical Domain Reconstruction (PDR) module is introduced to simulate physical occlusion, blocking shortcut learning and constructing a robust feature space; (2) Loss Stage: A Loss Manifold Reshaping (LMR) mechanism is established utilizing dual-branch constraints to suppress background gradients and enhance small target localization; and (3) Optimization Stage: A Decoupled Dynamic Scheduling (DDS) strategy is implemented, integrating AdamW with cosine annealing to ensure smooth convergence on small-sample data. Experimental results demonstrate that POD-YOLO achieves a 9.7% precision gain over the baseline and 83.08% recall, all while maintaining a low computational cost of 8.4 GFLOPs. This study validates the feasibility of exploiting the potential of lightweight architectures through optimization dynamics, offering an efficient paradigm for edge-based intelligent plant protection.

1. Introduction

As a globally pivotal oilseed crop, the peanut ranks among the top oil-bearing plants in terms of both cultivation area and total yield. As the world’s largest peanut producer, China views the security of peanut production as strategically significant to its national grain and oil supply system. However, the leaf spot complex (including brown spot and black spot) represents the most widely distributed and damaging disease category, frequently causing yield losses of 10% to 40% [1,2], thereby becoming a core bottleneck constraining the high-quality development of the peanut industry. Against the backdrop of intensifying climate change and the increasing complexity of agricultural ecological environments, diseases often exhibit frequent mixed infections. Traditional manual identification relies heavily on expert knowledge and suffers from subjectivity, low efficiency, and the inability to detect early-stage symptoms [3,4,5]. Consequently, in the context of global smart agriculture, the development of efficient, automated intelligent disease detection technology has become a research hotspot [6]. This trend aligns with broader advances in intelligent perception and system reliability in the fields of heavy machinery and automation. In these domains, neural network–enhanced analytical methods are increasingly employed to achieve efficient fault detection and to ensure operational safety [7].
In recent years, the rise of Convolutional Neural Network (CNN) and Transformer paradigms has provided powerful tools for visual perception in unstructured agricultural environments [8]. Early agricultural disease identification was primarily treated as an image classification problem, utilizing backbone networks such as ResNet and DenseNet for feature extraction [9]. However, these approaches fail to provide specific location and density information regarding diseases, making them difficult to use for guiding precision variable-rate spraying. With the development of object detection technology, the research focus has gradually shifted from coarse-grained classification to fine-grained localization [10]. While two-stage detectors based on region proposals (e.g., Faster R-CNN, Cascade R-CNN) demonstrate excellent detection accuracy [11], their complex Region Proposal Networks (RPN) result in a high computational load. In contrast, one-stage detectors represented by the YOLO (You Only Look Once) series have achieved a superior balance between speed and accuracy due to their end-to-end single-pass inference advantage, gradually becoming the mainstream choice in agricultural scenarios such as animal behavior analysis [12] and fruit detection [13,14]. To adapt to complex agricultural environments, researchers have proposed various improvement strategies:
Liu et al. [15] and Qi et al. [16] introduced attention mechanisms into YOLOv8 and YOLOv5, respectively, enhancing the models’ ability to capture illumination changes and long-range dependencies. Wang et al. [17] developed the SMC-YOLO method based on YOLOv8, employing SPCPM to utilize rich gradient paths to guide feature extraction in convolutional modules, combined with MDFEM to enhance feature information from multiple dimensions. Notably, the incorporation of the CSFLNLM attention mechanism at the front of the detection head effectively improved the model’s ability to distinguish objects from the background, achieving an mAP@0.5 of 86.7%, which demonstrates the potential of such improvement strategies in fine-grained agricultural detection tasks.
To address the challenge of minute and variable lesion sizes, Feature Pyramids and their variants (PANet, BiFPN) have been widely applied [18,19]. Sun et al. [20] explored multi-scale feature fusion strategies, while GAF-Net attempted to overcome the problem of feature dissipation in deep networks through GSConv and adaptive weighted fusion technologies. Some studies have attempted to introduce Vision Transformers (ViT) into agricultural detection [21], leveraging their global self-attention mechanisms to resolve long-range dependency issues. However, the quadratic computational complexity of Transformers makes them difficult to deploy on distinctively compute-constrained agricultural IoT devices. Therefore, lightweight CNN architectures (e.g., MobileNet, GhostNet) and model pruning technologies remain the current mainstream direction. Considering the requirements of mobile applications, Islam et al. [22] validated the efficiency of YOLOv10 in tea disease detection; similarly, the MEAN-SSD model developed by Sun et al. [23] achieved real-time monitoring of apple diseases on mobile devices through a lightweight architecture. While the aforementioned studies have achieved success in their respective fields, most focus on single-dimensional structural modifications, lacking systemic solutions for optimization dynamics under small-sample data and complex physical occlusion in the field.
In the latest evolution of one-stage detectors, YOLOv10n constitutes the ideal architectural base for this study due to its breakthrough design [24]. Its introduction of a consistent dual assignment strategy eliminates dependency on Non-Maximum Suppression (NMS), significantly reducing inference latency; meanwhile, the SCDown and PSA modules optimize minute feature retention and global receptive field construction at the physical level [25]. However, despite the exceptional performance of YOLOv10n on general datasets like COCO, transferring it to the specific domain of peanut disease monitoring presents severe challenges:
First, the dataset used in this study contains only 1443 images. On such a limited data manifold, the original Stochastic Gradient Descent (SGD) and its weight decay strategy struggle to guide the model to converge to flat minima. The model is prone to falling into sharp local optima, leading to poor generalization and severe oscillation in the later stages of training. Second, early-stage lesions are minute in diameter, and the background accounts for over 95% of the image; the accumulated gradients of simple negative samples often dominate the optimization direction, making it difficult to focus on weak features [26]. Furthermore, severe physical occlusion (overlapping leaves) and photometric drift exist in actual field environments [27]. General model augmentations like Mosaic fail to explicitly model these physical occlusion laws, easily leading to “shortcut learning” that relies on local textures [28].
To address the above challenges, this study proposes a domain-adaptive synergistic optimization framework (POD-YOLO) tailored for unstructured field conditions characterized by severe physical occlusion, variable illumination, and tiny symptom features in peanut leaf spot detection.We aim to optimize the accuracy-efficiency trade-off for agricultural edge devices. Rather than relying on deep architectures, we inject domain-specific priors to address feature erosion, occlusion, and class imbalance. These targeted ‘training-time’ optimizations enable the lightweight POD-YOLO to achieve elite performance with zero inference overhead. Unlike traditional approaches that merely stack modules, this study adopts a systemic “Augmentation-Loss-Optimization” synergistic strategy, aiming to build a detection model that possesses both high robustness and adaptability to edge devices. The main contributions of this paper are as follows:
  • Proposal of a Physical Domain Reconstruction (PDR) Module. To address feature dissipation caused by field occlusion, this module employs Cutout-based occlusion simulation. Ablation studies demonstrate that this module independently increased detection Precision by 3.51% compared to the baseline by effectively blocking shortcut learning pathways and enforcing robust feature extraction.
  • Design of a Decoupled Dynamic Scheduling (DDS) Strategy. Targeting the instability of small-sample optimization, we implemented a decoupled weight decay strategy using AdamW and cosine annealing. Experimental results show that this strategy successfully optimized the convergence trajectory, improving the Recall rate by 3.42% and ensuring the model escaped sharp local minima.
  • Establishment of a Loss Manifold Reshaping (LMR) Mechanism. We introduced a dual-branch constraint mechanism using Focal Loss and CIoU to address class imbalance. This mechanism achieved the most significant individual gain, boosting Precision by 7.62% by suppressing background gradients and reshaping the classification manifold.

2. Materials and Methods

2.1. Dataset Construction and Optimization

To construct a high-fidelity peanut disease dataset endowed with spatiotemporal diversity, this study conducted sample collection across two phenological cycles at the Crop Information Collection Base in Longzhuang Town, Tengzhou City, Shandong Province, China (35° N, 117° E). This region is a typical peanut-corn rotation area. The collection period covered July 2023 and July 2024, yielding a total of 480 raw images. The acquisition device selected was an iPhone 13 Pro equipped with a 12-megapixel sensor, with a raw resolution of 3024 ×4032 pixels. To eliminate geometric distortion and ensure consistency in feature scale, photography was conducted during a sunny time window of 07:00–09:00 to avoid high-exposure artifacts. Simultaneously, the optical axis of the lens was kept perpendicular (90°) to the leaf plane, maintaining a fixed object distance of 15 cm ± 1 cm.
Addressing the inevitable issues of focus drift and invalid backgrounds during field photography, a visual saliency-guided quality assessment mechanism was introduced. First, the AttentionInsight algorithm was utilized to analyze the raw images (as shown in Figure 1a), generating attention heatmaps simulating human visual attention mechanisms and focus maps for region filtering. The high-response regions (red areas) in the heatmaps validated the saliency of disease features. By setting saliency thresholds, low-quality samples with blurred salient regions or excessive background noise were automatically discarded. Subsequently, LabelImg-1.86 software was used to perform fine-grained manual annotation following the Pascal VOC standard. Ultimately, 456 valid images were screened and retained, covering early-stage punctiform lesions, middle-stage diffused lesions, and late-stage confluent necrosis.
Given that the volume of raw samples is insufficient to support the optimization of deep convolutional neural networks within a vast parameter space—which can easily lead to overfitting—this study designed a hybrid data augmentation strategy to expand the sample size and simulate complex imaging environments. First, random rotations of 10°, 30°, and 45° were applied to the raw images to simulate variable shooting angles in the field while maintaining the invariant topological features of the disease. Second, based on the HSV color space, a random perturbation of ±10% was applied to the brightness channel and contrast was enhanced to simulate photometric drift during morning and evening hours. Furthermore, to simulate different ISO sensitivities and the thermal noise of camera sensors in high-temperature environments, Gaussian noise with a normal distribution was superimposed in the image pixel space. Assuming the raw image is I ( x , y ) , the augmented image I n o i s y ( x , y ) is defined as:
I n o i s y ( x , y ) = I o r i g i n a l ( x , y ) + n ( 0 , σ 2 )
where σ 2 is the variance, with a random value range of [0.01, 0.03]. Through the aforementioned augmentation, the dataset scale was expanded to 1443 images (an increase of approximately 2.16 × ), effectively mitigating the overfitting risk of small-sample training and enhancing the model’s robustness against image quality degradation.
Initially, the raw collected images were partitioned into training, validation, and test sets following an 8:1:1 ratio via stratified sampling; subsequently, the stratified sampling strategy was rigorously implemented during the dataset augmentation phase. This strategy ensures a balanced distribution of samples from both 2023 and 2024 within each subset, thereby eliminating temporal domain bias resulting from inter-annual climatic variations.
In particular, to probe the model’s performance boundaries under extreme environments, the test set was specifically curated to encompass a collection of “challenging samples.” This subset includes 32 high-occlusion samples (occlusion rate > 50 % ), 28 low-illumination samples, and 25 ultra-early minute lesion samples (area ratio < 1 % ). This targeted partitioning enables the test set to authentically reflect the model’s generalization capability and robustness against interference in complex, unstructured agricultural scenarios. The processing of selected peanut leaf disease samples is illustrated in Figure 1.

2.2. The Improved Model: POD-YOLO

Addressing the three core challenges in peanut disease monitoring—dissipation of minute lesion features, physical occlusion resulting from high-density planting, and extreme class imbalance—this study proposes POD-YOLO, a physics-aware and dynamics-optimized detection model anchored on the lightweight YOLOv10n architecture. Adhering to the design philosophies of “data-centricity” and “training manifold reshaping,” POD-YOLO retains the advanced feature extraction components of YOLOv10n (such as SCDown downsampling and PSA spatial pyramid attention). Instead of merely stacking additional computational modules, it constructs a systemic synergistic framework encompassing “Augmentation-Loss-Optimization” [29]. This framework aims to reconstruct the model’s learning trajectory across three dimensions: the data input flow, the error propagation flow, and the parameter update flow. The overall architecture is illustrated in Figure 2.
Specifically, this synergistic framework incorporates improvement strategies across three dimensions: First, in the input space, the Physical Domain Reconstruction (PDR) module is introduced. To address severe field occlusion, this module employs explicit physical occlusion simulation and environmental invariance regularization, compelling the model to learn the contextual topological structure of lesions rather than local textures. This effectively blocks “shortcut learning” pathways and constructs a robust feature input space. Second, in the gradient space, a Loss Manifold Reshaping (LMR) mechanism is established. By leveraging dual-branch constraints of Focal Loss and CIoU Loss to reshape the classification gradient manifold, this mechanism suppresses the dominant effect of simple backgrounds while simultaneously enhancing the geometric localization precision of minute lesions. Finally, in the parameter space, a Decoupled Dynamic Scheduling (DDS) strategy is implemented. To mitigate the risk of small-sample training falling into local optima, this strategy integrates the AdamW optimizer with a cosine annealing algorithm to reshape optimization dynamics. By decoupling weight decay from gradient updates, it ensures the model can escape sharp saddle points and converge towards flat minima regions characterized by superior generalization capabilities.

2.2.1. PDR Module

The generalization capability of deep convolutional neural networks is contingent upon the distribution density and diversity of training data. However, in real-world field peanut disease monitoring scenarios, unstructured environmental backgrounds and complex canopy architectures inevitably induce severe long-tailed data distribution issues. While the standard Mosaic and Mixup strategies inherent in the original YOLOv10 architecture mitigate scale variation challenges to a certain degree, they struggle to effectively cope with the physical occlusion resulting from high-density planting and the photometric drift induced by dynamic natural lighting. Consequently, we introduce the Physical Domain Reconstruction (PDR) module (illustrated in Figure 3). It is a composite pipeline that integrates established techniques to reconstruct physical field characteristics. We employ Cutout regularization to simulate leaf overlapping, forcing the model to learn topological context. Simultaneously, we utilize RandomAffine and ColorJitter to model geometric viewpoint shifts and photometric drift, respectively. By synergizing these standard methods, the PDR module systematically constructs a robust feature space resilient to field noise.
Overlapping of leaves during peanut growth is a major physical factor leading to missed detection of tiny lesions. To align this physical phenomenon within the feature space, we selected the Cutout regularization technique [30]. From an information-theoretic perspective, lesion detection is often an “ill-posed problem.” Convolutional networks tend to engage in “shortcut learning,” relying solely on regions with the highest local contrast (such as the necrotic black spots at the center of brown spot lesions) for classification, while ignoring more essential but subtle semantic features. Cutout simulates the loss of local information caused by leaf occlusion by randomly erasing spatially continuous rectangular regions within the input tensor. This operation introduces a perceptual adversarial effect during training: when the most discriminative “black spot” features are occluded, the model is forced to redistribute its attention to secondary but generalizable feature domains, such as the chlorotic halos at lesion edges, the texture distortion of infected veins, and the spatial topological distribution of diseases on the leaves. This mandatory feature decoupling significantly enhances the model’s inference robustness under conditions of local feature absence, enabling it to maintain semantic consistency when processing highly occluded field samples.
Environmental Invariance Modeling via Geometric and Photometric Adaptation To cope with the geometric viewpoint randomness and dynamic lighting conditions prevalent in field imaging environments, this study integrated RandomAffine and ColorJitter operators into the optimization model, aiming to effectively bridge the distribution gap between the training set (source domain) and actual field scenarios (target domain). In terms of geometric manifold expansion, considering the complex spatial morphology of peanut leaves under wind disturbance or gravitational influence, affine transformations are employed to simulate the inevitable non-vertical viewing angles, rotational deviations, and scale scaling associated with handheld photography. This compels the model to learn disease geometric features possessing rotation and translation invariance. Simultaneously, regarding the correction of photometric drift, given that the waxy cuticle on the peanut leaf surface is prone to specular reflection under strong light or deep shadows due to occlusion, color jitter is utilized to apply random nonlinear perturbations to hue, saturation, and brightness within the HSV space. This strategy successfully simulates various lighting modalities ranging from direct intense light to diffuse scattered light, enabling the model to effectively decouple environmental lighting interference and focus on extracting the essential biochemical color features of lesions (such as the typical yellow-brown representation of brown spots), thereby achieving robust detection across different time periods and weather conditions.

2.2.2. DDS Strategy

In the training of deep learning models, the selection of the optimizer governs both the convergence rate and the generalization boundary. While Stochastic Gradient Descent (SGD) typically yields superior generalization performance, it suffers from slow convergence and high sensitivity to hyperparameters. Conversely, adaptive methods (such as Adam), despite their rapid convergence, frequently become trapped in sharp local minima, thereby compromising generalization capability. Given the distinct characteristics of the peanut disease dataset used in this study—specifically its extreme sample scarcity (merely 1443 images) and background complexity—the direct application of standard adaptive optimizers is highly prone to inducing the “memorization” of high-frequency background noise, leading to overfitting [31]. To address this, this strategy synergistically integrates the decoupled weight decay mechanism of AdamW with cosine annealing scheduling, aiming to suppress overfitting from the perspective of optimization dynamics.
The core of the DDS strategy lies in the introduction of AdamW to resolve the “coupled decay dilemma” inherent in the standard Adam algorithm. In standard Adam, L2 regularization is typically implemented by adding a penalty term to the objective loss function f ( θ ) . Let θ t denote the parameter vector at time t, g t be the gradient, and λ be the weight decay coefficient; the gradient update rule is expressed in a coupled form:
g t = f ( θ t 1 ) + λ θ t 1
Parameters are subsequently updated via adaptive moment estimation:
θ t = θ t 1 η t m ^ t v ^ t + ϵ
By substituting g t into the momentum calculation, the actual update step size for the regularization term is implicitly scaled by the adaptive factor 1 / v ^ t . This leads to the coupled decay dilemma: the regularization strength is subjected to scaling interference from the second moment of the gradient. In the later stages of training, when the gradients of certain parameters diminish ( v ^ t approaches 0), the regularization term is unreasonably amplified; conversely, it is diminished when gradients are large. This unstable constraint force renders the model unable to effectively constrain the parameter space, making it extremely susceptible to memorizing background noise during small-sample training.
In Standard Adam, L2 regularization exists as a component of the loss function ( L o s s + λ | | w | | 2 ). During gradient computation, the regularization term is scaled by Adam’s adaptive learning rate ( 1 / v ^ t ). This implies that for parameters with large gradient variations and short update steps, the regularization strength is inadvertently attenuated; conversely, it is amplified for others. This coupling induces a bias in the optimization direction; particularly in noisy agricultural datasets, this makes the model prone to overfitting background textures.
To resolve this, DDS employs the AdamW mechanism to decouple weight decay from gradient calculation. As illustrated in Figure 4, unlike Standard Adam which treats the regularization term as part of the loss function, AdamW applies it as an independent additive term to the parameter update step:
θ t = θ t 1 η t m ^ t v ^ t + ϵ Adaptive Gradient Step η t λ θ t 1 Explicit Decoupled Decay
As shown in the formula, the regularization term η t λ θ t 1 is independent of the adaptive second moment v ^ t .
This explicit decoupling ensures that regardless of gradient fluctuations, all parameters are subjected to a constant complexity penalty. The constant regularization constraint provided by DDS forces model parameters to remain within a smaller norm range, effectively suppressing the model’s memorization of non-robust features (such as background textures), thereby significantly reducing the risk of overfitting.
To further enhance the model’s optimization capability on non-convex loss surfaces, DDS replaces the linear decay strategy with cosine annealing. Combined with a Warmup strategy, this mechanism rapidly increases the learning rate during the initial training phase to traverse flat saddle point regions, and subsequently decreases it smoothly using the periodicity of the cosine function. From the perspective of the optimization landscape, this non-linear scheduling aids the model in escaping sharp local minima and ultimately converging to flat minima regions. Flat minima imply that minute perturbations in parameters do not result in drastic changes in Loss, thereby guaranteeing the model’s robustness on unseen test data.
Experimental logs corroborate the efficacy of the DDS strategy, evidenced by the absence of drastic gradient surges during the initial model initialization phase. The metrics/mAP50(B) trajectory ascended steadily from a value of 0.00258 at the first epoch, maintained exceptional stability throughout the mid-to-late training stages, and ultimately attained a peak of 0.8503. These results demonstrate that the proposed synergistic optimization strategy successfully achieved a dual enhancement in both convergence speed and generalization accuracy under small-sample conditions.

2.2.3. LMR Mechanism

In the optimization process of deep learning, the loss function defines the error surface topology within the high-dimensional parameter space, directly dictating the model’s convergence direction and ultimate performance. Although the original YOLOv10 configuration excels on general-purpose datasets, it encounters severe positive-negative sample imbalance when applied to the specific task of peanut field disease monitoring, where background pixels (such as soil, plastic mulch, and abundant healthy leaves) maintain absolute dominance. Under the default Binary Cross Entropy loss, the accumulated gradients from massive “easy negative” samples (e.g., healthy leaves, soil) frequently overwhelm the optimization direction, causing the “gradient drowning” of minute lesion features. To address this, this study constructed the Loss Manifold Reshaping (LMR) mechanism. Far from a simplistic substitution of loss functions, LMR is a collaborative optimization framework customized for imbalanced small-sample detection. For the classification branch, to break the gradient effect dominated by the background, we reconstructed the standard Cross-Entropy Loss into Focal Loss.
Reshaping Classification Space via Focal Loss By incorporating the dynamic modulating factor ( 1 p t ) γ , Focal Loss reshapes the loss curve within the classification probability space:
F L ( p t ) = α t ( 1 p t ) γ log ( p t )
Here, the focusing parameter γ (set to 1.5 in this study) plays a central regulatory role. As the prediction probability p t of a sample increases, this factor approaches zero, thereby exponentially down-weighting the contribution of simple backgrounds to the total gradient. The pivotal value of Focal Loss lies in embedding Hard Mining as a continuous, differentiable term within the loss function. This effectively “silences” the dominant background noise that standard YOLO mechanisms fail to suppress, forcing the optimization focus to shift from feature-salient large lesions to “edge cases” characterized by insufficient lighting, motion blur, or those generated by the PDR augmentation module. Analysis of training logs indicates that the rapid decline in classification loss (cls_loss) within the first 10 epochs is a direct result of the LMR mechanism effectively suppressing background gradient noise, enabling the model to swiftly traverse flat regions and enter a phase of fine-grained learning of lesion textures.
Geometric Constraints via C I o U in the Regression Branch Regarding the regression branch, although the default architecture of YOLOv10 integrates CIoU Loss [32], this study reinterprets and validates its specific applicability to peanut brown spot detection from a biological morphological perspective. Traditional IoU loss focuses solely on the overlap area. In complex backgrounds, this often leads to “bounding box drift,” where the model predicts a box that covers the lesion but exhibits a severely distorted aspect ratio (e.g., a vertically elongated box covering a circular spot). Regarding the regression branch, we integrated CIoU Loss to enhance localization precision. Unlike generic objects with highly variable shapes, peanut leaf spots exhibit regular geometric morphologies (typically circular or elliptical). As illustrated in Figure 5, building upon the fundamental metric of overlap area, C I o U explicitly introduces two critical geometric constraints. This constraint forces the predicted box to align not just with the location but also with the inherent geometric shape of the ground truth, effectively correcting distortion errors.
Its mathematical expression introduces an aspect ratio consistency penalty term α v :
L C I o U = 1 I o U + ρ 2 ( b , b g t ) c 2 + α v
v = 4 π 2 arctan w g t h g t arctan w h 2

2.3. Evaluation Metrics

To comprehensively quantify the performance of the POD-YOLO model in peanut disease detection tasks, this study established a multidimensional evaluation framework encompassing detection accuracy, model efficiency, and comprehensive robustness.In response to the specific morphological characteristics of peanut leaf spots—namely, their multi-class nature, small scale, and blurred boundaries—this study selected the following core metrics:
P r e c i s i o n is a metric used to measure the proportion of true disease samples among the samples predicted as disease by the model. In the context of complex field backgrounds, high p r e c i s i o n signifies that the model can effectively discriminate disease lesions from soil, plastic mulch, or healthy leaves. The calculation formula is:
P r e c i s i o n = T P T P + F P
where T P denotes True Positives, and F P denotes False Positives.
R e c a l l is used to measure the model’s capacity to excavate hard-to-classify samples. Given that this study introduced Focal Loss to address the positive-negative sample imbalance, variations in the R e c a l l metric are particularly pivotal. The improvement in R e c a l l directly validates the effectiveness of the optimization strategies in alleviating the “missed detection” problem. The calculation formula is as follows:
R e c a l l = T P T P + F N
where F N denotes the number of true disease samples that were undetected.
F1-score can be used to evaluate the comprehensive performance of the model and is the harmonic mean of P r e c i s i o n and R e c a l l . The calculation method for the F1 score is as follows:
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
When both p r e c i s i o n and r e c a l l results are favorable, the F1 score approaches 1. The F1 index reflects the optimal equilibrium state between the model’s p r e c i s i o n and r e c a l l .
To comprehensively reflect the detection performance of different classes under varying confidence thresholds, Average Precision ( A P ) and m A P were used. These are obtained by calculating the area under the Precision-Recall (P-R) curve. The calculation formula is as follows:
A P = 0 1 P ( R ) d R
m A P is obtained by averaging the A P values across all target classes, as follows:
m A P = 1 N c l a s s i = 1 N c l a s s A P i
In addition, we utilized the defining formulas for Focal Loss and CIoU as aforementioned.

2.4. Experimental Setup

To ensure the fairness and reproducibility of the experimental results, all experiments were conducted on a unified hardware platform and software framework. The experimental workstation was equipped with an NVIDIA GeForce RTX 3090 GPU to provide high-performance parallel computing support, an Intel Core i9-10900K CPU, and 64 GB of RAM. The operating environment was based on Linux, utilizing the PyTorch 1.12.0+cu113 deep learning framework in conjunction with CUDA version 11.3.Regarding the training strategy, the AdamW optimizer was adopted. The initial learning rate was set to 1 × 10 3 , and the weight decay coefficient was set to 5 × 10 2 . Furthermore, a cosine annealing strategy was introduced to dynamically schedule the learning rate, with the minimum learning rate set to 1% of the initial value.

3. Results and Analysis

3.1. Detection Performance Analysis

To comprehensively verify the superiority of POD-YOLO, we compared it against a wide range of state-of-the-art (SOTA) object detection models. These include mainstream lightweight models (YOLOv5n, YOLOv8n, and YOLOv10n), the Transformer-based RT-DETR, and the latest open-vocabulary foundation model YOLO-World. The comparative results on the test set are summarized in Table 1.
As shown in Table 1, POD-YOLO achieved the best overall performance among all compared methods, recording the highest mAP@50 (85.22%) and F1-score (83.68%). POD-YOLO significantly outperforms the baseline YOLOv10n (82.76%) by 2.46% and surpasses the strong competitor YOLOv8n (84.44%) by 0.78%. Notably, our model achieves a much higher Recall (83.08%) than YOLOv5n, v8n, and v10n, demonstrating that the proposed LMR mechanism effectively reduces missed detections in complex agricultural environments. Even against computationally intensive models, POD-YOLO maintains its lead. It outperforms RT-DETR (mAP 82.94%) by 2.28%. Furthermore, while the foundation model YOLO-World exhibits a slightly higher Precision (84.71%), POD-YOLO surpasses it in both Recall (83.08% vs. 81.58%) and mAP@50 (85.22% vs. 84.60%). This indicates that while foundation models are powerful, our specialized design captures subtle disease features more effectively than general-purpose pre-training. The 83.08% Recall ensures the detection of subtle, early-stage lesions that “weak” models often overlook. In disease management, minimizing false negatives is crucial to preventing rapid pathogen spread across the field. Concurrently, the 84.29% boost in precision signifies a reduced false alarm rate. Specifically, within the context of variable-rate spraying, a 9.7% precision gain converts into a reduction of nearly 10% in wasteful spraying by avoiding non-target areas like healthy foliage and backgrounds. Such efficiency is vital for lowering input costs and mitigating environmental contamination.
From the perspective of deployment efficiency, POD-YOLO demonstrates a decisive advantage.With a computational cost of only 8.4 GFLOPs, our model is on par with the lightweight YOLOv8n and YOLOv10n. In stark contrast, POD-YOLO requires 92% less computation than RT-DETR (100.6 GFLOPs) and 74% less than YOLO-World (31.9 GFLOPs). This significant improvement validates that POD-YOLO does not compromise efficiency to achieve higher accuracy. Instead, it successfully constructs a high-performance feature space with minimal computational resources, proving to be the optimal choice for real-time peanut disease detection on resource-constrained edge devices.
Mechanism analysis reveals that the synergy between PDR and DDS significantly suppressed background false positives by compelling the model to learn contextual features rather than isolated textures. Meanwhile, AdamW’s decoupled weight decay effectively mitigated overfitting, clarifying decision boundaries and leading to the marked improvement in Precision. Among the comparative models, the superior Recall of POD-YOLO (83.08%) indicates the successful capture of hard examples neglected by others. This is attributed to Focal Loss, which dynamically increased the gradient weights of hard-to-classify samples, ensuring the model could not “ignore” these edge cases during training.
Figure 6 illustrates the Precision-Recall (P-R) curves for all models evaluated on the test subset. POD-YOLO (represented by the red line) forms the optimal performance envelope, maintaining a significant precision advantage, particularly in the high-recall region (Recall > 0.8). This substantiates the model’s efficacy in overcoming the tendency of traditional detectors toward false positives during hard mining tasks. In contrast, in the same high-recall zone (Recall > 0.8), both the original YOLOv10n and YOLOv8n exhibit a steep decline, indicating that these models introduce a substantial volume of false detections when attempting to exhaustively retrieve all lesion instances.
The trajectories of evaluation metrics throughout the training cycle reveal that the model’s loss function exhibits smooth, oscillation-free convergence characteristics. Furthermore, the rapid ascent of mAP and Recall in the early training stages serves as robust validation of the superior convergence stability and generalization robustness conferred by the DDS and PDR strategies on small-sample datasets, as illustrated in Figure 7.
To provide a more intuitive visualization of the detection performance enhancement of POD-YOLO relative to the baseline (YOLOv11n), a comparative analysis of visual results on the test subset was conducted for both models, as illustrated in Figure 8.
Furthermore, to strictly control experimental variables and isolate the contributions of the proposed architectural modules (PDR, DDS, and LMR), this study adopted a unified hyperparameter protocol for all comparative models. While this standardized setting might slightly limit the potential of certain baselines, it ensures that the observed performance gains are attributed to the structural improvements of the POD-YOLO framework rather than inconsistencies in hyperparameter engineering. Moreover, the significant performance advantage achieved by POD-YOLO (e.g., 9.7% higher precision over the baseline) indicates that these improvements stem from the effective handling of occlusion and class imbalance, which are difficult to compensate for via minor hyperparameter adjustments. The experimental results presented above demonstrate that through the “Augmentation-Loss-Optimization” synergistic improvements, the POD-YOLO model achieves dual optimality in both accuracy and speed, without compromising computational efficiency.

3.2. Ablation Study

To validate the individual contributions and synergistic effects of each component within the POD-YOLO framework, we constructed a systematic ablation study under unified experimental conditions. Anchored on the original YOLOv10n as the Baseline, we progressively integrated the PDR module, the DDS strategy, and the LMR mechanism. The quantitative results are presented in Table 2. While the Baseline achieved an mAP@50 of 82.99%, its Precision was limited to 74.59%, indicating a significant propensity for false positives in complex backgrounds. Addressing this critical deficiency, the introduction of individual modules demonstrated distinct corrective capabilities: The PDR module, through Cutout-based physical occlusion simulation, effectively disrupted shortcut features in the images. This significantly elevated Precision to 78.10%, confirming its capability to suppress background false alarms. The LMR mechanism delivered the most substantial performance surge, with Precision increasing dramatically to 82.21% and mAP@50 reaching 85.22%. This powerfully validates the pivotal role of Focal Loss in suppressing gradients from massive simple negative samples and reshaping the optimization direction. The DDS strategy prioritized Recall optimization. By utilizing the decoupled weight decay mechanism of AdamW, it assisted the model in escaping local minima, thereby lifting Recall to 82.44%.
Further incremental experiments revealed significant non-linear synergistic effects among the modules. For instance, while the combination of Baseline + DDS + LMR achieved exceptional Recall (82.67%) and F1-score (81.57%), effectively compensating for the recall deficiency of the standalone LMR mechanism, its Precision (80.49%) still exhibited room for improvement. This implies that in the absence of Physical Domain Reconstruction (PDR), the model’s robustness against challenging occluded samples remains limited.
A close inspection of the ablation results reveals a notable non-linear interaction among the components. Specifically, the combination of “Baseline + PDR + LMR” (mAP@50: 83.27%) exhibited a performance dip compared to the “Baseline + LMR” configuration (mAP@50: 85.06%). This suggests a potential antagonistic effect between physical augmentation and loss reshaping when widely applied without dynamic optimization constraints. We attribute this phenomenon to “Gradient Noise Amplification.” The PDR module introduces synthetic occlusions (Cutout), creating artificially “hard” samples. Simultaneously, the LMR mechanism (Focal Loss) aggressively up-weights the gradients of these hard samples. Without advanced regularization, the standard optimizer struggles to distinguish between “useful hard features” (minute lesions) and “synthetic noise” (occlusion blocks), leading to optimization instability.However, the integration of the DDS strategy in the full POD-YOLO model resolves this conflict, boosting the mAP@50 to 85.22%. The decoupled weight decay mechanism in DDS acts as a regularization barrier, preventing the model from overfitting to the high-frequency noise generated by PDR. This confirms that DDS is not merely an optimizer but a critical synergistic bridge that enables the PDR and LMR modules to function cooperatively rather than antagonistically.
Ultimately, the proposed POD-YOLO (full module fusion) achieved an optimal balance across all evaluation metrics. It not only realized a dual breakthrough with 84.29% Precision and 83.08% Recall, but also attained a peak F1-score of 83.68%. Compared to the Baseline, its mAP@50 increased to a peak of 85.22%. As illustrated in Figure 9, the red curve representing POD-YOLO constitutes the Upper Envelope of all comparative models, notably maintaining a gradual declining trend even in the high-recall region.

3.3. Generalization Ability Evaluation

To objectively evaluate the robustness and adaptability of the proposed model when facing unknown data distributions, we compared POD-YOLO with the baseline model on a second public dataset, PlantDoc. Unlike the primary peanut dataset, PlantDoc features highly unstructured field environments and diverse plant species, which poses a significant challenge to the model’s feature extraction capabilities. This evaluation is crucial for verifying the model’s generalization potential for real-world deployment in diverse agricultural scenarios. The corresponding comparative results are presented in Table 3.
The inherent non-stationarity of field environments represents a critical bottleneck constraining the long-term deployment of models. Climatic fluctuations between different years (e.g., rainfall, light intensity) often induce significant distribution shifts in plant growth status, leaf coloration, and lesion morphology. To validate the robustness of POD-YOLO against this “temporal domain shift,” this study adopted a cross-year data fusion strategy covering two complete disease occurrence cycles in 2023 and 2024. This design not only substantially enriched sample diversity but also constructed a rigorous test benchmark for evaluating the model’s generalization boundaries. Models trained on datasets collected during a single time window are highly prone to “temporal overfitting,” wherein they memorize background features specific to a particular year (such as soil color deviations caused by drought). By training on mixed data from both years, POD-YOLO is compelled to seek optimal decision boundaries within a data manifold characterized by significant spatiotemporal heterogeneity. Experimental results indicate that the model successfully decoupled irrelevant environmental noise caused by inter-annual variations (e.g., light color temperature, changes in background weed density) and learned intrinsic disease features independent of the temporal dimension—such as the necrotic core structure and the topology of chlorotic halos in brown spots—thereby maintaining excellent detection stability on the mixed test set.
Furthermore, addressing the inconsistency in peanut canopy density and leaf occlusion patterns caused by inter-annual climatic differences, the PDR module integrated into POD-YOLO played a crucial regulatory role. By introducing random occlusion perturbations during training that are more complex than the actual field environment, this strategy endowed the model with strong adaptability to differentiated occlusion scenarios. Even in complex settings specific to a certain year, characterized by high-density overlapping or vigorous vegetative growth, the model remained capable of precisely localizing minute lesions by leveraging powerful contextual reasoning capabilities. The results demonstrate that POD-YOLO’s superior performance on the cross-year mixed test set confirms that it did not merely memorize the data distribution of a specific timeframe; rather, it possesses the temporal generalization capability to operate continuously within dynamic agricultural ecosystems. This provides a reliable theoretical basis for the model’s actual field deployment in future years.

4. Discussion

4.1. Feasibility Analysis for Edge Deployment

Crucially, the core improvements (PDR, DDS, LMR) are designed as training-time strategies, incurring zero inference cost. The proposed PDR, DDS, and LMR modules are strictly ’training-time techniques’ that introduce zero additional parameters or layers during inference. Consequently, POD-YOLO retains the baseline’s 8.4 GFLOPs, ensuring that its inference speed on resource-constrained platforms (e.g., Jetson Nano, Raspberry Pi) remains consistent with the highly efficient YOLOv10n, allowing seamless migration to agricultural IoT devices.

4.2. Environmental Robustness and Cross-Domain Generalization

Generalization Boundaries and Environmental Adaptability: While the dataset is regionally specific, the PDR module provides a theoretical foundation for environmental generalization. By integrating HSV-space perturbations and Gaussian noise during training, the model is explicitly regularized against severe lighting drifts and sensor interference common in extreme field conditions. Regarding the extension to different crop varieties, we posit that POD-YOLO learns intrinsic disease topologies (e.g., necrotic cores) that possess biological universality. Future deployments can leverage the model as a pre-trained backbone for Transfer Learning, efficiently extending its applicability to new cultivars without training from scratch.
The introduction of the PDR module breaks this deadlock through a “perceptual adversarial” mechanism. This strategy is particularly critical for the small dataset (1443 images) used in this study. By physically occluding salient features, PDR prevents the model from overfitting to the limited training data, effectively acting as a strong regularizer that validates the reliability of the performance gains. Further analysis of the synergistic effects between the LMR mechanism and the DDS strategy reveals their unique advantages in addressing the long-tail distribution problem in agricultural datasets. Unlike previous studies that often attempted to mitigate class imbalance via re-sampling, this study establishes a complementary optimization loop. Focal Loss effectively suppresses the easy negatives that dominate the image by dynamically reshaping the gradient manifold, driving the model to focus on minute and blurry hard examples. The AdamW mechanism in DDS provides a decoupled weight decay barrier. This ensures that the significant 9.7% precision gain is derived from learning robust decision boundaries rather than memorizing the sparse data distribution. The improvement in mAP@50-95 in the ablation study, along with the convergence of the optimization trajectory towards flat minima regions, provides strong empirical evidence that the combination of geometric constraints and dynamic stability effectively enhances localization robustness under stringent evaluation criteria. This finding holds broad implications for precision agriculture, as precise bounding box regression is a prerequisite for reducing pesticide waste in variable rate spraying.
In the broader context of smart agriculture, POD-YOLO demonstrates breakthrough Pareto optimality regarding computational efficiency and detection performance. Although existing SOTA models demonstrate superior theoretical accuracy through complex gradient flow designs, their increased parameter count often hinders edge deployment. POD-YOLO inherits and solidifies the efficient architectural advantages of YOLOv10n, strictly controlling the computational load at 8.4 GFLOPs. This study selected YOLOv10n because it represents the current limits of computational power on edge devices (8.4 GFLOPs). Our contribution demonstrates that by optimizing the training strategy, it is possible to achieve high usability even with low computational power, without needing to switch to a “stronger model” (with higher computational power). Although the current evaluation relies on static images, the PDR module inherently enhances robustness against dynamic artifacts. Specifically, the RandomAffine transformations simulate geometric distortions induced by wind-induced leaf movement, while Gaussian noise injection improves tolerance to motion blur and sensor instability. Furthermore, the core contributions of this study—DDS and LMR—are designed as distinct training-time optimization strategies independent of the spatial architecture. Consequently, these strategies can be seamlessly extended to video-based detection frameworks (e.g., integrating temporal attention blocks or Kalman filtering) to further stabilize detection consistency in continuous agricultural monitoring streams.
POD-YOLO’s “zero-inference-cost” architecture facilitates seamless integration into agricultural automation. In practical deployment, the model can be embedded on edge units within autonomous robots, where its high efficiency (>30 FPS) serves as a real-time trigger for Variable Rate Application systems. This “Spot-Spraying” mechanism activates nozzles only upon disease detection, significantly reducing chemical usage. Future research will focus on three key directions: Extreme Compression, exploring INT8 quantization for ultra-low-power microcontrollers; Multi-modal Fusion, integrating RGB-D or multispectral data to capture pre-symptomatic stress signals; and Active Learning, implementing a Human-in-the-Loop system to ensure continuous adaptation to new cultivars and pathogen mutations.

5. Conclusions

Addressing the challenges of minute feature dissipation, physical occlusion, and class imbalance in peanut disease monitoring, this study proposes POD-YOLO, a physics-aware and dynamics-optimized lightweight detection model. Instead of significantly altering the model depth, we utilize YOLOv10n as a foundation and implement a synergistic “Augmentation-Loss-Optimization” strategy to optimize the baseline.
The primary contribution of this study lies in validating the effectiveness of explicit physical prior injection and decoupled optimization dynamics for enhancing the generalization capability of lightweight models. Through the introduction of the PDR module, we successfully disrupted the “shortcut learning” propensity of CNNs, constructing a robust feature space. Simultaneously, the complementary loop formed by the DDS strategy and the LMR mechanism successfully resolved overfitting and gradient imbalance on small-sample datasets, enabling the model to achieve SOTA performance on the stringent mAP@50-95 metric while maintaining high recall. Crucially, POD-YOLO achieves these performance breakthroughs while strictly maintaining a low computational load of 8.4 GFLOPs. This “zero-inference-cost” characteristic demonstrates the vast potential of exploiting lightweight architectures solely through optimizing inductive biases during training, providing an efficient technical paradigm for developing low-power, real-time portable smart plant protection equipment. Future work will focus on integrating temporal feature fusion to address dynamic motion blur in fields and exploring the generalization capabilities of this synergistic optimization framework on other economic crops, thereby promoting the ubiquitous application of deep learning in smart agriculture.

Author Contributions

Conceptualization, Y.L. and W.Z.; methodology, Y.L. and W.Z.; software, Y.L.; validation, Y.L., W.Z., L.Z. and S.X.; formal analysis, S.X. and Z.W.; investigation, W.Z.; resources, L.Z.; data curation, W.Z. and Z.W.; writing—original draft preparation, H.Z.; writing—review and editing, L.Z. and S.X.; visualization, L.Z.; supervision, L.Z.; project administration, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the second batch of the “Tianchi Talents” Introduction Program of Xinjiang Uygur Autonomous Region in 2023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated for this study are part of an ongoing research project and are therefore not publicly available at this time to protect the integrity of future findings. Researchers interested in collaboration or data access may contact the corresponding author with a detailed proposal.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Larsen, J.; Dunne, J.; Austin, R.; Newman, C.; Kudenov, M. Automated Pipeline for Leaf Spot Severity Scoring in Peanuts Using Segmentation Neural Networks. Plant Methods 2025, 21, 22. [Google Scholar] [CrossRef]
  2. Guo, Z.; Chen, X.; Li, M.; Chi, Y.; Shi, D. Construction and Validation of Peanut Leaf Spot Disease Prediction Model Based on Long Time Series Data and Deep Learning. Agronomy 2024, 14, 294. [Google Scholar] [CrossRef]
  3. Ma, X.; Zhang, X.; Guan, H.; Wang, L. Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model. Agronomy 2024, 14, 1518. [Google Scholar] [CrossRef]
  4. Guan, Q.; Zhao, D.; Feng, S.; Xu, T.; Wang, H.; Song, K. Hyperspectral Technique for Detection of Peanut Leaf Spot Disease Based on Improved PCA Loading. Agronomy 2023, 13, 1153. [Google Scholar] [CrossRef]
  5. Feng, Q.; Xu, P.; Ma, D.; Lan, G.; Wang, F.; Wang, D.; Yun, Y. Online Recognition of Peanut Leaf Diseases Based on the Data Balance Algorithm and Deep Transfer Learning. Precis. Agric. 2023, 24, 560–586. [Google Scholar] [CrossRef]
  6. Chen, Z.; Wu, R.; Lin, Y.; Li, C.; Chen, S.; Yuan, Z.; Chen, S.; Zou, X. Plant Disease Recognition Model Based on Improved YOLOv5. Agronomy 2022, 12, 365. [Google Scholar] [CrossRef]
  7. Wrat, G.; Ranjan, P.; Mishra, S.K.; Jose, J.T.; Das, J. Neural Network-Enhanced Internal Leakage Analysis for Efficient Fault Detection in Heavy Machinery Hydraulic Actuator Cylinders. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2025, 239, 1021–1031. [Google Scholar] [CrossRef]
  8. Meng, Y.; Zhan, J.; Li, K.; Yan, F.; Zhang, L. A Rapid and Precise Algorithm for Maize Leaf Disease Detection Based on YOLO MSM. Sci. Rep. 2025, 15, 6016. [Google Scholar] [CrossRef]
  9. Tugrul, B.; Elfatimi, E.; Eryigit, R. Convolutional Neural Networks in Detection of Plant Leaf Diseases: A Review. Agriculture 2022, 12, 1192. [Google Scholar] [CrossRef]
  10. Luo, J.; Wu, Q.; Wang, Y.; Zhou, Z.; Zhuo, Z.; Guo, H. MSHF-YOLO: Cotton Growth Detection Algorithm Integrated Multi-Semantic and High-Frequency Features. Digit. Signal Process. 2025, 167, 105423. [Google Scholar] [CrossRef]
  11. Guan, Q.; Song, K.; Feng, S.; Yu, F.; Xu, T. Detection of Peanut Leaf Spot Disease Based on Leaf-, Plant-, and Field-Scale Hyperspectral Reflectance. Remote Sens. 2022, 14, 4988. [Google Scholar] [CrossRef]
  12. Huang, L.; Xu, L.; Wang, Y.; Peng, Y.; Zou, Z.; Huang, P. Efficient Detection Method of Pig-Posture Behavior Based on Multiple Attention Mechanism. Comput. Intell. Neurosci. 2022, 2022, 1759542. [Google Scholar] [CrossRef] [PubMed]
  13. Lawal, M.O. Tomato Detection Based on Modified YOLOv3 Framework. Sci. Rep. 2021, 11, 1447. [Google Scholar] [CrossRef] [PubMed]
  14. Liu, Z.; Wang, L.; Liu, Z.; Wang, X.; Hu, C.; Xing, J. Detection of Cotton Seed Damage Based on Improved YOLOv5. Processes 2023, 11, 2682. [Google Scholar] [CrossRef]
  15. Liu, J.; Feng, X.; Sun, Y.; Ni, R.; Chen, X. Research on Plant Disease Detection Based on Improved YOLOv8. In Proceedings of the 2025 37th Chinese Control and Decision Conference (CCDC), Xiamen, China, 16–19 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 5343–5347. [Google Scholar]
  16. Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An Improved YOLOv5 Model Based on Visual Attention Mechanism: Application to Recognition of Tomato Virus Disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
  17. Wang, Q.; Liu, Y.; Zheng, Q.; Tao, R.; Liu, Y. SMC-YOLO: A High-Precision Maize Insect Pest-Detection Method. Agronomy 2025, 15, 195. [Google Scholar] [CrossRef]
  18. Wang, Z.; Han, D.; Han, B.; Wu, Z.; Huang, X. SVN-YOLO: A High-Precision Ship Detection Algorithm Based on Improved YOLOv10n. J. Supercomput. 2025, 81, 1294. [Google Scholar] [CrossRef]
  19. Tang, Z.; Chen, C.; Chen, Y.; Zhong, J. Research on Target Detection of Sewage Outfall Based on Improved YoloV10n. J. Phys. Conf. Ser. 2025, 3106, 012012. [Google Scholar] [CrossRef]
  20. Sun, H.; Yao, G.; Zhu, S.; Zhang, L.; Xu, H.; Kong, J. SOD-YOLOv10: Small Object Detection in Remote Sensing Images Based on YOLOv10. IEEE Geosci. Remote Sens. Lett. 2025, 22, 8000705. [Google Scholar] [CrossRef]
  21. Guo, Y.; Lan, Y.; Chen, X. CST: Convolutional Swin Transformer for Detecting the Degree and Types of Plant Diseases. Comput. Electron. Agric. 2022, 202, 107407. [Google Scholar] [CrossRef]
  22. Islam, M.S.; Abid, M.A.S.; Rahman, M.; Barua, P.; Islam, K.; Zilolakhon, R.; Salayeva, L. YOLOv10-Powered Detection of Tea Leaf Diseases: Enhancing Crop Quality through AI. In Proceedings of the 2025 4th International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS), Ernakulam, India, 11–13 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 406–411. [Google Scholar]
  23. Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A Novel Real-Time Detector for Apple Leaf Diseases Using Improved Light-Weight Convolutional Neural Networks. Comput. Electron. Agric. 2021, 189, 106379. [Google Scholar] [CrossRef]
  24. Sun, S.; Zheng, S.; Xu, X.; He, Z. GD-YOLO: A Lightweight Model for Household Waste Image Detection. Expert Syst. Appl. 2025, 279, 127525. [Google Scholar] [CrossRef]
  25. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
  26. Singh, J.; Beeche, C.; Shi, Z.; Beale, O.; Rosin, B.; Leader, J.; Pu, J. Batch-Balanced Focal Loss: A Hybrid Solution to Class Imbalance in Deep Learning. J. Med. Imaging 2023, 10, 051809. [Google Scholar] [CrossRef] [PubMed]
  27. Huang, Y.; Zhao, H.; Wang, J. YOLOv8-E: An Improved YOLOv8 Algorithm for Eggplant Disease Detection. Appl. Sci. 2024, 14, 8403. [Google Scholar] [CrossRef]
  28. P, Y.S.G.; Seemakurthy, K.; Opoku, A.A.; Bharatula, S.D. BBoxCut: A Targeted Data Augmentation Technique for Enhancing Wheat Head Detection Under Occlusions. arXiv 2025, arXiv:2503.24032. [Google Scholar] [CrossRef]
  29. Lu, Y.; Yu, J.; Zhu, X.; Zhang, B.; Sun, Z. YOLOv8-Rice: A Rice Leaf Disease Detection Model Based on YOLOv8. Paddy Water Environ. 2024, 22, 695–710. [Google Scholar] [CrossRef]
  30. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar] [CrossRef]
  31. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  32. Liu, Y.; Li, X.; Fan, Y.; Liu, L.; Shao, L.; Yan, G.; Geng, Y.; Zhang, Y. Classification of Peanut Pod Rot Based on Improved YOLOv5s. Front. Plant Sci. 2024, 15, 1364185. [Google Scholar] [CrossRef]
Figure 1. Visualization of the dataset preprocessing and augmentation pipeline. (a) Original peanut leaf spot image; (b) Visual attention heatmap; (c) Focus map; (d) 45° rotated augmented sample; (e) 10° rotated augmented sample; (f) Augmented sample with superimposed Gaussian noise.
Figure 1. Visualization of the dataset preprocessing and augmentation pipeline. (a) Original peanut leaf spot image; (b) Visual attention heatmap; (c) Focus map; (d) 45° rotated augmented sample; (e) 10° rotated augmented sample; (f) Augmented sample with superimposed Gaussian noise.
Applsci 16 01162 g001
Figure 2. POD-YOLO Model Structure Diagram.
Figure 2. POD-YOLO Model Structure Diagram.
Applsci 16 01162 g002
Figure 3. PDR Module Flowchart.
Figure 3. PDR Module Flowchart.
Applsci 16 01162 g003
Figure 4. Schematic diagram comparing the working principles of Standard Adam and AdamW mechanisms (The symbol ‘*’ in the formula denotes the multiplication operator).
Figure 4. Schematic diagram comparing the working principles of Standard Adam and AdamW mechanisms (The symbol ‘*’ in the formula denotes the multiplication operator).
Applsci 16 01162 g004
Figure 5. Schematic illustration of the geometric principles of the CIoU loss mechanism within the LMR framework.
Figure 5. Schematic illustration of the geometric principles of the CIoU loss mechanism within the LMR framework.
Applsci 16 01162 g005
Figure 6. Comparison of Precision-Recall (P-R) curves between the POD-YOLO model and mainstream object detection models for leaf disease detection.
Figure 6. Comparison of Precision-Recall (P-R) curves between the POD-YOLO model and mainstream object detection models for leaf disease detection.
Applsci 16 01162 g006
Figure 7. Training performance of the POD-YOLO model: Loss curves, Precision curves, and mAP curves.
Figure 7. Training performance of the POD-YOLO model: Loss curves, Precision curves, and mAP curves.
Applsci 16 01162 g007
Figure 8. Comparison of detection results between the POD-YOLO model and the baseline model.
Figure 8. Comparison of detection results between the POD-YOLO model and the baseline model.
Applsci 16 01162 g008
Figure 9. Comparison of Precision-Recall curves obtained from the ablation study.
Figure 9. Comparison of Precision-Recall curves obtained from the ablation study.
Applsci 16 01162 g009
Table 1. Comparison of detection performance between the POD-YOLO model and mainstream object detection models (Best values are shown in bold).
Table 1. Comparison of detection performance between the POD-YOLO model and mainstream object detection models (Best values are shown in bold).
ModelPrecision (%)Recall (%)F1-Score (%)GFLOPsmAP@50 (%)
YOLOv5n79.6975.5977.597.281.78
YOLOv8n83.2779.6681.438.284.44
YOLOv10n74.5979.0276.748.482.76
RT-DETR81.9282.6582.28100.682.94
YOLO-World84.7181.5883.1231.984.60
POD-YOLO (Ours)84.2983.0883.688.485.22
Table 2. Ablation study results (Best values are shown in bold).
Table 2. Ablation study results (Best values are shown in bold).
ModelPrecision (%)Recall (%)mAP@50 (%)F1-Score (%)
Baseline74.5979.0282.9976.74
Baseline + PDR Module78.1082.2284.5580.11
Baseline + DDS Strategy78.6882.4484.2180.52
Baseline + LMR Mechanism82.2181.5885.0681.89
Baseline + PDR Module + DDS Strategy80.9580.3883.3980.66
Baseline + PDR Module + LMR Mechanism78.7781.1583.2779.94
Baseline + DDS Strategy + LMR Mechanism80.4982.6784.2481.57
POD-YOLO (Ours)84.2983.0885.2283.68
Table 3. Performance Comparison on the Public PlantDoc Dataset (Best values are shown in bold).
Table 3. Performance Comparison on the Public PlantDoc Dataset (Best values are shown in bold).
ModelPrecision (%)Recall (%)F1-Score (%)mAP@50 (%)
YOLOv10n71.5262.3566.6265.32
POD-YOLO (Ours)75.39 69.3872.2670.86
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, Y.; Zhao, L.; Zhao, W.; Xu, S.; Zheng, H.; Wang, Z. YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss. Appl. Sci. 2026, 16, 1162. https://doi.org/10.3390/app16031162

AMA Style

Liang Y, Zhao L, Zhao W, Xu S, Zheng H, Wang Z. YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss. Applied Sciences. 2026; 16(3):1162. https://doi.org/10.3390/app16031162

Chicago/Turabian Style

Liang, Yongpeng, Lei Zhao, Wenxin Zhao, Shuo Xu, Haowei Zheng, and Zhaona Wang. 2026. "YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss" Applied Sciences 16, no. 3: 1162. https://doi.org/10.3390/app16031162

APA Style

Liang, Y., Zhao, L., Zhao, W., Xu, S., Zheng, H., & Wang, Z. (2026). YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss. Applied Sciences, 16(3), 1162. https://doi.org/10.3390/app16031162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop