A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8

Jia, Xiaofei; Hua, Zhenlu; Shi, Hongtao; Zhu, Dan; Han, Zhongzhi; Wu, Guangxia; Deng, Limiao

doi:10.3390/agriculture15060617

Open AccessArticle

A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8

by

Xiaofei Jia

¹,

Zhenlu Hua

¹,

Hongtao Shi

¹

,

Dan Zhu

²,

Zhongzhi Han

¹

,

Guangxia Wu

³ and

Limiao Deng

^1,*

¹

School of Science and Information, Qingdao Agricultural University, Qingdao 266109, China

²

Key Lab of Plant Biotechnology in Universities of Shandong Province, College of Life Science, Qingdao Agricultural University, Qingdao 266109, China

³

School of Agronomy, Qingdao Agricultural University, Qingdao 266109, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(6), 617; https://doi.org/10.3390/agriculture15060617

Submission received: 30 December 2024 / Revised: 31 January 2025 / Accepted: 15 February 2025 / Published: 14 March 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

The number of soybean pods is a key determinant of soybean yield, making accurate detection and counting essential for yield estimation, cultivation management, and variety selection. Traditional manual counting methods are labor-intensive and time-consuming, and while object detection networks are widely applied in agricultural tasks, the dense distribution and overlapping occlusion of soybean pods present significant challenges. This study developed a soybean pod detection model, YOLOv8n-POD, based on the YOLOv8n network, incorporating key innovations to address these issues. A Dense Block Backbone (DBB) enhances the model’s adaptability to the morphological diversity of soybean pods, while the Separated and Enhancement Attention Module (SEAM) in the neck section improves the representation of pod-related features in feature maps. Additionally, a Dynamic Head increases the flexibility in detecting pods of varying scales. The model achieved an average precision (AP) of 83.1%, surpassing mainstream object detection methodologies with a 5.3% improvement over YOLOv8. Tests on three public datasets further demonstrated its generalizability to other crops. The proposed YOLOv8n-POD model provides robust support for accurate detection and localization of soybean pods, essential for yield estimation and breeding strategies, and its significant theoretical and practical implications extend its applicability to other crop types, advancing agricultural automation and precision farming.

Keywords:

deep learning; image processing; object detection; YOLOv8; pod detection

1. Introduction

Soybeans are a vital source of protein and oil for human consumption. China has been actively developing high-yielding soybean varieties [1,2]. Besides cultivating transgenic plants [3], phenotypic measurement and analysis are crucial for evaluating soybean varieties [4,5]. Dry plants are used for phenotypic assessments to select superior seeds [6]. The national standard specifies 44 soybean traits, including height, pod clearance, node count, branch number, pod and seed counts, and pod color [7,8]. Pod phenotype is a key agronomic trait determining seed yield and quality [9,10,11,12]. Morphological phenotypic parameters are essential for breeding and yield estimation [13]. Traditional manual methods for collecting these data are inefficient, error-prone, and costly, especially for large-scale analyses. Thus, developing new pod detection and recognition methods is crucial for accurately acquiring soybean pod traits, impacting breeding, yield estimation, and field management [14].

The investigation into automated methodologies for counting pods on mature soybean plants falls within the realm of non-destructive plant organ quantification. The evolution of computer vision technology in recent years has fueled significant advancements in this agricultural research area. Notable contributions encompass studies by Aich et al. [15] on leaf enumeration, Ref. [16] on fruit quantification, Ref. [17] on rice panicle counting, and [18] on wheat spike counting. Nevertheless, specialized research directed at pod counting in soybeans remains scarce. The techniques for non-destructive quantification of plant organ counts can be systematically categorized into two primary groups: feature-extraction-based algorithmic approaches and deep learning-based algorithmic approaches. The former involves image segmentation techniques leveraging color, texture, and shape attributes, as well as image classification methods grounded in region-of-interest analysis. The latter encompasses segmentation algorithms built upon fully convolutional networks and object detection frameworks employing deep convolutional neural networks.

Traditional image processing techniques utilize color, texture, and shape for plant organ analysis. Lu and colleagues [19] addressed the challenge of detecting partially occluded citrus fruits by initially segmenting the citrus region using R-B color features, followed by constructing a set of edge segments using a boundary tracking algorithm, and finally recovering the occluded contour using an ellipse fitting algorithm. The relative error for contour recovery was 5.34%. Lv [20] employed R-G color features for apple target segmentation, demonstrating good matching probability within a certain range of illumination brightness. Zhao [21] improved upon the R-G color feature segmentation by using a secondary segmentation to repair bright areas within apples, achieving an accuracy rate of 83.7%. However, this method cannot segment apples that are occluded by obstacles and requires external lighting for adequate illumination. Image segmentation algorithms based on color, texture, and shape typically operate within fixed or specific threshold ranges. Variations in illumination can alter the target feature values, leading to misclassification of the target as background, which in turn results in missed detections.

The widespread application of deep learning has revolutionized the detection and recognition of plant organs. Object detection algorithms based on deep convolutional neural networks (CNNs) extract robust abstract features, effectively mitigating the impact of uneven illumination. Xiong and colleagues utilized the Faster-RCNN algorithm to detect green citrus in natural environments, achieving a mean average precision (mAP) of 85.49%. Compared to the traditional Otsu algorithm, it demonstrated superior detection performance under side light, direct light, and backlight conditions. Mirbod et al. [22] employed the Faster-RCNN and Mask R-CNN models to measure apple sizes in images, reporting an average relative error of 4.8% for apple diameter estimation. Yang et al. [9] used an improved Mask R-CNN algorithm to effectively segment mature, adhered soybean pods and calculate morphological parameters such as pod length and width, yielding promising results. Uzal et al. [23] combined a custom feature extraction (FE) method, Support Vector Machine (SVM), and an enhanced CNN to estimate seed count in soybean pods, achieving detection accuracies of 80.4% (FE-SVM) and 86.2%, respectively. The application of deep learning for the rapid acquisition of crop morphological phenotypic parameters has achieved numerous successes, providing crucial data for genetic informatics research aimed at fundamentally developing high-yielding new varieties. However, current soybean pod phenotypic detection systems remain fragmented, and detection accuracy declines as the occlusion area increases.

Recent advancements in deep learning have greatly enhanced the ability to detect and count plant organs. SoybeanNet [24], for example, employs a transformer-based CNN, which effectively addresses the complexity of diverse soybean pod morphologies and enhances feature extraction. Similarly, models like DEKR-SPrior [25] introduce keypoint detection models that excel in environments with overlapping and densely packed pods, an issue common in soybean phenotyping. Moreover, the innovative use of depth cameras [26] to detect pods from a depth map, as explored in recent studies, shows promise in reducing background interference, a common issue in field-based detection. The widespread adoption of YOLO-based models, particularly YOLOv8, has led to significant improvements in soybean pod detection. These models, such as the improved YOLOv8-VEW [27], incorporate advanced mechanisms like the EMA attention mechanism and dynamic heads, enabling more accurate pod localization even in challenging conditions. Further, recent work in generative model frameworks like GenPoD highlights the power of synthetic data generation to tackle occlusion and class imbalance—two persistent challenges in pod detection. By generating diverse synthetic datasets, GenPoD [28] enhances the model’s robustness and generalizes better to real-world pod detection tasks.

Despite the rapid advancements, significant challenges remain in detecting pods in complex agricultural environments where occlusion and class imbalance are prominent. The evolving state of deep learning in this domain continues to offer promising solutions, pushing the boundaries of automated phenotyping to support high-yield soybean breeding programs.

To address the issue of insufficient publicly available soybean pod datasets, 1522 images of harvested soybean plants were collected to construct a dedicated dataset for pod detection and counting. A detection model named YOLOv8n-POD, specifically designed for pod features, was introduced, effectively overcoming the limitations of the existing object detection models in pod detection. The C2f-DBB (C2f module with Diversified Branch Block), DyHead detection head, and SEAM (Separated and Enhanced Attention Module) for small object detection were introduced, providing robust support for accurate detection and precise localization, phenotypic identification, and estimation of soybean yields. This method overcomes the limitations of traditional approaches and achieves precise detection of densely packed and occluded pods. This novel approach offers a rapid detection method for the quantitative selection of high-yielding and superior soybean varieties. The remainder of this paper is organized as follows: Section 2 provides detailed information on data preparation and introduces the proposed method. Section 3 presents the experimental results and analysis. Section 4 discusses the findings and suggests directions for future work. Finally, Section 5 describes the conclusions.

2. Materials and Methods

To tackle the distinctive challenges of soybean pod detection, this study presents a deep learning-based pod detection methodology, as depicted in Figure 1. The methodology can be summarized in the following steps:

Step 1: Image acquisition.

Soybean varieties (Nos. 1–138) cultivated at the Soybean Planting Base of Qingdao Agricultural University were selected for this study. A digital camera-based system was established to capture images of mature soybean pods.

Step 2: Image augmentation.

The dataset was expanded through manual rotation of the soybean plant images after the pods had been annotated.

Step 3: YOLOv8n model development.

Several enhancements were made to the YOLOv8n model:

(1): The Dense Block Backbone (DBB) was integrated into the YOLOv8n model’s backbone to bolster adaptability and recognition capabilities for the varied morphologies of soybean pods.
(2): The Self-Supervised Equivariant Attention Mechanism (SEAM) was incorporated into the neck network of the YOLOv8n model, significantly enhancing the representation of soybean pod-related features in the feature maps.
(3): A Dynamic Head was introduced in the head network of the YOLOv8n model, facilitating more versatile detection of soybean pods across diverse scales.

Step 4: Training results analysis.

Step 5: Detection results analysis.

The refined YOLOv8-POD model was employed to identify pods in the soybean plants, and the performance of the pod recognition model was assessed by analyzing the identification results.

Figure 1. Experimental flowchart.

2.1. Image Acquisition

The soybean samples were sourced from the soybean planting base of Qingdao Agricultural University, situated in Shandong Province, China (Longitude: 120.39°, Latitude: 36.07°). Data collection was conducted during the autumn harvest season. A Canon EOS 700D DSLR camera was utilized for capturing images of soybean plants and pods. The EOS 70D, released by Canon in 2013, is a mid-to-high-end DSLR featuring an 18-megapixel CMOS image sensor with dimensions of 22.3 mm × 14.9 mm, ensuring high image quality and detailed reproduction.

During the photography sessions, a Canon 18–135 mm zoom lens equipped with optical image stabilization was employed to minimize camera shake and ensure image clarity. This lens offers a focal range spanning from wide-angle to telephoto needs, with a filter size of 67 mm, a lens structure comprising 12 groups and 16 elements, a maximum aperture of F3.5–F5.6, and a minimum aperture of F22–F36. The six aperture blades maintain optimal light transmission and clarity across various focal lengths.

In total, 1522 images were captured, each with a post-shooting resolution of 5184 × 3456 pixels. However, counting soybean pods remains a formidable task, particularly due to their small size and the influence of various imaging factors. The characteristics of the dataset and the primary challenges associated with pod counting can be categorized as follows (see Figure 2):

(a): Pod Overlap and Occlusion: Soybean pods may become occluded by other pods or plant stems, resulting in mutual occlusion among pods and between pods and branches.
(b): Major Branching Occlusion: Pods may be concealed behind or obscured by thicker plant stems, rendering it difficult to accurately ascertain the pod count.
(c): Postural Variation: Pods captured from differing angles may exhibit varied postures, not necessarily presenting as flat or uniformly oriented.
(d): Small Pod Size: Some pods may be underdeveloped, appearing relatively minute in images. Accurate recognition of such small pods necessitates a model with heightened resolution and sensitivity.

Given these challenges, meticulous attention was given to addressing these limitations to ensure that our model exhibits enhanced robustness and strong generalization capabilities. Through our research, we have successfully overcome these challenges and achieved promising experimental results.

Figure 2. Dataset image samples and four major challenges in pod counting.

2.2. Dataset Labeling and Augmentation for Enhanced Detection

In this research endeavor, data augmentation techniques were employed to augment the dataset and mitigate the risk of model overfitting, ultimately enhancing the precision of the detection algorithm. As illustrated in Figure 2, manual rotation was utilized as a data augmentation method.

To guarantee that the object detection algorithm could effectively discern specific targets within the images, the open-source annotation software LabelImg (1.8.6) was leveraged for manual labeling. This step is indispensable for the training and validation phases of the algorithm, as it furnishes precise target localization information. During the manual annotation procedure, regions containing soybean pods were designated as positive samples, whereas non-target areas, such as branches and blank backgrounds, were classified as negative samples. The labeling process involved meticulously recording the coordinates of bounding boxes encompassing the target areas, including the top-left (Xmin, Ymin) and bottom-right (Xmax, Ymax) coordinates for each target, which served as inputs for the algorithm. These coordinates accurately delineate the position of the targets within the images.

To standardize the organization and storage of data, this study adhered to the dataset format employed in competitions by Everingham et al. [29], thereby facilitating efficient data management. In the face of complex image annotation scenarios, this study adopted a versatile labeling strategy. Irrespective of whether pods were occluded or segmented, they were treated as contiguous areas for annotation, ensuring that the algorithm could accurately recognize and manage intricate target relationships within the images. Ultimately, all annotation information was systematically organized and preserved in TXT format files, encompassing the category labels of the targets and their corresponding bounding box coordinates. This provided a robust data foundation for subsequent algorithm training and evaluation. Through this meticulous data preparation process, the study significantly improved the accuracy and robustness of the target detection algorithm, as depicted in Figure 3.

2.3. YOLOv8

The YOLO (You Only Look Once) series of models diverge from region-based object detection models by performing direct regression on the positions and categories of bounding boxes across the entire input image. YOLOv8 builds upon and extends the design principles of its predecessors within the YOLO series, efficiently detecting targets through direct regression on the bounding box positions and categories within the input image. The YOLOv8 model typically comprises the following key modules: input module, backbone network, neck network, and detection head. The architecture is illustrated in Figure 4.

Input Module: This module is responsible for preprocessing and enhancing images, ensuring that suitable input data are provided to the model. It prepares the images for subsequent feature extraction and detection tasks.

Backbone Network (Backbone): Serving as the cornerstone of the model, the backbone network is utilized for feature extraction. It processes the input image to extract meaningful and discriminative features essential for object detection.

Neck Network (Neck): This network focuses on feature fusion, further processing the features extracted by the backbone network. By employing structures such as Feature Pyramid Networks (FPNs) or a Path Aggregation Network (PAN), the neck network enhances the model’s capability to detect targets of varying scales, thereby improving detection accuracy and robustness.

Detection Head (Head): This component handles the final object detection tasks, encompassing category classification and bounding box localization. It leverages the fused features from the neck network to identify and localize soybean pods within the input image.

Figure 4. Structure of YOLOv8.

2.4. YOLO8n-POD Model

This paper introduces an efficient and specialized model, named YOLOv8n-POD, designed for the detection of soybean pods. Built upon the YOLOv8n framework, this model incorporates the Diverse Branch Block (DBB) and Dynamic Head, in conjunction with the innovative Separated and Enhancement Attention Module (SEAM) technology. The YOLOv8n-POD model is optimized specifically for the unique characteristics of soybean pods, with its structure illustrated in Figure 5.

Initially, the model’s backbone network leverages the deep structure of YOLOv8, integrating the DBB (highlighted in green) to enhance its adaptability and recognition capabilities for the diverse morphologies of soybean pods. The DBB introduces multiple processing pathways into the model, augmenting its ability to adapt to the varying sizes, shapes, and orientations of soybean pods.

Subsequently, the neck network incorporates the SEAM module (highlighted in blue). By separating and enhancing the attention mechanism, the SEAM module effectively improves the representation of soybean pod-related features in the feature map, while preserving rich semantic and spatial information. This supports precise detection and enhances the model’s ability to discern soybean pods from the background and other plant parts.

Finally, the introduction of the Dynamic Head in the head network section (highlighted in red) allows the model to flexibly detect soybean pods of varying scales. This mechanism dynamically adjusts the structure of the detection head to accommodate the changes in soybean pods at different growth stages, optimizing the model’s response to positive samples and thereby enhancing detection accuracy.

Figure 5. Structure of YOLOv8n-POD network.

2.4.1. Diverse Branch Block (DBB) Module

The Diverse Branch Block (DBB) module [30] is specifically designed to augment the performance of convolutional neural networks without incurring additional costs during inference. By integrating multiple branches of varying scales and complexities, the DBB enriches the feature space, enabling the model to capture a broader range of features. After training, the DBB can be seamlessly converted into a single convolutional layer, preserving the original network architecture for deployment, as illustrated in Figure 6. This approach offers the advantage of a more intricate microstructure during training while maintaining an unchanged macro-architecture, rendering it a versatile substitute for conventional convolutional layers.

Primary Design Elements:

(1): Diverse Branch Block (DBB): This consists of several convolutional branches, each employing kernels of different sizes to capture features at multiple scales. Common branches include kernels of sizes 1 × 1, 3 × 3, and 5 × 5 and may also incorporate pooling layers or other operations.
(2): Feature Fusion: After extracting features at various scales, these features are merged or concatenated to form a richer and more comprehensive representation. This is typically achieved using concatenate or element-wise operations to combine outputs from different branches.
(3): Parameter Control: Each branch within the DBB may have a different number of convolutional layers, filters, and activation functions, which can be tailored and optimized according to specific tasks or network architectures.
(4): Parallel Computing: The branching operations within the module are usually executed in parallel, enhancing computational efficiency and enabling the network to learn features at multiple scales simultaneously.

By incorporating the DBB, the model becomes more adept at managing variations in the shape and size of soybean pods, thereby enhancing detection accuracy and robustness. The DBB introduces a multi-path processing mechanism, empowering the network to learn features of the target from diverse perspectives. This increases the model’s adaptability to various geometric transformations of soybean pods, such as different orientations and sizes.

2.4.2. Dynamic Head Module

The Dynamic Head module stands out for its integration of three distinct attention mechanisms, scale-aware, spatial-aware, and task-aware, as illustrated in Figure 7. Inspired by the Inception module, the Dynamic Head comprises multiple branches, each utilizing convolutional kernels of varying sizes for feature extraction.

This design enables the model to effectively address the key challenges in object detection, such as variations in scale, spatial location, and specific detection tasks. By applying these attention mechanisms across different dimensions of the feature tensor, the Dynamic Head significantly boosts performance in object detection [31]. Experimental results demonstrate that this approach achieves exceptional performance in the MS-COCO benchmark tests, particularly when combined with the latest transformer background model and additional data, setting new benchmarks for the COCO dataset.

Soybean pod detection presents a challenging task due to variations in pod size, position, lighting conditions, and other factors stemming from differences in camera shooting modes and angles. This necessitates a detection head with robust scale-aware and spatial-aware capabilities. Typically, a pre-trained model is utilized, which requires further enhancement specifically for the task of pod detection. The Dynamic Head module, with its versatile and efficient design, is well suited to meet these challenges, significantly improving detection performance under various conditions.

Figure 7. Schematic of the Dynamic Head.

2.4.3. Separated and Enhancement Attention Module (SEAM) for Soybean Pod Detection

The Separated and Enhancement Attention Module (SEAM) [32], originally introduced in YOLO-FaceV2, serves as a pivotal component specifically designed to enhance the detection of obscured facial features, as depicted in Figure 8. In the context of soybean pod detection, this module operates through two parallel attention mechanisms: one focused on separating obscured pod features and the other on enhancing these features. This dual-pronged approach enables the SEAM to more effectively tackle the challenge of pod occlusions, thereby improving the model’s ability to detect pods in complex agricultural scenes. By employing this method, the SEAM augments the detection accuracy for obscured pods, thus bolstering the overall network performance in managing intricate, real-world agricultural conditions. The SEAM is particularly advantageous in addressing occlusion issues in pod detection, where pods may be obscured by leaves, other pods, or other plant material, impeding detection that would otherwise be feasible. With the inclusion of the SEAM attention mechanism, the model can effectively resolve some instances of small target occlusions, enhancing its robustness and reliability in detecting soybean pods under varying and often challenging conditions.

2.5. Experimental Setting

This study meticulously configured the experimental environment to guarantee the precision and reproducibility of the results. In adherence to scientific rigor, all the experimental procedures were meticulously conducted within the PyTorch 2.6 deep learning framework. The detailed configurations of the experimental setup are outlined in Table 1, ensuring transparency and facilitating replication of the study.

During the model training phase of this study, meticulous adjustments were made to several hyperparameters to optimize performance. These included the learning rate (lr0) and momentum, which play crucial roles in the training process. The initial learning rate (lr0) was meticulously set to 0.01, with this value serving as the final learning rate (lrf) after adjustments were made during training. The momentum was established at 0.937, aiding in accelerating convergence in the correct direction while effectively controlling the update speed of the model weights. To further enhance model performance, the weight decay was set to 0.0005, serving to reduce model complexity and mitigate the risk of overfitting. Additionally, a warmup period of three training epochs was incorporated to stabilize the initial phase of training. During this warmup period, a warmup momentum of 0.8 and a warmup bias learning rate (warmup_bias_lr) of 0.1 were utilized. These detailed adjustments collectively contribute to enhancing the training efficiency and overall performance of the soybean pod detection model. The specific training parameters employed in this study are presented in Table 2, providing transparency and facilitating replication of the experimental setup.

This study combined all the soybean plant images, including image files and labels, into a dataset comprising a total of 1522 files. These were divided into training and test sets in a 14:1 ratio, with the training set comprising 1422 images (split at a 3:1 ratio between original images and shift-augmented images) and the validation set including 100 original images.

2.6. Evaluation Criteria

In order to comprehensively assess the soybean pod detection model’s agricultural applicability, we established seven key evaluation dimensions: detection accuracy (precision, recall, and F1-score), localization precision (mAPval@0.5 and mAPval@0.5:0.95), and computational efficiency (Parameters[M] and GFLOPs). These metrics were calculated as follows:

R = \frac{T P}{T P + F N}

(1)

P = \frac{T P}{T P + F P}

(2)

F 1 = \frac{2 P R}{P + R}

(3)

where TP is the number of positive samples predicted to be positive, FP is the number of negative samples predicted to be positive, and FN is the number of positive samples predicted to be negative.

mAP (mean average precision) represents the average of the average precision (AP) values across all categories, providing an overall measure of the model’s accuracy. mAP@0.5 denotes the mean accuracy at an Intersection over Union (IoU) threshold of 0.5. The notation mAP@0.5:0.95 indicates that the mean average precision is calculated at IoU thresholds ranging from 0.5 to 0.95 in increments of 0.05, allowing for a more detailed evaluation of the model’s performance across varying degrees of overlap. In this study, the mAP is calculated with a default IoU threshold of 0.5. The calculation formula is as follows:

m A P = \sum_{i = 1}^{n} \begin{matrix} \frac{A P i}{n} \end{matrix}

(4)

GFLOP measures the computational cost of the model, calculated as the number of floating-point operations required for one forward pass (inference). It quantifies the model’s complexity in terms of its computational demands, which is important for assessing its efficiency. The Parameters (M) metric refers to the number of trainable parameters in the model. The number of parameters is a key indicator of the model’s complexity. A model with more parameters typically has a greater capacity to learn intricate features but also requires more memory and computational resources. The formulas are as follows:

GFLOPs = 109 FLOPS

(5)

FLOPs = 2 × Hout × Wout (Cin × K2 + bias) × Cout

(6)

Parameter (Conv) = Cout × (K × K × Cm + 1)

(7)

Parameters (FC) = (nin × nout) + nout

(8)

where

C_{i n}

is the channels of the input feature maps, K is the size of the convolution kernel, and

C_{o u t}

is the output channels of the output feature maps. H × W is the size of the output feature maps.

n_{i n}

is the input knot number, and

n_{o u t}

is the output knot number.

3. Results

3.1. Comparison Between the YOLOv8n-POD Model and the Original YOLOv8n Model

To evaluate the performance of both the original and improved models, we employed recall–confidence, F1–confidence, and precision–recall curves. The area under the precision–recall curve serves as an indicator of the accuracy of the pod detection. As illustrated in Figure 9, the enhanced YOLOv8n-POD model exhibits improvements in the recall–confidence, F1–confidence, and precision–recall curves compared to the YOLOv8n model, confirming the effectiveness of the modifications made for pod detection.

To assess the model’s capability in detecting soybean pods in complex environments, images captured under varying pod sizes and shapes were selected for a comparison of the results before and after model improvement. A set of test samples was utilized to test the validity of the model, as shown in Figure 10. An analysis revealed that, compared to the original model, the YOLOv8n-POD model significantly reduced the number of false positives and false negatives in pod detection while substantially enhancing accuracy.

3.2. Ablation Experiments

To validate the effectiveness of the improvements proposed in this paper for soybean pod detection, an ablation experiment was conducted. The objective of this experiment was to investigate the impact of replacing, relocating, or adding specific modules on the performance of the detection algorithm. The following table details the results of the ablation experiment, specifically assessing the efficacy of these modules.

(1): Impact of Different Backbone Networks

After establishing the YOLOv8 base model, we sought to enhance its performance by integrating mainstream backbone networks. These networks provide richer and more discernible feature representations, which are crucial for subsequent object detection tasks. Each of the incorporated networks brings unique innovations and advantages to the field of object detection.

YOLOv8n-LAWDS: Incorporates the LAWDS module, achieving local attention and dense shortcut connections, thereby improving accuracy and robustness.

YOLOv8n-CSP-EDLAN: Utilizes the CSPDarknet53 and EDLAN modules for effective feature aggregation, enhancing performance and efficiency.

YOLOv8n-MLCA-ATTENTION: Combines the MLCA and attention mechanisms, enriching feature information and boosting precision and robustness.

YOLOv8n-RevCol: Optimizes the use of low-level information through reverse connections, enhancing overall performance.

YOLOv8n-EfficientViT: Merges the design philosophies of transformer and EfficientDet, improving accuracy and efficiency.

YOLOv8n-LSKNet: Utilizes the lightweight LSKNet module for spatial and knowledge aggregation, increasing performance and efficiency.

YOLOv8n-VanillaNet: Employs the lightweight VanillaNet structure, simplifying the network and increasing speed and efficiency.

As shown in Table 3, although these backbone networks play significant roles in different scenarios and applications, the actual improvements in pod detection performance were not pronounced in our specific experimental setup. This suggests that the effectiveness of these networks may vary depending on the specific requirements and characteristics of the detection task.

(2): Modifications to the C2f

Subsequent modifications were made to the convolutions within the backbone network. The following variations were introduced:

YOLOv8n-C2f-GOLDYOLO: Incorporates the GOLDYOLO algorithm, enhancing the network structure and loss function to improve accuracy and robustness in object detection tasks.

YOLOv8n-C2f-EMSC: Integrates the Enhanced Multi-Scale Context (EMSC) module, enhancing object detection by fusing multi-scale features and incorporating contextual information.

YOLOv8n-C2f-SCcConv: Introduces the Spatial Channel-wise Convolution (SCcConv) module, which improves model perception and accuracy by enhancing spatial relationships among channels.

YOLOv8n-C2f-EMBC: Utilizes the Enhanced Multi-Branch Context (EMBC) module, improving object detection accuracy and robustness by refining multi-branch feature extraction and context utilization.

YOLOv8n-C2f-DCNV3: Incorporates the Dilated Convolutional Network V3 (DCNV3) module, expanding the field of view and enhancing the accuracy of object detection through dilated convolution and multi-scale feature fusion.

YOLOv8n-C2f-DAttention: Introduces the Dual Attention (DAttention) module, enhancing object detection perception and accuracy through channel and spatial attention mechanisms.

YOLOv8n-C2f-DBB: Utilizes the Dilated Bi-directional Block (DBB) module, expanding the field of view and improving the efficiency of context information utilization through dilated convolution and bi-directional connections.

The experimental results presented in Table 4 demonstrate that the inclusion of all the aforementioned modules can improve the detection accuracy of YOLOv8 to a certain extent. Relatively speaking, the DBB module exhibits greater potential in enhancing the network’s feature extraction capabilities.

(3): Impact of Different Detector Head Modules

Following the investigation of the effects of various backbone networks on model performance, common detector head modules were selected for integration to assess their impact on target detection performance. Specifically, the DyHead, P2, and Efficient Head detector head modules were incorporated into the model. As evident from Table 5, the DyHead detection head contributes more significantly to the performance enhancement of the model. Essentially, in the aforementioned experiments, the current fusion improvement was achieved by replacing the detection head with Detect_DyHead while also incorporating the C2f-DBB module.

(4): Effects of Loss Functions and Attention Mechanisms

Loss functions and attention mechanisms play pivotal roles in optimizing the model learning process and enhancing its adaptability to specific agricultural scenarios. As shown in Table 6, attempts to modify the loss function values did not yield ideal results and did not significantly enhance model performance. Consequently, the focus shifted toward refining the model through the integration of attention mechanisms.

Various attention mechanisms were introduced to bolster target detection performance through diverse approaches. These included Multi-Scale Positional Context Attention (MPCA), Channel Dimension Positional Context Attention (CPCA), BiLevel Routing Attention (BiLevelRoutingAttention_nchw), Squeeze-and-Excitation Attention (SEAttention), Bottleneck Attention Module (BAMBlock), and the Separated and Enhancement Attention Module (SEAM).

After incorporating these different attention mechanisms, it was observed that the SEAM attention mechanism was particularly advantageous for detecting small objects. By specifically enhancing the features of small targets, the SEAM makes these features more prominent within the model, thereby improving the visibility and recognition rates of small targets. By effectively addressing the challenges associated with small targets, SEAM enhances the model’s target detection capabilities across various scales, improving its generalization and reliability in practical agricultural applications. Ultimately, the integration of SEAM led to the development of the YOLOv8-POD model.

3.3. Comparison with Mainstream Object Detection Models

To assess the efficacy of the proposed YOLOv8n-POD model for soybean pod detection, we conducted comparative experiments with several prominent single-stage detection algorithms, including SSD [33], YOLOv3 [34], YOLOv4 [35], YOLOv5n [36], YOLOv7 [37], YOLOv8n [38], and YOLOx-nano [39]. The experimental results, presented in Table 7, demonstrate that the proposed YOLOv8n-POD model outperforms other single-stage detection algorithms in terms of accuracy and complexity.

When compared to mainstream single-stage object detection algorithms, the YOLOv8n-POD model exhibits significant improvements in the mean average precision (mAP). Specifically, it shows increases of 31.3%, 27.5%, 12%, 5.8%, 6%, 4.9%, and 3.7% over SSD, YOLOv3, YOLOv4, YOLOv5n, YOLOv7, YOLOv8n, and YOLOx-nano, respectively. A radar chart analysis, utilizing the data from Table 7, is depicted in Figure 11.

The experimental results clearly indicate that the YOLOv8n-POD model possesses notable advantages in soybean pod detection when compared to current mainstream object detection algorithms.

3.4. Results of Pod Counting

To gain a deeper understanding of the differences in counting results among various models, we selected representative images of soybean plants, including those with sparse pods (A and B) and dense pods (C and D), as illustrated in Figure 12. The red numbers in the top left corner of each image indicate the counting results for the respective image. In pod detection, many predicted bounding boxes are often underestimated due to the detector’s non-maximum suppression filtering. The visual results further confirm that the YOLOv8n-POD model exhibits robust adaptability when dealing with variations in soybean pod size and density. In comparison, YOLOv8n-POD is capable of capturing additional global information, thereby achieving outstanding performance.

In this study, we introduced the YOLOv8n-POD model, specifically optimized for the detection of soybean pods, and conducted a comprehensive evaluation of its detection performance. The results were compared with those of traditional object detection models, including SSD, YOLOv3, YOLOv4, YOLOv5, YOLOv7 tiny, YOLOv8, and YOLOX, demonstrating the superior efficacy of our model. Specifically, manual counting of soybean pods in four test images yielded counts of 11, 24, 55, and 50 pods, respectively. In comparison, the detection results were as follows: SSD detected 4, 14, 30, and 23 pods; YOLOv3 identified 9, 17, 28, and 24 pods; YOLOv4 counted 10, 20, 38, and 29 pods; YOLOv5 detected 11, 24, 46, and 40 pods; YOLOv7 tiny counted 11, 21, 46, and 39 pods; YOLOv8 identified 11, 22, 57, and 42 pods; YOLOX detected 5, 9, 13, and 21 pods; and the YOLOv8n-POD model detected 11, 25, 53, and 45 pods. The minimal errors between the YOLOv8n-POD results and manual counts further underscore the high accuracy and practicability of the model.

The YOLOv8n-POD model incorporates the Dilated Bi-directional Block (DBB), the Separated and Enhancement Attention Module (SEAM), and a dynamically adjustable Dynamic Head, significantly enhancing its ability to recognize soybean pods of various sizes, shapes, and growth stages. The application of these innovative technologies not only optimizes feature extraction and preserves spatial information but also demonstrates exceptional precision and robustness in detecting small targets in complex agricultural settings. Compared to other models, the YOLOv8n-POD maintains high accuracy while more effectively adapting to the diversity and variability in soybean pods.

3.5. Evaluation of Generalization Ability and Practicality

To further validate the effectiveness of our approach and to assess the generalization capability and broader applicability of our model, we conducted experiments using the YOLOv8n-POD model and the YOLOv8 model on three public datasets: tomatoes, chili peppers, and wheat spikes. The comparative detection results are depicted in Figure 13.

The Tomato Detection dataset, publicly accessible on the Kaggle platform at https://www.kaggle.com/datasets/andrewmvd/tomato-detection (accessed on 28 July 2024), comprises 895 images annotated with bounding boxes. This dataset showcases tomatoes of diverse sizes, shapes, and colors, capturing their growth progression from immaturity to maturity. It encompasses images of both mature and immature tomatoes, along with leaves and stems, thereby enriching the dataset’s randomness and diversity.

The Chili-data dataset, accessible on the Kaggle platform at https://www.kaggle.com/datasets/jingxiche/chili-data (accessed on 28 July 2024), comprises 976 images annotated with bounding boxes. This dataset includes chili peppers of varying sizes, shapes, and colors, capturing their growth progression from small to large sizes. The images within this dataset depict both small and large chili peppers, along with leaves and stems, reflecting the different developmental stages of the chili peppers.

The GWHD_2021 dataset, available online https://www.global-wheat.com/#about (accessed on 28 July 2024), comprises over 6000 images with a resolution of 1024 × 1024 pixels, annotated with more than 300,000 unique wheat spike instances. This dataset represents a significant advancement over its predecessor, GWHD_2020, incorporating several enhancements. It includes the addition of 1722 images from five diverse countries, contributing to a total of 81,553 new wheat head annotations, resulting in a comprehensive dataset containing 275,187 wheat heads. To further improve the quality of the dataset, GWHD_2021 involved the meticulous removal of low-quality images and the reannotation of certain images to enhance accuracy. Additionally, the dataset was expanded to include more growth stages and images of wheat under various environmental conditions, providing a richer and more diverse representation of wheat phenotypes.

To guarantee the precision of the experimental findings, two rounds of trials were conducted on three publicly accessible datasets, with comparisons made against the YOLOv8 model. In the first trial, a network Batch Size parameter of 48 was employed, whereas the second trial utilized a Batch Size of 8. The outcomes of these trials are comprehensively detailed in Table 8.

As detailed in Table 5, the chili pepper dataset exhibited the most pronounced improvement when evaluated with a larger Batch Size (48), achieving a 4.60% increase in the mAP@0.5 and a substantial 12.48% rise in the mAP@0.5:0.95. These enhancements underscore the enhanced robustness and accuracy of the YOLOv8n-POD model in managing complex scenarios. In contrast, the tomato dataset demonstrated minimal gains at this Batch Size, with only a 0.06% increase in the mAP@0.5 and a 0.27% improvement in the mAP@0.5:0.95. Notably, reducing the Batch Size to 8 led to more significant performance improvements across all the datasets. For the tomato dataset, the YOLOv8n-POD model achieved a 2.14% increase in the mAP@0.5 and a 2.18% improvement in the mAP@0.5:0.95. These results highlight the model’s enhanced precision and recognition capabilities when utilizing smaller batch sizes. The chili pepper dataset also showed remarkable enhancements, with a 4.50% increase in the mAP@0.5 and an 8.27% rise in the mAP@0.5:0.95, indicating a substantial improvement in overall performance. Additionally, while the wheat dataset exhibited inconsistent performance with a larger Batch Size, it demonstrated positive improvements with a Batch Size of 8, achieving a 1.28% increase in the mAP@0.5 and a 1.01% improvement in the mAP@0.5:0.95.

It is evident from these results that although the YOLOv8n-POD model was specifically developed for pod detection in complex scenarios, it has demonstrated improved performance on the three public datasets compared to the original YOLOv8, showcasing enhanced detection capabilities and generalization ability. This indicates that the YOLOv8n-POD model is not only suitable for pod detection but also applicable to the detection and counting of other crops in complex scenarios.

4. Discussion

This study proposes a pod detection method based on deep learning. By introducing the YOLOv8n-POD model, the accuracy and efficiency of pod detection are effectively improved. The research results demonstrate that this model exhibits significant advantages in multiple aspects, providing strong support for the breeding and yield estimation of soybean varieties.

The YOLOv8n-POD model outperforms the original YOLOv8n model in several key indicators. This study combines improvements with the traditional YOLOv8 model. Under simple backgrounds, the improved model achieves an average precision that is more than 6 percentage points higher than the original model. Specifically, the YOLOv8n-POD model demonstrates significant improvements in recall, the F1-score, and the mean average precision (mAP). This indicates that by introducing modules such as the Diverse Branch Block (DBB), Dynamic Head, and Separated and Enhancement Attention Module (SEAM), the model can more accurately identify and locate pods when handling pod detection tasks in complex environments, reducing false positives and missed detections.

Model optimization considers the morphological diversity of soybean pods. By introducing diverse processing paths and the SEAM attention mechanism, the model’s adaptability to soybean pods of different sizes, shapes, and orientations is enhanced. The SEAM attention module achieves the separation and enhancement of attention by dividing the input feature map into two branches. It introduces spatial and channel attention branches, which apply attention weighting to the input’s spatial and channel dimensions, respectively. This enhances the model’s perception of spatial and channel information, aiding in more accurate detection of soybean pods under different agricultural conditions, thereby supporting agricultural production.

The Dynamic Head module [40] plays a crucial role in enhancing target detection and counting capabilities. It integrates a self-attention mechanism between feature levels, spatial positions, and output channels, enabling a unified detection method. By introducing computed self-attention, the Dynamic Head facilitates precise classification, localization, and regression tasks, significantly improving the system’s overall performance in detecting and counting blood cells. The Dynamic Head is crucial for improving the YOLOv5 algorithm’s accuracy in detecting and counting cabbage sprouts [41]. By introducing a spatially aware attention mechanism, the Dynamic Head allows the algorithm to adaptively integrate semantic information from different scales, thereby enhancing model detection accuracy. Specifically, the Dynamic Head integrates scale-aware, spatially aware, and task-aware self-attention mechanisms, enabling the model to have higher sensitivity when processing small-sized targets, ultimately achieving an average precision (mAP@0.5) of 93.9%, which is an improvement of 2.5% over the baseline model. Additionally, this study designs a central region counting method based on the Byte track target tracking algorithm for real-time and accurate measurement of different target categories. In the research of Gu et al. [42], the Dynamic Head plays a key role in detecting targets of different scales. It dynamically fuses features using deformable convolutional sparse attention based on semantic importance and then performs cross-layer feature fusion for regions with similar target features in the same space at different spatial positions. Furthermore, the task-aware attention module reduces feature dimensions through average pooling and then uses two fully connected layers and a normalization function to dynamically switch ON and OFF feature channels, enabling joint learning and generalization of objects, thereby enhancing model performance. In the research of Yi [43], the Dynamic Head is integrated into a dynamic head framework to improve the detection capability of occluded flowers. This framework utilizes a self-attention mechanism to enhance the model’s perception capabilities at the feature level, spatial positions, and output channels, especially for identifying occluded flowers. By adjusting the feature sizes at different levels, the Dynamic Head generates a three-dimensional tensor, which is then processed by separate attention functions to apply sequential attention to different dimensions, thereby improving detection performance in complex occlusion and overlap scenarios. The introduction of the Dynamic Head enables the model to more flexibly detect soybean pods of different scales, improves detection accuracy, and provides significant value for the precise detection of soybean pods.

Despite the significant performance improvement of the YOLOv8n-POD model, its model complexity and computational load have not increased substantially. Specifically, the YOLOv8n-POD model has 5.544M parameters and 11.8 GFLOPs of floating-point operations, representing only a slight increase compared to the original YOLOv8n model. This indicates that the model can better adapt to real-time detection requirements in actual agricultural production while maintaining efficient detection capabilities.

When compared with other mainstream single-stage target detection models, the YOLOv8n-POD model exhibits significant performance advantages. Specifically, the YOLOv8n-POD model outperforms SSD, YOLOv3, YOLOv4, YOLOv5n, YOLOv7 tiny, YOLOv8n, and YOLOx-nano by 31.3%, 27.5%, 12%, 5.8%, 6%, 4.9%, and 3.7%, respectively, in terms of the mean average precision (mAP). This demonstrates that when handling soybean pod detection tasks, the YOLOv8n-POD model can more accurately identify and locate pods, providing more reliable data support for the breeding and yield estimation of soybean varieties.

To further validate the generalization ability and practicality of the YOLOv8n-POD model, this study conducted experiments on three public datasets (tomato, pepper, and wheat ear). The experimental results show that the YOLOv8n-POD model exhibits significant performance improvements on these datasets, especially when handling complex scenarios and small target detection, demonstrating stronger robustness and accuracy. This indicates that the YOLOv8n-POD model not only performs well in soybean pod detection tasks but also has good generalization ability and practicality, making it applicable to detection tasks for other crops.

Although the YOLOv8n-POD model has achieved significant results in pod detection tasks, there are still areas for further improvement. For example, the model may still experience some false positives and missed detections when handling pod detection in extreme lighting conditions and severe occlusion scenarios. Future research can further optimize the model’s feature extraction and attention mechanisms to improve detection performance in complex environments. Additionally, pod detection using multimodal data (such as RGB-D images, point clouds, etc.) is also an important direction for future research, which is expected to further improve detection accuracy and robustness.

5. Conclusions

This study introduces the YOLOv8n-POD model, a high-precision, real-time detection tool for soybean pods. Despite significant advancements, the model’s robustness and adaptability under diverse conditions require further validation. Future work should focus on enhancing its reliability and effectiveness in various agricultural environments to support the modernization of agricultural production.

Author Contributions

X.J.: writing—original draft, conceptualization, formal analysis, validation, methodology, investigation, visualization, validation. L.D.: writing—review and editing, funding acquisition, project administration, resources, software, supervision. D.Z.: data curation, funding acquisition, resources. H.S.: resources, supervision. Z.H. (Zhongzhi Han): funding acquisition, resources. Z.H. (Zhenlu Hua): formal analysis, investigation. G.W.: data curation, funding acquisition, resources. All authors have read and agreed to the published version of this manuscript.

Funding

This research was supported by the following funding sources: The Qingchuang Talents Induction Program of Shandong Higher Education Institution (008/1622001), the Qingdao Science and Technology Benefit the People Demonstration Project (23-2-8-xdny-10-nsh).

Institutional Review Board Statement

This study did not require institutional review board approval because it did not involve human or animal subjects.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors wish to thank the anonymous reviewers for their assistance in providing constructive advice on improving this article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Feng, X.; Liu, B.; Yang, S. Progress and perspective of soybean molecular design breeding research. Soil. Crop 2014, 3, 123–131. [Google Scholar] [CrossRef]
Wu, Y.; Wang, Y. The effect of China’s GM soybean imports. Soybean Sci. 2019, 38, 635–643. [Google Scholar]
Wu, G.; Xiu, Y.; Wang, H. Breeding of MtDREB2A transgenic soybean by an optimized cotyledonary-node method. Chin. Bull. Bot. 2018, 53, 59–71. [Google Scholar] [CrossRef]
Ghanem, M.E.; Marrou, H.; Sinclair, T.R. Physiological phenotyping of plants for crop improvement. Trends Plant Sci. 2015, 20, 139–144. [Google Scholar] [CrossRef]
Ma, H. The Collection and Analysis of Soybean Seed Phenotypic Characteristic Data for Artificial Intelligence Breeding. Master’s Thesis, Shangdong University, Weihai, China, 2020. [Google Scholar]
National Market Supervision Administration. Guidelines for the Testing of Distinctness, Uniformity, and Stability of Plant Varieties: Soybean; SAMSA: Beijing, China, 2018. [Google Scholar]
Li, Q.; Gao, J.; Su, E.; Lian, B. Identification and analysis of phenotypic traits in soybean. Soybean Sci. 2015, 34, 752–759. [Google Scholar]
Zhao, C.; Wang, R.; Li, Y.; Qiu, L.; Zhao, X.; Guo, B. Comprehensive analysis and evaluation of the phenotype and quality traits of Jiangxi soybean germplasm resources. Soybean Sci. 2019, 38, 686–693. [Google Scholar] [CrossRef]
Yang, S.; Zheng, L.; Yang, H.; Zhang, M.; Wu, T.; Sun, S.; Tomasetto, F.; Wang, M. A synthetic datasets based instance segmentation network for High-throughput soybean pods phenotype investigation. Expert. Syst. Appl. 2022, 192, 116403. [Google Scholar] [CrossRef]
Lu, W.; Du, R.; Niu, P.; Xing, G.; Luo, H.; Deng, Y.; Shu, L. Soybean Yield Preharvest Prediction Based on Bean Pods and Leaves Image Recognition Using Deep Learning Neural Network Combined With GRNN. Front. Plant Sci. 2022, 12, 791256. [Google Scholar] [CrossRef]
Momin, M.A.; Yamamoto, K.; Miyamoto, M.; Kondo, N.; Grift, T. Machine vision based soybean quality evaluation. Comput. Electron. Agric. 2017, 140, 452–460. [Google Scholar] [CrossRef]
Jiang, S.; An, H.; Luo, J.; Wang, X.; Shi, C.; Xu, F. Comparative Analysis of Transcriptomes to Identify Genes Associated with Fruit Size in the Early Stage of Fruit Development in Pyrus pyrifolia. Int. J. Mol. Sci. 2018, 19, 2342. [Google Scholar] [CrossRef]
Li, M.; Wang, X.; Cui, L.; Hao, Z.; Cang, J.; Min, L. A Comparative Study on Morphological and Physiological Characteristics of Soybean Pods and Leaves. J. Northeast. Agric. Univ. 2004, 6, 651–655. [Google Scholar] [CrossRef]
Rahman, S.U.; McCoy, E.; Raza, G.; Ali, Z.; Mansoor, S.; Amin, I. Improvement of Soybean; A Way Forward Transition from Genetic Engineering to New Plant Breeding Technologies. Mol. Biotechnol. 2023, 65, 162–180. [Google Scholar] [CrossRef] [PubMed]
Aich, S.; Stavness, I. Leaf counting with deep convolutional and deconvolutional networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2080–2089. [Google Scholar]
Wang, C.; Zou, X.; Tang, Y.; Luo, L.; Feng, W. Localisation of litchi in an unstructured environment using binocular stereo vision. Biosyst. Eng. 2016, 145, 39–51. [Google Scholar] [CrossRef]
Duan, L.; Xiong, X.; Liu, Q.; Yang, W.; Huang, C. Segmentation of Paddy Rice Panicles in Field Based on Deep Fully Convolutional Neural Network. Trans. Chin. Soc. Agric. Eng. 2018, 034, 202–209. [Google Scholar]
Pound, M.P.; Atkinson, J.A.; Wells, D.M.; Pridmore, T.P.; French, A.P. Deep learning for multi-task plant phenotyping. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2055–2063. [Google Scholar]
Lu, J.; Sang, N. Techniques for Citrus Detection and Occluded Contour Restoration on Trees under Varying Illumination Conditions. Trans. Chin. Soc. Agric. Mach. 2014, 45, 76–81. [Google Scholar] [CrossRef]
Lv, J.; Zhao, D.; Ji, W. Rapid Tracking and Recognition Method for Target Apples in Apple Picking Robots. Trans. Chin. Soc. Agric. Mach. 2014, 45, 65–72. [Google Scholar] [CrossRef]
Zhao, D.; Liu, X.; Chen, Y.; Ji, W.; Jia, W.; Hu, C. Nighttime Recognition Method for Apple Picking Robots. Trans. Chin. Soc. Agric. Mach. 2015, 46, 15–22. [Google Scholar] [CrossRef]
Mirbod, O.; Choi, D.; Heinemann, P.H.; Marini, R.P.; He, L. On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling. Biosyst. Eng. 2023, 226, 27–42. [Google Scholar] [CrossRef]
Uzal, L.C.; Grinblat, G.L.; Namías, R.; Larese, M.G.; Bianchi, J.S.; Morandi, E.N.; Granitto, P.M. Seed-per-pod estimation for plant breeding using deep learning. Comput. Electron. Agric. 2018, 150, 196–204. [Google Scholar] [CrossRef]
Li, J.; Magar, R.T.; Chen, D.; Lin, F.; Wang, D.; Yin, X.; Zhuang, W.; Li, Z. SoybeanNet: Transformer-based convolutional neural network for soybean pod counting from Unmanned Aerial Vehicle (UAV) images. Comput. Electron. Agric. 2024, 220, 108861, ISSN 0168-1699. [Google Scholar] [CrossRef]
He, J.; Weng, L.; Xu, X.; Chen, R.; Peng, B.; Li, N.; Xie, Z.; Sun, L.; Han, Q.; He, P.; et al. DEKR-SPrior: An Efficient Bottom-Up Keypoint Detection Model for Accurate Pod Phenotyping in Soybean. Plant Phenomics 2024, 6, 0198, ISSN 2643–6515. [Google Scholar] [CrossRef] [PubMed]
Mathew, J.; Delavarpour, N.; Miranda, C.; Stenger, J.; Zhang, Z.; Aduteye, J.; Flores, P. A Novel Approach to Pod Count Estimation Using a Depth Camera in Support of Soybean Breeding Applications. Sensors 2023, 23, 6506. [Google Scholar] [CrossRef] [PubMed]
Zhao, K.; Li, J.; Shi, W.; Qi, L.; Yu, C.; Zhang, W. Field-Based Soybean Flower and Pod Detection Using an Improved YOLOv8-VEW Method. Agriculture 2024, 14, 1423. [Google Scholar] [CrossRef]
Wu, K.; Wang, T.; Rao, Y.; Jin, X.; Wang, X.; Li, J.; Zhang, Z.; Jiang, Z.; Shao, X.; Zhang, W. Practical framework for generative on-branch soybean pod detection in occlusion and class imbalance scenes. Eng. Appl. Artif. Intell. 2025, 139, 109613, ISSN 0952-1976. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2009, 88, 303–338. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10886–10895. [Google Scholar]
Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. YOLO-FaceV2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Jocher, G.; Stoken, A.; Chaurasia, A.; Borovec, J.; Kwon, Y.; Michael, K.; Changyu, L.; Fang, J.; Skalski, P.; Hogan, A. ultralytics/yolov5: v6. 0-YOLOv5n‘Nano’models, Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo 2021. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Ait Mehdi, M.; Belattar, K.; Souami, F. An Enhanced Blood Cell Counting System Using Swin Transformer with Dynamic Head and KNN Model. In Proceedings of the Artificial Intelligence Doctoral Symposium, Singapore, 15 July 2023; pp. 95–106. [Google Scholar]
Yuan, K.; Wang, Q.; Mi, Y.; Luo, Y.; Zhao, Z. Improved Feature Fusion in YOLOv5 for Accurate Detection and Counting of Chinese Flowering Cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee) Buds. Agronomy 2023, 14, 42. [Google Scholar] [CrossRef]
Gu, B.; Wen, C.; Liu, X.; Hou, Y.; Hu, Y.; Su, H. Improved YOLOv7-Tiny Complex Environment Citrus Detection Based on Lightweighting. Agronomy 2023, 13, 2667. [Google Scholar] [CrossRef]
Yang, Y.; Zhang, G.; Ma, S.; Wang, Z.; Liu, H.; Gu, S. Potted Phalaenopsis Grading: Precise Bloom and Bud Counting with the PA-YOLO Algorithm and Multiviewpoint Imaging. Agronomy 2024, 14, 115. [Google Scholar] [CrossRef]

Figure 3. Schematic diagram of image annotation.

Figure 6. Schematic diagram of the DBB module.

Figure 8. Schematic diagram of the SEAM attention mechanism.

Figure 9. Comparison of YOLOv8n and YOLOv8n-POD.

Figure 10. Comparison of YOLOv8n and YOLOv8n-POD detection results.

Figure 11. Performance comparison of different single-stage models.

Figure 12. Pod counting results. (A,B): Sparse soybean pods. (C,D): Dense soybean pods.

Figure 13. Detection results on three public datasets.

Table 1. Experimental environment configuration.

Name	Configuration
Operating System	Win10 64
CPU/GHz	Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60 GHz, 16 vCPU
GPU	RTX 3090 (24 GB)
CUDA Version	11.8
Torch	2.0.0
Image Size	640 × 640
Batch	32
Epoch	200

Table 2. Model parameters.

Name	Value	Notes
Epochs	200	Training Epochs
Batch	128	Batch Size
Imgsz	640	Input Image Size
Workers	4	Number of Processes for Loading Data
Optimizer	SGD	Optimizer Used: SGD, indicating Stochastic Gradient Descent.
Seed	0	Random Seed: Used for experiment reproducibility.
lr0	0.01	Initial Learning Rate: Starting rate for learning.
lrf	0.01	Final Learning Rate: Used for annealing the learning rate adjustment.
Momentum	0.937	Momentum
Weight_decay	0.0005	Weight Decay Rate
Warmup_epochs	3.0	Number of Warmup Epochs
Warmup_momentum	0.8	Momentum during Warmup
Warmup_bias_lr	0.1	Learning Rate for Bias during Warmup
Box	7.5	Weight of Loss for Bounding Box

Table 3. Combination results of different mainstream backbones.

Model	Mainstream Network	c2f	Detect Head	Loss	Attention	M	GFLOPs	mAPval@0.5	mAPval@0.5:0.95
YOLOv8n	×	×	×	×	×	3.011043	8.2	0.778	0.414
YOLOv8n-LAWDS	√	×	×	×	×	2.681123	8.1	0.793	0.444
YOLOv8n-CSP-EDLAN	√	×	×	×	×	2.401075	6.6	0.805	0.450
YOLOv8n-MLCA-ATTENTION	√	×	×	×	×	3.011053	8.2	0.775	0.416
YOLOv8n-RevCol	√	×	×	×	×	2.282035	6.4	0.748	0.381
YOLOv8n-EfficientViT	√	×	×	×	×	4.011507	9.5	0.77	0.400
YOLOv8n-LSKNet	√	×	×	×	×	5.894209	19.7	0.752	0.388
YOLOv8n-VanillaNet	√	×	×	×	×	6.410613	205.4	0.809	0.439

Table 4. Combination results of different C2f modules.

Model	Mainstream Network	c2f	Detect Head	Loss	Attention	M	GFLOPs	mAPval@0.5	mAPval@0.5:0.95
YOLOv8n	×	×	×	×	×	3.011043	8.2	0.778	0.414
YOLOv8n-C2f-GOLDYOLO	×	√	×	×	×	5.976019	10.2	0.779	0.413
YOLOV8N-C2f-DBB	×	√	×	×	×	4.283043	11.2	0.804	0.453
YOLOv8n-C2f-EMSC	×	√	×	×	×	2.722787	7.7	0.080	0.449
YOLOv8n-C2f-SCcConv	×	√	×	×	×	2.634467	7.2	0.757	0.39
YOLOv8n-C2f-EMBC	×	√	×	×	×	3.359619	7.1	0.778	0.4171
YOLOv8n-C2f-DCNV3	×	√	×	×	×	2.896302	8.0	0.795	0.431
YOLOv8n-C2f-DAttention	×	√	×	×	×	3.078819	8.2	0.796	0.446

Table 5. Combination results of different detection head modules.

Model	Mainstream Network	c2f	Detect Head	Loss	Attention	M	GFLOPs	mAPval@0.5	mAPval@0.5:0.95
YOLOv8n	×	×	×	×	×	3.011043	8.2	0.778	0.414
YOLOv8n-p2	×	×	√	×	×	2.926692	12.4	0.807	0.442
YOLOv8n-EfficientHead	×	×	√	×	×	3.838947	8.2	0.788	0.439
YOLOv8n-dyhead	×	×	√	×	×	5.472527	11.9	0.813	0.469
YOLOv8n-DBB-p2	×	√	√	×	×	4.230372	16.0	0.814	0.455
YOLOv8n-EMSC-dyhead	×	√	√	×	×	5.541299	11.6	0.804	0.445
YOLOv8n-LAWDS-dyhead	√	×	√	×	×	5.142607	11.7	0.813	0.451
YOLOv8n-VanillaNet-dyhead	√	×	√	×	×	66.56761	209.1	0.478	0.161
YOLOv8n-DBB-dyhead	×	√	√	×	×	5.466943	11.7	0.822	0.472
YOLOv8n-DBB-DCNv3-dyhead	×	√	√	×	×	5.357402	11.7	0.812	0.461
YOLOv8n-C2f-DBB-EfficientHead	×	√	√	×	×	4.833619	8.9	0.806	0.469

Table 6. Results of combining different main mechanisms with the loss function.

Model	Mainstream Network	c2f	Detect Head	Loss	Attention	M	GFLOPs	mAPval@0.5	mAPval@0.5:0.95
YOLOv8n	×	×	×	×	×	3.011043	8.2	0.778	0.414
YOLOv8n-DBB-dyhead-nwd-0.5	×	√	√	√	×	6.744527	14.9	0.798	0.441
YOLOv8n-DBB-dyhead-inner-1.15	×	√	√	√	×	6.469372	14.3	0.781	0.411
YOLOv8n-DBB-dyhead-TripletAttention	×	√	√	×	√	5.499511	11.8	0.823	0.470
YOLOv8n-DBB-dyhead-MPCA	×	√	√	×	√	7.106511	15.0	0.811	0.460
YOLOv8n-DBB-dyhead-CPCA	×	√	√	×	√	6.904335	14.8	0.810	0.457
YOLOv8n-DBB-dyhead-BiLevelRoutingAttention_nchw	×	√	√	×	√	7.043023	15.0	0.808	0.454
YOLOv8n-DBB-dyhead-EMA	×	√	√	×	√	6.787663	15.0	0.809	0.452
YOLOv8n-DBB-dyhead-SEAttention	×	√	√	×	√	6.785487	15.0	0.816	0.466
YOLOv8n-DBB-dyhead-BAMBlock	×	√	√	×	√	6.797616	15.0	0.811	0.453
YOLOv8n-dyhead-SEAM	×	×	√	×	√	5.576015	12.0	0.808	0.452
YOLOv8n-DBB-dyhead-SEAM	×	√	√	×	√	5.574431	11.8	0.831	0.465

Table 7. Detection results of single-stage detection models.

Model	MAP50(%)	R (%)	Parameters (M)	GFLOPs (B)
SSD	51.8	12.8	23.612	60.7
YOLOv3	55.6	31.8	61.524	65.5
YOLOv4	71.1	51.5	63.938	59.5
YOLOv5n	77.3	68.3	2.508	7.2
YOLOv7 tiny	77.1	68.1	6.023	13.2
YOLOv8n	78.2	69.2	3.011	8.2
YOLOx-nano	79.4	66.8	0.896	2.5
YOLOv8n-POD	83.1	72.3	5.544	11.8

Table 8. Experimental results of YOLOv8n-POD model on three public datasets.

Category	Model	Batch	Precision	Recall	mAP@0.5	mAP@0.5:0.95
Tomato	YOLOv8n	48	0.87004	0.85273	0.92444	0.57567
	YOLOv8n-POD	48	0.8661	0.88203	0.92506	0.57835
	YOLOv8n	8	0.87452	0.81099	0.90322	0.54437
	YOLOv8n-POD	8	0.90041	0.82909	0.91466	0.56621
Pepper	YOLOv8n	48	0.89635	0.84809	0.92548	0.62801
	YOLOv8n-POD	48	0.96011	0.92185	0.97151	0.75281
	YOLOv8n	8	0.86725	0.76666	0.86944	0.54107
	YOLOv8n-POD	8	0.90974	0.82634	0.91443	0.62381
Wheat	YOLOv8n	48	0.90952	0.83051	0.90426	0.49095
	YOLOv8n-POD	48	0.9117	0.8226	0.90075	0.49815
	YOLOv8n	8	0.91276	0.8291	0.90562	0.49535
	YOLOv8n-POD	8	0.92037	0.84632	0.91846	0.5055

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, X.; Hua, Z.; Shi, H.; Zhu, D.; Han, Z.; Wu, G.; Deng, L. A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8. Agriculture 2025, 15, 617. https://doi.org/10.3390/agriculture15060617

AMA Style

Jia X, Hua Z, Shi H, Zhu D, Han Z, Wu G, Deng L. A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8. Agriculture. 2025; 15(6):617. https://doi.org/10.3390/agriculture15060617

Chicago/Turabian Style

Jia, Xiaofei, Zhenlu Hua, Hongtao Shi, Dan Zhu, Zhongzhi Han, Guangxia Wu, and Limiao Deng. 2025. "A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8" Agriculture 15, no. 6: 617. https://doi.org/10.3390/agriculture15060617

APA Style

Jia, X., Hua, Z., Shi, H., Zhu, D., Han, Z., Wu, G., & Deng, L. (2025). A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8. Agriculture, 15(6), 617. https://doi.org/10.3390/agriculture15060617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Soybean Pod Accuracy Detection and Counting Model Based on Improved YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Dataset Labeling and Augmentation for Enhanced Detection

2.3. YOLOv8

2.4. YOLO8n-POD Model

2.4.1. Diverse Branch Block (DBB) Module

2.4.2. Dynamic Head Module

2.4.3. Separated and Enhancement Attention Module (SEAM) for Soybean Pod Detection

2.5. Experimental Setting

2.6. Evaluation Criteria

3. Results

3.1. Comparison Between the YOLOv8n-POD Model and the Original YOLOv8n Model

3.2. Ablation Experiments

3.3. Comparison with Mainstream Object Detection Models

3.4. Results of Pod Counting

3.5. Evaluation of Generalization Ability and Practicality

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI