Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments

Niu, Qi; Ma, Wenjun; Diao, Rongxiang; Yu, Wei; Wang, Chunlei; Li, Hui; Wang, Lihong; Li, Chengsong; Wang, Pei

doi:10.3390/agriculture15101079

Open AccessArticle

Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments

by

Qi Niu

^1,2

,

Wenjun Ma

¹,

Rongxiang Diao

¹,

Wei Yu

¹,

Chunlei Wang

¹,

Hui Li

^1,2,

Lihong Wang

^1,2,

Chengsong Li

^1,2 and

Pei Wang

^1,2,*

¹

College of Engineering and Technology, Southwest University, Chongqing 400715, China

²

Key Laboratory of Agricultural Equipment in Hilly and Mountainous Areas, Southwest University, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(10), 1079; https://doi.org/10.3390/agriculture15101079

Submission received: 17 March 2025 / Revised: 20 April 2025 / Accepted: 14 May 2025 / Published: 16 May 2025

(This article belongs to the Special Issue Automation Strategy Using Machine Learning in Horticultural Crop Cultivation)

Download

Browse Figures

Versions Notes

Abstract

:

The harvesting of green Sichuan pepper remains heavily reliant on manual field operations, but automation can enhance the efficiency, quality, and sustainability of the process. However, challenges such as intertwined branches, dense foliage, and overlapping pepper clusters hinder intelligent harvesting by causing inaccuracies in target recognition and localization. This study compared the performance of multiple You Only Look Once (YOLO) algorithms for recognition and proposed a cluster segmentation method based on K-means++ and a cutting-point localization strategy using geometry-based iterative optimization. A dataset containing 14,504 training images under diverse lighting and occlusion scenarios was constructed. Comparative experiments on YOLOv5s, YOLOv8s, and YOLOv11s models revealed that YOLOv11s achieved a recall of 0.91 in leaf-occluded environments, marking a 21.3% improvement over YOLOv5s, with a detection speed of 28 Frames Per Second(FPS). A K-means++-based cluster separation algorithm (K = 1~10, optimized via the elbow method) was developed and was combined with OpenCV to iteratively solve the minimum circumscribed triangle vertices. The longest median extension line of the triangle was dynamically determined to be the cutting point. The experimental results demonstrated an average cutting-point deviation of 20 mm and a valid cutting-point ratio of 69.23%. This research provides a robust visual solution for intelligent green Sichuan pepper harvesting equipment, offering both theoretical and engineering significance for advancing the automated harvesting of Sichuan pepper (Zanthoxylum schinifolium) as a specialty economic crop.

Keywords:

green Sichuan pepper; YOLO; k-means++; cutting-point localization; cluster object detection

1. Introduction

The green Sichuan pepper (Zanthoxylum schinifolium), a globally significant spice crop, is predominantly cultivated in Southwest China, accounting for over 90% of the world’s total production. In 2022, the global cultivation area reached approximately 390,000 hectares, with an annual yield exceeding 650,000 metric tons, of which China contributed over 95% [1]. The industrial chain of green Sichuan pepper demonstrates substantial economic benefits, generating a per-mu output value of 10,000–20,000 Chinese Yuan (CNY) (fresh pepper purchase price: 8–15 CNY/kg) during peak seasons, with products exported to markets in Japan, South Korea, Europe, North America, and Southeast Asia. The agricultural process remains predominantly manual, with approximately 90% of harvesting operations still reliant on human labor; this is primarily attributable to the plant’s unique botanical characteristics, including its dense thorns, luxuriant foliage, and intricate clustered fruit formation. These biological constraints have severely limited technological adoption, resulting in an exceptionally low mechanization rate below 5% across cultivation systems. The most widely used harvesting method for green peppercorns (Zanthoxylum schinifolium) is manual pruning-based harvesting, which involves cutting off fruit-bearing branches (“branch pruning”)—typically relatively straight ones—along with fresh peppercorn clusters. The harvested branches are then processed to collect the fresh fruits either manually or mechanically. This method not only improves harvesting efficiency but also promotes the plant’s growth in the following year. However, the traditional manual harvesting process, which requires one hand to stabilize the branch while the other hand carefully plucks the clusters to preserve their integrity, suffers from low efficiency, high labor costs, and frequent thorn-induced injuries. These limitations significantly hinder the scalability of industrial production [2,3]. Consequently, accurate recognition of green Sichuan pepper clusters and precise cutting-point localization in complex environments represent critical challenges for intelligent harvesting equipment development.

In the field of economic crops recognition, deep learning-based multimodal sensing technologies continue to advance detection accuracy. Zhang et al. proposed a geographic traceability model for Sichuan pepper by integrating differential pulse voltammetry with artificial neural networks, achieving 100% accuracy in identifying 14 production regions through electrochemical fingerprint features, although the fine-grained segmentation of clustered fruit spikes remains unaddressed [4]. Jin et al. developed an improved YOLOv8n- dilation-wise residual–dilated re-parameterization block (DWRDRB) model incorporating dilated convolution and a generalized feature pyramid network, elevating apple detection mean Average Precision (mAP) to 81.68% in complex orchard environments, yet its adaptability to the spiny pericarp texture of green Sichuan pepper remains limited [5]. Wang et al. established a Backpropagation (BP) neural network-based chromatic grading system for green Sichuan pepper drying, achieving 98.04% classification accuracy via Red-Green-Blue to Hue-Saturation-Intensity (RGB-HSI) color space conversion, but omitted field-based in situ recognition scenarios [6]. Current research predominantly focuses on single-form fruits or static scenario detection, while dense small-target recognition for clustered green Sichuan pepper spikes still faces feature confusion issues, with existing algorithms exhibiting missed detection rates exceeding 15% under branch and leaf occlusion.

In cutting-point localization technology, multi-sensor fusion and dynamic modeling methods have emerged as research hotspots. Liang et al. enhanced the YOLOv5s-seg network architecture using deformable convolution and Re-parameterized Generalized Feature Pyramid Network(RepGFPN) feature fusion, achieving 91.2% localization accuracy for tomato pruning points, but failed to validate its applicability to needle-like cluster connective structures [7]. Bhattarai et al. proposed the AgRegNet density regression network, integrating spatial-channel attention mechanisms to achieve a 3.7 mm localization error for apple flower and fruit targets, though the system relies on high-precision RGB-D cameras [8]. Wang et al. developed a color evolution model for green Sichuan pepper drying by modifying greenhouse drying equipment and image-processing techniques, offering novel insights into feature extraction at cluster bases [6]. Current localization algorithms predominantly rely on fixed geometric constraints or single-depth sensors, struggling to adapt to the morphological diversity of green Sichuan pepper cluster bases, while the robustness of real-time operational systems remains inadequate.

This study focuses on the intelligent harvesting requirements of green Sichuan pepper. We propose research on the recognition of green Sichuan pepper clusters and cutting-point localization in complex environments, primarily focusing on constructing a dataset encompassing multi-illumination and multi-occlusion scenarios to train and compare the YOLOv5s, YOLOv8s, and YOLOv11s models, identifying the optimal recognition framework. Additionally, a K-means++-based cluster separation algorithm is implemented, which is combined with OpenCV to iteratively optimize the vertices of circumscribed triangles, solving for optimal cutting points under the constraint of minimizing the triangle area.

2. Materials and Methods

2.1. Green Sichuan Pepper Trees and Harvesting Environment

Green Sichuan pepper (Zanthoxylum schinifolium) presents as a deciduous shrub or small tree. The samples for this study were collected from a green Sichuan pepper plantation in Wanbao Village, Zhisheng Town, Rongchang District, Chongqing Municipality, China. The spacing of green Sichuan pepper trees typically ranges from 1 to 2 m, with optimal harvesting tree heights being between 0.5 and 2 m. The fruits grow in clusters primarily concentrated at the nodes of mature branches (new shoots) or leaf axils [9,10]. Since residual fruit stems from incomplete harvesting can hinder subsequent growth, priority is given to selecting the base of the fruit stem during harvesting to ensure the entire cluster is removed in one motion, minimizing adverse impacts on future yields.

The clusters exhibit a pagoda-shaped profile, with a central axis (stem) surrounded by fan-shaped fruit distributions. Compared to leaves and main branches, the fruits display a distinct light green coloration, providing critical visual cues for image recognition. Based on these characteristics, the geometric model of the cluster can be approximated as a conical surface, with its two-dimensional projection resembling a triangle [11]. This geometric framework underpins key research areas such as image recognition, harvesting path planning, and yield estimation, offering essential theoretical and modeling support for achieving precision and intelligence in green Sichuan pepper harvesting.

2.2. Green Sichuan Pepper Fruit Dataset

Given the complex visual characteristics of green Sichuan pepper in real-growth environments, such as the low chromatic contrast between fruits and leaves and the overlapping nature of clusters [12], this study focuses on individual green Sichuan pepper fruits as the core detection targets. A technical framework for cluster recognition was established using the widely adopted YOLO series of pre-trained models in computer vision.

The data collection covers complex lighting conditions (e.g., morning, noon, and afternoon conditions are shown in Figure 1), fruits of varying maturity levels, and artificially simulated foliage occlusion scenarios. We divided the original dataset into training, validation, and test sets, with data augmentation applied. During annotation via LabelImg, the strict guidelines below were followed: (1) only fully visible fruits (occluded area < 50%) were labeled; (2) background interference (e.g., soil and non-target plants) was excluded; (3) inter-cluster overlaps in natural growth states were preserved. The workflow and annotation examples are shown in Figure 2. The final annotated dataset contained 5886 labeled green Sichuan pepper targets.

The expanded scale and enhanced diversity of the dataset significantly improved the neural network’s feature discrimination capability, thereby strengthening the model’s generalization performance in complex scenarios. Diverse augmented samples were generated using techniques such as rotation, brightness adjustment, and noise injection to simulate real-world environmental variations, with a specific focus on occlusion robustness [13]. For instance, rotation was applied randomly within an angular range of −π to +π to mimic varying perspectives and partial occlusion patterns. Color adjustments (applied with a probability of 0.5) included randomized modifications to brightness, contrast, and saturation to account for lighting variations [14,15] while maintaining recognition under illumination-induced partial occlusions. Dynamic noise injection combining Gaussian (σ = 0.01 − 0.05) and impulse noise (5–15% pixel corruption) enhanced robustness to sensor-induced partial occlusions; example transformations are illustrated in Figure 3.

After augmentation, the final dataset comprised 14,504 images, with 32.7% containing severe occlusion. The validation set incorporated typical field scenarios. The data were divided into a training set (14,504 images), a validation set (2072 images), and a test set (2072 images). This partitioning comprehensively captured the morphological variations of green Sichuan pepper under diverse environmental conditions, ensuring the model’s generalization capability.

2.3. Comparative Analysis of Green Sichuan Pepper Recognition Algorithms Based on YOLO Architectures

This study systematically compared the architectural differences and optimization strategies of YOLOv5s, YOLOv8s, and YOLOv11s for green Sichuan pepper cluster recognition, with performance metrics summarized in Figure 4. YOLOv5s employed a focus slicing module and C3 cross-stage residual connections, achieving an inference speed of 120 FPS, but exhibited a high missed detection rate (>18%) for dense clusters, due to the limited fusion efficiency of its Feature Pyramid Network + Path Aggregation Network (FPN + PAN) structure for overlapping fruit spikes [8,16,17]. YOLOv8s enhances robustness in complex backgrounds through a Stem multi-branch convolution to preserve local details and a BiFPN dynamically weighted feature pyramid, though its reliance on Graphics Processing Unit (GPU) acceleration restricts edge deployment feasibility [7,18]. YOLOv11s introduces a Re-parameterized VGG(RepVGG) dynamic inference architecture and DySample dynamic upsampling, reducing parameters by 30% and enabling compatibility with edge devices like Jetson Nano, albeit with susceptibility to glare interference under high-light conditions. The experimental results indicated that YOLOv8s achieves optimal comprehensive performance in complex environments (F1 score = 89.7%), while YOLOv11s improves recall for dense targets by 15% [8,19,20]. However, the existing models still lack precision in localizing blurred connective bases of green Sichuan pepper clusters, necessitating integration with geometric optimization algorithms to address technical bottlenecks [21].

2.4. Algorithm Design for Green Sichuan Pepper Cutting-Point Localization

The study adopted the YOLO object detection model to identify green Sichuan pepper clusters, with the specific workflow illustrated in Figure 5. By setting the parameter save_txt as True, text files containing bounding box information were generated in real time. The output text files shared the same name as the original images with a txt suffix, wherein each line included the target label type and normalized coordinates of the top-left vertex (x₁,y₁) and the bottom-right vertex (x₂,y₂) of the detected bounding box. After the structured parsing of the text data using the Pandas read_csv function, a geometric center calculation model for bounding boxes was established, based on Equation (1):

x_{c} = \frac{x_{1} + x_{2}}{2}, y_{c} = \frac{y_{1} + y_{2}}{2}

(1)

where (x_c,y_c) are the center coordinates of a single green Sichuan pepper in the image coordinate system. By iterating through all bounding box data and performing center point calculations, a comprehensive spatial coordinate dataset of green Sichuan peppers was constructed.

Given the coexistence of multiple clusters in green Sichuan pepper images, this study employed the K-means++ algorithm from the Scikit-Learn 1.3.0 library for cluster segmentation [22]. The objective function is defined as follows (Equation (2)):

J = \sum_{n - 1}^{N} \sum_{k - 1}^{K} r_{nk} {‖x_{n} - μ_{k}‖}_{2}^{2}

(2)

where N is the total number of samples; K is the preset number of clusters; rnk is the membership indicator (binary) of sample n to cluster k; x_n is the feature vector of the n-th sample; and μ_k is the centroid vector of the k-th cluster.

A larger preset K value implies more initial centroids, which exponentially increases computational complexity. This approach amplifies inter-cluster distinctiveness while reducing intra-cluster variation. The statistical analysis of the collected green Sichuan pepper images revealed that the number of cluster centers, though variable, rarely exceeds 10. Therefore, by iterating K ∈ [1, 10] and calculating the objective function values, the optimal cluster count was determined via the elbow method, which identifies the curvature inflection point. This strategy effectively balances computational efficiency with clustering performance.

Based on the coordinate dataset after cluster segmentation, the convex hull algorithm was employed to preselect contour feature points, followed by the application of OpenCV’s minEnclosingTriangle() function to solve the minimum enclosing triangle. This algorithm optimizes the inclusion validation process of three-point combinations using the gradient descent method, significantly reducing time complexity. After determining the optimal triangle, the Euclidean distances from each vertex to the midlines of their opposite edges are calculated, as follows:

d_{i} = \frac{2 S}{‖e_{i}‖}

(3)

where S is the area of the triangle and e_i denotes the vector of the i-th edge. The midline corresponding to the maximum d_i is selected, and its midpoint is extended at a specific ratio to determine the optimal cutting point.

3. Results and Analysis

3.1. Comparative Training of Green Pepper Recognition Algorithm Models Based on YOLO Architecture

3.1.1. Model Training Configuration and Evaluation Metrics

To improve the model generalization capability, specific parameter configurations were implemented in the default.yaml, with a focus on optimizing image augmentation parameters. Key adjustments included setting the degrees to 0.1, scale to 0.5, mosaic to 1.0, and setting the parameters of fliplr as 0.5, while retaining the default YOLO framework settings for other augmentation parameters to ensure system integrity and stability. The hardware and software environment for model training in this study was configured as follows: the CPU model is an Intel i5-10200H(Manufacturer: Intel Corporation (Santa Clara, CA, USA); device is produced in the China.), and the GPU model is a Giga Texel Shader eXtreme(GTX) 1650 (Laptop)(Manufacturer: NVIDIA Corporation (Santa Clara, USA); device is produced by partners in Taiwan). The training parameters were set to 300 epochs with a batch size of 8. The stochastic gradient descent (SGD) optimizer was selected, with the input image size fixed at 640 × 640 pixels and the initial learning rate set to 0.01. The deep learning framework utilized Python IDE 3.8 (PyCharm 2023.2.5) and PyTorch 2.1.0. The loss function was based on the Intersection over Union (IoU) [23], whose calculation formula is defined as follows:

IoU = \frac{A \cap B}{A \cup B}

(4)

where A is the manually annotated area (cm²) and B is the machine-identified region area (cm²).

To evaluate the comparative performance of YOLOv5s, YOLOv8s, and YOLOv11s, this study employed the following evaluation metrics: Precision, Recall, mAP@0.5 (mean Average Precision at IoU 0.5), correlation coefficient, and two-sample t-test statistic.

3.1.2. Comparative Training Results and Analysis of Models

The final training results of the YOLOv5s, YOLOv8s, and YOLOv11s models are presented in Figure 6 and Table 1 below.

In the pepper target detection task, YOLOv11s exhibited a notable performance paradox: its mAP@0.5 value decreased by 24.4% compared to previous models (from 0.753 to 0.567), while its recall rate increased by 20.6% (from 0.754 to 0.910). Through systematic error analysis, this study attributed this phenomenon to the interaction between data labeling strategies and model detection capabilities. Specifically, the conservative labeling strategy applied to non-subject areas (such as occluded fruits by foliage and background interference) during annotation, combined with YOLOv11s’s enhanced feature extraction network that improves its recognition of partially occluded targets, resulted in numerous unannotated regions being misidentified as valid targets (as shown in Figure 7).

Although this systemic misidentification reduced the mAP@0.5 metric, it paradoxically better aligned with the actual field conditions in agricultural settings for green pepper detection.

In terms of real-time performance, the experimental results of the 480 P video stream frame-by-frame detection for 20 pepper targets demonstrated that YOLOv8s achieved the most outstanding inference efficiency (12.55 ms per frame, ~79 fps), showing a 5.3-fold improvement over YOLOv5s. Notably, YOLOv11s maintained a real-time detection capability of 28 fps while compressing its model size to 5.19 MB.

An analysis of real-time recognition results across YOLO versions for green pepper berries revealed that, for unoccluded targets, the detection performance gradually improved with successive YOLO iterations. As shown in Figure 8, YOLOv11s demonstrated significant enhancements over YOLOv5s and YOLOv8s, with particularly evident improvements in recognizing berries under low-light conditions. For occluded berries, however, as illustrated in Figure 9, the detection capabilities of YOLOv5s and YOLOv8s were notably inferior to those of YOLOv11s.

3.2. Green Pepper Cutting Point Experiment

3.2.1. Cutting Point Evaluation Parameters

To achieve the precise and comprehensive evaluation of cutting points, an evaluation method based on the sum of absolute and relative errors in images was adopted. The corresponding formula is as follows:

σ_{i} = \sqrt{{({\bar{x}}_{i} - x_{i 0})}^{2} + {({\bar{y}}_{i} - y_{i 0})}^{2}}

(5)

ε_{i} = \frac{σ_{i}}{D_{i}}

(6)

For harvesting points, the analysis focused on single-error analysis. By conducting in-depth analysis of individual harvesting point errors, this approach enables a more granular understanding of precision performance during green pepper cluster harvesting, facilitating targeted optimizations and adjustments to picking operations. In branch prediction, the method employed a strategy of evenly dividing 20 points along branches and calculating their overall error to comprehensively evaluate the accuracy and reliability of branch predictions. This multidimensional, fine-grained error analysis approach helps to thoroughly investigate and assess the recognition precision and stability of feature points related to green pepper clusters across different dimensions.

Based on the topological structure and spatial error characteristics of green pepper clusters, this study established a three-tiered cutting-point classification system (as illustrated in Figure 10):

Node cutting points: located at main stem junctions, their precise identification ensures structural integrity of the plant and guarantees the effectiveness of subsequent processing workflows.

Partial cutting points: situated in secondary branch bifurcation zones, these points enable localized precision operations.

Invalid cutting points: these are deemed operationally irrelevant due to spatial deviations or placement in non-structural regions.

3.2.2. Experimental Results and Error Analysis of Green Pepper Cutting-Point Program

In this study, 40 validation set images from a single trial were processed using K-means++ clustering and minimum bounding triangle algorithms, identifying a total of 104 cutting points, with 72 valid cutting points (validity rate: 69.23%). The recognition workflow involves multi-algorithm collaboration, including object detection, cluster analysis, and geometric modeling. Although cumulative effects of errors occurred during algorithm execution, the system achieved an average spatial deviation rate below 20 mm after measurement and calculations. The experimental results demonstrated that the overall recognition accuracy remained sufficient to meet practical operational requirements despite multi-algorithm error superposition, with partial outcomes visualized in Figure 11.

A comparative analysis between this study and Yang et al.’s research reveals distinct differences and connections. Yang et al.’s approach utilized the K-means clustering algorithm to identify pepper fruits, employing a planar mass-point system model to calculate centroids by addressing the discrete characteristics of pepper clusters [24]. They combined the Otsu algorithm with maximum entropy thresholding for linear composition to extract pepper fruit branches, acquiring depth data using monocular cameras. By constraining the shortest distance from Hough lines to centroids, they determined picking point coordinates in images, subsequently locating 3D world coordinates through coordinate system conversion. Yang et al.’s traditional machine vision solution demonstrated specific performance metrics, achieving a 90% detection rate but with an average 2D directional deviation of 43.8 mm. In contrast, this study exhibits significant precision advantages over conventional methods. However, it should be explicitly noted that discrepancies in harvesting point definitions across studies, combined with measurement interference factors when targeting small objects, inevitably introduce certain error margins. Through the comprehensive consideration of various complex factors, it can be reasonably concluded that, when Two-Dimensional (2D) directional deviations of harvesting points across different studies remain within comparable ranges, the existing algorithmic performance level can be essentially considered to have been achieved. This comparative framework provides critical insights for objectively evaluating technical solutions under heterogeneous experimental conditions.

A thorough investigation into the primary causes of invalid cutting points can be attributed to three critical factors. Firstly, when pepper clusters in images exhibit excessive dispersion or contain incomplete Sichuan pepper clusters, effective clustering into a unified target object within 2D space becomes unattainable. This lack of cohesive clustering undermines subsequent cutting-point determination based on cluster analysis, ultimately leading to invalid cutting points. Secondly, the application of the elbow method imposes specific prerequisites: images must contain at least two–three pepper clusters. This algorithm relies on multi-cluster distribution patterns to determine optimal clustering numbers, rendering it inapplicable to single-cluster scenarios. In cases where only one cluster exists, the elbow method fails to function properly, resulting in inaccurate cutting-point identification. Thirdly, the absence of robust fuzzy processing mechanisms and outlier exclusion algorithms in current methodologies introduces vulnerabilities. When evaluating aggregated data, the presence of significantly deviating individual samples can cause severe centroid displacement. Such displacement propagates errors into stem direction predictions, potentially generating predictions entirely misaligned with the actual growth orientation of pepper clusters. This cascading effect not only invalidates cutting points but also degrades the accuracy and reliability of the entire pepper recognition and processing pipeline, highlighting systemic limitations in algorithmic robustness. This systematic breakdown underscores the necessity for enhanced clustering adaptability, algorithm precondition validation, and outlier mitigation strategies to improve cutting-point validity in practical applications.

In practical applications, the detection results can be integrated with the AR4 robotic arm through the Robot Operating System(ROS)-Melodic platform, where the cutting points in the image coordinate system are transformed into the robotic arm’s Cartesian space. This enables dynamic harvesting operations leveraging a path-planning response time of 0.24 s, as validated through Gazebo simulations. Under the collaborative control of OpenCV and MoveIt, the robotic arm’s end-effector can precisely maneuver along the extension line of the triangle’s median to achieve movement toward the predetermined cutting point.

4. Conclusions

This study focuses on the critical domain of image recognition and localization algorithms for green Sichuan pepper. Through rigorous and systematic research processes, a high-efficiency visual recognition model specifically tailored for green Sichuan pepper was developed and the precise localization of cutting points was achieved, leading to the following key conclusions:

(1) The multi-scale dataset (14,504 images) constructed based on the pepper’s complex morphological features, combined with data augmentation strategies, significantly enhanced the model’s adaptability to dense small targets and occluded scenarios. In a rigorous evaluation of object detection architectures for agricultural harvesting scenarios, YOLOv11s exhibited superior occlusion robustness with a 21.3% higher recall metric (0.91; YOLOv5s: 0.75) and 9.3% improved F1-score (0.754; YOLOv5s: 0.69). The synergistic combination of reparametrized convolution blocks and dynamic sampling mechanisms (RepVGG + DySample) enhanced discriminative feature learning for heavily occluded targets. While demonstrating a 24.4% lower mAP@0.5 (0.567; YOLOv8s: 0.753) due to conservative annotation protocols, its detection completeness advantage translated to a 63.4% reduction in field-level miss rate (9.0% versus YOLOv8s’s 24.6% baseline).

Comparative analysis revealed that YOLOv8s achieved marginal gains in conventional benchmarks (+0.4% mAP@0.5 and +5.5% precision over YOLOv5s), yet displayed 17.2% lower recall (0.754) under occlusion compared to YOLOv11s. Architecturally, YOLOv11s demonstrated 12.6% higher memory efficiency (5.19 MB versus 5.94 MB) and better computational trade-offs for edge deployment (28 FPS versus 79 FPS), establishing itself as a latency-tolerant solution for occlusion-dense agricultural environments. These findings highlight YOLOv11s’s optimal balance between detection reliability (recall-driven optimization), model compactness (12.6% size reduction), and operational robustness, making it particularly suitable for high-stakes agricultural applications where missed detections incur significant economic losses.

(2) The proposed K-means++ cluster segmentation and minimum enclosing triangle iterative optimization algorithm dynamically determined optimal cluster numbers (K = 1~10) through the elbow method. By solving triangle vertex parameters via gradient descent, it achieved breakthrough cutting-point positioning accuracy. The experimental results show an average cutting-point displacement of 20 mm with 69.23% valid cutting points, representing a 54.3% improvement in 2D-positioning precision compared to traditional machine vision solutions like that discussed by Yang et al. [24]. This validates the algorithm’s robustness in morphologically diverse scenarios.

Author Contributions

Conceptualization, Q.N. and W.M.; methodology, Q.N.; software, Q.N., R.D., and W.M.; validation, Q.N., L.W., and W.Y.; formal analysis, Q.N. and C.L.; investigation, Q.N., W.M., and R.D.; resources, C.W., H.L., and P.W.; data curation, H.L., W.Y., and L.W.; writing—original draft preparation, Q.N.; writing—review and editing, Q.N., C.L., R.D., and P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Undergraduate Innovation Training Program Supported Project (202210635001), the Technology Innovation and Application Development Special Project of Chongqing (cstc2021jscx-gksbX0007), and Fundamental Research Funds for the Central Universities (SWU-KQ24001).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, C. Current Status and Countermeasures of Standardization in China’s Zanthoxylum Bungeanum Industry. China Trop. Agric. 2024, 15–24. Available online: http://qqkk.kmwh.top/index.php?c=content&a=show&id=7657 (accessed on 19 April 2025).
Chen, Z. Current Development Status and Future Prospects of China Sichuan Pepper Industry. Contemp. Hortic. 2022, 45, 16–18. [Google Scholar] [CrossRef]
Xia, G.; Wang, M.; Li, H.; Ren, M.; Wahia, H.; Zhou, C.; Yang, H. Synergistic principle of catalytic infrared and intense pulsed light for bacteriostasis on green sichuan pepper (Zanthoxylum schinifolium). Food Biosci. 2023, 56, 103177. [Google Scholar] [CrossRef]
Zhang, D.; Lin, Z.; Xuan, L.; Lu, M.; Shi, B.; Shi, J.; He, F.; Battino, M.; Zhao, L.; Zou, X. Rapid determination of geographical authenticity and pungency intensity of the red Sichuan pepper (Zanthoxylum bungeanum) using differential pulse voltammetry and machine learning algorithms. Food Chem. 2024, 439, 137978. [Google Scholar] [CrossRef] [PubMed]
Jin, T.; Han, X.; Wang, P.; Zhang, Z.; Guo, J.; Ding, F. Enhanced deep learning model for apple detection, localization, and counting in complex orchards for robotic arm-based harvesting. Smart Agric. Technol. 2025, 10, 100784. [Google Scholar] [CrossRef]
Wang, J.; Xia, D.; Wan, J.; Hou, X.; Shen, G.; Li, S.; Chen, H.; Cui, Q.; Zhou, M.; Wang, J.; et al. Color grading of green Sichuan pepper (Zanthoxylum armatum DC.)dried fruit based on image processing and BP neural network algorithm. Sci. Hortic. 2024, 331, 113171. [Google Scholar] [CrossRef]
Liang, X.; Wei, Z.; Chen, K. A method for segmentation and localization of tomato lateral pruning points in complex environments based on improved YOLOV5. Comput. Electron. Agric. 2025, 229, 109731. [Google Scholar] [CrossRef]
Bhattarai, U.; Bhusal, S.; Zhang, Q.; Karkee, M. AgRegNet: A deep regression network for flower and fruit density estimation, localization, and counting in orchards. Comput. Electron. Agric. 2024, 227, 109534. [Google Scholar] [CrossRef]
Li, Y.; Zhuo, P.; Jiao, H.; Wang, P.; Wang, L.; Li, C.; Niu, Q. Construction of constitutive model and parameter determination of green Sichuan pepper (Zanthoxylum armatum) branches. Biosyst. Eng. 2023, 227, 147–160. [Google Scholar] [CrossRef]
Li, Y.; Li, B.; Jiang, Y.; Xu, C.; Zhou, B.; Niu, Q.; Li, C. Study on the Dynamic Cutting Mechanism of Green Pepper (Zanthoxylum armatum) Branches under Optimal Tool Parameters. Agriculture 2022, 12, 1165. [Google Scholar] [CrossRef]
Yang, L.; Zhang, Y.; He, Z.; Li, S.; Pu, Y.; Chen, W.; Yang, S.; Yang, M. Design and experiment of the rotating shear picking device forgreen Sichuan peppers. Trans. Chin. Soc. Agric. Eng. 2024, 40, 72–83. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, S.; Wang, C.; Wang, L.; Zhang, Y.; Song, H. Segmentation Method of Zanthoxylum bungeanum Cluster Based on Improved Mask R-CNN. Agriculture 2024, 14, 1585. [Google Scholar] [CrossRef]
Zeng, W. Image data augmentation techniques based on deep learning: A survey. Math. Biosci. Eng. 2024, 21, 6190–6224. [Google Scholar] [CrossRef]
Jiang, L.; Jin, P.; Geng, J. Research on Image Data Augmentation Technology. Ship Electron. Eng. 2024, 44, 19–21+93. [Google Scholar]
Azizi, A.; Zhang, Z.; Hua, W.; Li, M.; Igathinathane, C.; Yang, L.; Ampatzidis, Y.; Ghasemi-Varnamkhasti, M.; Radi; Zhang, M.; et al. Image processing and artificial intelligence for apple detection and localization: A comprehensive review. Comput. Sci. Rev. 2024, 54, 100690. [Google Scholar] [CrossRef]
Xu, Y.; Xiong, J.; Li, L.; Peng, Y.; He, J. Detecting pepper cluster using improved YOLOv5s. Trans. Chin. Soc. Agric. Eng. 2023, 39, 283–290. [Google Scholar] [CrossRef]
Li, G.; Gong, H.; Yuan, K. Research on lightweight pepper cluster detection based on YOLOv5s. J. Chin. Agric. Mech. 2023, 44, 153–158. [Google Scholar] [CrossRef]
Ji, W.; He, L. Study on the Application of YOLOv8 in Sichuan Pepper Recognition in Natural Scenes. China-Arab States Sci. Technol. Forum 2024, 45–49. Available online: http://www.zakjlt.cn/ (accessed on 19 April 2025).
Lee, Y.-S.; Patil, M.P.; Kim, J.G.; Seo, Y.B.; Ahn, D.-H.; Kim, G.-D. Hyperparameter Optimization for Tomato Leaf Disease Recognition Based on YOLOv11m. Plants 2025, 14, 653. [Google Scholar] [CrossRef]
Khanam, R.; Asghar, T.; Hussain, M. Comparative Performance Evaluation of YOLOv5, YOLOv8, and YOLOv11 for Solar Panel Defect Detection. Solar 2025, 5, 6. [Google Scholar] [CrossRef]
Huang, H.; Zhang, H.; Hu, X.; Nie, X. Recognition and Localization Method for Pepper Clusters in Complex Environments Based on Improved YOLO v5. Trans. Chin. Soc. Agric. Mach. 2024, 55, 243–251. [Google Scholar] [CrossRef]
Cui, J.; Xu, J. Research and Implementation of the K-means++ Clustering Algorithm. Comput. Knowl. Technol. 2024, 20, 78–81. [Google Scholar] [CrossRef]
Yu, Q.; Han, Y.; Han, Y.; Gao, X.; Zheng, L. Enhancing YOLOv5 Performance for Small-Scale Corrosion Detection in Coastal Environments Using IoU-Based Loss Functions. J. Mar. Sci. Eng. 2024, 12, 2295. [Google Scholar] [CrossRef]
Yang, P.; Guo, Z. Vision recognition and location solution of Zanthoxylum bungeanum picking robot. J. Hebei Agric. Univ. 2020, 43, 121–129. [Google Scholar] [CrossRef]

Figure 1. Different lighting conditions. (a) Morning illumination conditions. (b) Noon illumination conditions. (c) Afternoon illumination conditions.

Figure 2. Manual annotation framework for single green Sichuan pepper with morphological verification.

Figure 3. Illustration of image data augmentation techniques in the dataset. (a) Original image. (b) Original image, rotated 180 degrees clockwise. (c) Original image with luminance enhancement. (d) Original image with reduced brightness. (e) Original image with salt-and-pepper noise added. (f) Original image with Gaussian noise added.

Figure 4. Comparative analysis of YOLO-based architectures for green Sichuan pepper recognition.

Figure 5. Cutting-point localization in green Sichuan pepper (The upper-right panel displays red rectangular box representing the YOLO detection frame; the lower-left panel illustrates manually drawn schematic lines indicating the triangular distribution of Zanthoxylum schinifolium; the lower-right panel presents computational results processed through OpenCV).

Figure 6. Precision–Recall (PR) curve of the training process on the Zanthoxylum dataset.

Figure 7. Comparison between manually annotated images and YOLOv11s’s detection results (The left panel illustrates manual annotation results, whereas the right panel displays detection outcomes generated by the YOLO algorithm).

Figure 8. Real-time recognition performance comparison of YOLO variants on unoccluded green Sichuan pepper fruits.

Figure 9. Real-time detection performance comparison of YOLO variants on occluded green Sichuan pepper.

Figure 10. Schematic diagram of three cutting-point categories (The blue circular markers in the figure denote the cutting points, sequentially arranged from top to bottom as follows: partially effective cuts, non-effective cuts, node-based optimal cuts).

Figure 11. Experimental validation of automated cutting-point localization. (a) Original images. (b) Annotated cutting point image with morphological validation.

Table 1. Comparative training performance of YOLOv5s, YOLOv8s, and YOLOv11s on the Zanthoxylum dataset.

Model	mAP@0.5	Precision	Recall	F1	File Size
YOLOv5s	0.750	0.707	0.750	0.69	13.7 MB
YOLOv8s	0.753	0.746	0.754	0.719	5.94 MB
YOLOv11s	0.567	0.730	0.910	0.754	5.19 MB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niu, Q.; Ma, W.; Diao, R.; Yu, W.; Wang, C.; Li, H.; Wang, L.; Li, C.; Wang, P. Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments. Agriculture 2025, 15, 1079. https://doi.org/10.3390/agriculture15101079

AMA Style

Niu Q, Ma W, Diao R, Yu W, Wang C, Li H, Wang L, Li C, Wang P. Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments. Agriculture. 2025; 15(10):1079. https://doi.org/10.3390/agriculture15101079

Chicago/Turabian Style

Niu, Qi, Wenjun Ma, Rongxiang Diao, Wei Yu, Chunlei Wang, Hui Li, Lihong Wang, Chengsong Li, and Pei Wang. 2025. "Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments" Agriculture 15, no. 10: 1079. https://doi.org/10.3390/agriculture15101079

APA Style

Niu, Q., Ma, W., Diao, R., Yu, W., Wang, C., Li, H., Wang, L., Li, C., & Wang, P. (2025). Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments. Agriculture, 15(10), 1079. https://doi.org/10.3390/agriculture15101079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Recognition of Green Sichuan Pepper Clusters and Cutting-Point Localization in Complex Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. Green Sichuan Pepper Trees and Harvesting Environment

2.2. Green Sichuan Pepper Fruit Dataset

2.3. Comparative Analysis of Green Sichuan Pepper Recognition Algorithms Based on YOLO Architectures

2.4. Algorithm Design for Green Sichuan Pepper Cutting-Point Localization

3. Results and Analysis

3.1. Comparative Training of Green Pepper Recognition Algorithm Models Based on YOLO Architecture

3.1.1. Model Training Configuration and Evaluation Metrics

3.1.2. Comparative Training Results and Analysis of Models

3.2. Green Pepper Cutting Point Experiment

3.2.1. Cutting Point Evaluation Parameters

3.2.2. Experimental Results and Error Analysis of Green Pepper Cutting-Point Program

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI