Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots

Zhang, Hongtu; Wang, Binbin; Su, Liyang; Yu, Zhongyi; Liu, Xinchao; Meng, Xiangsen; Zhao, Keyao; He, Xiongkui

doi:10.3390/agronomy15092163

Open AccessReview

Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots

by

Hongtu Zhang

^1,2,

Binbin Wang

^2,3,

Liyang Su

^2,3

,

Zhongyi Yu

^2,3

,

Xinchao Liu

³,

Xiangsen Meng

^1,2,

Keyao Zhao

^2,3 and

Xiongkui He

^2,3,*

¹

College of Engineering, China Agricultural University, Beijing 100083, China

²

College of Agricultural Unmanned System, China Agricultural University, Beijing 100193, China

³

College of Science, China Agricultural University, Beijing 100193, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(9), 2163; https://doi.org/10.3390/agronomy15092163

Submission received: 4 August 2025 / Revised: 28 August 2025 / Accepted: 8 September 2025 / Published: 10 September 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

In response to the global labor shortage in the pear industry, the use of robots for harvesting has become an inevitable trend. Developing pear harvesting robots for orchard operations is of significant importance. This paper systematically reviews the progress of three key technologies in pear harvesting robotics: Firstly, in the field of recognition technology, traditional methods are limited by sensitivity to lighting conditions and occlusion errors. In contrast, deep learning models, such as the optimized YOLO series and two-stage architectures, significantly enhance robustness in complex scenes and improve handling of overlapping fruits. Secondly, positioning technology has advanced from 2D pixel coordinate acquisition to 3D spatial reconstruction, with the integration of posture estimation (binocular vision + IMU) addressing occlusion issues. Finally, the end effector is categorized based on harvesting mechanisms: gripping–twisting, shearing, and adsorption (vacuum negative pressure). However, challenges such as fruit skin damage and positioning bottlenecks remain. The current technologies still face three major challenges: low harvesting efficiency, high fruit damage rates, and high equipment costs. In the future, breakthroughs are expected through the integration of agricultural machinery and agronomy (standardized planting), multi-arm collaborative operation, lightweight algorithms, and 5G cloud computing.

Keywords:

pear harvesting; harvesting robots; deep learning; object recognition; end-effector

1. Introduction

As one of the most important fruit resources worldwide, pears are widely valued for their unique flavor and nutritional benefits. According to the latest data [1], China’s annual pear production has reached 19.85 million tons, accounting for 75% of the global total, as shown in Figure 1. This remarkable achievement is not only reflected in production scale but also demonstrates the country’s comprehensive strength in industrial development. China has consistently ranked first in both pear cultivation area and total yield for consecutive years, establishing itself as a driving force in the global pear industry. As the world’s largest producer and consumer of pears, China continues to offer a model for high-quality development through its innovative practices in variety breeding, cultivation techniques, and industrial chain construction.

However, pear harvesting remains a typical labor-intensive stage within the production process, with a relatively low level of mechanization and a heavy reliance on manual labor. Statistics show [2] that labor input during the harvesting stage accounts for approximately 50% of the total labor required throughout the production management cycle, and harvesting costs can constitute as much as 50% to 70% of the total production expenditure. Therefore, achieving automated and intelligent fruit harvesting in orchards is both necessary and urgent. The development of agricultural harvesting robots holds significant practical importance in this context.

Beyond pear-specific advancements, several cross-cutting technologies are accelerating harvesting robotics across agriculture. Soft robotic grippers, inspired by biomimicry, enable damage-free handling of delicate fruits like tomatoes and strawberries through adaptive material compliance [3]. Swarm robotics systems, leveraging decentralized multi-agent coordination, show promise for large-scale orchard operations by enabling collaborative harvesting [4]. Unmanned aerial vehicles (UAVs) equipped with visual servoing now facilitate aerial harvesting in trellised crops, while edge AI processors deployed on mobile platforms allow real-time decision-making under computational constraints [5,6]. Sensor fusion (e.g., LiDAR + hyperspectral imaging) further enhances crop phenotyping and maturity assessment for diverse produce [7]. These innovations collectively address universal challenges—efficiency, damage reduction, and unstructured environment adaptation—laying groundwork for our pear-specific technical analysis.

In 1968, American scholars Schertz and Brown were the first to propose the concept of applying robotic technology to fruit and vegetable harvesting, which is widely regarded by the academic community as the starting point for research on agricultural harvesting robots. However, the early harvesting machines primarily employed mechanical shaking and pneumatic shaking methods, with low levels of automation and intelligence. This phase laid a certain theoretical foundation for the research on pear harvesting robots.

In 1984, the team led by Kawamura from Kyoto University in Japan achieved a substantial breakthrough. They developed the world’s first 5-degree-of-freedom articulated robot with autonomous positioning and operational capabilities, specifically designed for tomato harvesting. This robot utilized a vision system to identify fruit positions and a robotic arm to perform grasping and cutting actions, marking the official birth of modern harvesting robots and providing both conceptual research and technological reserves for the development of pear harvesting robots.

From the 1990s to the early 21st century, several research institutions began focusing on pear harvesting robots. During this period, progress was made in visual recognition and robotic arm control, but many challenges remained, including low recognition accuracy, slow harvesting speed, and poor adaptability to complex environments [8]. The complexity of pear growth environments, the irregular distribution of fruits, and the variability in shapes and colors of different pear varieties posed significant challenges for recognition and harvesting by robots.

From the early 21st century to 2020, with continuous advancements in computer technology, sensor technology, and artificial intelligence algorithms, pear harvesting robots made significant progress. The visual systems of robots became more advanced, enabling more accurate recognition of the position and maturity of pears. Some studies employed deep learning algorithms to improve the recognition rate of pears, allowing robots to better adapt to various lighting conditions and fruit occlusion scenarios. For instance, Wang [9] used MobileNetV3 as the backbone network, incorporated a hybrid attention mechanism, and optimized anchor box sizes to construct a lightweight YOLOv5s pear recognition model, achieving a recognition rate of 92.9% and reducing floating-point operations (FLOPs) by 32%.

At the same time, some robots began to feature autonomous navigation and obstacle avoidance capabilities, allowing them to move and operate independently in orchards. Meng et al. [10] introduced image edge detection technology to extract fruit edges for localization, with experiments showing that the visual system had high positioning accuracy, small errors, and strong reliability, providing reliable data support for autonomous navigation and harvesting tasks. The motion accuracy and flexibility of the robotic arm were also improved, enabling more precise harvesting actions. The team of Teng Juyuan from Chongqing University [11], based on the structure of the harvesting robotic arm, established a forward kinematic model using the D-H method, solved the inverse kinematics using matrix inverse multiplication, and optimized it based on the minimum energy criterion. They also conducted trajectory planning and simulation in joint space. The results indicated that the inverse kinematics solution was reliable, the trajectory was smooth and impact-free, and it met the practical requirements for harvesting.

During this period, numerous related research papers were published, including studies on improvements to robot visual recognition algorithms and optimization of robotic arm motion control strategies, all of which contributed to the continuous advancement of pear harvesting robot technology, as shown in Figure 2.

The commercialization of harvesting robots is accelerating globally, and pear harvesting robots have also made significant progress. Some companies have begun to launch commercial pear harvesting robot products or product-level prototypes, as shown in Figure 3. An Israeli robotics company [5] has developed a flying harvesting robot that can harvest various fruits, including pears, and has already been deployed in orchards in the United States, Italy, Israel, Chile, and other countries. Lv et al. [6] developed an intelligent autonomous fruit picking drone based on the YOLOv5 algorithm. This drone can effectively fly within orchards, accurately recognize fruits, and autonomously complete the harvesting task. At the same time, with the continuous maturation of technology and the gradual reduction in costs, pear harvesting robots are expected to see broader application and promotion in the coming years.

Pear harvesting robots utilize visual systems to identify and locate fruits, and they employ end effectors for harvesting. The core technology of their development lies in accurately recognizing and positioning the pears, while minimizing damage to both the fruits and the trees during the separation process. Therefore, pear harvesting robots generally consist of three key technologies: pear recognition technology, pear localization technology, and end effector technology.

Although existing research has extensively explored agricultural harvesting robot technologies, a systematic technical review specifically focused on pear fruit harvesting robots remains relatively scarce, particularly regarding the synergistic evolution and bottleneck analysis of the three core technologies: identification, positioning, and end effectors. This paper aims to systematically review the current state of key technologies for pear harvesting robots, with a focus on comparing and analyzing the performance advantages, disadvantages, and applicable scenarios of different technical pathways. It will also provide a quantitative assessment of their feasibility in real orchard environments. Furthermore, this study will identify critical challenges within the current technological framework (such as efficiency, damage rate, and cost) and propose future development directions, including interdisciplinary integration, multi-arm collaboration, lightweight algorithms, and agricultural practice adaptation. The goal is to provide a theoretical reference and practical guidance for the technological breakthrough and industrial application of pear fruit harvesting robots.

2. Recognition Technology for Pear Harvesting Robots

The primary task of pear harvesting robots is to accurately recognize pear targets. Currently, domestic researchers have proposed several methods for pear target recognition, which can be broadly divided into two categories [17], as shown in Figure 4.

The first category includes traditional image processing-based methods. These methods mainly rely on manually designed features such as color, shape, texture, or combinations of these and employ algorithms such as image morphological processing, edge detection, threshold segmentation, color difference methods, K-means clustering, and support vector machines (SVMs) for target segmentation and recognition [18,19,20,21].

The second category includes deep learning-based methods, which have rapidly developed in recent years. These methods leverage the powerful self-learning ability of deep neural networks to automatically learn and extract complex and robust feature representations directly from raw pear images, without relying on the cumbersome and subjective design of manual features. More importantly, deep learning enables end-to-end automatic recognition of target objects, greatly simplifying the recognition process [17].

2.1. Traditional Image Recognition Methods

In pear recognition research, the core process of traditional methods typically includes: image preprocessing (shown in Table 1), feature extraction, and segmentation and recognition based on specific algorithms [22].

Image preprocessing aims to enhance image quality, as shown in Figure 5, and suppress noise, as shown in Figure 6, providing a more reliable foundation for subsequent target recognition and segmentation.

For a color pear image, target recognition primarily focuses on the foreground pears [23], eliminating the influence of the background to better extract the pears from the image and improve the recognition success rate during the harvesting process.

Zhao [24] employed edge contour extraction to achieve initial foreground–background separation. They used binarization and morphological opening operations to eliminate the interference from the fruit stems and generated a mask, which was then applied to the original image for fruit segmentation and recognition. Yuan Zhao from Guizhou University [25] applied Gaussian filtering to prickly pear images, extracted RGB components, and performed histogram equalization on each component. They then used a specific color difference formula (0.57R-0.18G-0.2B) to determine the threshold for target–background separation. Niu [26] used the saturation (S) component of the HSV color space, performed median filtering for noise reduction, and determined a global segmentation threshold based on the valley of the S component’s histogram, achieving pear–background separation. However, recognition methods that rely mainly on a single feature or specific manually designed combinations are sensitive to environmental changes (e.g., lighting, occlusion), have poor robustness, and often require manual threshold adjustments.

To improve robustness and reduce dependence on manually set thresholds, researchers have introduced machine learning methods.

Support vector machine (SVM) [27,28], as a supervised learning algorithm, constructs an optimal classification hyperplane (which can handle non-linear problems using kernel tricks). SVM effectively utilizes a combination of features such as color, texture, and shape to train models that distinguish pears from the background. Zhang et al. [29] combined background removal, multi-feature (color, texture, shape) extraction, and PCA dimensionality reduction to construct and compare several multi-class kernel SVM models. Among them, the Gaussian kernel max-wins voting SVM achieved an 88.2% recognition accuracy for pears. Lei et al. [30] fused fruit color (color histograms) and epidermal texture features and constructed a recognition model using multi-feature fusion and an SVM. This model achieved an average recognition accuracy of 94% within 2 ms. Supervised learning iterates through data to learn the relationship between features and label information, establishing classification and regression models. Therefore, these models require high-quality data.

In contrast, unsupervised learning generally does not require data to be specially labeled, thus saving time on data annotation. Among unsupervised learning algorithms, the K-means (K-means clustering) algorithm is the most representative. Jiao et al. [31] applied K-means to the Lab color space for preliminary fruit–background segmentation. They then combined morphological erosion and dilation operations to refine the contours, successfully achieving rapid recognition of overlapping fruits.

The advantages of traditional pear recognition methods lie in their relatively low computational complexity and minimal data labeling requirements, making them suitable for deployment on resource-constrained embedded platforms. However, in complex orchard harvesting scenarios, they still face significant challenges (Figure 7):

(a) Interference from Complex Backgrounds: Background objects such as tree branches have high similarity in color and texture with pears. The feature differentiation capability of manually designed features, which traditional methods rely on, is limited, resulting in difficulties in segmentation and recognition.

(b) Impact of Lighting Variations: Differences in lighting intensity and angle across different parts of the fruit trees lead to significant changes in pear appearance features (such as color and shadows), making it difficult to accurately assess pear maturity. This severely affects the stability and accuracy of recognition algorithms.

(c) Fruit Density and Occlusion: Pears that grow densely or are stacked on top of each other can cause recognition algorithms to miss fruits that are occluded (false negatives) or mistakenly classify densely packed fruits as a single entity or background (false positives).

2.2. Deep Learning Recognition Methods

Compared to traditional fruit recognition methods that rely on manually designed features (such as color, texture, and shape) in digital image processing and machine learning, deep learning has achieved significant success in pear recognition tasks due to the powerful self-learning capability of deep neural networks. By automatically learning discriminative features from large volumes of labeled images, deep learning models can maintain high recognition rates and robustness in complex backgrounds, enabling precise identification and classification of different pear varieties.

Currently, deep learning-based object detection methods have become mainstream for pear recognition, primarily divided into single-stage and two-stage object detection algorithms. The differences between them are shown in Table 2 [17,33,34,35].

Single-Stage Object Detection Algorithms:

To tackle the challenges of pear fruit detection in natural environments (such as complex lighting, occlusion, background interference, and small targets), researchers have continuously improved the performance of models based on the YOLO series, focusing on three aspects: accuracy optimization, model lightweighting, and feature enhancement, as shown in Table 3.

Accuracy Optimization and Robustness Improvement: Wang et al. [36] were the first to improve YOLOv3 by reducing the depth of the backbone network and introducing a dense convolution module (based on Dense Net) to enhance feature reuse. They also optimized the bounding box regression using the GIoU loss function, significantly improving the model’s robustness under complex lighting and occlusion, with mAP increasing to 89.54%.

To further address difficulties caused by similar fruit and background colors, severe occlusion, and overlapping, Ma et al. [37] innovatively replaced SPP max pooling with average pooling in YOLOv4 (CSPDarknet53 backbone) to retain low-contrast target information. They also widely used depthwise separable convolutions, reducing the model size by 44%, while recall and mAP increased to 85.56% (+1.29%) and 90.18% (+0.1%), respectively.

Li et al. [38] focused on high-density occlusion and small target detection, proposing the YOLOv5s-FP multi-scale collaborative perception network. By integrating feature fusion and cross-scale perception mechanisms, this model achieved a 96.12% mAP (currently the highest reported) with relatively low computational cost, significantly improving performance in densely overlapping and sudden lighting change scenarios.

Model Lightweighting and Real-Time Performance Enhancement: Parico and Ahamed [39] conducted a systematic comparison within the YOLOv4 series and selected YOLOv4-tiny (AP 94.09%, FPS ≥ 24) as the optimal lightweight–accuracy balance solution. They successfully integrated it with Deep SORT to achieve real-time pear fruit counting (F1 87.85%), demonstrating the feasibility of using lightweight models for real-time applications in orchards.

Continuing along the lightweight path, Zheng et al. [40] replaced the YOLOv7 backbone network with the lighter MobileNetv3 and introduced the SloU loss function with angular loss. This significantly reduced the model’s parameter count and computational complexity, achieving an effective balance between speed and accuracy (94.36% accuracy, 89.28% recall).

Tan et al. [41] made extreme optimizations to YOLOv8n. The improved model achieved significant speedup in detection on both GPU and CPU, increasing by 34.0% and 24.4%, respectively. At the same time, the F0.5 score and mAP reached 94.7% and 88.3%, greatly enhancing the model’s deployment potential on resource-constrained edge devices.

Complex Background Feature Enhancement and Small Target Optimization: To address the challenges of distant small targets and complex background interference, Zhao et al. [42] designed a lightweight model based on YOLOv8-s. By innovatively employing multi-scale convolution and feature coupling mechanisms, the model enhances the feature extraction capability in complex backgrounds. Additionally, the DP-FPN structure (which integrates HSM/HOM/SWS modules) was used to deeply optimize multi-scale feature fusion, significantly improving the model’s ability to perceive small pears and complex scenes. This work complements Li’s [38] multi-scale perception approach, further strengthening the model’s adaptability to scale variations and background interference.

In addition to the YOLO series, researchers have employed novel architectures to improve pear fruit detection. Zhao et al. [32] introduced a lightweight visual architecture, Vmamba, and innovatively incorporated 3D selective scanning (SS3D) to enhance feature extraction in complex backgrounds. They also combined a reward–punishment mechanism (RPM) to suppress redundant information interference and constructed a stacked feature pyramid network (SFPN) to optimize the detection of densely packed small targets, achieving an mAP50 of 94.8%, with a 7.6% improvement in detection accuracy in dense scenes compared to the baseline.

Jia et al. [43] improved the SSD model by pruning, achieving precise multi-target recognition in complex environments. Their algorithm increased detection accuracy by 23.55% to 98.01% and the recall rate by 10.27 percentage points to 85.03%, and the accuracy continued to improve as sample size increased.

Yan et al. [44] improved Faster R-CNN by replacing ROI Pooling with ROI Align to enhance target localization accuracy, and combined it with a VGG16 backbone network, achieving a recognition accuracy of 95.16% and a detection efficiency of 0.2 s per sample.

Meng et al. [45] built on the Mask R-CNN framework and added mask and classification branches to the ResNet pre-trained model, achieving an average segmentation accuracy of 98.02% for mature pears (95.28% in occluded scenes). Zhu et al. [46] further optimized Mask R-CNN by replacing standard convolutions in ResNet50 with deformable convolutions to enhance feature adaptability. They also added a bottom-up path in the FPN to reinforce small target feature retention, improving mAP by 6% to 91.3%.

Faster R-CNN lays the foundation for the two-stage detection paradigm through end-to-end region proposal generation, offering significant accuracy advantages. Mask R-CNN extends this by adding a segmentation branch, realizing “detection-segmentation” integration, which is especially suitable for overlapping pear fruit scenes that require precise contours. The key differences between Faster R-CNN and Mask R-CNN are shown in the Table 4:

Siyu Pan and Ahamed Tofael’s experimental research [47] systematically revealed the core differences between Faster R-CNN and Mask R-CNN: Mask R-CNN simultaneously generates target masks and bounding boxes, while Faster R-CNN only outputs bounding boxes. This functional difference directly leads to performance disparity—in independent pear detection scenarios, Mask R-CNN significantly outperforms Faster R-CNN in terms of detection accuracy, but with relatively slower recognition speed. In pear aggregation scenarios, Mask R-CNN achieves high correct recognition rates due to its instance segmentation capability, while Faster R-CNN fails entirely due to its inability to distinguish overlapping individuals.

A rotational robustness test further validated this conclusion: when pears are dispersed, both models show only slight differences in bounding box size. However, when pears are densely distributed, Mask R-CNN is still able to effectively separate targets in rotated images, while Faster R-CNN’s recognition failure becomes more pronounced.

The comparative experimental results of pear fruit recognition algorithms are shown in Table 5.

As can be seen from Table 5, single-stage algorithms narrow the accuracy gap with two-stage algorithms by optimizing network structures (such as introducing Transformer and multi-scale feature fusion) and loss functions (such as focal loss and CIoU loss), while maintaining speed advantages. Two-stage algorithms, on the other hand, improve inference speed through lightweight designs (such as MobileNet backbone) and efficient proposal mechanisms, making them more adaptable to a wider range of practical scenarios.

The emergence of new paradigms, such as anchor-free and end-to-end models (e.g., DETR), is gradually blurring the boundaries between single-stage and two-stage approaches, driving the development of object detection towards more universal and efficient directions.

3. Localization of Pear Harvesting Robots

After the pear fruit is recognized in the image, it is necessary to determine the spatial coordinates of the picking point relative to the robot in order to guide the robot to perform the harvesting action using its motion execution system and end effector [48]. The specific process of 3D localization is shown in Figure 8.

The 3D localization of the picking point consists of two stages: first, based on the recognition and segmentation of the fruit, the 2D pixel coordinates of the pear are obtained; then, after obtaining the 2D information, algorithms such as stereo matching and 3D reconstruction are used to obtain the target’s 3D coordinates, 3D posture, and other information [49].

3.1. Two-Dimensional Information Acquisition

Currently, the acquisition of the 2D pixel coordinates of the picking point by harvesting robots is mainly divided into traditional methods, with the most common being centroid localization and minimum bounding circle localization. In addition to traditional methods for obtaining 2D image coordinates of the picking point, deep learning-based bounding box localization and instance segmentation methods are also commonly used for coordinate acquisition (Table 6).

Cheng et al. [50] used HSV analysis and Canny edge segmentation to distinguish mature citrus, background, and fruit stems, suppressing branch and leaf noise. After extracting the fruit contour points, they employed least squares ellipse fitting to calculate the centroid for accurate localization. Feng et al. [51] extracted the valid region of apples and performed contour chain code search to match concave points for separating overlapping fruits, using centroid coordinates for precise localization. However, when some pears have irregular shapes (such as concavities or partial occlusion), the centroid may deviate from the actual center, leading to localization errors. To address this issue, Niu et al. [52] combined convex hull theory and shape context algorithms to extract the fruit’s symmetry axis. They used the minimum bounding circle algorithm to fit the apple’s contour and locate the fruit’s center. For pear varieties with nearly spherical shapes, the minimum bounding circle method can be used for localization, which is robust in occlusion scenarios and has low localization errors.

For issues like occlusion of fruits in complex backgrounds, Ren et al. [53] improved the loss function (WIoUv3) and introduced an attention mechanism (EMA), significantly enhancing localization accuracy, providing high-reliability 2D localization for “Yuluxiang” pears in unstructured environments. Zhao et al. [42] used the YOLOv8 model combined with HSA attention (to suppress noise), DP-FPN (multi-scale localization), and MO-NMS (overlap separation) to achieve high-precision real-time 2D localization in complex orchards, providing core visual support for automated pear picking.

The core of determining the 2D image pixel coordinates lies in the “image segmentation → object localization → coordinate calculation” technical chain, which separates the fruit from the background and obtains its location. Traditional methods are suitable for simple scenarios, while deep learning performs better in complex environments. In practical applications, strategies like camera calibration and multi-frame optimization are needed to improve coordinate accuracy, providing reliable input for subsequent 3D localization and robotic arm control.

3.2. Depth Information Acquisition

The acquisition of depth information for the picking point is a key process in the conversion from 2D to 3D [54]. Common methods for obtaining depth information, which are shown in Figure 9, include monocular vision, multi-view vision, depth cameras, and laser ranging [55,56,57,58,59].

Liu et al. [59] proposed a monocular ranging method based on pixel changes, which utilizes the pinhole imaging principle and the mapping relationship between pixel count changes and imaging distance to obtain the fruit’s distance. This method is effective for short distances but lacks robustness, and its accuracy is significantly affected by the initial distance. Chen et al. [60], combining instance segmentation algorithms, proposed a fruit localization method based on a monocular RGB camera, achieving an average position error of 0.18. However, the localization accuracy is easily affected by fruit occlusion and overlap, with poor real-time performance.

To address the issue of monocular vision being significantly impacted by leaf occlusion and having low robustness, Xiong et al. [61] developed a binocular vision system by combining two CCD cameras, calculating the pedicel picking point through stereo matching. The maximum error was 5.08 cm. Ling et al. [62] developed a dual-arm collaborative robot with a binocular stereo camera mounted on the robot’s top to gain a wide field of view. SI et al. [63] used a binocular camera and implemented apple localization through fruit area features and epipolar geometry matching algorithms, achieving a positioning error of less than 20 mm in the measurement range of 400–1500 mm. However, binocular cameras are limited by the field of view, making it difficult to obtain 3D information beyond 180°.

Chen et al. [64] combined two binocular systems into a four-eye vision system, adopting an adaptive stereo matching strategy, and proposed a high-precision point cloud stitching algorithm to provide the robot with 3D spatial coordinate information. The multi-view (including binocular) vision method has low hardware cost and high depth map resolution and was widely used in early depth information acquisition devices.

Currently, depth cameras are more commonly used in picking robots. Pan et al. [47] used a 3D stereo camera to capture 9054 RGB-D images and extracted pear fruit surface information through depth images, achieving good positioning accuracy. Neupane, C et al. [65] compared eight different cameras and demonstrated that the TOF (time-of-flight) depth camera performs better under sunlight interference, making it suitable for pear fruit localization.

The methods mentioned above are all visual positioning methods, which are easily affected by changes in lighting and complex backgrounds in natural environments. The laser ranging method can effectively solve these issues in fruit localization. Kang et al. [66] proposed a localization method that integrates LiDAR and cameras. Experimental results showed that the positioning error of LiDAR in the 0.5–1.8 m range was only 0.245–0.275 cm.

Based on the research findings of the aforementioned scholars, the common technical types for obtaining depth information and their distinguishing methods are shown in Table 7.

Compared with the other three localization methods, monocular vision, while having a simple structure and low cost, requires algorithms to indirectly infer depth information, resulting in larger localization errors. Multi-view vision is a more commonly used method for fruit localization due to its low cost and high image resolution. However, its anti-interference ability is weak, and its positioning accuracy is easily affected by changes in lighting. Depth cameras have fast response times and can be used in low-light environments such as nighttime, greatly improving operational efficiency. Laser localization can achieve long-distance positioning of pears with high accuracy and good robustness, but its high cost limits its widespread application.

3.3. Pose Acquisition

When the pear’s growth posture is not obstructed, the picking robot can directly acquire the picking point. However, considering the complex environment of the orchard, most pears are likely to be obstructed. To address this issue, it is necessary to acquire the posture of the fruit. The team led by Liu from Shandong Agricultural University [70] combined binocular vision and IMU data to build a “visual-force-posture” closed-loop control system that dynamically adjusts the grasping posture to accommodate different pear shapes.

To solve the problem of pear obstruction during the picking process, Shi et al. [71] combined binocular vision (global positioning) and monocular vision (local fine-tuning) to construct a layered visual servo system that enables rapid localization and grasping of the pear’s spatial posture in complex backgrounds.

With the rise of RGB-D and other depth cameras, more research is focusing on using three-dimensional data to determine the appropriate grasping posture for each fruit. Lehnert et al. [12] used an RGB-D camera to scan sweet peppers and generate detailed point clouds, determining the best working posture for the robotic arm by analyzing the gradient of the normal vectors on the fruit’s surface. However, the scanning time is too long, which affects the picking efficiency.

To address this issue, Gao et al. [72] proposed an optimized picking direction method based on the local point cloud of the target apple: starting from the fruit’s center, they generate a direction vector perpendicular to the nearest branch to guide the robotic arm in avoiding obstacles. This method achieves an average direction error of 11.81°, with the time for calculating the direction of a single fruit being 0.543 s.

3.4. Pear Fruit Vibration Problem

According to the previous chapters, the fruit’s position and posture information have been clarified. However, during the actual picking process, due to the interference of mechanical arm collisions with branches, the pear fruit can experience oscillation, which lowers the success rate of the picking. To solve this problem, Li et al. [73] proposed a preprocessing algorithm based on morphological principles, which removes the fruitless branches and then clusters and fits the remaining branches to obtain the three-dimensional position data of the fruit branches. However, this method increases the computational load, and its real-time performance is poor.

To address this issue, Yang et al. [74] proposed a tracking and recognition method combined with affine transformation, using the correlation between previous and subsequent images to predict the position of the apple. This method achieves rapid tracking of oscillating apples, with a running time of 25 ms and a tracking error of less than 4%, significantly improving the speed and efficiency of the algorithm.

4. End Effector of the Pear Harvesting Robot

The end effector is a key component of the pear harvesting robot. Once the robotic arm reaches the harvesting position, the end effector completes the fruit picking process (Figure 10). It significantly influences the harvesting rate and damage rate of the robot. This chapter summarizes the end effector types based on different harvesting methods and drive mechanisms.

This figure outlines the closed-loop control workflow for adaptive harvesting. The process initiates when the visual system successfully identifies and locates a target pear. Subsequently, the robotic arm maneuvers the end effector into the harvesting position. A critical adaptive decision point follows: the system continuously evaluates whether the end effector’s motion trajectory is optimally aligned with the fruit. If misalignment is detected (“No”), the system feedbacks to recalibrate the robotic arm’s movement. Only upon successful alignment (“Yes”) does the system trigger the final actuator to execute the harvesting action, thereby completing the cycle. This iterative adjustment ensures robust performance in dynamic and unstructured orchard environments.

4.1. Harvesting Methods

The main gripping methods of the end effector are clamping and suction. After completing the gripping task, the pear must be separated from the pear tree, with common separation methods including twisting, pulling, and cutting [75,76,77]. Some common types of end effectors are shown in Figure 11.

The clamping–twisting end effector first secures the pear fruit during the picking process, and then the twisting mechanism rotates the clamped pear, causing the fruit stem to break and separate from the tree. This method requires minimal positioning accuracy of the pear and offers high robustness.

Bulanon et al. [81] used a two-finger twisting end effector, which is driven by a DC motor to grip the fruit; it then separates the stem and fruit through the rotation of a stepper motor. However, the two-finger grip lacks stability, which can only be improved by increasing the grip strength between the fingers, but excessive grip force may damage the surface of the pear.

To address this issue, Zhang et al. [82] proposed a human-like three-finger end effector, which demonstrated good stability in pear fruit gripping operations. However, as the number of mechanical fingers increases, the overall weight of the end effector significantly rises, leading to higher actuator energy consumption and increasing the complexity of the control system.

To mitigate this, Silwal et al. [83] proposed an underactuated, human-like three-finger twisting end effector, reducing the number of actuators and simplifying control. The clamping–twisting method requires sufficient gripping force to overcome the twisting reaction force. The pear’s waxy surface layer can cause unstable friction, and localized pressure can result in invisible bruising [84]. Liu et al. [85] conducted force analysis tests on the clamping and twisting of fragrant pears, selecting the appropriate twisting angle and incorporating flexible silicone material to solve these issues.

The clamping–cutting end effector first grips the fruit and, unlike the twisting method that relies on a twisting mechanism to separate the pear, uses a cutter to shear and quickly separate the pear from the tree.

Gao et al. [86] designed a back-mounted claw-type pear fruit picker, which first clamps the pear and then uses a blade to quickly cut the fruit stem, achieving a simple and non-damaging picking process. Xu et al. [87] used a cylinder to drive the blade for rotating and cutting the fruit stem, achieving separation between the fruit and the tree. This end effector has high picking efficiency but performs poorly with pears that have short stems.

The cutting method does not require direct contact with the fruit, which reduces surface damage compared to the twisting method. However, in complex environments, precise identification and localization of the fruit stem are required. To address this, Liu et al. [80] proposed using lasers to cut the fruit stem, a method that can achieve precise cutting, although the cutting speed makes it difficult to meet actual requirements.

Unlike the gripping and cutting end effectors, the suction-type end effector does not require precise positioning of the fruit’s pedicel. By utilizing negative pressure, it draws the fruit close to the air suction end effector, ultimately achieving the separation of the fruit from the branch for harvesting.

Baeten et al. [88] designed a suction-type harvesting end effector; when the end effector approaches the apple, the apple is sucked towards it, and rotation helps separate the fruit from the stem. Although the suction-type end effector does not require precise localization of the pear, gaps between the vacuum suction cup and the pear may lead to insufficient suction.

To address this, Wang et al. [89] developed a tomato picking robot. When the end effector moves to the target position, negative pressure generated by the vacuum pump is used to suction the fruit, which is then separated from the stem by gripping and rotating mechanisms.

Based on the research findings of the aforementioned scholars, the common clamping methods of end effectors, along with their respective advantages, disadvantages, and applicable scenarios, are shown in Table 8.

4.2. End Effector Drive Methods

The drive method of the end effector in a fruit picking robot directly determines its grasping force, speed, accuracy, flexibility, and the degree of damage to the crop. Therefore, selecting the appropriate drive method is a key step in designing an efficient, reliable, and damage-free fruit picking robot. The most commonly used drive methods for the end effector in pear fruit picking are motor-driven and pneumatic-driven systems.

Motor-driven systems can achieve quick response for the end effector, reducing crop damage with their high control precision. They also offer good compatibility and are easy to integrate. For instance, Guo et al. [90] used a motor-driven multi-joint gripper to achieve adaptive grasping of pear fruits, achieving high control precision in multi-fruit cluster scenarios, with a picking success rate of 96%. Li et al. [91] designed a barrel-shaped end effector, where the motor controls a blade and fruit gathering mechanism to enclose the fruit. Once the target fruit is fully inside the end effector, the blade works with the gathering mechanism to cut the fruit’s stem.

Pneumatic end effectors are driven by cylinders mounted externally. Wang et al. [92] developed a pneumatic robotic arm where the end effector is driven by vacuum air pressure for its gripping device. During the picking process, the fruit is first adsorbed and then the fruit stem is cut. Yang et al. [93] proposed a pneumatic pressure-driven spherical fruit picking robot end effector. Under a 2.0 MPa air pressure condition, this system achieved the best picking results, with a high success rate of 100% and an average picking time of 2.4 s per fruit, without causing any damage to the fruit. This system’s overall performance meets the requirements of most spherical fruit picking tasks and demonstrates a certain level of versatility.

The technological evolution of end effectors for pear fruit picking essentially represents progress in damage control and environmental adaptability. From early rigid gripping to bionic multi-finger collaboration, and from mechanical cutting to non-contact suction, its development has consistently focused on two core goals: reducing local stress and minimizing reliance on precise positioning. Although the current electropneumatic hybrid drive has significantly improved picking success rates and efficiency, the slipping damage caused by the waxy layer and the precise positioning of the fruit remain key bottlenecks for the industrialization of the technology.

5. Discussion and Future Perspectives

Building upon the technical foundations laid in Section 2, Section 3 and Section 4, this chapter aims to conduct an in-depth discussion and prospective analysis of the inherent connections, core contradictions, and transformative pathways for future breakthroughs in pear harvesting robotics research. This critical synthesis of the aforementioned literature, examined from the perspective of system integration and technological convergence, is first grounded in a quantitative comparison of the core technical parameters discussed throughout this article, as summarized in Table 9. This empirical framework will then serve to structure our analysis of the interoperability challenges and synergies between perception, localization, and manipulation modules, ultimately framing the promising avenues for future research.

5.1. Discussion

5.1.1. The Gap Between High-Precision Perception and Low-Success-Rate Execution

Current research exhibits a significant disconnect between perception and execution. While deep learning models achieve high-precision recognition in complex environments, computational latency, such as Mask R-CNN processing speed coupled with motion planning delays for robotic arms, results in prolonged single-fruit harvesting cycles, failing to meet efficiency requirements [38,47]. Critically, transient vision system failures under extreme occlusion or sudden illumination changes along with localization errors are amplified by end effectors, directly causing grasping failures and fruit damage. For example, suction-based end effectors require instantaneous negative pressure; millimeter-level localization deviations may trigger adsorption failure [81]. Gripping end effectors risk bruising if force control models lack tight integration with visually identified fruit posture [85]. This necessitates shifting from discrete technology research toward designing integrated perception–execution adaptive control systems.

5.1.2. Intrinsic Trade-Offs in Technical Pathways and Compatibility Conflicts with Agricultural Scenarios

Comparative analysis reveals inherent trade-offs that are inadequately addressed in current research, manifested in three dimensions:

(1) Precision–Speed Trade-off: Two-stage detection excels in accuracy and occlusion handling but suffers high computational complexity, hindering real-time control [37,44]. Single-stage lightweight models prioritize speed but exhibit severe precision degradation in dense small-target scenarios [39]. No universal optimal solution exists; selection must align with specific contexts, such as orchard planting systems and primary harvest targets.

(2) Performance–Cost Trade-off: LiDAR delivers ultra-high precision and illumination robustness; yet prohibitive costs impede large-scale agricultural adoption [61,66]. Stereo vision offers cost efficiency but critically depends on ambient light stability [57,63]. This dichotomy drives the demand for multi-sensor fusion, though achieving high performance at low cost remains unresolved.

(3) Generality–Specialization Trade-off: High-DOF universal manipulators provide flexibility but incur complex control, high costs, and collision risks in dense orchards [3]. Dedicated harvesters optimize cost-efficiency yet lack versatility [86]. Future designs require breakthroughs in modularity and reconfigurability.

5.1.3. Core Conflict: Environmental Unstructuredness vs. Algorithmic Adaptability

The highly unstructured orchard environment constitutes the fundamental challenge. Despite deep learning’s potential, its data-driven nature cannot guarantee 100% robustness against infinite variability in illumination, occlusion, and fruit growth postures. For example, visual recognition algorithms may miss detections when encountering unseen occlusion patterns absent from training data [30]; localization models may fail during fruit oscillations such as wind-induced or contact-induced sway due to unmodeled dynamics [73]. Thus, beyond algorithmic advancements, deliberate environmental structuring, such as standardized tree training systems, must simplify robotic tasks—compensating for current limitations in machine intelligence.

5.2. Future Perspectives

Based on the preceding analysis, this study contends that the future development of pear harvesting robotics will rely on multi-layered synergistic innovation rather than breakthroughs in singular technologies.

(1) Deep Integration of Agricultural Machinery and Agronomy

In the domain of mechanized pear harvesting, the unstructured and non-standardized operational environment constitutes the core bottleneck restricting robotic efficiency. Current pear harvesting robots face constraints including foliage occlusion, branch interference, and illumination variations during fruit recognition. Furthermore, irregular planting patterns with intertwined branches and uneven plant density increase path-planning complexity for end effectors and reduce harvesting efficiency. The fundamental solution lies in simplifying the problem at its source. Promoting standardized planting systems such as V-trellis or T-trellis structures can significantly reduce fruit occlusion and enhance visual detectability. Rational planning of row and plant spacing improves field ventilation and light penetration, reducing illumination-related interference with detection systems. Regular pruning minimizes ineffective occlusion and makes fruit distribution more recognizable for deep learning networks. Such agricultural–mechanical synergy will not only reduce fruit detection complexity and end effector operational difficulty but also improve harvesting efficiency through environmental optimization, thereby enabling scalable deployment of mechanized pear harvesting technologies.

(2) Intelligent Upgrades in Perception, Decision-Making, and Execution Systems

At the perception level, exploring lightweight Transformer and Mamba architectures [32,42] may achieve better accuracy–speed trade-offs on embedded platforms. Multimodal sensor fusion technologies should be developed to enhance robustness under adverse lighting through complementary information integration. For decision-making, reinforcement learning (RL) and imitation learning (IL) methods should be introduced to enable robots to learn optimal harvesting strategies and anti-interference capabilities via simulation and field training. Multi-arm cooperative decision algorithms ought to be developed for task allocation and obstacle avoidance, thereby improving harvesting efficiency. In execution, force and tactile feedback-based compliant control algorithms must be researched to allow end effectors to adaptively adjust grasping force and posture, reducing damage rates below manual harvesting levels [70,85].

(3) Engineering Innovations in System Architecture

Modular design should be adopted by developing universal mobile platforms compatible with quickly interchangeable end effectors and algorithm modules, enabling single robots to adapt to multiple crops such as pears and apples, thus improving device utilization and cost-efficiency. Lightweight robotic arm design with new materials like carbon fiber composites should replace traditional steel structures, reducing weight while maintaining stiffness and strength. This will enhance motion agility, dynamic response, and energy efficiency while minimizing inertial impact, ultimately improving operational efficiency and adaptability in dense orchards. Cloud–edge collaboration can leverage 5G connectivity and cloud computing to significantly reduce computation time and improve harvesting efficiency. Since robots outperform humans in continuous, repetitive tasks, achieving precise fruit localization for nighttime operation will substantially enhance the utilization efficiency and application prospects of harvesting robots.

6. Conclusions

This study systematically reviews global research advances in three core technologies for pear harvesting robots—recognition, localization, and end effectors. Through comparative analysis of performance boundaries and applicable scenarios across technical approaches, the following conclusions are drawn:

(1) Recognition Technology: Deep learning-based object detection algorithms have superseded traditional methods as the mainstream. Single-stage algorithms like YOLO achieve an optimal balance between speed and accuracy, while two-stage algorithms such as Mask R-CNN excel in complex occlusion and overlapping scenarios. Future trends lie in lightweight network design, Transformer architecture adoption, and continuous optimization for small-target detection.

(2) Localization Technology: Multi-sensor fusion is imperative. Vision-based solutions (stereo, RGB-D) offer lower costs but suffer environmental sensitivity; LiDAR delivers superior accuracy yet faces prohibitive costs for large-scale deployment. The critical bottleneck continues to be the enhancement of real-time performance and stability in dynamic unstructured environments.

(3) End effectors: Their design is intrinsically coupled with perception capabilities. Existing solutions—gripping–twisting, cutting, and suction-based—all exhibit inherent flaws: either demanding excessive perception precision or risking fruit damage. Bio-inspired compliant grasping and force-controlled manipulation are pivotal for mitigating damage.

(4) Core Finding: The current developmental bottleneck has shifted from singular technological breakthroughs to addressing system-level integration challenges. These include bridging the gap between high-precision perception and low-success-rate execution, resolving intrinsic trade-offs among technical pathways, and mitigating uncertainties from unstructured environments.

(5) Future Outlook: Commercialization of pear harvesting robots must rely on three pillars: deep integration of agricultural machinery and agronomy, intelligent synergy across the perception–decision–execution–technical chain, and cost-effective, high-reliability system-level engineering innovations. Only through interdisciplinary collaboration can this technology transition from laboratory prototypes to widespread orchard deployment.

Author Contributions

Conceptualization, X.H. and H.Z.; methodology, H.Z. and Z.Y.; analysis, H.Z.; investigation, B.W. and H.Z.; resources, X.H., L.S., H.Z., K.Z. and X.M.; data curation, H.Z.; writing—original draft preparation, H.Z. and X.L.; writing—review and editing, H.Z., X.H., L.S. and X.L.; visualization, H.Z.; supervision, X.H. and B.W.; project administration, X.H. and B.W.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by the 2115 talent development program of China Agricultural University and the earmarked fund for the China Agriculture Research System (CARS-28).

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank all staff of CCAT and CAUS, China Agricultural University, for their great contributions to this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Food and Agriculture Organization of the United Nations. Production (t)” [Dataset]. Food and Agriculture Organization of the United Nations (2025) [Original Data]. 2025. Available online: https://ourworldindata.org/explorers/global-food?tab=chart&pickerSort=desc&pickerMetric=production__tonnes&Food=Pears&Metric=Production&Percapita=false&country=CHN~OWID_WRL (accessed on 27 March 2025).
Zhao, C.; Fan, B.; Li, J.; Feng, Q. Agricultural robots: Technology progress, challenges and trends. Smart Agric. 2023, 5, 1–15. [Google Scholar] [CrossRef]
Xiong, Y.; Ge, Y.; From, P.J. An improved obstacle separation method using deep learning for object detection and tracking in a hybrid visual control loop for fruit picking in clusters. Comput. Electron. Agric. 2021, 191, 106508. [Google Scholar] [CrossRef]
Cui, Y.; Ma, L.; He, Z.; Zhu, Y.; Wang, Y.; Li, K. Design and Experiment of Dual Manipulators Parallel Harvesting Platform for Kiwifruit Based on Optimal Space. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2022, 53, 132–143. [Google Scholar] [CrossRef]
An Israeli company has developed drones for fruit harvesting. Sens. World 2022, 28, 1.
Lv, J.; Zhong, X.; Peng, Y. Design of a Fruit-Picking Drone Based on YOLOv5. China South. Agric. Mach. 2025, 56, 4. [Google Scholar] [CrossRef]
Goswami, P.; Vaishnav, R.; Anand, T.; Dayal, P. A Comprehensive Review on LiDAR Based 3D Deep Learning Object Detection Algorithms. In Proceedings of the 2025 International Conference on Computer, Electrical and Communication Engineering, ICCECE 2025, Kolkata, India, 7–8 February 2025. [Google Scholar]
Song, J.; Zhang, T.; Xu, L.; Tang, X. Research actuality and prospect of picking robot for fruits and vegetables. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2006, 37, 158–162. [Google Scholar]
Wang, B. Research on Key Technologies for Pear Harvesting Robots Based on ROS and YOLOv5. Master’s Thesis, Hebei University, Baoding, China, 2024. [Google Scholar] [CrossRef]
Meng, Y.; Wang, J.; Tian, E.; Yu, Y. Vision System Design of Picking Robot Based on Image Edge Detection Technology. J. Agric. Mech. Res. 2020, 42, 245–248. [Google Scholar] [CrossRef]
Teng, J.; Xu, H.; Wang, Y.; Zhang, Z. Design and Simulation of Motion Trajectory Planning for Manipulator of Picking Robot. Comput. Simul. 2017, 34, 362–367. [Google Scholar]
Lehnert, C.; English, A.; McCool, C.; Tow, A.W.; Perez, T. Autonomous Sweet Pepper Harvesting for Protected Cropping Systems. IEEE Robot. Autom. Lett. 2017, 2, 872–879. [Google Scholar] [CrossRef]
Hohimer, C.J.; Wang, H.; Bhusal, S.; Miller, J.; Mo, C.; Karkee, M. Design and field evaluation of a robotic apple harvesting system with a 3d-printed soft-robotic end-effector. Trans. ASABE 2019, 62, 405–414. [Google Scholar] [CrossRef]
Zhang, Q.; Liu, F.; Jiang, X.; Xiong, Z.; Xu, C. Motion planning method and experiments of tomato bunch harvesting manipulator. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2021, 37, 149–156. [Google Scholar] [CrossRef]
Han, W.K.; Luo, J.L.; Wang, J.T.; Gu, Q.H.; Lin, L.J.; Gao, Y.; Chen, H.R.; Luo, K.Y.; Zeng, Z.X.; He, J. Design of a Chili Pepper Harvesting Device for Hilly Chili Fields. Agronomy 2025, 15, 1118. [Google Scholar] [CrossRef]
Jinyang, H.; Chenghai, Y.; Kairan, L. Design and experiment of mango picking robot. J. Shihezi Univ. (Nat. Sci.) 2025, 43, 152–159. [Google Scholar] [CrossRef]
Jin, S.; Zhou, H.; Jiang, H.; Sun, M. Research progress on visual system of picking robot. Jiangsu J. Agric. Sci. 2023, 39, 582–595. [Google Scholar] [CrossRef]
Qin, Z. A Discussion on the Application of Computer Technology in Image Morphological Processing. J. Pu’er Univ. 2019, 35, 24–26. [Google Scholar]
Wang, J.; Wang, H. An edge detection algorithm for noisy images based on OTSU adaptive threshold segmentation. Pract. Electron. 2025, 33, 42–45. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, Y. K-means clustering method for color image segmentation based on Lab space. J. Gannan Norm. Univ. 2019, 40, 44–48. [Google Scholar] [CrossRef]
Chen, J. Large Sample Iterative Training Algorithm for Support Vector Machine. Mod. Inf. Technol. 2025, 9, 85–91. [Google Scholar] [CrossRef]
Patel, K.K.; Kar, A.; Jha, S.N.; Khan, M.A. Machine vision system: A tool for quality inspection of food and agricultural products. J. Food Sci. Technol. 2012, 49, 123–141. [Google Scholar] [CrossRef] [PubMed]
He, Z.; Sun, L.; Rui, Y. Detection of Small Surface Defects Based on Machine Vision. J. Appl. Sci. 2012, 30, 531–537. [Google Scholar] [CrossRef]
Zhao, D. Research on Rapid Nondestructive Testing Method andGrading Equipment of Pear External Quality Based on Machine. Master’s Thesis, Hebei Agricultural University, Baoding, China, 2023. [Google Scholar] [CrossRef]
Zhao, Y. Research on the Detection and Harvesting Location of Prickly Pear Based on Machine Vision. Master’s Thesis, Guizhou University, Guiyang, China, 2020. [Google Scholar] [CrossRef]
Niu, B.; Zhang, X. Research on Digital Image Processing Methods Based on Computer Vision: A Case Study of Pear Fruit Detection and Classification. Inf. Rec. Mater. 2021, 22, 195–197. [Google Scholar] [CrossRef]
Hosseini, S.; Zade, B.M.H. New hybrid method for attack detection using combination of evolutionary algorithms, SVM, and ANN. Comput. Netw. 2020, 173, 107168. [Google Scholar] [CrossRef]
Li, Z. Intelligent Classification Methods and Applications of Weili Pear Quality. Master’s Thesis, Hebei University of Technology, Tianjin, China, 2023. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, L. Classification of Fruits Using Computer Vision and a Multiclass Support Vector Machine. Sensors 2012, 12, 12489–12505. [Google Scholar] [CrossRef]
Lei, H.; Jiao, Z.; Ma, J.; Wu, L.; Zhong, Z. Fast Recognition Algorithm of Apple Varieties Based on Multi Feature Fusion and SVM. Autom. Inf. Eng. 2020, 41, 13–17. [Google Scholar] [CrossRef]
Jiao, Y.; Luo, R.; Li, Q.; Deng, X.; Yin, X.; Ruan, C.; Jia, W. Detection and Localization of Overlapped Fruits Application in an Apple Harvesting Robot. Electronics 2020, 9, 1023. [Google Scholar] [CrossRef]
Zhao, P.; Cai, W.; Zhou, W.; Li, N. Revolutionizing automated pear picking using Mamba architecture. Plant Methods 2024, 20, 167. [Google Scholar] [CrossRef]
Zhang, R. Research and Implementation of Key Technology for Real-Time Video Streaming Object Detection. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2021. [Google Scholar] [CrossRef]
Wang, N.; Zhi, M. Review of One-Stage Universal Object Detection Algorithms in Deep Learning. J. Front. Comput. Sci. Technol. 2025, 19, 1115–1140. [Google Scholar]
Chen, Y.; Li, W.; Weng, H.; Zheng, J.; Lun, J. Overview of Two-Stage Object Detection Algorithms Based on Deep Learning. Inf. Comput. 2023, 14, 3. [Google Scholar]
Wang, H.; Mou, Q.; Yue, Y.; Zhao, H. Research on universal detection model of fruit picking based on YOLOv3. China Sci. 2021, 16, 336–342. [Google Scholar]
Ma, S. Research on Pear Fruit Recognition Based on Improved YOLOv4 and Yield Prediction Model. Master’s Thesis, Hebei Agricultural University, Baoding, China, 2022. [Google Scholar] [CrossRef]
Li, Y.; Rao, Y.; Jin, X.; Jiang, Z.; Wang, Y.; Wang, T.; Wang, F.; Luo, Q.; Liu, L. YOLOv5s-FP: A Novel Method for In-Field Pear Detection Using a Transformer Encoder and Multi-Scale Collaboration Perception. Sensors 2023, 23, 30. [Google Scholar] [CrossRef] [PubMed]
Parico, A.I.; Ahamed, T. Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT. Sensors 2021, 21, 4803. [Google Scholar] [CrossRef]
Zheng, W.; Yang, Y. Target detection method for fragrant pears at mature stage based on improved lightweight YOLO v7. Jiangsu Agric. Sci. 2024, 52, 121–128. [Google Scholar] [CrossRef]
Tan, H.; Ma, W.; Tian, Y.; Zhang, Q.; Li, M.; Li, M.; Yang, X. Improved YOLOv8n object detection of fragrant pears. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2024, 40, 178–185. [Google Scholar] [CrossRef]
Zhao, P.; Zhou, W.; Na, L. High-precision object detection network for automate pear picking. Sci. Rep. 2024, 14, 14965. [Google Scholar] [CrossRef]
Jia, J.; Cui, J. Application Research of Intelligent Robot in the Field of Agricultural Automation. J. Kaifeng Univ. 2023, 4, 6. [Google Scholar]
Yan, J.; Zhao, Y.; Zhang, L.; Su, X.; Liu, H.; Zhang, F.; Fan, W.; He, L. Recognition of Rosa roxbunghii in natural environment based on improved Faster RCNN. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2019, 35, 143–150. [Google Scholar] [CrossRef]
Meng, X.; Alifu, K.; Lv, Q.; Zhou, L. Research on Fragrant Pear Target Recognition in NaturalEnvironment Based on Transfer Learning. J. Xinjiang Univ. (Nat. Sci. Ed. Chin. Engl.) 2019, 4, 7. [Google Scholar] [CrossRef]
Zhu, Y. Balsam Pear Segmentation and Picking Point Study Based Onimproved Mask R-C. Master’s Thesis, Tarim University, Aral, China, 2024. [Google Scholar] [CrossRef]
Pan, S.; Ahamed, T. Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN. Sensors 2022, 22, 4187. [Google Scholar] [CrossRef] [PubMed]
Montoya-Cavero, L.-E.; Díaz de León Torres, R.; Gómez-Espinosa, A.; Cabello, J.A.E. Vision systems for harvesting robots: Produce detection and localization. Comput. Electron. Agric. 2022, 192, 106562. [Google Scholar] [CrossRef]
Shi, G.; Zhang, F.; Gou, Y.; Zheng, L.; Cai, J.; Feng, C. Research progress on target recognition and picking point localization of fruit picking robots. J. Chin. Agric. Mech. 2025, 46, 115–124. [Google Scholar] [CrossRef]
Cheng, F.; Wu, W.; He, H.; Huang, Y.; Fu, J. Research on Target Recognition and Localization Methods for Citrus Harvesting Robots. Sci. Technol. Inf. 2019, 17, 30–31. [Google Scholar] [CrossRef]
Feng, J.; Wang, S.; Liu, G.; Zeng, L. A Separating Method of Adjacent Apples Based on Machine Vision and Chain Code Information. In Proceedings of the Computer and Computing Technologies in Agriculture V, Zhangjiajie, China, 19–21 October 2012; pp. 258–267. [Google Scholar]
Niu, L.; Zhou, W.; Wang, D.; He, D.; Zhang, H.; Song, H. Extracting the symmetry axes of partially occluded single apples in natural scene using convex hull theory and shape context algorithm. Multimed. Tools Appl. 2017, 76, 14075–14089. [Google Scholar] [CrossRef]
Ren, R.; Sun, H.; Zhang, S.; Wang, N.; Lu, X.; Jing, J.; Xin, M.; Cui, T. Intelligent Detection of Lightweight “Yuluxiang” Pear in Non-Structural Environment Based on YOLO-GEW. Agronomy 2023, 13, 2418. [Google Scholar] [CrossRef]
Li, Y.; He, L.; Jia, J.; Lv, J.; Chen, J.; Qiao, X.; Wu, C. In-field tea shoot detection and 3D localization using an RGB-D camera. Comput. Electron. Agric. 2021, 185, 106149. [Google Scholar] [CrossRef]
Gou, Y.; Yan, J.; Zhang, F.; Sun, C.; Xu, Y. Research Progress on Vision System and Manipulator of Fruit Picking Robot. Comput. Eng. Appl. 2023, 59, 13–26. [Google Scholar] [CrossRef]
Liu, Y.; Liu, S.; Yang, C.; Wang, K.; Xie, N. Three-dimensional Spatial Localization of Overlapping Citrus Based on Binocular Stereo Vision. J. Agric. Sci. Technol. 2020, 22, 104–112. [Google Scholar] [CrossRef]
Li, Y.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Xiong, Z. Advance of Target Visual Information Acquisition Technology for Fresh Fruit Robotic Harvesting: A Review. Agronomy 2022, 12, 1336. [Google Scholar] [CrossRef]
Zhou, G.; Zhu, Y.; Zhang, P. Autonomous Obstacle Avoidance Vehicle Based on the Combination of LiDAR and Depth Camera. Sci. Technol. Innov. 2025, 5, 1–5. [Google Scholar] [CrossRef]
Liu, J.; Zhou, D.; Li, Y.; Li, D.; Li, Y.; Rana, R. Monocular distance measurement algorithm for pomelo fruit based on target pixels change. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2021, 37, 183–191. [Google Scholar] [CrossRef]
Chen, C.; Li, B.; Liu, J.; Bao, T.; Ren, N. Monocular positioning of sweet peppers: An instance segmentation approach for harvest robots. Biosyst. Eng. 2020, 196, 15–28. [Google Scholar] [CrossRef]
Xiong, J.; Lin, R.; Liu, Z.; He, Z.; Tang, L.; Yang, Z.; Zou, X. The recognition of litchi clusters and the calculation of picking point in a nocyurnal natural environment. Biosyst. Eng. 2018, 166, 44–57. [Google Scholar] [CrossRef]
Ling, X.; Zhao, Y.; Gong, L.; Liu, C.; Wang, T. Dual-arm cooperation and implementing for robotic harvesting tomato using binocular vision. Robot. Auton. Syst. 2019, 114, 134–143. [Google Scholar] [CrossRef]
Si, Y.; Liu, G.; Feng, J. Location of apples in trees using stereoscopic vision. Comput. Electron. Agric. 2015, 112, 68–74. [Google Scholar] [CrossRef]
Chen, M.; Tang, Y.; Zou, X.; Huang, K.; Huang, Z.; Zhou, H.; Lian, G. Three-dimensional perception of orchard banana central stock enhanced by adaptive multi-vision technology. Comput. Electron. Agric. 2020, 174, 105508. [Google Scholar] [CrossRef]
Neupane, C.; Koirala, A.; Wang, Z.; Walsh, K.B. Evaluation of Depth Cameras for Use in Fruit Localization and Sizing: Finding a Successor to Kinect v2. Agronomy 2021, 11, 1780. [Google Scholar] [CrossRef]
Kang, H.; Wang, X.; Chen, C. Accurate fruit localisation using high resolution LiDAR-camera fusion and instance segmentation. Comput. Electron. Agric. 2022, 203, 107450. [Google Scholar] [CrossRef]
Furukawa, Y.; Hernandez, C. Multi-view stereo: A tutorial. Found. Trends Comput. Graph. Vision 2015, 9, 1–148. [Google Scholar] [CrossRef]
Giancola, S.; Valenti, M.; Sala, R. A survey on 3D cameras: Metrological comparison of time-of-flight, structured-light and active stereoscopy technologies. In SpringerBriefs in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; pp. 89–90. [Google Scholar]
Zanuttigh, P.; Marin, G.; Dal Mutto, C.; Dominio, F.; Minto, L.; Cortelazzo, G.M. Time-of-Flight and Structured Light Depth Cameras: Technology and Applications; Springer International Publishing: Cham, Switzerland, 2016; pp. 1–355. [Google Scholar]
Li, M.; Liu, P. A bionic adaptive end-effector with rope-driven fingers for pear fruit harvesting. Comput. Electron. Agric. 2023, 211, 107952. [Google Scholar] [CrossRef]
Shi, Y.; Zhang, W.; Li, Z.; Wang, Y.; Liu, L.; Cui, Y. A “Global–Local” Visual Servo System for Picking Manipulators. Sensors 2020, 20, 3366. [Google Scholar] [CrossRef] [PubMed]
Gao, R.; Zhou, Q.; Cao, S.; Jiang, Q. An Algorithm for Calculating Apple Picking Direction Based on 3D Vision. Agriculture 2022, 12, 1170. [Google Scholar] [CrossRef]
Li, J.; Tang, Y.; Zou, X.; Lin, G.; Wang, H. Detection of Fruit-Bearing Branches and Localization of Litchi Clusters for Vision-Based Harvesting Robots. IEEE Access 2020, 8, 117746–117758. [Google Scholar] [CrossRef]
Yang, Q.; Chen, C.; Dai, J.; Xun, Y.; Bao, G. Tracking and recognition algorithm for a robot harvesting oscillating apples. Int. J. Agric. Biol. Eng. 2020, 13, 163–170. [Google Scholar] [CrossRef]
Guo, Z.; Yin, C.; Wu, X.; Chen, Q.; Wang, J.; Zhou, H. Research status and prospect of key technologies of fruit picking manipulator. Jiangsu J. Agric. Sci. 2024, 6, 1142–1152. [Google Scholar] [CrossRef]
Li, G.; Ji, C.; Zhai, L. Research progress and analysis of end-effector for fruits and vegetables picking robot. J. Chin. Agric. Mech. 2014, 5, 7. [Google Scholar] [CrossRef]
Peng, Y.; Liu, Y.; Yang, Y.; Yang, Y.; Liu, N.; Sun, Y. Research progress on application of soft robotic gripper in fruit and vegetable picking. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2018, 34, 11–20. [Google Scholar] [CrossRef]
Yang, X.; Yang, Q.; Liu, L. Design and Experiment of End Effector for Facility Tomato Harvesting Robot. J. Agric. Mech. Res. 2025, 47, 126–134. [Google Scholar] [CrossRef]
Chen, T.; Zhang, S.; Fu, G.; Chen, J.; Zhu, L. Review of Research Progress on Tropical Fruit Harvesting Robots. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2025, 56, 184–201. [Google Scholar] [CrossRef]
Liu, J.; Xu, X.; Li, P. Analysis and experiment on laser cutting of fruit peduncles. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2014, 45, 59–64. [Google Scholar] [CrossRef]
Bulanon, D.M.; Kataoka, T. Fruit detection system and an end effector for robotic harvesting of Fuji apples. Agric. Eng. Int. CIGR J. 2010, 12, 203–210. [Google Scholar]
Zhang, H.; Li, X.; Wang, L.; Liu, D.; Wang, S. Construction and Optimization of a Collaborative Harvesting System for Multiple Robotic Arms and an End-Picker in a Trellised Pear Orchard Environment. Agronomy 2024, 14, 80. [Google Scholar] [CrossRef]
Silwal, A.; Davidson, J.R.; Karkee, M.; Mo, C.; Lewis, K. Design, integration, and field evaluation of a robotic apple harvester. J. Field Robot. 2017, 34, 1140–1159. [Google Scholar] [CrossRef]
Wang, S. Development and Test of Mechanical Picking Device for Pear Fruit. Master’s Thesis, Shandong Agricultural University, Tai’an, China, 2023. [Google Scholar] [CrossRef]
Liu, Y.; Dong, J.; Peng, Y.; Lan, H.; Li, P. Development and Experiment of Korla Fragrant Pear Picking End-effector with Controlled Gripping Pressure. J. Agric. Mech. Res. 2020, 42, 33–39. [Google Scholar] [CrossRef]
Gao, Z.; Pang, G.; Li, L.; Zhao, K.; Wang, X.; Ji, C. Design of hand-operated piggyback jaw gripper type simplified picker for pear. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2019, 35, 39–45. [Google Scholar] [CrossRef]
Xu, L.; Liu, X.; Zhang, K.; Xing, J.; Yuan, Q.; Chen, J.; Duan, Z.; Ma, S.; Yu, C. Design and test of end-effector for navel orange picking robot. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2018, 34, 53–61. [Google Scholar] [CrossRef]
Baeten, J.; Donné, K.; Boedrij, S.; Beckers, W.; Claesen, E. Autonomous Fruit Picking Machine: A Robotic Apple Harvester. In Field and Service Robotics: Results of the 6th International Conference; Laugier, C., Siegwart, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 531–539. [Google Scholar]
Wang, X.; Wu, P.; Feng, Q.; Wang, G. Design and Test of Tomatoes Harvesting Robot. J. Agric. Mech. Res. 2016, 38, 94–98. [Google Scholar] [CrossRef]
Guo, H.; Ma, R.; Zhang, Y.; Li, Z. Design an d Simulation Analysis of Ya Shaped Underactuated Korla Fragrant Pear Picking Manipulator. J. Agric. Mech. Res. 2023, 45, 110–117. [Google Scholar] [CrossRef]
Li, G.; Ji, C.; Gu, B.; Xu, W.; Dong, M. Kinematics analysis and experiment of apple harvesting robot manipulator with multiple end-effectors. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2016, 47, 14–21 and 29. [Google Scholar] [CrossRef]
Wang, Z. Review of smart robots for fruit and vegetable picking in agriculture. Int. J. Agric. Biol. Eng. 2021, 14, 33–54. [Google Scholar] [CrossRef]
Yang, W.; Feng, H.; Han, Y.; Xu, Y. Development of Spherical Fruit Picking Robot End Effector Based on Pneumatic Actuation. J. Agric. Mech. Res. 2019, 41, 149–154. [Google Scholar] [CrossRef]

Figure 1. Development of pear fruit production in China [1].

Figure 2. Research on harvesting robots in China and abroad: (a) Norwegian University of Life Sciences’ strawberry trimming robot for picking [3]; (b) UK’s self-picked chili peppers [12]; (c) Washington State University’s apple harvesting robot [13]; (d) South China University of Technology’s tomato harvesting robot [14]; (e) Northwest A&F University’s Kiwi harvesting robot [4]; (f) hilly pepper field pepper harvesting device [15].

Figure 3. (a) Ground-based harvesting robot system [16]; (b) tethered harvesting UAV system.

Figure 4. (a) Process differences between machine learning and deep learning; (b) presentation of the complete technical process of pear detection based on deep learning.

Figure 5. Preprocessing of pear fruit images: (a) original RGB image; (b) original RGB image; (c) the RGB image of the pear fruit is converted to HSV; (d) pear fruit image enhancement.

Figure 6. Denoising of pear fruit images: (a) original noiseless image; (b) salt-and-pepper noise image (density 10%); (c) Gaussian noise image (δ = 0.05); (d) median filtered denoising (PSNR: 33.42 db); (e) Gaussian filter denoising (PSNR: 23.83 db); (f) bilateral filtered denoising (PSNR: 17.71 db).

Figure 7. Pear fruit images with complex backgrounds [32]: (a) the color texture is similar; (b) uneven lighting influence; (c) fruit dense and sheltered.

Figure 8. Three-dimensional positioning process for pears.

Figure 9. Established approaches for depth measurement [17,55].

Figure 10. End effector’s adaptive harvesting operation sequence.

Figure 11. Prevalent end effector typologies [78,79,80]: (a) suction cup twisting; (b) two-finger twist; (c) multi-finger twist; (d) suction cup shear type; (e) clamping and shearing; (f) laser cutting.

Table 1. Methods for image preprocessing.

Preprocessing Method	Objective	Method	Application Scenario
Color Space Transformation	Reduce light sensitivity, enhance color robustness	RGB to HSV, RGB to grayscale	For images under different lighting, requires stable pear fruit color features
Image Denoising	Remove image noise, ensure image quality	Median filtering, high-pass filtering, wavelet and bilateral filtering	For pear fruit detection, suitable for background removal and image clarity
Image Enhancement	Enhance fruit-background contrast, highlight key features	Contrast enhancement, adjustment of brightness and contrast	For distinguishing pear fruits and backgrounds, suitable for improving visibility and recognition

Table 2. The difference between single-phase and double-phase algorithms.

Dimension	Single-Stage Methods (e.g., YOLO, SSD)	Two-Stage Methods (e.g., Faster R-CNN, Mask R-CNN)
Detection Flow	Direct feature extraction → Bounding box and classification (single stage)	Region proposal → Object detection (two-stage)
Speed	High frame rate	Low frame rate
Accuracy	Medium accuracy, weak on small object detection	High accuracy, better performance on small object detection
Hardware Dependency	Low, can run on CPU or lightweight hardware	High, requires GPU acceleration
Application Scenario	Real-time or resource-limited tasks	High precision, non-time-critical tasks

Table 3. Core improvements and recognition advantages of certain YOLO algorithms [36,37,38,39,40,41,42].

Version	Core Improvements	Pome Fruit Recognition Characteristics (Advantages)
YOLOv3	Multi-scale Prediction, Darknet-53	Balances speed and accuracy; suitable for medium-scale deployment
YOLOv4	CSPDarknet, SPP, Mish	Strong robustness against occlusion and challenging illumination
YOLOv5	Lightweight design, Adaptive Anchor Boxes	Optimal real-time performance on edge devices
YOLOv7	ELAN Architecture + Compound Scaling	Dual optimization of accuracy and speed in complex scenes
YOLOv8	End-to-End Framework, Transformer Fusion	Multi-task support; strong generalization in complex scenarios

Table 4. The key differences between Faster R-CNN and Mask R-CNN [47].

Dimension	Fast R-CNN	Mask R-CNN
Core Task	Object detection (Bounding box + Category classification)	Object detection + Instance segmentation (Pixel-level mask output)
Network Structure	Classification Branch, Regression Branch	Additional Mask Branch (FCN-based pixel-level segmentation)
ROI Processing	ROI Pooling (Quantization Error Present)	ROI Align (Error Elimination, Enhanced Segmentation Precision)
Training Objective	Classification Loss + Regression Loss	Multi-task learning: Classification + Regression + Segmentation losses
Performance Characteristic	Faster Inference Speed	Slower Inference Speed, but Supports Pixel-level Localization

Table 5. Comparison of experimental results for pear fruit recognition algorithms.

Algorithm	Key Performance Metrics	Application Scenarios
Modified YOLOv3 [36]	mAP 89.54%	Complex illumination and occlusion
Modified YOLOv4 [37]	Recall 85.56%, mAP 90.18%, model size ↓44%	Similarly colored backgrounds, heavy occlusion and overlap
YOLOv5s-FP [38]	mAP 96.12%	High-density occlusion, small targets, dense overlap, illumination variations
YOLOv4-tiny + Deep SORT [39]	AP 94.09%, FPS ≥ 24, F1-score 87.85%	Orchard real-time counting
MobileNetv3-YOLOv7 [40]	Precision 94.36%, Recall 89.28%	Lightweight deployment
Optimized YOLOv8n [41]	GPU speed ↑34.0%, CPU speed ↑24.4%, F0.5 at 94.7%, mAP 88.3%	Resource-constrained edge devices
Lightweight YOLOv8-s [42]	Small target perception accuracy ↑ significantly	Long-distance small targets, cluttered backgrounds
Vmamba-SS3D-RPM-SFPN [32]	mAP@50 94.8%, dense scene precision ↑7.6%	Cluttered backgrounds, dense small target detection
Pruned SSD [43]	Precision 98.01%, Recall 85.03%	Multi-object recognition in complex environments
ROI Align–Faster R-CNN [44]	Recognition precision 95.16%, detection efficiency 0.2 s/item	Generic object detection
Mask R-CNN–ResNet [45]	Average segmentation precision 98.02% (95.28% under occlusion)	Mature pear segmentation (occlusion included)
Deformable Conv–Mask R-CNN [46]	mAP 91.3%	Small target feature preservation

Table 6. Different 2D coordinates are obtained and their differences.

Method Category	Typical Algorithm	Localization Accuracy	Speed	Applicable Scenarios
Region-based Features	Centroid Method	Low	Fast	Simple scenes; circular pear fruits with uniform pixel distribution
Contour-based Features	Minimum Enclosing Circle	Medium	Medium	Simple scenes with regular pear fruits
Deep Learning	YOLOv8	High	Slow	Complex shapes, multi-target scenes, occlusion scenarios

Table 7. Different types of technologies for obtaining depth information and their distinctions [7,67,68,69].

Technology Type	Operation Mode	Principle	Core Advantage	Main Limitations
Binocular Camera	Passive Imaging	Stereo Vision (Disparity Calculation)	Low cost Operates under ambient illumination Moderate accuracy	Fails in low/high light Accuracy drops at long range Texture-dependent matching
Multi-Camera Array	Passive Imaging	Multi-View Stereo Matching	Wide FOV (reduced occlusion) Enhanced depth robustness (multi-view redundancy)	Complex calibration (multi-camera sync) High hardware cost
ToF Depth Camera	Active Imaging	Time-of-Flight (Light Pulses)	Real-time (60–90 FPS) Robust to ambient light (modulated source) Optimal mid-range (0.5–5 m)	Accuracy degrades with distance Sunlight interference Multi-device crosstalk
Structured-Light Camera	Active Imaging	Optical Encoding (Speckle/Stripes)	Ultra-high near-range precision Textureless surface capture Lower power vs. ToF	Limited range (<3 m) Outdoor sunlight failure Specular/transparent surface artifacts
LiDAR	Active Imaging	Laser Scanning (Pulse/Phase Modulation)	Millimeter-level ranging accuracy Ultra-long range (>100 m) Robust interference resistance	Mechanical scanning latency (motion artifacts) Low point cloud density (single-line scanning)

Table 8. Different clamping methods, their advantages, disadvantages, and application scenarios.

Feature	Grasping and Twisting Type	Grasping and Shearing Type	Vacuum Adsorption Type
Operating Principle	Grasps pear, then twists/pulls fruit stem for separation	Grasps pear, then rotates blades to sever fruit stem	Employs vacuum suction to grasp pear, separates stem via shearing or twisting
Primary Advantages	Accommodates varying fruit sizes; High positional tolerance	Minimizes flesh damage; Preserves intact stem	High harvesting speed; Adaptable to arbitrary fruit orientations
Primary Defects	Wax cuticle causes gripping slippage; Twisting tends to tear adjacent stem tissue, damaging flesh	High risk of damage to pears with short stems; Requires high-precision stem recognition	Impact during placement causes damage; Small fruit prone to separation failure (insufficient pressure due to gaps); Leaf/small debris suction risk causes blockages
Fruit Damage Risk	High (Stem tearing + Surface compression damage)	Low	Medium (Impact injury, post-separation deterioration of torn stems)
Environmental Adaptation	Suitable in sparse foliage areas; Fails in dense clusters	Stable performance under low light; Reduced accuracy during rain/fog	High tolerance to fruit orientation; Performance degradation in dusty/humid conditions (vacuum system impact)

Table 9. Performance benchmarking of core technical modules.

Technology Module	Solution Type	Performance Parameters	Advantages	Limitations
Recognition	YOLOv5s-FP [38]	mAP 96.12%; High-density occlusion scenes	Multi-scale perception; Robust to lighting	Computational latency (>100 ms)
	Mask R-CNN-ResNet [45]	Segmentation accuracy 95.28% (occluded scenes)	Instance segmentation; Handles overlap well	Low frame rate (~5 FPS)
	Optimized YOLOv8n [41]	GPU speed ↑34.0%; CPU speed ↑24.4%; mAP 88.3%	Edge device compatibility	Degraded small target detection
Localization	Binocular Vision [63]	Error < 20 mm (400–1500 mm range)	Low cost; High resolution	Lighting-sensitive; Limited FOV (<180°)
	LiDAR–Camera Fusion [66]	Error 0.245–0.275 cm (0.5–1.8 m range)	Millimeter accuracy; Robust to lighting	High cost (>USD 2000)
	ToF Depth Camera [65]	Stable under sunlight interference	Suitable for dynamic environments	Short effective range (<3 m)
End effector	Pneumatic Shearing [93]	Cycle time 2.4 s/fruit; Damage rate 0%	Adapts to irregular stems	High risk for short-stem varieties
	Vacuum Adsorption [81]	Harvesting speed 5 fruits/min	Omnidirectional adaptability	Adsorption failure due to waxy layer (gap > 2 mm)
	Grasping–Twisting [90]	Success rate 96% (multi-fruit clusters)	High dimensional tolerance	Flesh damage risk (>15 N grip force)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Wang, B.; Su, L.; Yu, Z.; Liu, X.; Meng, X.; Zhao, K.; He, X. Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots. Agronomy 2025, 15, 2163. https://doi.org/10.3390/agronomy15092163

AMA Style

Zhang H, Wang B, Su L, Yu Z, Liu X, Meng X, Zhao K, He X. Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots. Agronomy. 2025; 15(9):2163. https://doi.org/10.3390/agronomy15092163

Chicago/Turabian Style

Zhang, Hongtu, Binbin Wang, Liyang Su, Zhongyi Yu, Xinchao Liu, Xiangsen Meng, Keyao Zhao, and Xiongkui He. 2025. "Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots" Agronomy 15, no. 9: 2163. https://doi.org/10.3390/agronomy15092163

APA Style

Zhang, H., Wang, B., Su, L., Yu, Z., Liu, X., Meng, X., Zhao, K., & He, X. (2025). Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots. Agronomy, 15(9), 2163. https://doi.org/10.3390/agronomy15092163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Current Research Status and Development Trends of Key Technologies for Pear Harvesting Robots

Abstract

1. Introduction

2. Recognition Technology for Pear Harvesting Robots

2.1. Traditional Image Recognition Methods

2.2. Deep Learning Recognition Methods

3. Localization of Pear Harvesting Robots

3.1. Two-Dimensional Information Acquisition

3.2. Depth Information Acquisition

3.3. Pose Acquisition

3.4. Pear Fruit Vibration Problem

4. End Effector of the Pear Harvesting Robot

4.1. Harvesting Methods

4.2. End Effector Drive Methods

5. Discussion and Future Perspectives

5.1. Discussion

5.1.1. The Gap Between High-Precision Perception and Low-Success-Rate Execution

5.1.2. Intrinsic Trade-Offs in Technical Pathways and Compatibility Conflicts with Agricultural Scenarios

5.1.3. Core Conflict: Environmental Unstructuredness vs. Algorithmic Adaptability

5.2. Future Perspectives

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI