The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking

Ye, Lei; Ma, Junjun; Lv, Yuanhua; Guo, Zhipeng; Lai, Zhihao; Ou, Chuhong; Li, Jin; Wu, Fengyun

doi:10.3390/agriculture15222330

Open AccessArticle

The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking

by

Lei Ye

¹

,

Junjun Ma

¹,

Yuanhua Lv

²,

Zhipeng Guo

¹,

Zhihao Lai

²,

Chuhong Ou

¹,

Jin Li

^1,* and

Fengyun Wu

^3,*

¹

School of Intelligent Engineering, Shaoguan University, Shaoguan 512000, China

²

College of Engineering, South China Agricultural University, Guangzhou 510642, China

³

School of Modern Information Industry, Guangzhou College of Commerce, Guangzhou 511363, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2025, 15(22), 2330; https://doi.org/10.3390/agriculture15222330

Submission received: 23 September 2025 / Revised: 31 October 2025 / Accepted: 6 November 2025 / Published: 9 November 2025

(This article belongs to the Topic Intelligent Agriculture: Perception Technologies and Agricultural Equipment for Crop Production Processes)

Download

Browse Figures

Versions Notes

Abstract

Precise localization of the fruit stem picking point is crucial for robots to achieve efficient harvesting operations. However, in unstructured orchard environments, citrus fruit stems are easily obscured by branches and leaves and affected by factors such as overlapping fruits. This leads to poor picking localization accuracy for robots, impacting their autonomous picking efficiency. Therefore, this paper proposes a method for estimating the posture of citrus fruit stems and performing picking operations under environmental occlusion, based on the YOLO-OBB algorithm. First, the YOLOv5s algorithm detects the ROI of citrus, combined with depth information to obtain their 3D point clouds. Second, the OBB algorithm constructs oriented point cloud bounding boxes to determine stem orientation and picking point locations. Finally, through hand–eye pose transformation of the robotic arm, the end-effector is controlled to achieve precise picking operations. Experimental results indicate that the average picking success rate of the YOLO-OBB algorithm reaches 82%, representing a 50% improvement over approaches without fruit stem estimation. This conclusively shows that the proposed algorithm provides precise fruit stem pose estimation, effectively enhancing robotic picking success rates under constrained fruit stem detection conditions. It offers crucial technical support for autonomous robotic harvesting operations.

Keywords:

picking robot; fruit stem pose estimation; YOLOv5s; OBB algorithm

1. Introduction

China is the world’s largest producer of citrus fruits. As the citrus industry continues to develop, the demand for harvesting has increased significantly. Currently, harvesting citrus is primarily done by hand, which requires a substantial workforce. However, an aging population has led to labor shortages in orchards, making it increasingly difficult for manual labor to meet the demands of large-scale production [1,2,3,4]. Citrus picking robots have become popular for harvesting, and knowing the exact location of the citrus picking points is essential for efficient picking [5,6,7,8]. However, the current method of picking citrus fruits is easily affected by factors such as branches and leaves occluding the fruit and overlapping fruits. This makes it difficult for robots to accurately detect the location of the picking point of the citrus fruit stems under complex occlusion conditions. This results in poor positioning accuracy of the robots, affecting picking efficiency. Therefore, the study of efficient and stable citrus picking localization algorithms is of great significance to improve the efficiency of autonomous robot picking in complex environments.

Deep learning, as the current mainstream method for researching intelligent agricultural equipment, is widely used in fruit detection and localization [9,10]. Currently, research has improved the network model architecture to improve the detection accuracy in various agricultural application scenarios and achieved better results. Sun et al. proposed an improved YOLO-P method based on the YOLOv5 network architecture, which optimizes the detection ability of small targets, so that the model can maintain excellent detection performance under different complex background and lighting conditions, with an average accuracy of 97.6% [11]. Pan et al. developed a fruit recognition method for robotic systems, which was able to effectively detect individual pears by combining a 3D stereo camera with a Mask R-CNN, and the algorithm achieved an mAP of 95.22% [12]. Fruit yield estimation is an important part of fruit harvesting; Zhang et al. proposed two algorithms, OrangeYolo and OrangeSort, which are used for citrus detection and tracking, respectively, and the AP of citrus detection reaches 93.8%, while the average absolute error of motion displacement is 0.081 [13,14]. Yan et al. proposed a lightweight apple detection method based on the improved YOLOv5s detection method, by introducing the BottleneckCSP-2 module, combined with the SE attention mechanism to extract the apple feature information, which effectively improves the robot’s recognition ability in occluded situations, and the recall and mAP of the algorithm reach 91.48% and 86.75%, respectively [15]. In addition, the researchers used a fusion method based on deep learning and stereo vision to obtain the 3D position of the fruit. Shu et al. proposed a visual system based on stereoscopic vision for lychee-picking robots. They used background separation technology to obtain external feature information of lychees by combining automatic exposure and white balance to process left-eye images in real time, and by combining right-eye images for stereoscopic matching to extract three-dimensional spatial coordinate information. This method achieved a picking success rate of 91.1%, effectively addressing the issue of lychees being obstructed [16]. Gené-Mola et al. propose an apple detection method combining Mask R-CNN and SFM. The fruit is detected and segmented in 2D space by Mask R-CNN, and the 3D point cloud reconstruction of the apples is realized by combining SFM, and the F1-score of the algorithm is improved by 0.065 [17]. Lin et al. proposed a citrus detection and localization method based on RGB-D image analysis, which employs a depth filter and Bayesian classifiers to eliminate the non-correlated points and introduces a density clustering method for the citrus point cloud clustering, combined with an SVM classifier to further estimate the position of citrus. The F1 value of the algorithm reaches 0.9197, which meets the robustness requirements of citrus picking robots [18]. Zhang et al. proposed a real-time apple recognition and localization method based on structured light and deep learning. The YOLOv5 model was employed to detect apples in real time and obtain two-dimensional pixel coordinates. These were then combined with an active laser vision system to obtain the three-dimensional world coordinates of the apples. This method effectively improved the quality of three-dimensional reconstruction of apples, with an average recognition accuracy of 92.86% and a spatial localization accuracy of 4 mm [19].

The above research method fully demonstrates that improved deep learning network models can yield relatively precise fruit information. Simultaneously, a fusion approach combining deep learning and stereo vision enables accurate acquisition of the fruit’s three-dimensional position, providing the picking robot with the fruit’s 3D coordinates. However, achieving precise fruit picking still requires further estimation of the fruit’s picking point [20]. Addressing this challenge, numerous scholars have initiated related research aimed at extracting fruit picking point information based on the three-dimensional coordinates. Xiao et al. proposed a monocular pose estimation method based on semantic segmentation and rotating object detection. They used R-CNN to identify candidate boxes for the fruit target, reconstructed a 3D point cloud for the ROI with the highest confidence, and constructed a 3D bounding box to estimate the pose of the citrus fruits. The algorithm’s identification and localization success rate reached 93.6%, and the fruit picking success rate reached 85.1% [21]. Researchers estimate the location of fruit picking points by obtaining information about the depth of fruit. Hou et al. proposed a method for detecting and locating citrus picking points based on stereo vision. They introduced the CBAM attention mechanism and used the Soft-NMS strategy in the region proposal network, as well as a matching method based on normalized cross-correlation, to obtain picking points. Experimental results showed that the average absolute error for citrus picking point localization was 8.63 mm, and the average relative error was 2.76% [22]. However, the accuracy of these localization methods fusing depth information still suffers when dealing with dense occlusion or dynamic interference. Tang et al. proposed a detection and localization algorithm based on the YOLOv4-tiny model and stereo vision. By optimizing the YOLOv4-tiny model, relevant feature information of oil tea fruits was obtained. This feature information was combined with the bounding boxes generated by the YOLO-Oleifera model to extract ROI information. Triangulation principles were then employed to determine the picking points. The algorithm achieved an AP of 92.07% and an average detection time of 31 ms [23].

The aforementioned research report indicates that under conditions of clear background and well-defined fruit observation positions, the fruit picking point can be obtained with relative precision. However, in actual citrus orchards, phenomena such as foliage obscuring fruits and overlapping fruits occur. Simultaneously, the small size of the fruit pedicels makes it difficult to determine the position and orientation of the pedicel where the fruit picking point is located, thereby hindering the picking robot’s ability to accurately identify the fruit’s picking point. To address this challenge, researchers have initiated relevant studies. Li et al. proposed the improved YOLO11-GS model based on YOLO11n-OBB. For grape stem recognition, oriented bounding boxed and geometric methods are combined to enable rapid and precise localization of picking point and postures. Meanwhile ByteTrack and BoTSORT tracking algorithms are used for tracking grape bunches and stems [24]. Song et al. proposed a novel method for estimating corn tassel posture based on computer vision and oriented object detection. This method first locates the horizontal plane through the tassel and leaf veins, then matches the tassel orientation with directional bounding boxes. Directional bounding boxes are generated by integrating the detection results from the Directional R-CNN model. Subsequently, pixels from these directional bounding boxes are input into a secondary observation module to extract corn tassel information, ultimately enabling precise determination of its orientation [25]. Zhou et al. proposed an integrated framework based on multi-objective oriented detection specifically designed for the detection and analysis of stalk crops. The proposed framework utilizes the YOLO-OBB model, which is built upon the YOLOv8 architecture, to extract stalk-shaped objects. Simultaneously, it utilizes the multi-label detection model YOLO-MLD to perform quality grading and pose estimation for individual objects, achieving an average accuracy of 93.4% [26]. Gao et al. proposed an improved detector-YOLOv11OC, which integrates an angle-aware attention module, cross-layer fusion network, and GSConv Inception network. This approach achieves precise pose-oriented detection while reducing model complexity. When combined with depth maps, the system achieves a pose estimation accuracy of 92.5% [27]. Gao et al. proposed a Stereo Corn Pose Detection algorithm that uses 3D object detection technology to obtain information about the pose and size of corn through stereo images. This algorithm addresses the shortcoming of traditional 3D object detection, which lacks pitch angle detection. It achieves an accuracy rate of 91% in corn size and pose detection [28].

By analyzing the above research methods, the Orient Bounding Box (OBB) algorithm show significant advantages in specific detection scenarios: when facing targets tilted at any angle, it can flexibly adjust the angle of the bounding box with the rotation of the target to achieve a close fit to the target shape; for a large number of targets that are densely distributed and are prone to mutual obstruction or overlapping of the bounding box, the algorithm can differentiate between different targets based on the different angle of the target to reduce the obstruction interference; at the same time, the OBB algorithm can directly output the position information of the detected target to support the subsequent accurate operation data. At the same time, the OBB algorithm can directly output the positional information of the detected targets, providing key data support for subsequent precision operations.

The aforementioned research methods demonstrate that deep learning and stereo vision technologies can effectively detect fruit and provide precise positioning information. However, in the actual orchard, the fruit stems are slender, similar in color to the branches and leaves, and easily obscured by the leaves and fruits, which makes it difficult to accurately extract the fruit picking point by the traditional target fruit localization method, ultimately resulting in low picking accuracy of the picking robot. For this reason, this paper therefore proposes a citrus fruit stems pose estimation method based on the YOLO-OBB algorithm to address these challenges. By precisely estimating the pose of the fruit stems and the picking point, this method provides a crucial technical solution for the autonomous picking of citrus fruits in complex environments. The specific details are as follows:

(1) The YOLOv5s algorithm is used to detect citrus fruits in real time and extract their precise Region of Interest (ROI) information. The ROI information is then mapped to the depth image space to construct a three-dimensional point cloud of citrus fruits, providing a data foundation for subsequent estimation of the posture of citrus fruit stems.

(2) Combining the camera imaging and citrus point cloud features, the OBB algorithm is used to construct the oriented point cloud bounding box, which solves the attitude estimation limitation problem of the traditional Axis-Aligned Bounding Box (AABB) algorithm for the elongated and inclined fruit stems and then realizes the estimation of the attitude of the fruit stems; the precise location of the picking point of the fruit stems is then extracted from the citrus point cloud features by PCA analysis.

(3) Based on the coordinate relationship between the camera and the robotic arm, the spatial coordinate offsets between the camera, the end-effector, and the fruit stem pose are calculated. Compensating for the angular offsets enables precise real-time picking operations to be achieved with the end-effector.

2. Materials

2.1. Data Collection

The image data in this paper were collected from citrus orchards in Yiliu Town, Liyuan Yao Autonomous County, Shaoguan, Guangdong, as shown in Figure 1. The shooting equipment used was an Intel RealSense D435i depth camera (Intel Corporation, Santa Clara, CA, USA) with 640 × 640 image pixels, a 30 FPS image acquisition frame rate, and a shooting distance of 0.5~1.5 m. In order to improve the diversity of the image data, this paper not only takes into account the working range of the robotic arm and acquires the data from different perspectives such as flat view, top view, and elevation view, etc., but also takes into full consideration the fruit and background factors, such as fruit posture, color, branch and leaf shading, etc. (as shown in Figure 1). A total of 715 images were acquired, and the images were saved in JPG format. In addition, in order to avoid the overfitting problem of the network model, this paper improves the diversity of the image data through data enhancement techniques, selecting brightness enhancement, contrast enhancement, and adding pretzel noise to improve the generalization performance of the model, with a total of 1270 images after enhancement.

2.2. Data Annotation

After collecting the image data, it is necessary to annotate the data to meet the requirements of model training. This paper uses LabelImg software (Version 1.8.1) to annotate citrus images to obtain citrus label information and save the dataset in txt format. Additionally, the image dataset is randomly divided into training, validation, and test sets in a 7:2:1 ratio [29,30], containing 889, 254, and 127 images, respectively. The training set is used for network model training and learning, the validation set is used to adjust the network model’s training parameters and evaluate model performance, and the test set is used to verify the network model’s generalization performance.

2.3. Training Decision-Making

Training and testing of the model were conducted on a computer platform built independently. The specific hardware and software configurations are shown in Table 1. To improve the quality of training the network model, this paper set the resolution of the input images uniformly to 640 × 640 pixels, the batch size to 8, and the training cycle to 200.

3. Methods

3.1. Algorithm Framework

In the unstructured orchard operating environment, citrus fruit stems are susceptible to branch and leaf shading, fruit overlap, and other factors, resulting in poor positioning accuracy of the robot’s picking and affecting the efficiency of automated citrus picking. To address the aforementioned challenges, this paper proposes a citrus fruit stem pose estimation method based on the YOLO-OBB fusion algorithm. The method consists of three parts: fruit detection, fruit localization, and fruit picking, in order to realize the three major processes of detection–localization–picking in the citrus picking robot operation shown in Figure 2. In the fruit detection part, the YOLOv5s detection algorithm is combined with a depth camera to extract the depth information of the target citrus; in the fruit localization part, the 3D point cloud re-construction of the target citrus is realized, and the OBB algorithm is combined with the PCA algorithm to estimate the attitude of the stems of the citrus point cloud, so as to obtain the precise position information of the target citrus picking point; Finally, in the fruit picking part, based on the hand–eye coordination relationship and the position information of the picking point, the end-effector is further controlled to complete the picking operation. The approach comprises three components: fruit detection, fruit positioning, and fruit picking, as illustrated in Figure 2.

Initially, the RealSense D435i depth camera aligns RGB and depth video streams. The YOLOv5s algorithm is employed for real-time citrus detection, acquiring ROI information of the fruits. Subsequently, the ROI data are mapped onto the depth map to extract the depth value at the center point of the bounding box. Combining this depth information, a 3D point cloud of the citrus fruit is reconstructed. Ultimately, the OBB algorithm is used to construct the oriented minimum enclosing box for the citrus point cloud, analyze and extract the features of the citrus point cloud by the PCA method, estimate the attitude of the fruit stem, obtain the accurate position information of the picking point, calculate the attitude of the citrus fruit stem and the attitude of the end-effector based on the hand–eye coordination relationship transformation, and further control the end-effector to realize the accurate picking operation of the citrus.

By estimating the pose of the citrus fruit stem and controlling the picking direction of the end-effector through the aforementioned method, the citrus picking robot is provided with precise information about the location of the picking point, thereby enabling efficient picking operations. The method proposed in this paper realizes the innovation of the application by integrating the existing YOLO detection algorithm, OBB algorithm, and PCA method applied to the citrus picking robot. Due to the insufficient accuracy of the traditional citrus picking algorithm in locating the citrus picking points under occlusion conditions, this paper estimates the position of citrus fruit stems under the occlusion conditions, based on the traditional citrus picking algorithm, by adding the OBB algorithm and the PCA method to the traditional citrus picking algorithm. The accuracy of citrus stem localization under occlusion conditions is improved, which in turn improves the success rate of citrus picking. Furthermore, the algorithm presented in this paper is not only applicable to citrus picking scenarios but can also be further developed and applied to similar fruit stems affected by branch and leaf obstruction, fruit overlap, and other factors, thus demonstrating good generality and application value.

3.2. Fruit Detection

YOLO series network models are widely recognized for their high detection rate and high-precision target recognition capability. Currently, YOLO series network models have been updated and iterated, and the detection capability and deployment aspects of different versions vary. Among them, YOLOv5 series network models are widely used for crop detection due to the balanced performance of detection accuracy and efficiency and easy deployment characteristics. In terms of large target object detection, compared with the YOLOv5 series network models, the higher versions such as YOLOv8 and YOLOv11 have improved the training efficiency, but the degree of improvement in detection accuracy is smaller [31]. In addition, according to the official document of YOLOv5, YOLOv5s is more effective than YOLOv5m and YOLOv51 in detecting large targets in complex scenes. At the same time, based on the deployment convenience and device applicability considerations, and the subsequent application of the method to embedded devices such as Jeston AGX(NVIDIA Corporation, Santa Clara, CA, USA), the YOLOV5s algorithm is more suitable for lightweight framework deployment [32]. Therefore, the YOLOv5s network is selected as the algorithmic framework of this paper, and the target citrus ROI is extracted through the YOLOv5s network [33]. Its detailed network architecture is shown in Figure 3.

The YOLOv5s network model consists of four parts: Input, Backbone, Neck, and Prediction. Firstly, the Input module uniformly scales the size of the input image to 608 × 608 pixels; secondly, in the Backbone module, the Focus structure utilizes a convolution kernel to perform feature extraction on the input image and completes the initial feature encoding; then, the Neck module utilizes the feature pyramid and path aggregation network to fuse feature information at different levels of the image and enhances the model’s multi-scale target detection performance by constructing a multi-scale feature pyramid; finally, in the Prediction module, three different sizes of feature maps are used to perform multi-scale target prediction, generating the target’s category and bounding box location information, which makes the network model able to adapt to and accurately handle citrus targets of different sizes.

Upon activating the RealSense camera, RGB and depth video streams are spatially aligned. The YOLOv5s network model then performs real-time detection on the RGB stream to extract citrus ROIs, thereby establishing the image data foundation for subsequent stem pose estimation.

3.3. Fruit Localization

The YOLOv5s network model can provide accurate location information of citrus targets, but it cannot be directly applied to the picking robot, and it is necessary to further extract the 3D spatial location information of citrus targets based on the ROI information of citrus to provide accurate localization for the precise operation of the robot. Fruit depth information matching is crucial for obtaining the 3D spatial location of citrus fruits. As a common visual localization method, RGB-D matching employs image detection algorithms to extract ROI information from RGB images. It then matches the internal parameters of the citrus ROI with the corresponding depth information in the depth image, ultimately mapping the output citrus depth information into 3D space. This paper employs the RGB-D target matching method to process citrus ROI information, thereby acquiring 3D point cloud data of citrus fruits. Firstly, combining the citrus ROI information extracted from the YOLOv5s network model, its center coordinates are further extracted; secondly, through the coordinate transformation relationship between pixel space and 3D space, the ROI information is mapped to the depth map for the masking process, and the center depth information of the corresponding detection frames is obtained; lastly, combining with the depth information of the citrus fruits, the real-time generation of a citrus 3D point cloud is carried out, which provides the robot with accurate, three-dimensional spatial position information for the robot. This method can effectively avoid matching the redundant features in the background area, which greatly improves the efficiency, and the specific process is shown in Figure 4.

3.4. Fruit Picking

The citrus depth information is obtained by the fruit localization algorithm, so as to calculate the point cloud information of citrus, but it cannot be directly used as a picking point to act on citrus picking, and it needs to be fitted and merged based on the point cloud attitude of citrus to obtain the positioning information of the fruit stems. In this paper, the OBB algorithm is used to calculate the minimum directed box of the citrus point cloud, combined with the PCA algorithm to estimate the optimal direction of the box, to estimate the position information of the fruit stems, and finally, the position information of the fruit stems is transformed and outputted into the attitude parameters of the robotic arm for the picking of citrus fruits.

3.4.1. Fruit Attitude Estimation

In the unstructured orchard environment, due to the interference of factors such as changes in light intensity, resulting in citrus fruits often exhibiting texture patches, discontinuities, and occlusion, which affects the picking accuracy of the robot; on the other hand, because the RealSense camera uses structured light imaging to obtain depth information and the citrus fruit stems are fine, when the picking robot needs to obtain the depth information of the fine target, it is easy to generate errors or return wrong depth values, affecting the spatial localization accuracy of citrus fruit stems.

In order to obtain more accurate information on the posture of citrus fruit stems, this paper combines the imaging characteristics of the camera with the results of approximating the ellipsoidal shape of citrus point clouds. The OBB algorithm constructs a three-dimensional minimum bounding box for the citrus fruit, and PCA analyzes and extracts features from the point cloud. Further calculations of the posture information of citrus fruit stems are performed to obtain precise fruit stem picking point location information. The specific process is shown in Figure 5.

OBB is a minimum directed bounding box estimation algorithm with linear complexity, which can dynamically determine the size and direction of the bounding box according to the geometry of the target, and the bounding box does not need to be parallel to the coordinate axes. When the target is displaced or rotated, the OBB algorithm can recalculate the bounding box by transforming the coordinates of the base axis, which has better dynamic transformation characteristics. And PCA, as a classical statistical method, analyzes the distribution characteristics of the point cloud data in each direction by calculating the covariance matrix of the point cloud data to further determine the optimal direction of the bracketing box. Among them, the covariance is used to measure the linear correlation between two variables, and the covariance matrix integrally describes the correlation characteristics of the whole point set in different axes [34], and the covariance calculation formula is shown in Equation (1).

c o v (X_{i}, X_{j}) = E [(X_{i} - μ_{i}) (X_{j} - μ_{j})]

(1)

Here, μ_i and μ_j are the expected values of X_i and X_j, respectively.

Once the covariance between variables has been ascertained, the next step involves calculating the covariance matrix of the point cloud data. The eigenvalues and eigenvectors of the point cloud data are obtained through the eigen decomposition of the covariance matrix, as demonstrated in Equation (2). In this equation, the eigenvalues represent the distribution strength of the covariance matrix in the corresponding directions, while the eigenvectors define the main directions of the point cloud data, and the feature vectors are orthogonal to each other.

B = [\begin{matrix} c o v (x, x) & c o v (x, y) & c o v (x, z) \\ c o v (x, y) & c o v (y, y) & c o v (y, z) \\ c o v (x, z) & c o v (y, z) & c o v (z, z) \end{matrix}]

(2)

In this paper, the PCA algorithm is employed to eigen-decompose the matrix of covariance. The eigenvectors and the corresponding eigenvalues of the main axis direction of the point cloud are then extracted. Ultimately, the eigenvector corresponding to the largest eigenvalue is selected as the optimal direction of the enclosing box to construct the real pose of the citrus fruit stems in three-dimensional space. This provides a solid data basis for the establishment of accurate location information of the fruit stem picking points. The effect of this is shown in Figure 6. As shown in the figure, the green feature vectors point in the Y-axis direction, the blue vectors point in the Z-axis direction, and the red vectors point in the X-axis direction. Among them, the green feature vector is the direction of fruit stem pose estimation, and the green point in this direction is defined as the picking point.

3.4.2. Estimation of Robot’s Picking Posture

Accurate coordination between the camera and the robot arm is a key prerequisite for picking robot operation. Before the citrus picking operation, it is necessary to calibrate the camera and robotic arm to establish the spatial mapping relationship between the citrus fruit stem pose and the end of the robotic arm. In this paper, we take the real citrus fruit stem coordinate system {S} as an example and establish its transformation relationship with the camera coordinate system {C} and the end-effector coordinate system {G}, as shown in Figure 7.

The relationship of each coordinate system for hand–eye calibration is shown in Figure 8. In order to ensure the mapping accuracy between the camera pixel coordinates and the real 3D position, this paper use Zhang’s calibration method to calibrate the camera [35], as well as the hand–eye calibration of the camera and the robotic arm based on the method of Chen et al. [36] to establish the coordinate transformation relationship between the end-effector and the depth camera. The robot arm coordinate system {R}, the robot arm flange coordinate system {F}, the end coordinate system {G}, the camera coordinate system {C}, and the checkerboard coordinate system {B} are indicated.

Firstly, Zhang’s calibration method is used to extract corner point coordinates and perform optimization, completing the camera’s intrinsic and extrinsic calibration. Secondly, the checkerboard is placed in front of the robotic arm, and images are acquired from different angles by continuously adjusting the robotic arm’s posture. Finally, based on the hand–eye calibration equation, the least squares method is used to solve the transformation relationship between the camera coordinate system and the robotic arm coordinate system. According to the chain transfer principle of coordinate transformation, the spatial coordinate transformation relationship can be expressed as shown in Equation (3).

{}_{B}^{C}T {}_{R}^{B}T {}_{F}^{R}T {}_{G}^{F}T {}_{C}^{G}T = I

(3)

where I is the identity matrix. To better represent the coordinate transformation relationship between the tool (hand) and the camera (eye), it is further transformed into

{}_{C}^{G}T = {({}_{B}^{C}T {}_{R}^{B}T {}_{F}^{R}T)}^{- 1}

(4)

After establishing the coordinate transformation relationship between the camera and the robotic arm, the rotation matrix for the fruit stem coordinate system {S} needs to be further constructed:

{}_{S}^{C}R = [v_{1}, v_{2}, v_{3}]

(5)

where v₁ is the growth direction of the fruit stem (green vector), v₂ is the transverse growth direction of the fruit (red vector), and v₃ is the normal vector of the fruit stem direction (blue vector).

To better establish the mapping relationship between {C} and {S}, the fruit stem coordinate system is further transformed. In the fruit stem pose coordinate system, using v₃ as the rotation axis, a rotation of π radians is performed around this axis to ensure its feature vector direction aligns with that of the camera coordinate system. The rotated coordinate system rotation matrix (

{}_{S}^{C}R^{'}

) is

{}_{S}^{C}R^{'} = {}_{S}^{C}R [\begin{matrix} - 1 & 0 & 0 \\ 0 & - 1 & 0 \\ 0 & 0 & 1 \end{matrix}]

(6)

Combining the rotation matrix of the fruit stem coordinate system, a homogeneous transformation matrix is constructed to achieve the mapping from {C} to {S}:

{}_{S}^{C}T = [\begin{matrix} {}_{S}^{C}R & t \\ 0 & 1 \end{matrix}]

(7)

where the translation vector t is the coordinate of the point cloud centroid in {C}.

Based on the hand–eye matrix (

{}_{C}^{G}T

) obtained from calibration, the transformation from {G} to {S} is acquired:

{}_{S}^{G}T = {}_{C}^{G}T {}_{S}^{C}T

(8)

The deflection angle of the end-effector is obtained by parsing the rotation matrix (

{}_{S}^{G}T

) component, which is then combined with the robot arm to drive the end-effector and ultimately enable precise picking.

4. Experiments

4.1. Experimental Platform and Equipment

In this paper, an experimental platform of a picking robot equipped with a hand–eye vision system was built independently. It mainly consists of a depth camera, robotic arm, end-effector, control box, and computer, as shown in Figure 9. Among these components, the robotic arm was selected from the LM3 series from Shanghai Lebai Robotics Co., Ltd(Shanghai, China)., and the end-effector was designed independently by combining the characteristics of citrus picking. The specific hardware and software equipment parameters are configured as shown in Table 2.

To verify the robustness of the algorithms in this paper for citrus fruit stem pose estimation, the experimental scenario is an unstructured citrus orchard, as shown in Figure 10. Two experimental scenarios are set up in this paper: a fruit stem pose estimation comparison experiment and a citrus picking performance experiment. The algorithm design for these scenarios is based on an ROS system.

4.2. Fruit Stem Pose Error Estimation Experiment

In an unstructured orchard environment, it is difficult for the picking robot to obtain complete citrus point cloud data due to branch and leaf occlusion, fruit overlap, etc., which leads to missing point clouds on the citrus surface and thus affects the accuracy of the fruit stem pose estimation. In order to verify the detection robustness of the algorithm under non-complete point cloud conditions, this experiment estimates the fruit stem pose vector by using the OBB enveloping box algorithm and combines with the ellipsoidal arc characteristics of the citrus point cloud to construct a quantitative relationship between the arc coverage of ZX plane and the fruit stem pose estimation error; in addition, it further evaluates the correlation between the quality of the citrus point cloud and the fruit stem pose estimation and verifies the accuracy of the fruit stem pose estimation under the blocking conditions. In this paper, 50 fruit stem estimation experiments are randomly conducted in citrus orchard scenarios, which include two types of fruit stems without occlusion as well as citrus growing at a small angle of inclination with occluded fruit stems. Among them, the type of fruit stem without occlusion is divided into two scenarios: fruit without occlusion and fruit occlusion of 10~50%, as shown in Figure 11.

Firstly, CloudCompare software (Version 2.13.2) is used to define the real pose of citrus fruit stems, and the real fruit stem vectors of citrus are constructed by observing the growth pose of citrus and the direction of fruit stems and selecting the start and end points of the fruit stems. Secondly, the ROI point cloud is projected onto the ZX plane, and the reference circle is constructed by using the three-point sampling method as shown in Equation (9). Finally, the arc angle α corresponding to the actual point cloud distribution is measured to obtain the arc coverage (P), as shown in Equation (10). This is specifically shown in Figure 12.

\max (| | P_{1} - P_{2} | |, | | P_{2} - P_{3} | |, | | P_{3} - P_{1} | |) \leq 0.2 D_{\max}

(9)

P = (\frac{α}{2 π}) \times 100 % (0 \leq α \leq 2 π)

(10)

where ||P_i − P_j|| is the Euclidean distance between any two sampling points, and D_max is the maximum projection diameter of the point cloud on the ZX projection plane (i.e., the maximum Euclidean distance of the point cloud data convex hull).

To further obtain the citrus fruit stem pose estimation error (θ), the error range between the true fruit vector (A) and the fruit stem pose estimation vector (B) is calculated by the cosine similarity method, as shown in Equation (11).

θ = \arccos (\frac{A \cdot B}{|A| |B|})

(11)

As can be seen from Figure 13 and Table 3, in the unobstructed fruit scenario, the percentage of circular arc coverage of citrus reaches 38.24%, and the estimated angle θ of the fruit stem pose of the OBB algorithm is distributed in the range of 9.73°~23.32°, with an average θ of 16.81°, which exhibits a higher precision of the pose estimation. In the scenarios with shaded fruits and small-angle-tilted-growing fruits, the percentage of circular coverage decreases to 28.75% and 34.58%, respectively, which is 3.66%~9.49% lower compared to the unshaded scenario. In terms of fruit stem pose estimation error, the θ range of the OBB algorithm in the occluded fruit scenario is distributed from 9.20° to 27.88°, and the average θ rises to 20.92°, which is an increase of 4.11° compared to the unobscured fruit scenario. As the citrus is affected by the interference of the occluded objects and its own tilted growth, the effective contour of the citrus surface is missing, and the percentage of arc coverage is relatively low, resulting in an increase in the estimation error of the fruit stem pose; nevertheless, the OBB algorithm still maintains a relatively accurate estimation of the fruit stem pose in complex occlusion scenarios, and the average estimation error range can reach 20.92°~21.58°, which can fully satisfy the real-time picking needs of picking robots and reflects good robustness. Additionally, in the scenario of small-angle-tilted-growing fruit, the θ range of the OBB algorithm is 11.79° to 29.02°, with an average of 21.58°. This is only 0.66° higher than the shaded fruit scenario, which further verifies the algorithm’s superior generalization performance and ability to effectively estimate the stems’ pose in shaded, tilted citrus.

The experimental results fully demonstrate that arc coverage has less influence on the algorithm of this paper, and the algorithm proposed in this paper is able to meet the practical operational requirements of picking robots in citrus picking scenarios of different complexity in citrus occlusion complexity and can provide accurate picking point location information for the robot end-effector.

4.3. Citrus Picking Performance Test

To comprehensively evaluate the picking performance of the citrus picking robot, this experiment selects the picking success rate as the evaluation index. It compares the non-estimated fruit stem scheme with the algorithm in this paper for fruit stem pose estimation to verify the picking success rate of fruit stem pose estimation. This experiment is randomly conducted 50 times in a real citrus orchard scene, and the reset after the robot completes the picking task is recorded as a picking experiment.

Firstly, initialize the state of the robotic arm and use the YOLOv5s network to detect the target citrus fruit that is reachable by the robotic arm in real time. Then, drive the end-effector to reach the front of the citrus fruit. Secondly, reconstruct the dense point cloud of the citrus and estimate the pose of the fruit stems using the OBB algorithm on the ellipsoidal point cloud to determine the three-dimensional spatial picking point of the citrus. Lastly, using the hand–eye coordination relationship, drive the end-effector to adjust the picking pose and complete the picking operation as shown in Figure 14. For the picking scheme without fruit stem estimation, the fruit stem pose estimation operation is not performed, and the picking is performed directly forward by driving the end-effector to reach the front of the citrus. As shown in Figure 15, the thin green line in the figure shows the estimated fruit stem results, and the green dots are the picking points. If the fruit stem of the target citrus is in the occlusion-free condition and grows in the positive upward direction of the fruit, the picking success rate of the method is high; conversely, if the target citrus is in the occlusion condition and its fruit stem exhibits a small-angle-tilted growth, the picking success rate of the method is low. The stem attitude estimation method makes it possible to adjust the end-effector pose to coincide with the stem pose and perform citrus fruit picking. As shown in continuous frames of citrus picking in Figure 16, as depicted in (a), the picking robot successfully picked the citrus fruit by integrating the fruit stem posture information output by the posture estimation algorithm. As shown in (b) and (c), due to the insufficient width of the cutting end of end-effector, it easily collides with branches and leaves near the fruit stem. This prevents the cutting end from performing the cutting operation on the stem, ultimately leading to failure in picking the citrus fruit.

As shown in Table 4, the computation time for estimating the fruit stem pose using the OBB algorithm is 0.29 s in scenarios where fruits are not occluded and grow at small angles. In complex occlusion scenarios, the computation time for the OBB algorithm is 0.30 s, which is only 0.01 s longer than in the previous two scenarios. Considering the picking efficiency across all picking scenarios, the average pose estimation time of the OBB algorithm remains stable at 0.29 s.

The experimental results demonstrate that the proposed algorithm exhibits good computational efficiency, fully meeting the real-time requirements of a picking robot system. Additionally, the algorithm significantly improves the success rate of picking operations without compromising the robot’s overall performance, effectively ensuring the continuity and efficiency of robotic picking tasks.

Figure 17 shows that in the scenes without occluded fruit, the fruit-picking success rate of the proposed algorithm is 87.5%. This represents a 23.22% improvement over the method that does not estimate fruit stems. In scenarios involving obstructed fruits and fruits growing at small angles, the fruit picking success rate of the proposed algorithm remains at 78.94% and 80%, respectively. Although the complex environment leads to a decrease in success rate compared to the ideal scenario, the overall picking performance was still significantly better than the scheme without fruit stem estimation. The picking success rates of the scheme without fruit stem estimation in the above scenarios are 26.31% and 11.76%, respectively, with poor picking efficiency. According to the comparison of the experimental data, the average picking success rate of the proposed algorithm is 82%, representing a 50% improvement over the method without fruit stem estimation. Additionally, the algorithm demonstrates good fruit stem pose estimation accuracy and can overcome environmental occlusion and fruit pose interference factors, significantly enhancing the success rate of robotic picking operations and exhibiting good robustness. The specific experimental data are shown in Table 5.

5. Conclusions and Future Work

5.1. Conclusions

To address the problems of poor picking localization accuracy of citrus picking robots under complex occlusion conditions, which affects the efficiency of autonomous picking, in this paper, we propose an attitude estimation method of citrus stem based on the YOLO-OBB algorithm, which is based on the YOLOv5s network model and binocular vision method for preliminary localization of citrus, to improve the feature extraction ability of the target in the field of view of the camera, and combine the OBB algorithm and the PCA method to accurately estimate the attitude of the fruit stem, to solve the problem of difficulty in localization of the fruit stem picking point, and to improve the localization accuracy of the picking robot and the operation efficiency. In scenes with unobstructed fruits, obstructed fruits, and fruits growing at a small angle, the arc coverage ratio of citrus fruits was 38.24%, 28.75%, and 34.58%, respectively, with an average ratio of 33.86%. In terms of fruit stem pose estimation, the average θ of the OBB algorithm reached 16.81° in the unoccluded fruit scene, while in the occluded fruit scene, the average θ rose to 20.92°, an increase of 4.11° compared to the unoccluded fruit scene. Although the fruit stem pose estimation error increased due to the loss of the effective contour of the surface caused by the interference of occluded objects and the tilted growth of the citrus fruit, the OBB algorithm was still able to maintain a relatively accurate fruit stem pose estimation ability, meeting the picking requirements of the picking robot. In addition, in the scenario of fruit growing at a small angle of inclination, the average θ of the OBB algorithm was 21.58°, which was only 0.66° higher than that in the scenario with occluded fruit, effectively addressing the challenge of estimating the pose of fruit stems that are occluded and tilted, and the average estimation time of fruit stem pose reaches 0.29 s, which has a good estimation effect. In the above three picking scenarios, the picking success rate of the algorithm in this paper reached 87.5%, 78.94%, and 80%, respectively, with an average picking success rate of 82%, which is 50% higher than the fruit stem estimation scheme without fruit stems, greatly improving the picking success rate of the robot. The experimental results show that the algorithm in this paper can accurately estimate the pose of citrus fruit stems, provide accurate picking point location information for picking robots, efficiently complete picking tasks, meet the fruit picking requirements of robots, and provide an effective solution for fruit detection, positioning, and picking by orchard picking robots.

5.2. Discussions of Limitations

Traditional deep learning and three-dimensional vision technologies can effectively detect fruits and provide precise positioning information. However, in unstructured orchard environments where fruits are obscured by branches and leaves or overlap, relying solely on fruit location data results in low picking accuracy for picking robots. This paper addresses citrus picking under occlusion conditions. It employs visual algorithms to detect and perform 3D reconstruction of obscured citrus fruits, mapping partial point cloud information in 3D space. Directional bounding boxes are then constructed for these partial point clouds. Using PCA to estimate the location of citrus stems, finally, citrus fruit picking under occlusion conditions is performed.

The citrus picking method in this paper focuses mainly on the implementation of the overall picking method; our current focus on the YOLOv5s detection method is on using the YOLO framework to obtain ROI information for specific fruits, and we have not yet attempted a higher version of YOLO or a method to improve the network model. This may cause some limitations in the algorithm, which is something we did not do well, so our future work will further attempt to optimize the recognition algorithm.

Experimental data demonstrate that the proposed algorithm achieves excellent results when picking citrus fruits under three conditions: no occlusion, minor occlusion, and small-angle-tilted growth. However, under conditions of extensive occlusion and steeply inclined growth, the picking rate decreases. This is because while the robot precisely locates the picking point during the process, it fails to plan the robotic arm’s picking path. Consequently, the cutting end is prone to collisions with nearby branches and leaves during execution, causing the citrus stem to shift out of the targeted picking position and preventing successful fruit removal.

5.3. Future Work

The algorithm in this paper can provide accurate picking position information for the picking robot to realize efficient citrus picking in complex orchard environments with good robustness. However, in the face of the rapid development of intelligent agricultural machinery and the complex changes in unstructured orchards, there is still room for improvement in the future: the applicability of the end-effector affects the success rate of picking in actual picking operations. In the future, the structural design of the end-effector can be further optimized to improve the success rate of citrus picking. Improvement of the network model will also be considered in the future to enhance the recognition performance of fruit targets under occlusion; in addition, the picking method proposed in this paper can be further improved by further improving the network model and the end-effector mechanism, so that it can be extended and applied to the picking scenarios of other fruits such as lychees, apples, pears, and dragon fruits, etc., which demonstrates good generalization performance and application value.

Author Contributions

L.Y.: conceptualization, methodology, supervision, writing—review and editing, and funding acquisition; J.M.: methodology, data curation, formal analysis, and writing—original draft; Y.L.: writing—original draft, formal analysis, and data curation; Z.G.: conceptualization, methodology, and data curation; Z.L.: formal analysis and data curation; C.O.: formal analysis and data curation; J.L.: methodology, supervision, funding acquisition, and writing—review and editing; F.W.: methodology, supervision, funding acquisition, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the Basic and Applied Basic Research Foundation of Guangdong Province (Grant No. 2023A1515110586), Guangdong Provincial Education Department Characteristic Innovation Project (Grant No. 2024KTSCX132 and Grant No. 2024KTSCX063), University-Industry Collaborative Education Program of Ministry of Education (Grant No. 2024XTYR08), and Key research project of Shaoguan University (grant number SZ2023KJ14).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed toward the corresponding author(s).

Conflicts of Interest

This study was conducted without any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Yin, H.; Sun, Q.; Ren, X.; Guo, J.; Yang, Y.; Wei, Y.; Huang, B.; Chai, X.; Zhong, M. Development, integration, and field evaluation of an autonomous citrus-harvesting robot. J. Field Robot. 2023, 40, 1363–1387. [Google Scholar] [CrossRef]
Ye, L.; Wu, F.; Zou, X.; Li, J. Path planning for mobile robots in unstructured orchard environments: An improved kinematically constrained bi-directional RRT approach. Comput. Electron. Agric. 2023, 215, 108453. [Google Scholar] [CrossRef]
Wang, H.; Dong, L.; Zhou, H.; Luo, L.; Lin, G.; Wu, J.; Tang, Y. YOLOv3-Litchi Detection Method of Densely Distributed Litchi in Large Vision Scenes. Math. Probl. Eng. 2021, 2021, 8883015. [Google Scholar] [CrossRef]
Tang, Y.; Chen, M.; Wang, C.; Luo, L.; Li, J.; Lian, G.; Zou, X. Recognition and localization methods for vision-based fruit picking robots: A review. Front. Plant Sci. 2020, 11, 510. [Google Scholar] [CrossRef]
Ye, L.; Duan, J.; Yang, Z.; Zou, X.; Chen, M.; Zhang, S. Collision-free motion planning for the litchi-picking robot. Comput. Electron. Agric. 2021, 185, 106151. [Google Scholar] [CrossRef]
Wang, H.; Lin, Y.; Xu, X.; Chen, Z.; Wu, Z.; Tang, Y. A study on long-close distance coordination control strategy for litchi picking. Agronomy 2022, 12, 1520. [Google Scholar] [CrossRef]
Chen, J.; Qiang, H.; Wu, J.; Xu, G.; Liu, X. Extracting the navigation path of a tomato-cucumber greenhouse robot based on a median point Hough transform. Comput. Electron. Agric. 2020, 174, 105472. [Google Scholar] [CrossRef]
Chen, J.; Huang, L.; Huang, Y. An accurate recognition and localization algorithm for strawberry picking robots. In Proceedings of the International Conference on Optical and Photonic Engineering (icOPEN 2024), Foshan, China, 15–18 November 2024; Volume 4. [Google Scholar]
Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep learning–Method overview and review of use for fruit detection and yield estimation. Comput. Electron. Agric. 2019, 162, 219–234. [Google Scholar] [CrossRef]
Zhang, Y.; Li, L.; Chun, C.; Wen, Y.; Xu, G. Multi-scale feature adaptive fusion model for real-time detection in complex citrus orchard environments. Comput. Electron. Agric. 2024, 219, 108836. [Google Scholar] [CrossRef]
Sun, H.; Wang, B.; Xue, J. YOLO-P: An efficient method for pear fast detection in complex orchard picking environment. Front. Plant Sci. 2023, 13, 1089454. [Google Scholar] [CrossRef]
Pan, S.; Ahamed, T. Pear recognition in an orchard from 3D stereo camera datasets to develop a fruit picking mechanism using mask R-CNN. Sensors 2022, 22, 4187. [Google Scholar] [CrossRef]
Zhang, W.; Wang, J.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Deep-learning-based in-field citrus fruit detection and tracking. Hortic. Res. 2022, 9, uhac003. [Google Scholar] [CrossRef]
Häni, N.; Roy, P.; Isler, V. A comparative study of fruit detection and counting methods for yield mapping in apple orchards. J. Field Robot. 2020, 37, 263–282. [Google Scholar] [CrossRef]
Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A real-time apple targets detection method for picking robot based on improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
Shu, Y.; Zheng, W.; Xiong, C.; Xie, Z. Research on the vision system of lychee picking robot based on stereo vision. J. Radiat. Res. Appl. Sci. 2024, 17, 100777. [Google Scholar] [CrossRef]
Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Morros, J.-R.; Ruiz-Hidalgo, J.; Vilaplana, V.; Gregorio, E. Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry. Comput. Electron. Agric. 2020, 169, 105165. [Google Scholar] [CrossRef]
Lin, G.; Tang, Y.; Zou, X.; Li, J.; Xiong, J. In-field citrus detection and localisation based on RGB-D image analysis. Biosyst. Eng. 2019, 186, 34–44. [Google Scholar] [CrossRef]
Zhang, Q.; Su, W.-H. Real-time recognition and localization of apples for robotic picking based on structural light and deep learning. Smart Cities 2023, 6, 3393–3410. [Google Scholar] [CrossRef]
Luo, L.; Liu, B.; Chen, M.; Wang, J.; Wei, H.; Lu, Q.; Luo, S. DRL-enhanced 3D detection of occluded stems for robotic grape harvesting. Comput. Electron. Agric. 2025, 229, 109736. [Google Scholar] [CrossRef]
Xiao, X.; Wang, Y.; Jiang, Y.; Wu, H.; Zhou, B. Monocular Pose Estimation Method for Automatic Citrus Harvesting Using Semantic Segmentation and Rotating Target Detection. Foods 2024, 13, 2208. [Google Scholar] [CrossRef]
Hou, C.; Xu, J.; Tang, Y.; Zhuang, J.; Tan, Z.; Chen, W.; Wei, S.; Huang, H.; Fang, M. Detection and localization of citrus picking points based on binocular vision. Precis. Agric. 2024, 25, 2321–2355. [Google Scholar] [CrossRef]
Tang, Y.; Zhou, H.; Wang, H.; Zhang, Y. Fruit detection and positioning technology for a Camellia oleifera C. Abel orchard based on improved YOLOv4-tiny model and binocular stereo vision. Expert Syst. Appl. 2023, 211, 118573. [Google Scholar] [CrossRef]
Li, P.; Chen, J.; Chen, Q.; Huang, L.; Jiang, Z.; Hua, W.; Li, Y. Detection and picking point localization of grape bunches and stems based on oriented bounding box. Comput. Electron. Agric. 2025, 233, 110168. [Google Scholar] [CrossRef]
Song, C.; Zhang, F.; Li, J.; Zhang, J. Precise maize detasseling base on oriented object detection for tassels. Comput. Electron. Agric. 2022, 202, 107382. [Google Scholar] [CrossRef]
Zhou, S.; Zhong, M.; Chai, X.; Zhang, N.; Zhang, Y.; Sun, Q.; Sun, T. Framework of rod-like crops sorting based on multi-object oriented detection and analysis. Comput. Electron. Agric. 2024, 216, 108516. [Google Scholar] [CrossRef]
Gao, Y.; Tang, H.; Wang, Y.; Liu, T.; Li, Z.; Li, B.; Zhang, L. Oriented Object Detection with RGB-D Data for Corn Pose Estimation. Appl. Sci. 2025, 15, 10496. [Google Scholar] [CrossRef]
Gao, Y.; Li, Z.; Hong, Q.; Li, B.; Zhang, L. Corn pose estimation using 3D object detection and stereo images. Comput. Electron. Agric. 2025, 231, 110016. [Google Scholar] [CrossRef]
Lin, Y.; Huang, Z.; Liang, Y.; Liu, Y.; Jiang, W. Ag-yolo: A rapid citrus fruit detection algorithm with global context fusion. Agriculture 2024, 14, 114. [Google Scholar] [CrossRef]
Cai, Z.; Zhang, Y.; Li, J.; Zhang, J.; Li, X. Synchronous detection of internal and external defects of citrus by structured-illumination reflectance imaging coupling with improved YOLO v7. Postharvest Biol. Technol. 2025, 227, 113576. [Google Scholar] [CrossRef]
Jiang, T.; Zhong, Y. ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11. arXiv 2025, arXiv:2502.14314. [Google Scholar] [CrossRef]
Yadav, P.K.; Thomasson, J.A.; Searcy, S.W.; Hardin, R.G.; Braga-Neto, U.; Popescu, S.C.; Martin, D.E.; Rodriguez, R.; Meza, K.; Enciso, J. Assessing the performance of YOLOv5 algorithm for detecting volunteer cotton plants in corn fields at three different growth stages. Artif. Intell. Agric. 2022, 6, 292–303. [Google Scholar] [CrossRef]
Huang, Z.; Ou, C.; Guo, Z.; Ye, L.; Li, J. Human-Following Strategy for Orchard Mobile Robot Based on the KCF-YOLO Algorithm. Horticulturae 2024, 10, 348. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, L.; Lei, P.; Jin, X. Automated solving algorithm and application for oriented bounding box based on principal axis of inertia. In Proceedings of the 2021 The 2nd International Conference on Mechanical Engineering and Materials (ICMEM 2021), Beijing, China, 19–20 November 2021; p. 012007. [Google Scholar]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Chen, M.; Tang, Y.; Zou, X.; Huang, Z.; Zhou, H.; Chen, S. 3D global mapping of large-scale unstructured orchard integrating eye-in-hand stereo vision and SLAM. Comput. Electron. Agric. 2021, 187, 106237. [Google Scholar] [CrossRef]

Figure 1. (a) Citrus orchard scene; (b) Fruit lightly occluded by branches and leaves, (c) Fruit moderately occluded by branches and leaves, (d) Fruit severely occluded by branches and leaves, (e) Fruit occlude by overlapping.

Figure 2. YOLO-OBB algorithm flowchart (The red, green, and blue arrows indicate the x-axis, y-axis, and z-axis of the coordinate system, respectively and the green point indicate the picking point.).

Figure 3. YOLOv5s network framework (The three images in the input section are each represented as a RGB stream).

Figure 4. Citrus deep information matching flowchart.

Figure 5. Fruit stem pose estimation flowchart. (The green point indicates the fruit stem picking point ,red arrow indicates sequence of steps in the implementation of the algorithm).

Figure 6. Example diagram for fruit stem pose estimation.

Figure 7. Spatial relationship coordinates of the hand–eye system.

Figure 8. Hand–eye calibration schematic diagram.

Figure 9. Experimental platform.

Figure 10. Experimental scene.

Figure 11. Example diagram of citrus classification; (a) no obstruction, (b) 10% to 50% occlusion rate, and (c) citrus growing at a slight angle with the fruit stem occluded.

Figure 12. Citrus point cloud estimation (The red arrow indicates the angle of the fan-shaped outline of the citrus that can be seen).

Figure 13. Fruit stem pose estimation data.

Figure 14. Citrus fruit stem picking point localization schematic diagram (The red point indicate the points of estimation for citrus fruit stems).

Figure 15. Comparison of the picking effect with and without estimation of the fruiting stems (The coordinate system in the figure represents the End-effector coordinate system, and the X,Y,Z represents the x-axis, y-axis, and axis of the end-effector coordinate system, respectively).

Figure 16. Example of citrus picking process (The number indicate the sequence of steps in operation process of the citrus picking robot).

Figure 17. Citrus picking scheme comparison.

Table 1. Environment configurations.

Environment	Component	Parameter
Hardware	CPU	AMD Ryzen 7 1700X Eight-Core
Hardware	GPU	NVIDIA GeForce GTX 1080
Software	CUDA	11.4
	CUDNN	8.4.1
	PyTorch	1.12.1

Table 2. Experimental platform parameter configuration.

Category	Device	Parameter	Value
Depth camera	RealSense D435i (Intel Corp., Santa Clara, CA, USA)	Resolution	1280 × 800
		Focal length	1.93 mm
		Baseline distance	50 mm
Laptop computer	Lenovo XiaoXinPro 16ACH 2021 (Lenovo Corp, Beijing, China)	CPU	AMD Ryzen 5 5600H with Radeon Graphics
Laptop computer	Lenovo XiaoXinPro 16ACH 2021 (Lenovo Corp, Beijing, China)	GPU	NVIDIA GeForce GTX 1650 4G

Table 3. Fruit stem pose error estimation analysis.

Name	Uncovered Fruit	Occluded Fruit	Small-Tilt-Angle-Growth Fruit	Average
P (%)	38.24	28.75	34.58	33.86
OBB-θ (°)	16.81	20.92	21.58	19.77

Table 4. Citrus fruit stem pose estimation computation time comparison.

Name	Uncovered Fruit	Occluded Fruit	Small-Tilt-Angle-Growth Fruit	Average
OBB (s)	0.29	0.30	0.29	0.29

Table 5. Average citrus picking success rate comparison.

Method	Average Picking Success Rate
YOLO-OBB	82%
Not estimating the fruit stem	32%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, L.; Ma, J.; Lv, Y.; Guo, Z.; Lai, Z.; Ou, C.; Li, J.; Wu, F. The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking. Agriculture 2025, 15, 2330. https://doi.org/10.3390/agriculture15222330

AMA Style

Ye L, Ma J, Lv Y, Guo Z, Lai Z, Ou C, Li J, Wu F. The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking. Agriculture. 2025; 15(22):2330. https://doi.org/10.3390/agriculture15222330

Chicago/Turabian Style

Ye, Lei, Junjun Ma, Yuanhua Lv, Zhipeng Guo, Zhihao Lai, Chuhong Ou, Jin Li, and Fengyun Wu. 2025. "The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking" Agriculture 15, no. 22: 2330. https://doi.org/10.3390/agriculture15222330

APA Style

Ye, L., Ma, J., Lv, Y., Guo, Z., Lai, Z., Ou, C., Li, J., & Wu, F. (2025). The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking. Agriculture, 15(22), 2330. https://doi.org/10.3390/agriculture15222330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The YOLO-OBB-Based Approach for Citrus Fruit Stem Pose Estimation and Robot Picking

Abstract

1. Introduction

2. Materials

2.1. Data Collection

2.2. Data Annotation

2.3. Training Decision-Making

3. Methods

3.1. Algorithm Framework

3.2. Fruit Detection

3.3. Fruit Localization

3.4. Fruit Picking

3.4.1. Fruit Attitude Estimation

3.4.2. Estimation of Robot’s Picking Posture

4. Experiments

4.1. Experimental Platform and Equipment

4.2. Fruit Stem Pose Error Estimation Experiment

4.3. Citrus Picking Performance Test

5. Conclusions and Future Work

5.1. Conclusions

5.2. Discussions of Limitations

5.3. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI