A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots

Kan, Jiangming; Wu, Yue; Dong, Ruifang; Yao, Shun; Zhao, Xixuan; Zou, Tianji; Kang, Boqi; Li, Junjie

doi:10.3390/agriculture16050620

Open AccessArticle

A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots

by

Jiangming Kan

^1,2,

Yue Wu

^1,2,

Ruifang Dong

^1,2

,

Shun Yao

^1,2,

Xixuan Zhao

^1,2

,

Tianji Zou

³

,

Boqi Kang

^3,* and

Junjie Li

^4,*

¹

School of Technology, Beijing Forestry University, Beijing 100083, China

²

Key Laboratory of State Forestry Administration on Forestry Equipment and Automation, Beijing 100083, China

³

Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China

⁴

School of Space Exploration, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2026, 16(5), 620; https://doi.org/10.3390/agriculture16050620

Submission received: 10 January 2026 / Revised: 8 February 2026 / Accepted: 7 March 2026 / Published: 8 March 2026

(This article belongs to the Special Issue Perception, Decision-Making, and Control of Agricultural Robots)

Download

Browse Figures

Versions Notes

Abstract

Position-Based Visual Servoing (PBVS) and Image-Based Visual Servoing (IBVS) struggle to balance end effector pose accuracy and robustness in apple picking. They are also prone to target loss and control singularities. A progressive Hybrid Automatic Switching Visual Servoing (HAVS) method is proposed and applied to an apple-picking robotic system. HAVS integrates PBVS and IBVS to coordinate control of the manipulator end effector pose. A depth-based switching function is designed. When target depth is below an optimal threshold, the controller switches to PBVS for precise final positioning. This reduces target loss and control singularities. An adaptive proportional-derivative (PD) controller with fuzzy gain scheduling updates the control gains online to enhance responsiveness and stability. The hardware consists of a six-axis manipulator, a depth camera, and a mobile base. You Only Look Once version 5 (YOLOv5) performs apple detection and generates control commands. Indoors, success rate was 96%, which was 4 and 10 percentage points higher than PBVS only and IBVS only. Average picking time was 12.5 s, 0.3 s, and 1.1 s shorter. Outdoors, success rate was 87.5%, average time was 13.2 s, and damage rate was 4.2%. This method provides a reference implementation for visual servo control in agricultural picking robots.

Keywords:

hybrid visual servo control; apple picking; agricultural robot; fuzzy gain scheduling; adaptive proportional-derivative control

1. Introduction

Apples are among the most widely consumed fruits worldwide and are eaten fresh or used in processed products such as juices and sauces [1]. Apple consumption has been associated with health benefits, including lower risks of cancer and cardiovascular disease [2]. As orchard cultivation expands globally, harvesting under adverse weather and rising labor costs have become major constraints [3,4]. Advances in artificial intelligence and automated control have accelerated the adoption of robotic apple harvesters as alternatives to manual picking [5,6].

Recent years have seen substantial progress in mechanized apple harvesting [7,8,9,10,11]. Vision systems provide key information for environmental perception and target localization in harvesting robots. Vision-based servo control remains an active research topic in robotics [12,13,14]. In complex operating environments, vision sensors capture rich external information and can improve robotic stability and precision [15]. Orchard environments are unstructured, and tree growth patterns and fruit locations are highly uncertain. During harvesting, the manipulator must grasp target fruits accurately and efficiently [16]. A key challenge is to locate fruits rapidly and precisely in these conditions. Visual servo control must then guide the manipulator to the target for reliable harvesting [17].

The concept of visual servoing was first proposed by Hill et al. [18]. Unlike conventional robot control, visual servoing forms a closed-loop system that uses visual features as feedback for real-time motion control. Hutchinson et al. [19] further formalized visual servoing and categorized it into Position-Based Visual Servoing (PBVS) and Image-Based Visual Servoing (IBVS).

PBVS converts image measurements into a three-dimensional target pose for control [20]. The controller extracts image features. Using camera-to-hand calibration parameters, it estimates the target-to-manipulator relative pose in real time. Control laws are then formulated in Cartesian space to drive the manipulator and reduce pose errors [21]. Mehta et al. [22] developed a vision-based estimation and control system for robotic fruit harvesting. The system employed rotational and hybrid translation controllers for manipulation. A stability analysis was conducted to assess closed-loop performance, indicating suitability for medium-to-large citrus harvesting. IBVS defines and regulates the control error directly in the two-dimensional image plane [23]. The controller selects image features, such as points or lines, and specifies their desired locations. It computes the pixel error between current and desired feature positions in real time. Based on this error, it generates camera velocity commands to minimize the image error [24]. Li et al. [25] proposed a novel uncalibrated visual servoing method that integrates a hybrid visual configuration with an adaptive tracking controller. Asymptotic convergence was established using Lyapunov stability analysis, and experiments validated the method.

These studies indicate that PBVS emphasizes accurate positioning and orientation control. It is therefore suitable for tasks that require high end-effector accuracy. However, PBVS is sensitive to camera calibration, hand-eye calibration, and target model errors [26]. During motion, the target can move out of the field of view, which can cause control failure. IBVS is generally more robust to calibration and model errors, and it tends to keep the target within the field of view [27]. However, IBVS can induce complex and hard-to-predict three-dimensional trajectories. Image singularities may also occur.

To address the limitations of IBVS and PBVS, Aabdul et al. [28] proposed a hybrid visual servoing method that uses a switching function to transition between the two schemes. When joint variables approach their limits, the controller switches to PBVS to avoid excessive distortion in the commanded camera motion. When feature points move close to the image boundary, the controller switches back to IBVS to maintain target tracking and stable control. Lei et al. [29] proposed a hybrid visual servoing method for a coal-mine tunnel bolting and drilling robot. The method achieved high-precision alignment and improved efficiency. However, the stability and reliability of the vision system still require further validation. Li et al. [30] proposed a hybrid visual servo control (HVSC) method for cherry-tomato harvesting robots. The method switches between PBVS for coarse positioning and IBVS for fine alignment, enabling smooth and accurate localization tasks. However, the system relies on prior assumptions about color and shape. This may reduce recognition robustness. In addition, it has not been validated in real harvesting environments.

To address these limitations, this study proposes a progressive Hybrid Automatic Switching Visual Servoing (HAVS) method tailored for unstructured agricultural environments. The main contributions are:

(1): Development of a HAVS method that employs hybrid IBVS–PBVS control during the coarse alignment phase for rapid target approach and guaranteed target retention within the Field-of-View (FOV). In addition, when the depth of the target falls below the optimal threshold, PBVS is employed for fine alignment.
(2): Proposal of an adaptive PD controller with fuzzy gain scheduling that updates control gains online to improve response speed and dynamic stability.
(3): Construction of an apple-picking robot system with its overall performance verified through indoor simulated and field picking experiments.

2. Design of the Apple-Picking Robot System

2.1. Hardware and Software Design of the Apple-Picking Robot

Figure 1 shows the apple-picking robot system. It comprises a uFactory (Shenzhen, China) xArm6 six-degree-of-freedom (6-DoF) collaborative manipulator, an Intel (Santa Clara, CA, USA) RealSense D435i depth camera, an electrically actuated three-finger flexible gripper, a control terminal, and a mobile platform. The xArm6 comprises a six-joint serial structure with independently actuated revolute joints. It provides 6-DoF motion. This configuration enables the end-effector to reach the desired position and orientation within its reachable workspace [31]. The end-effector uses an electrically actuated three-finger flexible gripper. The gripper supports compliant grasping and stall protection. This helps reduce apple damage during picking. The three-finger design accommodates apples with diameters of 50–150 mm.

In robotic vision systems, depth-camera mounting is typically categorized as eye-in-hand or eye-to-hand. In an eye-to-hand setup, the camera is fixed on the mobile base and does not move with the manipulator. This setup provides a wider field of view and stronger global perception. However, target localization accuracy is lower [32]. In an eye-in-hand setup, the camera is mounted on the end-effector and moves with the manipulator. The camera can approach the target, which reduces localization error. Apple picking requires high target localization accuracy. Therefore, the eye-in-hand configuration was adopted.

The software stack is built on Ubuntu 22.04 and the Robot Operating System (ROS). C++ programs are developed in Microsoft (Redmond, WA, USA) Visual Studio 2022. LibTorch 1.13.0, the PyTorch C++ application programming interface (API), is used for graphics processing unit (GPU)-accelerated inference for apple detection and localization. Image processing uses the depth-camera software development kit (SDK) and the Open-Source Computer Vision Library (OpenCV) 4.8. Motion planning and visual servo control are implemented with the xArm SDK. ROS supports mobile platform navigation and locomotion control. It also enables inter-module communication.

2.2. Operational Workflow of the Apple-Picking Robot

Figure 2 illustrates the overall workflow of the apple-picking robot. The workflow includes three phases: target detection, vision-based servo control, and the picking action. First, the red–green–blue and depth (RGB-D) camera captures synchronized color and depth images of the target. After preprocessing, a target-detection network identifies the target fruit. The target center is then computed in both the base frame (3D) and the image plane (2D). An adaptive PD controller with fuzzy gain scheduling is used. PBVS drives the manipulator towards the target apple. IBVS regulates the end-effector orientation to keep the target on the camera optical axis. When the target depth falls below a preset threshold, the controller switches to PBVS for final fine positioning. After the desired pose is reached, the gripper closes. The manipulator then retracts and releases the fruit to complete the picking operation.

3. HAVS Control Method

The proposed progressive Hybrid Automatic Switching Visual Servoing (HAVS) method adopts an eye-in-hand vision configuration. It enables real-time detection and three-dimensional localization of target apples. It also supports visual servoing of the manipulator. The software uses multi-threaded architecture. The vision, control, and display threads run concurrently. The vision thread performs apple detection and localization. The control thread runs the hybrid visual servoing algorithm and issues motion commands. The display thread supports visualization and human–machine interaction. This design improves system stability and meets real-time constraints.

3.1. Object Detection

Object detection networks are commonly divided into one-stage methods, such as the You Only Look Once (YOLO) family, and two-stage methods, such as Faster Region-based Convolutional Neural Networks (Faster R-CNN) [33,34]. Apple orchard environments are highly unstructured, characterized by dense fruit and severe occlusion. Though anchor-free detectors (e.g., YOLOv8) perform well on general datasets, YOLOv5’s anchor-based approach offers a more robust geometric prior. It generates detection anchors via K-means clustering, enabling it to more effectively separate adjacent fruits in dense clusters. In addition, YOLOv5 offers superior deployment suitability and well-established algorithmic maturity. Therefore, this study uses YOLOv5 for target detection.

YOLOv5 has fewer parameters and lower computational cost than Faster R-CNN, Focal Loss for Dense Object Detection (RetinaNet), and Single Shot MultiBox Detector (SSD). It therefore achieves faster inference. This supports real-time detection on resource-constrained platforms while maintaining high accuracy [35,36,37]. Therefore, the network provides timely localization information for the picking task. YOLOv5 provides four model scales: s, m, l, and x. They share the same architecture but differ mainly in depth and feature-map width [38]. YOLOv5s is the smallest variant and uses a shallower network with narrower feature maps. Larger variants increase depth and width to improve detection performance. YOLOv5s is usually less accurate than larger variants, but it provides faster inference. Given the real-time requirement, YOLOv5s is selected for target detection.

The MinneApple dataset [39,40,41] was used for training and evaluation. It is designed for apple detection and segmentation in orchard scenes and is generally more challenging than the indoor and fielded picking setup considered in this study. It contains 1000 high-resolution orchard images covering diverse lighting, occlusion, and background conditions. It includes polygon mask annotations for more than 41,000 apple instances. This supports both detection and segmentation tasks. The dataset was randomly split into training (80%), validation (10%), and test (10%) sets. This split supports model training, hyperparameter tuning, and generalization evaluation. Performance was evaluated using mean average precision at an intersection-over-union (IoU) threshold of 0.5 (mAP@0.5).

Transfer learning was employed by initializing the detection network using pre-trained weights obtained from the COCO dataset. We set the image size to 640 × 640 pixels, with 300 training epochs, a learning rate of 0.01, momentum of 0.937, weight decay of 0.0005, and adopt the Stochastic Gradient Descent (SGD) optimizer. We adopted standard YOLOv5 data augmentation techniques, including Mosaic augmentation and random HSV color jittering. Model training was conducted on a laptop with Ubuntu 22.04, a 13th Gen Intel (R) Core (TM) i7-13620H (2.40 GHz) CPU and an NVIDIA GeForce RTX 4060 Laptop GPU. Typical detection results are shown in Figure 3.

3.2. Basic Modules and Control-Law Design

3.2.1. IBVS Control Module

IBVS defines the control error on a two-dimensional image plane and forms a closed-loop system. IBVS has low computational cost and is relatively insensitive to camera calibration errors. This study uses the detected apple center as the image feature. The image center is set as the desired feature location. IBVS aims to align the apple center with the image center.

With an image resolution of 640 × 480, the image center is located at pixel (320, 240). The IBVS error is defined as the pixel offset between the apple center and the image center.

e_{I B V S} = [u - u_{c}, v - v_{c}]^{T}

(1)

A desired depth of d₀ (150 mm) is specified to indicate that the apple is close to the gripper. The depth error is defined as the difference between the desired and measured depths. This constraint limits distance variation along the line-of-sight direction.

e_{d} = d - d_{0}

(2)

Proportional control maps the pixel and depth errors to correction commands for the end-effector orientation. After empirical tuning, the proportional gains K_pα and K_pβ are set to −0.1 and 0.1, respectively.

Δ α = K_{p α} (u - u_{c}), Δ β = K_{p β} (v - v_{c})

(3)

Here, Δα and Δβ denote the control increments for the end-effector roll and pitch angles. For the IBVS baseline, control commands computed from depth error were used to drive the manipulator toward the target.

Δ z = K_{p 2} e_{d}

(4)

3.2.2. PBVS Control Module

PBVS maps image observations and depth measurements to a target position in three-dimensional space. Control is performed in this spatial domain. The objective is to drive end-effector motion and minimize the position error between the gripper working point and the target apple center. The error is reduced to near zero. This requires coordinating transformations among the camera, the manipulator, and the target. The coordinate frames are defined in Figure 4.

The Intel RealSense D435i captures RGB images and depth maps. YOLOv5s estimates the target apple center pixel coordinates (u, v) from the RGB image. The detection results are aligned with the depth map. The depth value at (u, v) is then read. A moving-average filter smooths the depth value to suppress fluctuations from illumination changes and sensor noise. The depth map measures the distance from the apple surface to the camera. It does not directly provide the depth of the apple’s geometric center. The apple is approximated as a sphere. Its equivalent image radius r is estimated from the YOLOv5s bounding box. The center depth is corrected by adding r to the measured depth d.

The depth-camera intrinsics include focal lengths (f_x, f_y), and principal point coordinates (c_x, c_y). Using the pinhole camera model, (u, v) and the corrected depth are back-projected to obtain the three-dimensional target center in the camera frame.

X_{c} = \frac{(u - c_{x}) \cdot d}{f_{x}}, Y_{c} = \frac{(v - c_{y}) \cdot d}{f_{y}}, Z_{c} = d + r

(5)

The target-center position in the camera frame is denoted by p_cam, written in homogeneous form as [Xc, Yc, Zc, 1]^T. A hand-eye calibration transform maps this point from the camera frame to the end-effector frame.

T_{cam \to end} = [\begin{matrix} - 0.0080 & - 0.9990 & 0.0428 & 85.0887 \\ 0.9998 & - 0.0074 & 0.0149 & - 30.0342 \\ - 0.0146 & 0.0430 & 0.9989 & 5.0004 \\ 0.0 & 0.0 & 0.0 & 1.0 \end{matrix}]

(6)

p_{e n d} = T_{cam \to end} \cdot p_{c a m}

(7)

This transform is denoted as T_cam→end, which is a homogeneous transformation matrix obtained from hand-eye calibration. For control, the point is further expressed in the base frame. The xArm SDK provides forward kinematics for the xArm6. It outputs the end-effector position x_end and orientation (α, β, γ) in the base frame. These kinematics define the base-to-end-effector transformation. The target point is then mapped into the base frame.

R_{e n d \to b a s e} = [\begin{matrix} \cos γ \cos β & \cos γ \sin β \sin α - \sin γ \cos α & \cos γ \sin β \cos α + \sin γ \sin α \\ \sin γ \cos β & \sin γ \sin β \sin α + \cos γ \cos α & \sin γ \sin β \cos α - \cos γ \sin α \\ - \sin β & \cos β \sin α & \cos β \cos α \end{matrix}]

(8)

T_{e n d \to b a s e} = [\begin{matrix} R_{e n d \to b a s e} & x_{e n d} \\ 0 & 1 \end{matrix}]

(9)

p_{b a s e} = T_{e n d \to b a s e} \cdot p_{e n d}

(10)

The mapped point in the base frame is denoted as p_base. Its first three components give the target position p_target. A fixed offset l_offset of 185 mm is measured between the gripper working point and the end-effector frame along the Z-axis. This offset is used to compute the gripper working-point position in the base frame.

x_{g r i p p e r} = x_{e n d} + R_{e n d \to b a s e} \cdot {[0,0, l_{offset}]}^{T}

(11)

The PBVS position error is the difference between the target position and the gripper working-point position. It is used as the input to end-effector position control.

e_{P B V S} = x_{t a r g e t} - x_{g r i p p e r}

(12)

3.2.3. Adaptive PD Control Module with Fuzzy Gain Scheduling

A PD controller is used to regulate the manipulator’s end-effector position and balance response speed and stability. The error derivative is approximated using a discrete-time difference. The control period T is about 50 ms.

{\dot{e}}_{P B V S} (t) = \frac{e_{P B V S} (t) - e_{P B V S} (t - T)}{T}

(13)

The PD update computes the end-effector position increment at each control cycle.

Δ x (t) = K_{p 1} e_{P B V S} (t) + K_{d} {\dot{e}}_{P B V S} (t)

(14)

In unstructured orchard environments, fixed-gain PD control may not achieve both fast convergence and stable behavior over the full operating range. To address these trade-offs, a Mamdani fuzzy inference-based gain scheduler is designed [42,43,44]. The scheduler updates K_p₁ and K_d online based on the current error and its rate of change. Compared with advanced control strategies such as model reference adaptive control (MRAC) and active disturbance rejection control (ADRC), the fuzzy PD controller exhibits unique advantages in unstructured orchard environments. It relies solely on error feedback and does not require an accurate system model, thereby mitigating complications caused by model mismatch. Furthermore, its low computational complexity facilitates future deployment of the algorithm on embedded systems.

The scheduler uses a dual-input dual-output structure. It aims to transition smoothly from proportional (P) control to a higher-damping PD regime. The control flow is shown in Figure 5. Two inputs are used. The position error (PE) is the Euclidean distance between the end-effector and the target point. The error change rate (EC) describes the rate of change of this distance. For PE, the fuzzy subsets are {ZO, PS, PM, PB}. For EC, the fuzzy subsets are {NB, NM, NS, ZO, PS, PM, PB}. In these labels, P and N denote positive and negative, ZO denotes zero, and S, M, and B denote small, medium, and big.

The design of fuzzy rules strictly adheres to the physical principles underlying error dynamics. For large PE, the proportional gain K_p₁ is increased to accelerate convergence; for small PE, K_p₁ is reduced to prevent overshoot. Similarly, the derivative gain K_d is dynamically adjusted according to the rate of EC to provide effective damping. Based on the fundamental principles, the membership functions are further optimized via experimental tuning, as shown in Figure 6. The fuzzy rules are listed in Table 1.

Fuzzy inference produces a fuzzy output set. Defuzzification is required to obtain crisp control gains [45]. Weighted averaging is used for defuzzification to compute K_p₁ and K_d.

K_{o u t} = \frac{\sum_{i = 1}^{n} μ_{i} \cdot y_{i}}{\sum_{i = 1}^{n} μ_{i}}

(15)

Here, μ_i denotes the firing strength of the rule i. The corresponding single-point output is denoted by y_i. Its quantization parameters are determined experimentally, as shown in Table 2. Finally, the online-updated gains are applied in the PD update to compute the end-effector position increment Δx(t) at each control cycle. This method improves adaptability and operational stability in complex orchard environments while retaining the simplicity of the PD structure.

3.3. HAVS Switching Method and Coordinated Control

PBVS defines the error in three-dimensional space and drives the end-effector towards the target through coordinate transformations. PBVS often provides good global convergence and avoids image-plane singularities. It also controls the end-effector position directly in three-dimensional space, which supports high positioning accuracy. However, PBVS requires higher computation and is sensitive to calibration and depth errors. When the camera is far from the target, large motion can move the target out of the field of view. This can degrade subsequent control. In contrast, IBVS defines the error directly in the image plane and adjusts the manipulator to align target features with their desired locations. IBVS typically has lower computational cost and faster response. It is also more robust to calibration and depth errors. At close range, IBVS can suffer from image singularities and oscillations. It also provides limited direct constraints on global motion in three-dimensional space. Therefore, a progressive HAVS fusion method is proposed. It combines PBVS and IBVS in a complementary manner to balance field-of-view maintenance and end-effector positioning accuracy.

During long-range motion, HAVS runs PBVS and IBVS in parallel to control the manipulator. Target detection and coordinate transformation provide a three-dimensional position error and a two-dimensional image-plane error. PBVS uses the three-dimensional position error and an adaptive PD controller with fuzzy gain scheduling to drive the end-effector towards the target. In parallel, IBVS uses the image-plane error to regulate end-effector orientation. This keeps the target near the image center and reduces the risk of leaving the field of view. A geometric offset exists between the gripper working point and the camera optical axis. IBVS also provides limited direct constraints on three-dimensional positions. Therefore, when the target depth falls below a preset threshold, the controller switches to PBVS for final fine positioning. When the target position error drops below 2.5 mm, the gripper closes to complete the picking action. This threshold corresponds to about 5% of a typical apple dimension. It is consistent with the required picking accuracy. Figure 7 illustrates the laboratory simulated picking process. Under indoor conditions, the picking success rate reached 96%. Video S1 in the Supplementary Materials shows the indoor picking process of the robot.

4. Results and Discussion

After training, the YOLOv5s model achieved 78.1% mAP@0.5 on the MinneApple test set. Despite the dataset’s challenging lighting and occlusion conditions, deployment on the prototype achieved 89% detection accuracy in 100 indoor tests with randomly selected target positions conducted under a simpler and more controlled setting. Unlike static dataset evaluation, both detection confidence and success rate improve as the manipulator approaches the target. Regarding the latency, the object detection module is implemented using LibTorch (C++) with FP32 precision on an NVIDIA RTX 4060 Laptop GPU. The target detection module achieves an inference speed exceeding 60 FPS (10–15 ms), ensuring the real-time performance and stability of visual detection.

To evaluate the performance of the HAVS-based apple-picking system, a series of comparative experiments were conducted in an indoor simulated picking environment. Field picking trials were also conducted at X Farm in Beijing, China, to assess performance in a real orchard.

4.1. Indoor Simulated Picking Experiments

4.1.1. Determination of Switching Thresholds

Seven candidate depth thresholds for switching to PBVS were set from 0.20 m to 0.50 m in 0.05 m increments. In the indoor simulated picking environment, 50 different target positions were randomly selected, with varied lighting and occlusion conditions across these positions. Seven depth thresholds were used for picking experiments at each target position, and the system performance was evaluated by average picking time and picking success rate. The picking time is defined as the duration from the initialization of the robotic system to the moment the manipulator releases the fruit into the collection basket, and the average value is calculated across 50 independent experiments. In each trial, a picking attempt is deemed successful only if the target is successfully detected, clamped and released into the collection basket.

The results are shown in Figure 8. Error bars indicate the standard deviation across 50 trials. Switching to PBVS when the target depth fell below 0.40 m yielded the best overall performance. If the threshold is too high, the larger long-range displacement increases the chance that the target leaves the camera field of view. This can interrupt control continuity. If the threshold is too low, the switch occurs at close range, and the system becomes more prone to oscillations. This increases the trajectory length and reduces picking efficiency.

4.1.2. Comparative Experiments

After introducing fuzzy gain scheduling, 50 randomized HAVS picking trials were conducted in the indoor environment. Compared with fixed-gain PD control, the average picking time decreased by 1.3 s, and the success rate increased by 4 percentage points.

HAVS was further compared with PBVS-only and IBVS-only baselines to evaluate performance. A total of 100 target positions with varied lighting and occlusion conditions were randomly selected in the indoor environment. At each position, picking tests were conducted separately using three control methods. All three methods were tested under identical experimental conditions. The fuzzy rules and fixed-gain settings followed the specifications described above and are employed in the baseline methods. For a representative target position, time-varying position-error curves were recorded for each method, as shown in Figure 9. From target detection to reaching the desired pose, HAVS converged in about 5.8 s. This was 0.7 s and 1.7 s faster than PBVS and IBVS, respectively.

Figure 10 plots the end-effector trajectories of the three methods when picking the same target position. The target apple position in the base frame is (488.0, 170.4, 656.0). The end-effector starts at (249.5, 0.0, 633.7). In three-dimensional space, the HAVS trajectory is smoother, with no noticeable jitter or redundant backtracking. The IBVS trajectory shows local oscillations. The PBVS trajectory contains abrupt direction changes. Trajectory-length analysis shows that HAVS produces the shortest path among the three methods. The HAVS path length is 298.9 mm. This is 10.0 mm shorter than PBVS and 12.2 mm shorter than IBVS. Overall, HAVS combines PBVS and IBVS with automatic switching. This yields smoother end-effector trajectories and shorter paths, improving motion efficiency.

Results from the 100 trials are summarized in Figure 11. The lower success rate of IBVS is primarily attributed to insufficient 3D positional constraints during the final alignment phase. Furthermore, IBVS frequently failed due to Jacobian singularities at close range, which resulted in unavoidable trajectory jitter. PBVS primarily failed due to FOV loss. As the controller imposes no constraints in the image plane, extensive manipulator movements can cause the target to exit the FOV, which ultimately leads to control instability and failure.

In the indoor simulated picking task, HAVS achieved a 96% success rate. This is 4 and 10 percentage points higher than PBVS and IBVS, respectively. The average picking time of HAVS was 12.5 s. It was 0.3 s and 1.1 s shorter than PBVS and IBVS, respectively. The picking-time distribution shows that HAVS has a standard deviation of 1.1 s, which is lower than 1.4 s for IBVS and 1.2 s for PBVS. This suggests that HAVS suppresses visual-deviation-induced corrections more quickly across scenarios, which reduces ineffective manipulator motion time. As a result, picking-time fluctuations decrease and system robustness improves.

4.2. Field Picking Experiments

The field experimental setup is shown in Figure 12. Apple trees were planted in rows. The access path was about 2.5 m wide, and the trees were about 3 m tall. Most apples were within the manipulator workspace and were treated as picking targets. Four experimental groups were conducted on the same day at different times. Each group included 30 picking attempts, for a total of 120 trials. Four groups of experimental apple trees were randomly selected. The first group was tested from 9:00 to 10:00 under intense light; the second group from 10:00 to 11:00 under weak light with partial cloud cover; the third group from 12:00 to 13:00 under direct sunlight with the strongest light intensity; and the fourth group from 15:00 to 16:00 under the weakest light intensity, with thick clouds completely blocking sunlight. Strong sunlight slightly reduced detection accuracy and decreased the picking success rate, but both remained within an acceptable range. The field picking results are summarized in Table 3. In the orchard field trials, 105 of 120 attempts were successful, corresponding to a success rate of 87.5%. The average picking time was 13.2 s. Five apples were damaged, corresponding to a damage rate of 4.2%. Video S2 in the Supplementary Materials shows the outdoor orchard picking process of the robot.

4.3. Discussion of Experimental Results

Among the 120 field picking trials, 15 failures were observed. As shown in Figure 13, the failures fall into two categories. In the first failure mode, the gripper inadvertently grasped nearby branches while closing. This prevented a secure grasp, and the fruit slipped from the gripper during manipulator retraction. Branch detection could be added to the vision pipeline to guide the approach and avoid branches. In the second failure mode, foliage occluded the camera field of view during motion, causing target detection to stop. Active obstacle avoidance could mitigate this issue. The manipulator trajectory can be adjusted using obstacle point clouds to avoid dense foliage.

Table 4 compares the proposed system with representative picking robots reported in the literature. Unreported metrics are denoted by “/”. Table 4 presents a performance comparison between the proposed system and other visual servoing systems. It is important to note that direct quantitative comparison is limited by different experimental conditions, including hardware setups, fruit categories and test environments. Nevertheless, these results demonstrate that the proposed method achieves comparable performance on standard computing hardware, which validates the feasibility of its deployment on mobile robotic platforms.

Existing fruit harvesting robots demonstrate substantial differences in system architecture and visual sensor selection. In terms of overall design, some studies prioritize end-effector development: for example, Wang et al. [46] proposed a soft gripper and Shi et al. [47] developed a composite pneumatic manipulator. In contrast, Chen et al. [48], Park et al. [49], and Choi et al. [50] implemented more sophisticated visual schemes. Specifically, Chen et al. [48] employed binocular stereo vision cameras supporting both global and local localization, rendering them suitable for continuous operation in large-scale orchards.

For apple-picking tasks, the average picking time was 13.20 s. This was shorter than the 25.00 s reported by Xu et al. [11] and the 14.69 s reported by Wang et al. [46]. The picking success rate was 87.50%. It was higher than the 70.77% reported by Wang et al. [46] and the 84.70% reported by Shi et al. [47] but lower than the 95.00% reported by Xu et al. [11]. Xu et al. [11] evaluated their system on indoor potted apple trees under relatively controlled conditions. Their results were therefore not directly comparable with orchard field trials. The fruit damage rate was 4.2%. This was like the 4.55% reported by Wang et al. [46] but higher than the 0.88% reported by Shi et al. [47]. Shi et al. [47] used a flexible end-effector that integrates suction adhesion with finger gripping, which can reduce fruit damage.

Table 4 also includes results from Chen et al. [48] for dragon fruit, Park et al. [49] for cucumber, and Choi et al. [50] for citrus. Because the target fruits and experimental conditions differ, cross-fruit comparisons are provided for reference only. For reference, the average picking time in this study was 13.20 s. This was shorter than 15.19 s (Chen et al. [48]) and 21.33 s (Choi et al. [50]), and much shorter than 56.00 s (Park et al. [49]). The picking success rate was 87.50%. This was higher than 76.90% (Chen et al. [48]), 83.33% (Choi et al. [50]), and 56.60% (Park et al. [49]).

Considering average picking time, success rate, and damage rate together, the proposed system shows competitive efficiency for apple-picking tasks and suggests potential for further engineering deployment.

Although field trials have validated the system for picking, further optimization is possible. First, outdoor robustness can be improved by refining the YOLOv5 detection method. Multimodal fusion can also be explored to improve accuracy under strong illumination and occlusion. In addition, branch detection and active obstacle-avoidance planning will be introduced to improve the robustness of visual servo control in complex orchard environments. Second, the current hardware limits the workspace and chassis maneuverability. Potential upgrades include improving mobile-based travers ability and adding an elevatable manipulator base to expand the workspace. In addition, adjustable compliant end-effectors for different fruit sizes and shapes could improve adaptability and reduce damage risk.

5. Conclusions

This study proposes a progressive Hybrid Automatic Switching Visual Servoing (HAVS) method and develops an apple-picking robotic system. Through the integration of PBVS and IBVS in the coarse alignment phase, together with a switch to PBVS-only control for fine alignment, the critical problems of target loss in PBVS and control singularities in IBVS are effectively overcome. Additionally, the adoption of a PD controller with fuzzy gain scheduling improves the system’s control efficiency and overall operational stability. Compared with fixed-gain PD control, the average picking time decreased by 1.3 s, and the success rate increased by 4 percentage points.

In indoor simulations, HAVS achieved a 96% success rate. This was 4 and 10 percentage points higher than PBVS and IBVS, respectively. The average picking time was 12.5 s, which was 0.3 s and 1.1 s shorter than PBVS and IBVS, respectively. In orchard field trials, four sessions were conducted with 120 picking attempts. The overall success rate was 87.5%, the average picking time was 13.2 s, and the apple damage rate was 4.2%. These results provide a reference implementation for visual servo control in agricultural picking robots.

Future work will focus on enhancing apple recognition performance under strong light and occlusion conditions through multimodal fusion as well as integrating active obstacle avoidance planning to further improve the robustness of the robotic system in complex unstructured orchard environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture16050620/s1, Video S1: indoor picking process of the robot using the HAVS method and Video S2: outdoor orchard picking process of the robot using the HAVS method.

Author Contributions

Conceptualization, J.K. and Y.W.; methodology, Y.W. and S.Y.; software, R.D. and S.Y.; validation, Y.W. and X.Z.; formal analysis, X.Z. and T.Z.; investigation, R.D.; resources, J.K. and T.Z.; data curation, Y.W. and J.L.; writing—original draft preparation, J.K. and Y.W.; writing—review and editing, B.K. and J.L.; visualization, Y.W.; supervision, B.K. and J.L.; project administration, J.K.; funding acquisition, T.Z. and B.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in the MinneApple dataset at https://rsn.umn.edu/projects/orchard-monitoring/minneapple (accessed on 24 March 2025). Additionally, other original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, X.; Hou, M.; Liu, T.; Ren, J.; Li, H.; Yang, H.; Hu, Z.; Gao, Z. Continuous cold plasma reactor for the processing of NFC apple juice: Effect on quality control and preservation stability. Innov. Food Sci. Emerg. Technol. 2025, 100, 103905. [Google Scholar] [CrossRef]
Boyer, J.; Liu, R.H. Apple phytochemicals and their health benefits. Nutr. J. 2004, 3, 5. [Google Scholar] [CrossRef] [PubMed]
Ji, W.; He, G.; Xu, B.; Zhang, H.; Yu, X. A New Picking Pattern of a Flexible Three-Fingered End-Effector for Apple Harvesting Robot. Agriculture 2024, 14, 102. [Google Scholar] [CrossRef]
Zou, B.; Liu, F.; Zhang, Z.; Hong, T.; Wu, W.; Lai, S. Mechanization of mountain orchards: Development bottleneck and foreign experiences. J. Agric. Mech. Res. 2019, 41, 254–260. [Google Scholar]
Zhang, Z.; Igathinathane, C.; Li, J.; Cen, H.; Lu, Y.; Flores, P. Technology progress in mechanical harvest of fresh market apples. Comput. Electron. Agric. 2020, 175, 105606. [Google Scholar] [CrossRef]
Bac, C.W.; Van Henten, E.J.; Hemming, J.; Edan, Y. Harvesting robots for high-value crops: State-of-the-art review and challenges ahead. J. Field Robot. 2014, 31, 888–911. [Google Scholar] [CrossRef]
Zhang, Q.; Karkee, M. Fully automated tree fruit harvesting. Resour. Mag. 2016, 23, 16–17. [Google Scholar]
He, L.; Fu, H.; Karkee, M.; Zhang, Q. Effect of fruit location on apple detachment with mechanical shaking. Biosyst. Eng. 2017, 157, 63–71. [Google Scholar] [CrossRef]
Zhu, F.; Zhang, W.; Wang, S.; Jiang, B.; Feng, X.; Zhao, Q. Apple-harvesting robot based on the YOLOv5-RACF model. Biomimetics 2024, 9, 495. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, Z.; Wang, X.; Liu, H.; Wang, Y.; Wang, W. Multi-purpose apple harvest platform economic evaluation modeling and software development. Int. J. Agric. Biol. Eng. 2019, 12, 74–83. [Google Scholar]
Xu, Y.; Qiao, X.; Ding, L.; Li, X.; Chen, Z.; Yue, X. Enhanced YOLOv5 with ECA Module for Vision-Based Apple Harvesting Using a 6-DOF Robotic Arm in Occluded Environments. Agriculture 2025, 15, 1850. [Google Scholar] [CrossRef]
Guo, Z.; Fu, H.; Wu, J.; Han, W.; Huang, W.; Zheng, W.; Li, T. Dynamic Task Planning for Multi-Arm Apple-Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm. Agriculture 2025, 15, 588. [Google Scholar]
Liu, X.; Song, Z.; Tan, Y.; Yang, S.; Ma, Y. Conflict avoidance task planning strategy for dual-arm cooperative strawberry harvesting robots. Comput. Electron. Agric. 2026, 241, 111067. [Google Scholar] [CrossRef]
Xing, Z.; Zhang, Z.; Wang, Y.; Xu, P.; Guo, Q.; Zeng, C.; Shi, R. SDC-DeepLabv3+: Lightweight and precise localization algorithm for safflower-harvesting robots. Plant Phenomics 2024, 6, 0194. [Google Scholar]
Yang, Y.; Zhang, M.; Ma, W.; Hu, Y. Intelligent Batch Harvesting of Trellis-Grown Fruits with Application to Kiwifruit Picking Robots. Agronomy 2025, 15, 2499. [Google Scholar] [CrossRef]
Wang, W.; Li, C.; Xi, Y.; Gu, J.; Zhang, X.; Zhou, M.; Peng, Y. Research Progress and Development Trend of Visual Detection Methods for Selective Fruit Harvesting Robots. Agronomy 2025, 15, 1926. [Google Scholar] [CrossRef]
Shi, X.; Wang, S.; Zhang, B.; Ding, X.; Qi, P.; Qu, H.; Li, N.; Wu, J.; Yang, H. Advances in Object Detection and Localization Techniques for Fruit Harvesting Robots. Agronomy 2025, 15, 145. [Google Scholar] [CrossRef]
Cong, V.D.; Hanh, L.D. A review and performance comparison of visual servoing controls. Int. J. Intell. Robot. Appl. 2023, 7, 65–90. [Google Scholar] [CrossRef]
Hutchinson, S.; Hager, G.D.; Corke, P.I. A tutorial on visual servo control. IEEE Trans. Robot. Autom. 2002, 12, 651–670. [Google Scholar]
Salvato, E.; Blanchini, F.; Fenu, G.; Giordano, G.; Pellegrino, F.A. Position-based visual servo control without hand-eye calibration. Robot. Auton. Syst. 2025, 193, 105045. [Google Scholar]
Yu, P.; Tan, N.; Mao, M. Position-based visual servo control of dual robotic arms with unknown kinematic models: A cerebellum-inspired approach. IEEE/ASME Trans. Mechatron. 2023, 28, 2328–2339. [Google Scholar] [CrossRef]
Mehta, S.; Burks, T. Vision-based control of robotic manipulator for citrus harvesting. Comput. Electron. Agric. 2014, 102, 146–158. [Google Scholar] [CrossRef]
Chen, Y.; Cai, H.; Zeng, G.; Dai, Z.; Lin, Y.; Lu, H.; Wang, Y. Robust Image-Based Visual Servo Control of Unmanned Aerial Manipulator for Supply Delivery. IEEE Trans. Ind. Electron. 2025, 73, 2668–2678. [Google Scholar] [CrossRef]
Rotithor, G.; Salehi, I.; Tunstel, E.; Dani, A.P. Stitching dynamic movement primitives and image-based visual servo control. IEEE Trans. Syst. Man Cybern. Syst. 2022, 53, 2583–2593. [Google Scholar] [CrossRef]
Li, T.; Yu, J.; Qiu, Q.; Zhao, C. Hybrid uncalibrated visual servoing control of harvesting robots with RGB-D cameras. IEEE Trans. Ind. Electron. 2022, 70, 2729–2738. [Google Scholar] [CrossRef]
Wang, Z.; Xiang, X.; Xiong, X.; Yang, S. Position-based acoustic visual servo control for docking of autonomous underwater vehicle using deep reinforcement learning. Robot. Auton. Syst. 2025, 186, 104914. [Google Scholar] [CrossRef]
Zhong, X.; Zhou, Q.; Sun, Y.; Kang, S.; Hu, H. Deep reinforcement learning-based uncalibrated visual servoing control of manipulators with FOV constraints. Appl. Sci. 2025, 15, 4447. [Google Scholar] [CrossRef]
Hafez, A.A.; Cervera, E.; Jawahar, C.V. Hybrid visual servoing by boosting IBVS and PBVS. In Proceedings of the 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA 2008), Damascus, Syria, 7–11 April 2008; pp. 1–6. [Google Scholar]
Lei, M.; Zhang, X.; Yang, W.; Wan, J.; Dong, Z.; Zhang, C.; Zhang, G. High-Precision Drilling by Anchor-Drilling Robot Based on Hybrid Visual Servo Control in Coal Mine. Mathematics 2024, 12, 2059. [Google Scholar] [CrossRef]
Li, Y.; Lien, W.; Huang, Z.; Chen, C. Hybrid visual servo control of a robotic manipulator for cherry tomato harvesting. Actuators 2023, 12, 253. [Google Scholar] [CrossRef]
Soman, D.; George, V.; Victor, A.; Nathu, A.; Habib, M.A.; Parashar, D. Modelling and simulation of a robotic manipulator with six degrees of freedom attitude and orbit control for space applications. Syst. Sci. Control Eng. 2025, 13, 2562839. [Google Scholar] [CrossRef]
Hua, W.; Zhang, Z.; Zhang, W.; Liu, X.; Hu, C.; He, Y.; Mhamed, M.; Li, X.; Dong, H.; Saha, C.K. Key technologies in apple harvesting robot for standardized orchards: A comprehensive review of innovations, challenges, and future directions. Comput. Electron. Agric. 2025, 235, 110343. [Google Scholar] [CrossRef]
Pagire, V.; Chavali, M.; Kale, A. A comprehensive review of object detection with traditional and deep learning methods. Signal Process. 2025, 237, 110075. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.; Romero-González, J. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Bai, Y.; Yu, J.; Yang, S.; Ning, J. An improved YOLO algorithm for detecting flowers and fruits on strawberry seedlings. Biosyst. Eng. 2024, 237, 1–12. [Google Scholar]
Magdy, A.; Moustafa, M.S.; Ebied, H.M.; Tolba, M.F. Lightweight faster R-CNN for object detection in optical remote sensing images. Sci. Rep. 2025, 15, 16163. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Y.; Gao, T.; Fang, Y.; Chen, T. A novel SSD-based detection algorithm suitable for small object. IEICE Trans. Inf. Syst. 2023, 106, 625–634. [Google Scholar] [CrossRef]
Li, H.; Gu, Z.; He, D.; Wang, X.; Huang, J.; Mo, Y.; Li, P.; Huang, Z.; Wu, F. A lightweight improved YOLOv5s model and its deployment for detecting pitaya fruits in daytime and nighttime light-supplement environments. Comput. Electron. Agric. 2024, 220, 108914. [Google Scholar] [CrossRef]
Häni, N.; Roy, P.; Isler, V. Apple counting using convolutional neural networks. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 2559–2565. [Google Scholar]
Häni, N.; Roy, P.; Isler, V. MinneApple: A benchmark dataset for apple detection and segmentation. IEEE Robot. Autom. Lett. 2020, 5, 852–858. [Google Scholar] [CrossRef]
Häni, N.; Roy, P.; Isler, V. A comparative study of fruit detection and counting methods for yield mapping in apple orchards. J. Field Robot. 2020, 37, 263–282. [Google Scholar] [CrossRef]
Nguyen, A.; Taniguchi, T.; Eciolaza, L.; Campos, V.; Palhares, R.; Sugeno, M. Fuzzy control systems: Past, present and future. IEEE Comput. Intell. Mag. 2019, 14, 56–68. [Google Scholar] [CrossRef]
Saatchi, R. Fuzzy logic concepts, developments and implementation. Information 2024, 15, 656. [Google Scholar] [CrossRef]
Qian, R.; Wu, R.; Wu, H. Research on adaptive control of six-degree-of-freedom manipulator based on fuzzy PID. Syst. Sci. Control Eng. 2025, 13, 2498912. [Google Scholar]
Sun, J.; Zhang, Y.; Chang, Y.; Shen, T.; Ding, S. Fixed-time composite learning fuzzy control with disturbance rejection for uncertain engineering systems toward Industry 5.0. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 4077–4088. [Google Scholar]
Wang, X.; Kang, H.; Zhou, H.; Au, W.; Wang, M.Y.; Chen, C. Development and evaluation of a robust soft robotic gripper for apple harvesting. Comput. Electron. Agric. 2023, 204, 107552. [Google Scholar]
Shi, S.; Yang, F.; Liu, Z.; Xu, X.; Zhang, F.; Wang, Z. Design and experiment of composite pneumatic apple picking manipulator. Trans. Chin. Soc. Agric. Mach. 2024, 55, 93–105. [Google Scholar]
Chen, M.; Chen, Z.; Luo, L.; Tang, Y.; Cheng, J.; Wei, H.; Wang, J. Dynamic visual servo control methods for continuous operation of a fruit harvesting robot working throughout an orchard. Comput. Electron. Agric. 2024, 219, 108774. [Google Scholar] [CrossRef]
Park, Y.; Seol, J.; Pak, J.; Jo, Y.; Kim, C.; Son, H.I. Human-centered approach for an efficient cucumber harvesting robot system: Harvest ordering, visual servoing, and end-effector. Comput. Electron. Agric. 2023, 212, 108116. [Google Scholar] [CrossRef]
Choi, D.W.; Park, J.H.; Yoo, J.H.; Ko, K. AI-driven adaptive grasping and precise detaching robot for efficient citrus harvesting. Comput. Electron. Agric. 2025, 232, 110131. [Google Scholar] [CrossRef]

Figure 1. Hardware configuration of the apple-picking robotic system.

Figure 2. Operational workflow of the apple-picking robot system.

Figure 3. YOLOv5 object-detection results in orchard scenes: (a) original image; (b) detection output.

Figure 4. Coordinate relationships among the camera, manipulator, and target.

Figure 5. Control flow of the PD controller with fuzzy gain scheduling.

Figure 6. Membership functions for the input variables: (a) position error PE; (b) error change rate EC.

Figure 7. Indoor simulated picking process of the manipulator using the HAVS method.

Figure 8. Average picking time and success rate under different switching thresholds.

Figure 9. Representative time-varying position-error curves for different visual servoing methods: (a) HAVS; (b) PBVS; (c) IBVS; (d) comparison of all three methods.

Figure 10. Representative trajectories of PBVS, IBVS and HAVS selected from 100 trials.

Figure 11. Average picking time and success rate of the three strategies in the indoor task.

Figure 12. Orchard field environment: (a) orchard access road; (b) robot picking operation.

Figure 13. Examples of failed picks: (a) branches grasped near the apple; (b) camera field of view occluded by foliage.

Table 1. Fuzzy control rules for K_p₁ and K_d.

	NB	NM	NS	ZO	PS	PM	PB
PE	NB	NM	NS	ZO	PS	PM	PB
PB	B, S	B, ZO	B, ZO	B, ZO	B, ZO	B, ZO	B, ZO
PM	M, M	M, S	B, ZO	B, ZO	B, ZO	B, S	B, M
PS	S, B	S, M	M, S	M, S	B, S	B, S	B, M
ZO	ZO, B	ZO, B	S, M	S, M	M, M	B, M	B, M

Table 2. Quantization values of the output variables.

Fuzzy Linguistic Level	Quantized Value of K_p₁	Quantized Value of K_d	Physical Interpretation
B	0.65	0.08	Aggressive correction with strong damping
M	0.35	0.04	Standard correction with medium damping
S	0.12	0.01	Fine adjustment with weak damping
ZO	0.00	0.00	Stop command with no damping

Table 3. Field results of the apple-picking robotic system.

Experiment Group	Picking Attempts	Average Picking Time (s)	Successful Picks	Success Rate (%)	Damaged Apples	Damage Rate (%)
1	30	12.5	26	86.7	1	3.3
2	30	13.2	27	90.0	1	3.3
3	30	14.1	24	80.0	2	3.3
4	30	13.0	28	93.3	1	6.7
Total	120	13.2	105	87.5	5	4.2

Table 4. Comparison of picking performance with representative picking robots.

Picking Robots	Fruit Type	Average Picking Time (s)	Success Rate (%)	Damage Rate (%)
This work	Apple	13.20	87.50	4.2
Xu et al. [11]	Apple	25.00	95.00	/
Wang et al. [46]	Apple	14.69	70.77	4.55
Shi et al. [47]	Apple	/	84.70	0.88
Chen et al. [48]	Dragon fruit	15.19	76.90	/
Park et al. [49]	Cucumber	56.00	56.60	4.7
Choi et al. [50]	Citrus	21.33	83.33	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kan, J.; Wu, Y.; Dong, R.; Yao, S.; Zhao, X.; Zou, T.; Kang, B.; Li, J. A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots. Agriculture 2026, 16, 620. https://doi.org/10.3390/agriculture16050620

AMA Style

Kan J, Wu Y, Dong R, Yao S, Zhao X, Zou T, Kang B, Li J. A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots. Agriculture. 2026; 16(5):620. https://doi.org/10.3390/agriculture16050620

Chicago/Turabian Style

Kan, Jiangming, Yue Wu, Ruifang Dong, Shun Yao, Xixuan Zhao, Tianji Zou, Boqi Kang, and Junjie Li. 2026. "A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots" Agriculture 16, no. 5: 620. https://doi.org/10.3390/agriculture16050620

APA Style

Kan, J., Wu, Y., Dong, R., Yao, S., Zhao, X., Zou, T., Kang, B., & Li, J. (2026). A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots. Agriculture, 16(5), 620. https://doi.org/10.3390/agriculture16050620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Progressive Hybrid Automatic Switching Visual Servoing Method for Apple-Picking Robots

Abstract

1. Introduction

2. Design of the Apple-Picking Robot System

2.1. Hardware and Software Design of the Apple-Picking Robot

2.2. Operational Workflow of the Apple-Picking Robot

3. HAVS Control Method

3.1. Object Detection

3.2. Basic Modules and Control-Law Design

3.2.1. IBVS Control Module

3.2.2. PBVS Control Module

3.2.3. Adaptive PD Control Module with Fuzzy Gain Scheduling

3.3. HAVS Switching Method and Coordinated Control

4. Results and Discussion

4.1. Indoor Simulated Picking Experiments

4.1.1. Determination of Switching Thresholds

4.1.2. Comparative Experiments

4.2. Field Picking Experiments

4.3. Discussion of Experimental Results

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI