Active Laser-Camera Scanning for High-Precision Fruit Localization in Robotic Harvesting: System Design and Calibration

Robust and effective fruit detection and localization is essential for robotic harvesting systems. While extensive research efforts have been devoted to improving fruit detection, less emphasis has been placed on the fruit localization aspect, which is a crucial yet challenging task due to limited depth accuracy from existing sensor measurements in the natural orchard environment with variable lighting conditions and foliage/branch occlusions. In this paper, we present the system design and calibration of an Active LAser-Camera Scanner (ALACS), a novel perception module for robust and high-precision fruit localization. The hardware of ALACS mainly consists of a red line laser, an RGB camera, and a linear motion slide, which are seamlessly integrated into an active scanning scheme where a dynamic-targeting laser-triangulation principle is employed. A high-fidelity extrinsic model is developed to pair the laser illumination and the RGB camera, enabling precise depth computation when the target is captured by both sensors. A random sample consensus-based robust calibration scheme is then designed to calibrate the model parameters based on collected data. Comprehensive evaluations are conducted to validate the system model and calibration scheme. The results show that the proposed calibration method can detect and remove data outliers to achieve robust parameter computation, and the calibrated ALACS system is able to achieve high-precision localization with millimeter-level accuracy.


Introduction
With the growing global population, the agriculture industry has been pushing to adopt mechanization and automation for increasing, sustainable food production at lower economic and environmental costs.While such technologies have been deployed for field crops such as corn and wheat, the fruit sector (e.g., apple, citrus and pear) still heavily relies on seasonal, manual labor.In many advanced economies, the availability of labor for farming has been on steady decline, while the cost of labor has increased significantly.Moreover, tasks like manual harvesting involve extensive body motion repetitions and awkward postures (especially when picking fruits at high places or deep in the canopy, and repeatedly ascending and descending on ladders with heavy loads), which put workers at risk for ergonomic injuries and musculoskeletal pain (Fathallah, 2010).Considering the aforementioned issues, robotic harvesting is thus considered to be a promising solution for sustainable fruit production and has received increasing attention in recent years.
Research on robotic harvesting technology has been ongoing for several decades, and different robotic systems have been attempted for semi-automated or fully automated fruit harvesting (Zhao et al., 2011;Mehta and Burks, 2014;De Kleine and Karkee, 2015;Silwal et al., 2017;Xiong et al., 2018;Williams et al., 2019;Hohimer et al., 2019;Zhang et al., 2020Zhang et al., , 2021;;Bu et al., 2022;Zhang et al., 2022).A typical robotic harvesting system consists of a perception module, a manipulator, and an end-effector.Specifically, the perception module exploits onboard sensors (e.g., cameras and LiDARs) to detect and localize the fruit.Once the fruit position is determined by the perception system, the manipulator is controlled to reach the target fruit, and then a specialized end-effector (e.g., gripper or vacuum tube) is actuated to detach the fruit.Therefore, the development of a robotic harvesting system requires multi-disciplinary advancements to enable a variety of synergistic functionalities.Among the various tasks, fruit detection and localization is the first and foremost one to support robotic manipulation and fruit detachment.Specifically, the fruit detection function aims at segmenting fruits from the complex background, while the localization is to calculate the spatial positions of the detected fruits.Due to variable lighting conditions, color variations of fruits with different degrees of ripeness and varietal differences, and fruit occlusions by foliage and branches, developing sensing modules and perception algorithms capable of robust and effective fruit detection and localization in the real orchard environment poses significant technical challenges.
To date, extensive studies have been devoted to efficient and robust fruit detection, which is most commonly accomplished using color images captured by RGB cameras.In general, these approaches can be classified into two categories: feature-based and deep learning-based.The feature-based methods (Bulanon et al., 2002;Zhao et al., 2005;Wachs et al., 2010;Zhou et al., 2012;Nguyen et al., 2016;Lin et al., 2020) use differences among predefined features (e.g., color, texture, and geometric shape) to identify the fruit, and various conventional computer vision techniques (e.g., Hough transform-based circle detection method, optical flow method, and Ostu adaptive threshold segmentation) are used for feature extraction.Such methods perform well under certain simple harvesting scenarios but are susceptible to varying lighting conditions and heavy occlusions.This is because the extracted features are defined artificially and they are not universally adaptable and may lack generalization capabilities in distinguishing target fruits when the harvesting scene changes (Li et al., 2022).Different from feature-based methods, deep learning-based methods exploit convolutional neural networks to extract abstract features from color images, making them suitable for complex recognition problems.Deep learning-based object recognition algorithms have seen tremendous success in recent years, and a variety of network structures, i.e., region convolution neural network (RCNN) (Girshick et al., 2014), Faster RCNN (Ren et al., 2017), Mask RCNN (He et al., 2020;Chu et al., 2021), You Only Look Once (YOLO) (Redmon and Farhadi, 2018;Tian et al., 2019b;Kang and Chen, 2020), and Single Shot Detection (SSD) (Liu et al., 2016), have been studied and extended for fruit detection.Specifically, RCNN based approaches employ a two-stage network architecture, in which a region proposal network (RPN) is used to search the region of interest and a classification network is used to conduct bounding box regression.As opposed to two-stage networks, YOLO and SSD based one-stage networks merge the RPN and classification branch into a single convolution network architecture, which enjoys improved com-putation efficiency.
Once the fruits are recognized and a picking sequence is determined (see e.g., Zhang et al. (2022)), 3dimensional (3D) localization needs to be conducted to compute the spatial coordinates of a target fruit.Accurate fruit localization is crucial since erroneous localization will cause the manipulator to miss the target and subsequently degrade the harvesting performance of the robotic system.Various sensor configurations and techniques have been used for fruit localization (Gongal et al., 2015;Gené-Mola et al., 2019;Fu et al., 2020;Neupane et al., 2021;Kang et al., 2022).One example is (passive) stereo vision systems, which exploit two-camera layout and triangulation optical measurement principle to obtain depth information.For such systems, the relative geometric pose of the two cameras needs to be carefully designed and calibrated, and sophisticated algorithms are required to search common features in two dense RGB images for stereo matching.Therefore, the main disadvantages of stereo vision systems are that the generation of depth information is computationally expensive and the performance of stereo matching is inevitably affected by occluded pixels or varying lighting conditions that are common in the natural orchard environment.
Consumer RGB-D cameras are another type of depth measurement sensors that have recently been employed to localize fruits (Xiong et al., 2019;Tian et al., 2019a;Arad et al., 2020;Kang et al., 2020).Different from passive stereo vision systems that purely rely on natural light, the RGB-D sensors include a separate artificial illumination source to aid the depth computation.According to the methods on how the depth measurements are computed, RGB-D cameras can be divided into three categories: structured light (SL), time of flight (ToF), and active infrared stereo (AIRS) (Fu et al., 2020).An SLbased RGB-D sensor usually consists of a light source and a camera system.The light source projects a series of light patterns onto the workspace, and the depth information can then be extracted from the images based on the deformation of the light pattern.The first-generation Kinect (Microsoft Corp., Redmond, WA, USA) and the RealSense F200 and SR300 (Intel Corp., Santa Clara, CA, USA) are representative consumer sensors that operate with SL, and they have been utilized in different agricultural applications (Lehnert et al., 2017;Liu et al., 2018;Milella et al., 2019).The ToF-based RGB-D sensors use an infrared light emitter to emit light pulses onto the scene.The distance between the sensor and the object is calculated based on the known speed of light and the round trip time of the light signal.One important feature of the ToF systems is that their depth measurement precision does not deteriorate with distance, which makes them suitable for harvesting applications requiring a long perception range.Moreover, the AIRS-based RGB-D sensors are an extension of the conventional passive stereo vision system.They combine an infrared stereo camera pair with an active infrared light source to improve the depth measurement under low-texture environment.One of the most widely used AIRS sensors in fruit localization is the RealSense D400 family (Intel Corp., Santa Clara, CA, USA).Despite some successes, the sensors mentioned above may have limited and unstable performance in the natural orchard environment.For example, the SL-based sensors are sensitive to the natural light condition and to the interference of multiple patterned light sources.The ToF systems are vulnerable to scattered light and multi-path interference, and usually provide lower resolution of depth images compared to other RGB-D cameras.Similar to passive stereo vision systems, the AIRS-based sensors encounter stereo matching issues, which can lead to flying pixels or oversmoothing around the contour edges (Fu et al., 2020).In addition, the performance of these sensors could deteriorate significantly when target fruits are occluded by leaves and branches, due to low or limited density of the illuminating light patterns or point cloud.
It is thus clear that both the stereo vision systems and the RGB-D sensors have inherent depth measurement limitations in providing precise fruit localization information that is necessary for effective robotic harvesting systems.Towards this end, we devise a novel perception module, called Active LAser-Camera Scanner (ALACS), to improve fruit localization accuracy and robustness for ready deployment in apple harvesting robots.In this paper, we present the system design and calibration scheme of ALACS, and the main contributions of this paper are highlighted as follows.
1.A hardware system consisting of a red line laser, an RGB camera, and a linear motion slide, coupled with an active scanning scheme, is developed for fruit localization based on the laser-triangulation principle.2. A high-fidelity extrinsic model is developed to capture 3D measurements by matching the laser illumination source with the RGB pixels.A robust calibration scheme is then developed to calibrate the model parameters by leveraging random sample consensus (RANSAC) techniques to detect and remove data outliers.3. The effectiveness of the developed model and calibration scheme is evaluated through comprehensive experiments.
This is the first effort that, to the best of our knowl-edge, combines a line laser with a camera to accomplish millimeter-level localization performance.While ALACS is primarily developed and tested for the apple harvesting application, it can be readily extended and adopted for other tree fruits.
The rest of the paper is organized as follows.Section 2 provides an overview of our newly-developed robotic apple harvesting system.Section 3 presents the system design of the ALACS.The extrinsic model for 3D measurement characterization and the corresponding robust calibration scheme are introduced in Section 4. Simulation and experimental results are presented in Section 5. Finally, conclusions are drawn in Section 6.

Overview of the Robotic Apple Harvesting System
In this section, we first briefly introduce our robotic apple harvesting platform, into which the ALACS is integrated.As shown in Figure 1, the robotic platform consists of four main components: a perception module, a 4 degree-of-freedom manipulator, a soft vacuum-based end-effector, and a dropping module.The robotic system is mounted on a trailer base to facilitate movement in the orchard environment.An industrial computer is utilized to coordinate the perception module, the manipulator, and all communication devices.The entire software is fully integrated using the robot operating system (ROS), where different software components are primarily communicated via custom messages.
The following introduces the steps that our system takes to harvest an apple.At the beginning of each harvesting cycle, the perception module is activated to detect and localize the fruits within the manipulator's workspace.Given the 3D apple location, the planning algorithm is used to generate a reference trajectory, and the control module then actuates the manipulator to follow this reference trajectory to approach the fruit.After successfully attaching the fruit to the end-effector, a rotation mechanism is triggered to rotate the end-effector by a certain angle, and then the manipulator is driven to pull and detach the apple.Finally, the manipulator retracts to a dropping spot and releases the fruit.According to the aforementioned picking procedure, it can be seen that the fruit detection and localization is a key task in automated apple harvesting.Our previous system prototypes (Zhang et al., 2021(Zhang et al., , 2022) ) utilized RGB-D cameras to facilitate fruit detection and localization.However, laboratory and field tests found that the commercial RGB-D cameras could not provide accurate depth information of the target fruits under leaf/branch occlusions and/or challenging lighting conditions.Inaccurate apple localization has been identified as one of the primary causes for harvesting failure.To enhance the apple localization accuracy and robustness, we designed a new perception unit (called ALACS), which seamlessly integrates the line laser with RGB image for active sensing.

Design of the Active Laser-Camera Scanner
As shown in Figure 2, the perception module of the robotic apple harvesting system includes an Intel Re-alSense D435i RGB-D camera (Intel Corp., Santa Clara, CA, USA) and a custom ALACS unit.The RGB-D camera is mounted on a horizontal frame that is above the manipulator to provide a global view of the scene.The ALACS unit is comprised of a red line laser (Laserglow Technologies, North York, ON, Canada), a FLIR RGB camera (Teledyne FLIR, Wilsonville, OR, USA), and a linear motion slide.The line laser is mounted on top of the linear motion slide that enables the laser to move left and right horizontally with a full stroke of 20 cm.Meanwhile, the FLIR RGB camera is installed at the rear end of the linear motion slide with a relative angle to the laser.The hardware configuration of ALACS is designed to facilitate depth measurements using the principle of laser triangulation.The laser triangulation-based technique captures depth measurements by pairing a laser illumination source with a camera, which has been widely used in industry applications for precision 3D object profiling.It should be noted that the ALACS unit is different from the conventional laser triangulation sensors.For conventional laser triangulation sensors, the relative position between the laser and the camera is fixed (i.e., both of them are either stationary or moving simultaneously).
For ALACS, the camera is fixed while the laser position can be adjusted with the linear motion slide.The RGB-D camera and the ALACS unit are fused synergistically to achieve apple detection and localization.Specifically, the fusion scheme includes two steps.In the first step, the images captured by the RGB-D camera are fed into a deep learning approach for fruit detection (see Chu et al. (2023)), and the target apple location is then roughly calculated with the depth measurements provided by the RGB-D camera.In the second step, by using the rough apple location, the ALACS unit is triggered to actively scan the target apple, and an ameliorative apple position is obtained.As shown in Figure 3, the basic working principle of ALACS is to project the laser line onto the target fruit and then use the image information and triangulation technique to localize the fruit.The perception strategy of the ALACS unit is designed as follows: 1. Initialization.The linear motion slide is actuated to regulate the laser towards an initial position, ensuring that the red laser line is projected on the left To accomplish the aforementioned fruit localization scheme, laser line extraction and position candidate computation are two key tasks.The laser line extraction is achieved by leveraging computer vision techniques, and a detailed description on the extraction algorithm can be found in our recent work (Zhang et al., in press, 2023).To facilitate the computation of fruit 3D positions, a high-fidelity model is derived based on the principle of laser triangulation, and a robust calibration scheme is designed.The following will detail the development of the high-fidelity model and calibration scheme.

Modeling of the ALACS Unit
The basic idea of laser triangulation-based technique is to capture depth measurements by pairing a laser illumination source with a camera.Both the laser beam and the camera are aimed at the target object, and based on the extrinsic parameters between the laser source and the camera sensor, the depth information can be collected with trigonometry.As shown in Figure 4, F l and F c are denoted as the laser frame and camera frame, respectively.α ∈ R is the rotating angle along the y l -axis between F l and F c .L ∈ R is the horizontal distance (i.e., the translation along the x l -axis) between F l and F c .β ∈ R is the angle between the laser plane and the (y l , z l ) plane of F l .α, L, and β are considered as the extrinsic parameters between the laser illumination source and the camera, which are essential for deriving the high-fidelity model of the ALACS unit.In the following, we first introduce the pin-hole model of the camera and then present the model of ALACS.
Let p i be a point located at the intersection of the laser line and the object.The 3D position of p i under the camera frame F c is denoted by as the pixel coordinate of p i on the image plane.Then, the following pin-hole camera model can be used to describe the projection from pc,i to m c,i : where ϖ(•) is the camera distortion model and K ∈ R 3×3 is the camera intrinsic matrix.Both ϖ(•) and K can be obtained via standard calibration approaches, and thus once m c,i is detected from the image, the normalized coordinate pc,i can be calculated by pc,i = K −1 ϖ −1 (m c,i ). (3) We now derive the high-fidelity model for the ALACS unit.Denote p l,i = x l,i , y l,i , z l,i ⊤ ∈ R 3 as the 3D position of p i under the laser frame F l .According to the relative pose between F l and F c (see Figure 4), it can be concluded that In addition, as there is an angle, i.e., β, between the laser plane and the (y l , z l ) plane of F l , we have x l,i = −y l,i tan(β). (5) Based on (4) and ( 5), the following expression can be derived: It can be concluded from (1) that x c,i = z c,i ūc,i and y c,i = z c,i vc,i .After submitting these two relations into (6), we can derive that Using ( 7) and the facts that x c,i = z c,i ūc,i and y c,i = z c,i vc,i , we have , .
(8) ( 7) and ( 8) are the high-fidelity model that reveals the 3D measurement mechanism of the ALACS unit.Specifically, given the pixel coordinate m c,i , pc,i , i.e., ūc,i and vc,i , can be computed via (3).Then, model ( 7) and ( 8) can be exploited to calculate the 3D position p c,i = x c,i , y c,i , z c,i ⊤ provided that the extrinsic parameters α, L, and β are well calibrated.

Robust Calibration Scheme
The extrinsic parameters α, L, and β play a crucial role in facilitating the 3D measurement of the ALACS unit.In this subsection, we focus on introducing how we perform robust calibration on the extrinsic parameters α, L, and β.Note that α and β are constants, while L is variable as the linear motion slide can move to different positions.During the calibration procedure, the linear motion slide is fixed at an initial position, and the corresponding horizontal distance between laser and camera is denoted by L 0 ∈ R. α, β, and L 0 (i.e., the initial value of L) are obtained via offline calibration.Then, when the linear motion slide is moving, L can be updated online based on its initial value L 0 and the movement distance of the linear motion slide.
The calibration procedure includes two steps.In the first step, multiple sets of data from recorded images.The second step then formulates an optimization problem by using the collected data and the model ( 7) to compute the extrinsic parameters.The following details these two steps in sequence.
The hardware setup for image and data collection is shown in Figure 5, where a planar checkerboard is placed in front of the ALACS unit so that the laser line will be projected on it.We use the planar checkerboard as the calibration pattern to facilitate the data collection.Specifically, given an image that covers the whole checkerboard, the pixel coordinates of laser points projected on the checkerboard are extracted based on the color feature.Once pixel coordinate m c,i is obtained, the corresponding normalized coordinate pc,i , i.e., ūc,i and vc,i , is calculated with (3).Furthermore, we leverage the following scheme to calculate z c,i (see Figure 6): Based on the detected checkerboard corners and the prior knowledge about the checkerboard square size, the relative pose information between the planar checkerboard and the camera is reconstructed (Hartley and Zisserman, 2003).The pose information is described by the rotation matrix R b ∈ SO 3 and the translation vector t b ∈ R 3 .3. Computation of z c,i .Based on the relative pose information R b , t b and the normalized coordinate pc,i , z c,i is calculated with projection geometry (Hartley and Zisserman, 2003).
To obtain multiple data samples s i = ūc,i , vc,i , z c,i the planar checkerboard is moved to different positions, and an image is recorded at each position.For each image, several laser points are selected and the corresponding data samples s i = ūc,i , vc,i , z c,i ⊤ are computed by using the aforementioned strategy.A total of n data samples will be collected and then used for the calibration of extrinsic parameters.
In the second step, the extrinsic parameters are to be identified based on the model ( 7) and the collected data samples s i = ūc,i , vc,i , z c,i In the ideal case, each data sample s i should satisfy the relation ( 7).According to this observation, the extrinsic parameters Algorithm 1 RANSAC-based robust calibration Hypothesis generation Randomly select 4 data samples from S to construct the subset Estimate parameters αk , L0,k , βk based on S k and ( 9) 2. Verification Initialize the inlier set where α, L0 , and β ∈ R are estimated values of α, L 0 , and β, respectively.Note that the minimization problem ( 9) directly applies all data samples to compute extrinsic parameters, which is not robust in the presence of data outliers.In general, the data samples are corrupted with noises and may contain outliers that do not satisfy the relation ( 7).These outliers can severely influence the calibration accuracy and thus need to be removed.Towards that end, we adopt the random sample consensus (RANSAC) methodology (Fischler and Bolles, 1981;Raguram et al., 2013) to extract credible data from S = {s 1 , s 2 , • • • , s n }.The RANSAC-based robust calibration scheme is detailed in Algorithm 1. Specifically, the calibration scheme is divided into three steps.First, subsets of S are randomly selected to calculate different possible solutions to problems (9).Each one of these possible solutions is called a hypothesis in the RANSAC algorithm.Second, hypotheses are scored using the data points in S, and the hypothesis that obtains the best score is returned as the solution.
Finally, the data points that voted for the solution are categorized as a set of inliers and will be used to calculate the final solution.
The developed calibration scheme leverages RANSAC techniques to iteratively estimate the model parameters and select the solution with the largest number of inliers.Therefore, it is able to robustly identify the model parameters when some data samples are corrupted or noisy.

Calibration Methods and Results
As shown in Figure 5, the experimental setup mainly consists of a specially designed ALACS unit and a planar checkerboard.To collect data samples for calibration, the planar checkerboard is placed in sequence at 10 different positions between 0.6 and 1.2 m from the ALACS unit, and at each position the FLIR camera is triggered to capture an image.For each image, 3 laser points are selected and the corresponding data samples s i = ūc,i , vc,i , z c,i ⊤ are computed by using the strategy introduced in Section 4.2.A total of n = 30 data samples are collected and then used for the calibration of extrinsic parameters.
To better evaluate the effectiveness of the developed high-fidelity model and robust calibration scheme, four different methods are implemented and tested on the same data samples.These four methods are introduced, as follows: • Method 1: This method utilizes the low-fidelity model to conduct the calibration.Specifically, the low-fidelity model only considers two extrinsic parameters α and L and assumes that β = 0.Under this case, the depth measurement mechanism of the ALACS unit degenerates into The model ( 10) and all collected data samples are used to estimate the extrinsic parameters α and L.
• Method 2: Both the low-fidelity model ( 10) and RANSAC techniques are used for calibration.Compared with Method 1, this method leverages RANSAC to remove outlier data.
• Method 3: This method computes the extrinsic parameters α, L 0 , and β by solving the optimization problem (9), which is designed based on the highfidelity model ( 7) and all data samples.
• Method 4: This is our developed method which combines the high-fidelity model with RANSAC techniques for calibration.The method is detailed in Algorithm 1.
The mean error of z c,i − ẑc,i is computed to evaluate the performance of these four methods.The calibration results are summarized in 2 use model ( 10) for calibration, while Methods 3 and 4 rely on model ( 7).From Table 1, it can be seen that Methods 3 and 4 achieve better calibration performance than Methods 1 and 2, indicating that the high-fidelity model ( 7) can well pair the laser with the RGB camera for depth measurements.Moreover, by comparing Method 3 with Method 4, it can be concluded that the RANSAC technique is robust for removal of outlier data and the developed calibration method is effective in determining the extrinsic parameters of the ALACS unit.

Localization Accuracy
As mentioned in Section 4.2, the parameters α and β are constants, while L is variable since the laser position can be adjusted via the linear motion slide.The linear motion slide is fixed at an initial position (i.e., L is fixed to L 0 ) during the calibration procedure.We change the value of L by moving the laser to different positions and collect data samples to fully evaluate the localization accuracy of the ALACS unit.More precisely, the laser is moved from its initial position towards the camera side by d cm, where d is selected as the following values in turn: d = 0, 5, 10, 15, 20.
Given L 0 and d, L can be computed by L = L 0 − d.For each laser position (i.e., for each L value), 10 images are collected with the planar checkerboard being placed at different positions between 0.6 and 1.2 m away from the ALACS unit.3 laser points are randomly chosen from each image, and then at each laser position, a total of 30 data samples are utilized to evaluate the localization accuracy of the ALACS unit.The 3D measurements of the collected data, i.e., p c, j = x c, j , y c, j , z c, j ⊤ ( j = 1, 2, • • • , 30), are obtained with the aid of the checkerboard setup.Meanwhile, the extrinsic parameters calculated with the developed robust calibration scheme (see Table 1) are used to determine the estimated 3D measurements pc, j = xc, j , ŷc, j , ẑc, j ⊤ .The localization results are shown in Figure 7. Specifically, Figure 7a shows the localization error distribution of ALACS with laser being placed at 5 different positions, and Figure 7b depicts the corresponding statistical metrics.It can be found from the results that the ALACS unit achieves precise localization in the x (horizontal), y (vertical), and z (depth) directions.In most instances, the localization errors along x, y, and z directions are within 0.4 mm, 0.8 mm, and 3 mm, respectively.Even under the worst-case scenarios, the largest localization errors along these three directions are less than 0.6 mm, 1.2 mm, and 4 mm, respectively, when the distance between the planar checkerboard and ALACS is within 0.6∼1.2 m.Note that our robotic harvesting system uses a vacuum-based endeffector to grasp and detach fruits, and the end-effector is able to attract fruits within a distance of about 1.5 cm.Therefore, according to the evaluation results, it can be concluded that the ALACS unit can meet the requirements for fruit localization and can be integrated with other hardware modules for automated apple harvesting.
The RealSense D435i RGB-D camera was used in our previous apple harvesting robotic prototypes to localize the fruit (Zhang et al., 2021(Zhang et al., , 2022)).According to the manufacturer's datasheet (Int, 2023), this camera offers a measurement accuracy of less than 2% of the depth range.This suggests that the maximum localization error along the depth direction is estimated to be less than 24 mm within the distance range of 0.6 to 1.2 m between the target and the camera.On the other hand, the ALACS unit demonstrates a maximum depth measurement error of 4 mm at distance ranging from 0.6 to 1.2 m.These results indicate that the ALACS unit has promising potential for achieving precise and reliable fruit localization.

Conclusion
This paper has reported the system design and calibration scheme of a new perception module, called Active LAser-Camera Scanner (ALACS), for fruit localization.A red line laser, an RGB camera, and a linear motion slide were fully integrated as the main components of the ALACS unit.A high-fidelity model was established to reveal the localization mechanism of the ALACS unit.Then, a robust scheme was proposed to calibrate the model parameters in the presence of data outliers.Experimental results demonstrated that the proposed calibration scheme can achieve accurate and robust parameter computation, and the ALACS unit can be exploited for localization with the maximum errors being less than 0.6 mm, 1.2 mm, and 4 mm in the horizontal, vertical, and depth directions, respectively, when the distance between the target and ALACS is within 0.6∼1.2 m.Future work will include further improvements on the efficiency of the scanner such that it can provide a faster measurement to support multiple arms planned in our next version of the harvesting robot.In addition, we will design comprehensive experiments to compare the measurement accuracy of the ALACS unit and consumer depth cameras.

Figure 1 :
Figure 1: The developed robotic apple harvesting system.(a) Image of the whole system operating in the orchard environment.(b) Main components of the robotic system.

Figure 2 :
Figure 2: CAD model of the perception module.

Figure 3 :
Figure 3: Fundamental working principle of the ALACS unit.

Figure 4 :
Figure 4: Coordinate frames and extrinsic parameters of the ALACS unit.

1.
Corner Detection.The checkerboard corners are detected from the image by using the algorithm developed in Geiger et al. (2012).2. Pose Reconstruction.

Figure 6 :
Figure 6: Scheme to compute z c,i .(a) Corner detection.(b) Pose reconstruction.(c) Computation of z c,i .

Figure 7 :
Figure 7: Localization accuracy of ALACS when the laser is adjusted to different positions (i.e., d = 0, 5, 10, 15, 20 cm).(a) Localization error distribution at 5 different laser positions.(b) Statistics summary of the localization error distribution.On each box, the central red mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points.

Table 1 :
Calibration results by using four methods.