Real-Time Motion Tracking for Indoor Moving Sphere Objects with a LiDAR Sensor

Object tracking is a crucial research subfield in computer vision and it has wide applications in navigation, robotics and military applications and so on. In this paper, the real-time visualization of 3D point clouds data based on the VLP-16 3D Light Detection and Ranging (LiDAR) sensor is achieved, and on the basis of preprocessing, fast ground segmentation, Euclidean clustering segmentation for outliers, View Feature Histogram (VFH) feature extraction, establishing object models and searching matching a moving spherical target, the Kalman filter and adaptive particle filter are used to estimate in real-time the position of a moving spherical target. The experimental results show that the Kalman filter has the advantages of high efficiency while adaptive particle filter has the advantages of high robustness and high precision when tested and validated on three kinds of scenes under the condition of target partial occlusion and interference, different moving speed and different trajectories. The research can be applied in the natural environment of fruit identification and tracking, robot navigation and control and other fields.


Application of LiDAR
Light Detection and Ranging (LiDAR) technology provides realistic 3-dimensional (3D) image information and has been widely utilized in various fields [1]. LiDAR sensors are commonly used in perception for autonomous vehicles because of their high accuracy, speed, and range. These characteristics make the sensors suitable for integration into the perception layer of controllers which have the capacity to avoid collisions with unpredicted obstacles [2]. LiDAR technology is also applied to field Autonomous Land Vehicles (ALVs) to detect potential obstacles. With a novel 3D LiDAR setup, the blind area around the vehicle is greatly reduced and the density of LiDAR data is greatly improved, which are critical for ALVs [3]. In addition to autonomous land vehicle applications, LiDAR is also used for navigation of unmanned aircraft systems [4]. The authors combined LiDAR to automatically identify ground objects that pose navigation restrictions such as airports and high-rises.
Meanwhile, in the field of agriculture and forestry, by combining field and LiDAR data in forests with coexisting evergreen and deciduous species, researchers modelled common forest stand variables (height, diameter, volume and biomass) with high accuracy [5]. At the same time, LiDAR point clouds data is used for comparative classification analysis of post-harvest growth detection in precision agriculture [6], while other researchers have proposed a new approach for discriminating maize both the laser PN (Phase Noises) and the NLPN (Nonlinear Phase Noises) in 100-Gb/s single channel coherent optical phase-modulated systems. The KF is one of the main topics in the field of robotic motion planning and control. Gulalkari et al. [24] proposed an object tracking and following six-legged robot (6LR) system that uses a Kinect camera based on KF and back-stepping control method. Lim et al. [25] proposed incorporating dead-reckoning using only encoder measurements, and a Kalman filter-based Gaussian Process to compensate the uncertainty. As for other aspects, Moon [26] developed a human skeleton tracking system using the Kalman filter framework, in which multiple Kinect sensors are used to correct inaccurate tracking data from a single Kinect sensor.
In order to solve the real-time tracking process for a moving sphere at indoor environments and in the future for spherical fruit identification and positioning with varying illumination, this paper first introduces the processes of sphere detection with 3D LiDAR, and then discusses the principles of the KF and PF algorithms. Next, with experimental modeling, data analysis of two tracking methods is compared, and finally we reach a conclusion. The tracking flowchart is as shown in Figure 1 below.
Sensors 2017, 17,1932 3 of 21 estimate and track both the laser PN (Phase Noises) and the NLPN (Nonlinear Phase Noises) in 100-Gb/s single channel coherent optical phase-modulated systems. The KF is one of the main topics in the field of robotic motion planning and control. Gulalkari et al. [24] proposed an object tracking and following six-legged robot (6LR) system that uses a Kinect camera based on KF and back-stepping control method. Lim et al. [25] proposed incorporating dead-reckoning using only encoder measurements, and a Kalman filter-based Gaussian Process to compensate the uncertainty. As for other aspects, Moon [26] developed a human skeleton tracking system using the Kalman filter framework, in which multiple Kinect sensors are used to correct inaccurate tracking data from a single Kinect sensor. In order to solve the real-time tracking process for a moving sphere at indoor environments and in the future for spherical fruit identification and positioning with varying illumination, this paper first introduces the processes of sphere detection with 3D LiDAR, and then discusses the principles of the KF and PF algorithms. Next, with experimental modeling, data analysis of two tracking methods is compared, and finally we reach a conclusion. The tracking flowchart is as shown in Figure 1 below.

The Velodyne System
The Velodyne VLP-16 3D LiDAR sensor obtains a 360-degree scene capture through the rotation of its internal motor. It is composed of 16 laser beams, which scan thousands of times per second. Each beam has a fixed pitch angle. The experimental scene and its visualization result are shown in Figure 2.

The Velodyne System
The Velodyne VLP-16 3D LiDAR sensor obtains a 360-degree scene capture through the rotation of its internal motor. It is composed of 16 laser beams, which scan thousands of times per second. Each beam has a fixed pitch angle. The experimental scene and its visualization result are shown in Figure 2.

Outliers and Noise Filtering
To reduce the calculation of segmentation after the 3D data acquisition, it is necessary to eliminate some of the noise, outliers, holes, etc. by filtering according to some motion cues. Here we remove the coordinate origin (0, 0, 0) and then use outlier filter proposed by Rusu et al. [27] which works well for indoor scenes. Firstly, we compute the average distance of each point between its nearest neighbors. Next, we compute the mean , and standard deviation of all these distances to determine a distance threshold. The standard deviation coefficient depends on the size of the analyzed neighborhood . The distance threshold will be equal to: In Equation (1), is set to 1 and is set to 30 here with the empiric and experimental threshold, especially for the indoor scene. Finally, the points can be classified as inliers or outliers if their average neighbor distance is below or above this threshold respectively. The results of filtering are shown below in Figure 3, where most of the noises and outliers marked by red ellipse are removed.

Fast Ground Segmentation
For the indoor object motion tracking, the ground segmentation is essential to cancelling the ground background information. The fast ground segmentation algorithm proposed by Himmelsbach et al. [28] is used to remove the ground noise and to reduce the amount of subsequent calculations, which requires less runtime and obtains good segmentation results.

Outliers and Noise Filtering
To reduce the calculation of segmentation after the 3D data acquisition, it is necessary to eliminate some of the noise, outliers, holes, etc. by filtering according to some motion cues. Here we remove the coordinate origin (0, 0, 0) and then use outlier filter proposed by Rusu et al. [27] which works well for indoor scenes. Firstly, we compute the average distance of each point between its nearest k neighbors. Next, we compute the mean µ, and standard deviation σ of all these distances to determine a distance threshold. The standard deviation coefficient α depends on the size of the analyzed neighborhood k. The distance threshold t will be equal to: In Equation (1), α is set to 1 and k is set to 30 here with the empiric and experimental threshold, especially for the indoor scene. Finally, the points can be classified as inliers or outliers if their average neighbor distance is below or above this threshold respectively. The results of filtering are shown below in Figure 3, where most of the noises and outliers marked by red ellipse are removed.

The Velodyne System
The Velodyne VLP-16 3D LiDAR sensor obtains a 360-degree scene capture through the rotation of its internal motor. It is composed of 16 laser beams, which scan thousands of times per second. Each beam has a fixed pitch angle. The experimental scene and its visualization result are shown in Figure 2.

Outliers and Noise Filtering
To reduce the calculation of segmentation after the 3D data acquisition, it is necessary to eliminate some of the noise, outliers, holes, etc. by filtering according to some motion cues. Here we remove the coordinate origin (0, 0, 0) and then use outlier filter proposed by Rusu et al. [27] which works well for indoor scenes. Firstly, we compute the average distance of each point between its nearest neighbors. Next, we compute the mean , and standard deviation of all these distances to determine a distance threshold. The standard deviation coefficient depends on the size of the analyzed neighborhood . The distance threshold will be equal to: In Equation (1), is set to 1 and is set to 30 here with the empiric and experimental threshold, especially for the indoor scene. Finally, the points can be classified as inliers or outliers if their average neighbor distance is below or above this threshold respectively. The results of filtering are shown below in Figure 3, where most of the noises and outliers marked by red ellipse are removed.

Fast Ground Segmentation
For the indoor object motion tracking, the ground segmentation is essential to cancelling the ground background information. The fast ground segmentation algorithm proposed by Himmelsbach et al. [28] is used to remove the ground noise and to reduce the amount of subsequent calculations, which requires less runtime and obtains good segmentation results.

Fast Ground Segmentation
For the indoor object motion tracking, the ground segmentation is essential to cancelling the ground background information. The fast ground segmentation algorithm proposed by Himmelsbach et al. [28] is used to remove the ground noise and to reduce the amount of subsequent calculations, which requires less runtime and obtains good segmentation results. denotes a 3D point, given by the Euclidean coordinates to the ego-coordinate system with original point at the center of the LiDAR sensor.

2.
The x-o-y plane denotes a circle with a radius of R, and then cut the circle equally into multiple discrete sectors, as shown in Figure 4. The ∆α denotes the angle of each sector plane, so the number of sectors M = 2π/∆α. . The x-o-y plane denotes a circle with a radius of , and then cut the circle equally into multiple discrete sectors, as shown in Figure 4. The Δ denotes the angle of each sector plane, so the number of sectors = 2 ∆ ⁄ . 3. represents each sector, where 1 ≤ i ≤ , so that each point can be classified into a sector plane according to its projection on the x-o-y plane, expressed as a segment ( ): where atan2( , ) represents the angle within [0, 2π) between the positive direction of x-axis and x-o-y plane, representing y-value of , representing x-value of , and △ representing the angle of each sector plane.
We denote the set of all points mapped to the same segment by : Define a mapping of all points of the same segment to one of many bins , = 1 … discretizing the range component of the points, while the superscript denotes the sector that the bin belongs to. The minimum or maximum range that a bin covers is expressed respectively by and . Obviously, a point ∈ maps to bin : The is denoted by the set of all points mapping to . Given a set of of 3D points mapped to the same bin, a new set of 2D points is simply defined as: where representing any point in 3D space, = ( , , ) , = + , .
All points have been mapped to a segment and a corresponding segment bin. With the above mapping method of 3D point clouds data set, sorting from small to large by distance, the processed 3D space point set is partially of order. A prototype point is calculated for every non-empty bin points whose point with lowest z-coordinate and most likely belonging to the ground plane. 3. S i represents each sector, where 1 ≤ i ≤ M, so that each point can be classified into a sector plane according to its projection on the x-o-y plane, expressed as a segment (p i ): where atan2(y i , x i ) represents the angle within [0, 2π) between the positive direction of x-axis and x-o-y plane, y i representing y-value of p i , x i representing x-value of p i , and ∆α representing the angle of each sector plane. We denote the set of all points mapped to the same segment s by P s : Define a mapping of all points P s of the same segment to one of many bins b s j , j = 1 . . . B discretizing the range component of the points, while the superscript s denotes the sector that the bin belongs to. The minimum or maximum range that a bin covers is expressed respectively by r min j and r max j . Obviously, a point p i ∈ P s maps to bin b s j : The P b s j is denoted by the set of all points mapping to b s j . Given a set of P b s j of 3D points mapped to the same bin, a new set of 2D points P b s j is simply defined as: where p i representing any point in 3D space, All points have been mapped to a segment and a corresponding segment bin. With the above mapping method of 3D point clouds data set, sorting from small to large by distance, the processed 3D is calculated for every non-empty bin points P b s j whose point with lowest z-coordinate and most likely belonging to the ground plane.

Fast Ground Segmentation
On the basis of the above data mapping method of 3D point clouds data set, the ground model in a sectorial area S can be expressed as a set L s of a line segment shown in Figure 5.

Fast Ground Segmentation
On the basis of the above data mapping method of 3D point clouds data set, the ground model in a sectorial area S can be expressed as a set of a line segment shown in Figure 5. Calculate the distance between a point and the line as shown in Figure 6. Calculate the two straight lines , and , which pass through both ends of line segment , and perpendicular to line segment . Then determine whether the point is between the straight lines , and (such as point) or outside the two lines (such as point, point). If the point is at the position , the distance and can be directly calculated. If the point is at the position , then the distance between and point is calculated. In the same way, if at point , the distance between and is equal to the distance between and point . The process of extracting lines for a segment is expressed in Algorithm 1 as follows. The thresholds mentioned in the algorithm are tested in the experiment with the settings shown in Table 1, while the threshold determines whether the point belongs to a ground point. It remains to formulate necessary conditions for a line = + to be considered part of the ground plane: 1. The line's slope must not exceed a certain threshold . 2. The line's absolute y-intercept b must not exceed a certain threshold . 3. The root mean square error of the fit must not exceed a certain threshold . 4. The distance of the first point of a line to the line previously fitted must not exceed , enforcing smooth transitions between pairs of successive lines.
With the above bins and segments calculation, and the experimental thresholds trial selection, the fast ground segmentation results are shown in Figure 7, where the most of ground information can be effectively cancelled. Calculate the distance between a point and the line L as shown in Figure 6. Calculate the two straight lines L a , and L b , which pass through both ends of line segment AB, and perpendicular to line segment L. Then determine whether the point P is between the straight lines L a , and L b (such as P 1 point) or outside the two lines (such as P 2 point, P 3 point). If the point is at the position P 1 , the distance L and P 1 can be directly calculated. If the point is at the position P 2 , then the distance between P 2 and point A is calculated. In the same way, if at point P 3 , the distance between P 3 and L is equal to the distance between P 3 and point B. The process of extracting lines for a segment is expressed in Algorithm 1 as follows.

Fast Ground Segmentation
On the basis of the above data mapping method of 3D point clouds data set, the ground model in a sectorial area S can be expressed as a set of a line segment shown in Figure 5. Calculate the distance between a point and the line as shown in Figure 6. Calculate the two straight lines , and , which pass through both ends of line segment , and perpendicular to line segment . Then determine whether the point is between the straight lines , and (such as point) or outside the two lines (such as point, point). If the point is at the position , the distance and can be directly calculated. If the point is at the position , then the distance between and point is calculated. In the same way, if at point , the distance between and is equal to the distance between and point . The process of extracting lines for a segment is expressed in Algorithm 1 as follows. The thresholds mentioned in the algorithm are tested in the experiment with the settings shown in Table 1, while the threshold determines whether the point belongs to a ground point. It remains to formulate necessary conditions for a line = + to be considered part of the ground plane: 1. The line's slope must not exceed a certain threshold . 2. The line's absolute y-intercept b must not exceed a certain threshold . 3. The root mean square error of the fit must not exceed a certain threshold . 4. The distance of the first point of a line to the line previously fitted must not exceed , enforcing smooth transitions between pairs of successive lines.
With the above bins and segments calculation, and the experimental thresholds trial selection, the fast ground segmentation results are shown in Figure 7, where the most of ground information can be effectively cancelled. The thresholds mentioned in the algorithm are tested in the experiment with the settings shown in Table 1, while the threshold T ground determines whether the point belongs to a ground point. It remains to formulate necessary conditions for a line y = mx + b to be considered part of the ground plane: 1.
The line's slope m must not exceed a certain threshold T m .

2.
The line's absolute y-intercept b must not exceed a certain threshold T b . 3.
The root mean square error of the fit must not exceed a certain threshold T RMSE . 4.
The distance of the first point of a line to the line previously fitted must not exceed T dprev , enforcing smooth transitions between pairs of successive lines. With the above bins and segments calculation, and the experimental thresholds trial selection, the fast ground segmentation results are shown in Figure 7, where the most of ground information can be effectively cancelled.

4:
if |p l | >= 2 then % if the number of points in L s bigger than two 5: else % if the number of points in L s smaller than two 15: % if the first point or the distance of point and line match thresholds 16: else % if the number of points in smaller than two 15: if

Object Segmentation
On the basis of fast ground segmentation to cancel the noises, the Euclidean clustering segmentation algorithm is used to segment the sphere target, and its point-cloud data needs to be trained to recognize the target.

Euclidean Clustering of Point Clouds
The Euclidean clustering algorithm is used to segment other disperse irrelevant background sorting point clouds data of same similarity within certain threshold. The basic idea is dividing n points into m classes randomly at first, and then making the points in each class have comparatively high similarity while the points in different classes have comparatively low similarity. Then we calculate the Euclidean distance between clusters. The two clusters with a minimum distance could be merged into one cluster. We repeat the calculation of distances between clusters, and subsequent merging. With the repeated iteration until the distance between any two clusters is over than the given threshold, or the number of clusters is less than the given number, the segmentation is completed and the target sphere object is obtained. The distance threshold is set as 0.06 according to

Object Segmentation
On the basis of fast ground segmentation to cancel the noises, the Euclidean clustering segmentation algorithm is used to segment the sphere target, and its point-cloud data needs to be trained to recognize the target.

Euclidean Clustering of Point Clouds
The Euclidean clustering algorithm is used to segment other disperse irrelevant background sorting point clouds data of same similarity within certain threshold. The basic idea is dividing n points into m classes randomly at first, and then making the points in each class have comparatively high similarity while the points in different classes have comparatively low similarity. Then we calculate the Euclidean distance between clusters. The two clusters with a minimum distance could be merged into one cluster. We repeat the calculation of distances between clusters, and subsequent merging. With the repeated iteration until the distance between any two clusters is over than the given threshold, or the number of clusters is less than the given number, the segmentation is completed and the target sphere object is obtained. The distance threshold is set as 0.06 according to the number of sphere object's point clouds, the minimum number of points in each class as 100, and the maximum number as 2000. The results of Euclidean clustering are shown in Figure 8, where the surrounding of the target sphere can be clearly clustered. Figure 8e shows the basic contour of the target sphere. With the Euclidean clustering, we could extract and match the moving sphere during the indoor motion tracking process.  Figure 8, where the surrounding of the target sphere can be clearly clustered. Figure 8e shows the basic contour of the target sphere. With the Euclidean clustering, we could extract and match the moving sphere during the indoor motion tracking process.

VFH Descriptor Extraction
The numbers of clusters have been segmented in the process of Euclidean clustering. Subsequently the feature extraction descriptor for each cluster is employed to match the target sphere with Fast Library with Approximate Nearest Neighbors (FLANN). The feature description here describes the geometry and topology of the local or global of point clouds data sets, which can be easily understood as a point-cloud feature. The feature description of point-clouds generally consists of local and global feature descriptions. The local features describe the local geometry and shape characteristics of the point-cloud data, while the global features describe the global topological structure of the point clouds. Only for the motion tracking at the whole indoor environment to distinguish different poses, the global feature descriptor of Viewpoint Feature Histograms (VFH) is employed to estimate the feature of clusters and to extract the target. The visualization of sphere object and its VFH are shown in Figure 9.

VFH Descriptor Extraction
The numbers of clusters have been segmented in the process of Euclidean clustering. Subsequently the feature extraction descriptor for each cluster is employed to match the target sphere with Fast Library with Approximate Nearest Neighbors (FLANN). The feature description here describes the geometry and topology of the local or global of point clouds data sets, which can be easily understood as a point-cloud feature. The feature description of point-clouds generally consists of local and global feature descriptions. The local features describe the local geometry and shape characteristics of the point-cloud data, while the global features describe the global topological structure of the point clouds. Only for the motion tracking at the whole indoor environment to distinguish different poses, the global feature descriptor of Viewpoint Feature Histograms (VFH) is employed to estimate the feature of clusters and to extract the target. The visualization of sphere object and its VFH are shown in Figure 9.

Feature Match of FLANN
The point clouds feature model library is constructed with the FLANN and the extraction steps are listed as follows: (1) Acquire the point clouds data sets using the LiDAR sensor at different distance between the sphere object and sensor, and then extract VFH features for each point clouds model. (2) Load the above VFH features into memories and convert the data into matrix format.
(3) Create the k-d (k-dimensional) tree with the converted matrix data, and save the index of k-d tree for the direct search match. (4) Input the VFH feature and the index of k-d, and search the nearest neighbor along the k-d tree for the input data. (5) Achieve the target point clouds if the difference between the searching results and VFH is less than the given threshold.

Kalman Filtering
For the motion tracking of a moving sphere, the KF [29] provides a highly computable method in the recursive way to estimate the state of the process and minimize the estimated mean square error. The state and measurement equations are used to describe a dynamic system. The state vector of the system of time moment is determined by both the state at time moment − 1 and the observed noise. The measurement vector is also determined by these two, which is by the observation function of state vector at time moment k and the noise. The target motion tracking process with KF is shown in Figure 10. The state variables are the positions and velocities of the sphere object in the X, Y, and Z coordinate, expressed as the matrix [ , , , , , ] and the observed variable z is the objects' position data real-time sensed by LiDAR sensor and real-time processed, denoted as [ , , ] . Here, the state function is expressed as Equation (6):

Feature Match of FLANN
The point clouds feature model library is constructed with the FLANN and the extraction steps are listed as follows: (1) Acquire the point clouds data sets using the LiDAR sensor at different distance between the sphere object and sensor, and then extract VFH features for each point clouds model. (2) Load the above VFH features into memories and convert the data into matrix format.
(3) Create the k-d (k-dimensional) tree with the converted matrix data, and save the index of k-d tree for the direct search match. (4) Input the VFH feature and the index of k-d, and search the nearest neighbor along the k-d tree for the input data. (5) Achieve the target point clouds if the difference between the searching results and VFH is less than the given threshold.

Kalman Filtering
For the motion tracking of a moving sphere, the KF [29] provides a highly computable method in the recursive way to estimate the state of the process and minimize the estimated mean square error. The state and measurement equations are used to describe a dynamic system. The state vector s k of the system of time moment k is determined by both the state s k−1 at time moment k − 1 and the observed noise. The measurement vector m k is also determined by these two, which is by the observation function of state vector s k at time moment k and the noise. The target motion tracking process with KF is shown in Figure 10. The state variables are the positions and velocities of the sphere object in the X, Y, and Z coordinate, expressed as the matrix x, y, z, v x , v y , v z T and the observed variable z is the objects' position data real-time sensed by LiDAR sensor and real-time processed, denoted as [x, y, z] T . Here, the state function is expressed as Equation (6): and the measurement function is given by Equation (7): Sensors 2017, 17, 1932 10 of 21 and the measurement function is given by Equation (7): = + (7) Figure 10. Sphere motion tracking process with KF based on LiDAR.
The random variables and represent the process and the measurement noise respectively, denoting the noises and disturbances of the moving sensing data. They are assumed to be independent of each other, and with normal probability distributions: The matrix denotes the process noise covariance and the measurement noise covariance. Here assumed that and are redefined as constant shown as follows: where , , are respectively the coordinate value of the center sphere object in the x, y, z coordinate system, and , , , , , is respectively the speed of the center coordinates in the x, y, z direction.
The time interval between the two frames of LiDAR is only one second, which is relatively short, and the motion could be considered as a uniform motion, so the state transition matrix A is expressed as: The random variables w k−1 and v k represent the process and the measurement noise respectively, denoting the noises and disturbances of the moving sensing data. They are assumed to be independent of each other, and with normal probability distributions: The matrix Q denotes the process noise covariance and R the measurement noise covariance. Here assumed that Q and R are redefined as constant shown as follows: Define the state variable s k as a six-dimensional vector shown in Equation (12): where x k , y k , z k are respectively the coordinate value of the center sphere object in the x, y, z coordinate system, and v x,k , v y,k , v z,k is respectively the speed of the center coordinates in the x, y, z direction.
The time interval dt between the two frames of LiDAR is only one second, which is relatively short, and the motion could be considered as a uniform motion, so the state transition matrix A is expressed as: The measurement vector m k is used to observe the center position of the moving spherical objects shown in Equation (14): and the corresponding observation matrix is as follows: The variableŝ − k (− represents priori, andˆrepresents the estimate) denotes k-th priori state estimation when the k-th preceding state is known, andŝ k is the known posterior state estimation of measurement variable z k at the k-th period.
The following Equation (16) predicts the current state value with the result of the previous finest state: whereŝ − k is the a priori state estimate at step k given knowledge of the process prior to step k. A is the state transition matrix of the system shown in Equation (13),ŝ k−1 is the optimal estimate at time moment t − 1, and w is the system noise following Gaussian distribution.
The uncertainty of each moment is represented by the covariance matrix P, and the update formula is expressed in Equation (17): where P − k is the covariance ofx − k , P t−1 is the covariance ofx k−1 , and Q is the covariance of the random signals w k−1 and v k . The prediction Equations (16) and (17) update time t.
We define a priori and a posteriori estimate error as Equations (18) and (19), respectively; Then, the a priori estimated error covariance is given by Equation (20): and the s posteriori estimate error covariance is: Sensors 2017, 17,1932 12 of 22 The initial state P 0 of covariance matrix P k is redefined as: The posteriori state estimateŝ k as a linear combination of an a priori estimateŝ − k and a weighted difference between an actual measurement m k and a measurement prediction Hŝ − k , calculated as shown in Equation (23):ŝ where the matrix K k is the gain that minimizes the posteriori error covariance. The difference m k − Hŝ − k is the measurement innovation (residual) that reflects the discrepancy between the predicted measurement Hŝ − k and the actual measurement z k . To accomplish minimization, the Equations (18) and (23) are firstly substituted into Equation (19), with a unit matrix I, then we get Equation (24): Substituting the above Equation (24) into Equation (21), then we get: Obtaining the indicated expectations, we take the derivative of the tracking result with respect to K k . Set the result to zero, and then solve for K k as below: With the equations above, the smaller the observed noise covariance R, the larger the gain K k . The smaller the covariance P − k , the smaller the gain K k . The Kalman gain K k works on two aspects, firstly, it weighs the size of the a priori estimated error of the covariance P − k and the observed noise covariance matrix R to determine the more convincing model between the prediction and observation model; secondly, it transforms the representation form of the residuals from the observation domain to the state domain.
To make the KF rung down till the end of the whole system running process, the covariancex k at state t needs to be updated as follows: After the time updating calculation of the Equations (14) and (19) and the measurement updating Equations (20), (23) and (24), the whole Kalman tracking process repeats again. A posteriori estimation of Equation (20) obtained from the previous calculation of Equation (18) is taken as the a priori estimation of Equation (24) of the next computation. The whole process of KF is shown in Algorithm 2 as follows.

Algorithm 2. Kalman Filter
Input: m k , % object position for time step t from sensor Output:ŝ k , % a position estimation of object 1: initialize t,ŝ k−1 , A, P, Q, R % t represents prediction time moment,ŝ k−1 is the known posterior % state estimation at time moment k − 1; A represents state transition matrix; % P represents the covariance matrix, Q denotes covariance of the random % signals, and R is the matrix of observation noise covariance. 2: if filterStop = false then % end with convergence of click the 'stop' button. 3:ŝ − k ← Aŝ k−1 % calculate predicting position estimation, according to Equation (16) 4: P ← APA T + Q % calculate priori covariance matrix, according to Equation (17) 5: K ← PC T (CPC T + R) % calculate Kalman Gain matrix, according to Equation (26) 6:ŝ k ←ŝ − k + K(m k − Cŝ − k ) % calculate optimal estimation value, according to Equation (23) 7: P ← (I − KC)P % calculatex k covariance, according to Equation (27), I denotes unit matrix 8: t ← t + 1 9: end if

Particle Filtering
Particle filtering is a non-parametric Monte Carlo method used to simulate the realization of the recursive Bayesian filter, which is applicable to any state space model for the non-linear non-Gaussian case, and its accuracy can reach the optimal estimate. Filtered particles are possibilities to describe the target state. The purpose of filtering is the most probable state of the filtered target. In the Bayesian estimation theory, the current state of the target is estimated using the previous state and the current measured value. The arbitrary probability distribution p(x k ) can be Monte Carlo approximated using the discrete particle set as follows: where k , N k are respectively expressed as particle state, weight and total number under k time, where δ is Dirac's delta function.
The most basic and common PF implementation framework is Sequential Importance Sampling and Resampling (SISR) or Sampling Importance Resampling (SIR) filter, and the algorithm is shown below: Step 1: For i = 1, 2, . . . , N, Initializing the particle set, k = 0: Generating the sampled particles {x based on the priori distribution p(x 0 ); Step 2: For k = 1, 2, . . . , N, Executing the follow steps circularly: Sequential importance sampling: for i = 1, 2, . . . , N, generating the sampled particles { x from the importance probability density, then calculating the particle weights, finally normalizing the weights so that the sum of the weights of the particles is 1; k }, and the resampled set is x (i) k , 1/N ; Printing: calculating the estimated state value: k . Sequential importance sampling is the basis of particle filtering, which applies the sequential analysis method in statistics to the Monte Carlo method, so as to realize the recursive estimation of the probability density of posterior filtering. Assumed that the importance probability density function q(x 0:k |y 1:k ) can be decomposed into: q(x 0:k |y 1:k ) = q(x 0:k−1 |y 1:k−1 )q(x k |x 0:k−1 , y 1:k ), Let the system state be a Markov process, and the given system state is independent of each observation so that there is: The recursive form of the posterior probability density function can be expressed as: In the update phase, the particles' weights are recalculated according to the likelihood function p(x 0:k |Y k ): In general, it is necessary to normalize the weight of the particle: This results in an approximate representation of the posterior probability density function expressed by Equation (32). In practical application, the use of too many samples will result in a sharp increase in the computational complexity and the deterioration of the performance of the particle filter. However, it is very difficult to correctly approximate the posterior probability with a small amount of sampling, and the resampling process may also lead to particle deficiency. Therefore, it is necessary to determine the appropriate sampling quantity and improve the efficiency of sampling according to the state of the system, under the condition of ensuring the diversity of the particles. In this paper, adaptive particle filter based on Kullback-Leibler Distance (KLD) sampling proposed by Fox [30] is adopted to resample the particles.
The core idea of the KLD sampling is that in each iteration of the particle filter, using the probability 1 − δ to make the error between the true posterior probability and the estimated probability density based on the sample less than ε: where δ is set as 0.99 and as 0.2. So that the number of resampled samples is determined. The error is determined by calculating the KLD. The KLD is used to represent the approximation error between the two probability distributions p and q: In the resampling, the smaller particles are neglected and the larger particles are copied. The number of particles in the resampling is determined by KLD sampling in the process of particle duplication, and the number of particles of the next importance is determined, adjusting the number of particles on-line and reducing the computational complexity. The process of KLD sampling is shown in Algorithm 3.

Experimentation and Discussions
In order to validate the robustness and real-time performance of the two tracking algorithms (KF and adaptive PF), some experiments were carried out on the moving spherical target with or without occlusion, obstacles, different speeds and different trajectories. We tied the ball with a rope and pulled the string to move the ball.

Target Tracking with Occlusion and Obstacle in the Environment
In target tracking processes, occlusion is a very common phenomenon. When the target is blocked, the valid information will be reduced and the tracking difficulty will be increased. Considering the partial occlusion of the spherical target with a long piece of wood, as shown in Figure 11a, the experimental results are shown in Figures 12 and 13 with two different tracking algorithms, respectively.

Target Tracking with Occlusion and Obstacle in the Environment
In target tracking processes, occlusion is a very common phenomenon. When the target is blocked, the valid information will be reduced and the tracking difficulty will be increased. Considering the partial occlusion of the spherical target with a long piece of wood, as shown in Figure 11a, the experimental results are shown in Figures 12 and 13 with two different tracking algorithms, respectively.  In Figure 12, when the target is partially occluded, the KF clusters the wood point clouds together as the next frame's measurement, and some tracking loss occurs. However, in Figure 13, the PF algorithm can track the object effectively where the occluded wood board does not cluster with the target.  In Figure 12, when the target is partially occluded, the KF clusters the wood point clouds together as the next frame's measurement, and some tracking loss occurs. However, in Figure 13, the PF algorithm can track the object effectively where the occluded wood board does not cluster with the target. Obstacles in front of the moving target are common in practical applications, and when multiple targets appear in motion, they may interfere with each other. The carton box as an obstacle is shown in Figure 11b. The experimental results are shown in Figure 14. Obstacles in front of the moving target are common in practical applications, and when multiple targets appear in motion, they may interfere with each other. The carton box as an obstacle is shown in Figure 11b. The experimental results are shown in Figure 14. Obstacles in front of the moving target are common in practical applications, and when multiple targets appear in motion, they may interfere with each other. The carton box as an obstacle is shown in Figure 11b. The experimental results are shown in Figure 14. The two methods can track the spherical target continuously in the case of an obstacle. In Figure 14, the KF is susceptible to near point-cloud in the tracking process, and the PF has strong robustness for the motion tracking. The two methods can track the spherical target continuously in the case of an obstacle. In Figure 14, the KF is susceptible to near point-cloud in the tracking process, and the PF has strong robustness for the motion tracking.

Target Tracking at Different Moving Speeds
Whether the tracker can track moving targets at different speeds is also an important indicator of the performance. Due to the limitations of our laboratory environment and the point-cloud density of LiDAR equipment, the basketball position swings easily in the process of manually moving the basketball, especially at low speed, and a uniform speed can hardly be obtained. At the experimental process, the observed speed of movement of the basketball is 0.054 m/s at its low speed, and 0.125 m/s at its high speed. In Figure 15, the continuity of low-speed trajectory expressed by the light blue dots shows that the KF is more suitable for tracking at relative lower speeds. The continuity and density of high speed trajectory expressed by the red and brown fork shows the adaptive PF has better tracking performance. Basically, with more trials of different speed experiments, the adaptive PF has better tracking effects for the different moving speed than the KF.

Target Tracking at Different Moving Speeds
Whether the tracker can track moving targets at different speeds is also an important indicator of the performance. Due to the limitations of our laboratory environment and the point-cloud density of LiDAR equipment, the basketball position swings easily in the process of manually moving the basketball, especially at low speed, and a uniform speed can hardly be obtained. At the experimental process, the observed speed of movement of the basketball is 0.054 m/s at its low speed, and 0.125 m/s at its high speed. In Figure 15, the continuity of low-speed trajectory expressed by the light blue dots shows that the KF is more suitable for tracking at relative lower speeds. The continuity and density of high speed trajectory expressed by the red and brown fork shows the adaptive PF has better tracking performance. Basically, with more trials of different speed experiments, the adaptive PF has better tracking effects for the different moving speed than the KF.

Target Tracking in Different Motion Trajectories
In an actual situation, the target may move in a variety of trajectories. The spherical target's realtime tracking is tested in three kinds of trajectories including straight line, curve and threedimensional trajectory. The linear motion trajectory and error analysis are shown in Figure 16. Figure  16a,b show the actual trajectory and the estimated value of the spherical target in the rectilinear movement under the Kalman filter and the adaptive particle filter, respectively. The results indicate

Target Tracking in Different Motion Trajectories
In an actual situation, the target may move in a variety of trajectories. The spherical target's real-time tracking is tested in three kinds of trajectories including straight line, curve and three-dimensional trajectory. The linear motion trajectory and error analysis are shown in Figure 16. Figure 16a,b show the actual trajectory and the estimated value of the spherical target in the rectilinear movement under the Kalman filter and the adaptive particle filter, respectively. The results indicate that the target moves as a curve in the same situation are shown in Figure 17. Obviously, the trajectory with the Kalman filter is smoother. However, the effect of PF tracking is better. The error comparison is shown in Figures 16c and 17c. Errors and fluctuations of the KF are greater than the adaptive PF.
The results of the trajectory tracking of the spherical target in the three-dimensional space are shown in Figure 18. It is easy to see that the KF and the PF can basically track the moving target in three-dimensional space, but fluctuate in the Z-axis direction.

Conclusions
This paper has proposed a systematic real-time detection and tracking method for an indoor moving sphere using the VLP-16 3D LiDAR sensor. After a series of preprocessing steps of pointcloud data and global feature extraction, the Kalman filter and the adaptive particle filter method are used to estimate the real-time motion state of a spherical object. With three different kinds of indoor comparison experiments and analysis, the results show that the adaptive PF has better tracking performance. The specific work allows us to put forth two conclusions:

Conclusions
This paper has proposed a systematic real-time detection and tracking method for an indoor moving sphere using the VLP-16 3D LiDAR sensor. After a series of preprocessing steps of pointcloud data and global feature extraction, the Kalman filter and the adaptive particle filter method are used to estimate the real-time motion state of a spherical object. With three different kinds of indoor comparison experiments and analysis, the results show that the adaptive PF has better tracking

Conclusions
This paper has proposed a systematic real-time detection and tracking method for an indoor moving sphere using the VLP-16 3D LiDAR sensor. After a series of preprocessing steps of point-cloud data and global feature extraction, the Kalman filter and the adaptive particle filter method are used to estimate the real-time motion state of a spherical object. With three different kinds of indoor comparison experiments and analysis, the results show that the adaptive PF has better tracking performance. The specific work allows us to put forth two conclusions: Firstly, the real-time detection of the spherical target is accomplished by acquiring the real-timepoint-cloud data of moving sphere at indoor, preprocessing with fast ground segmentation algorithm to remove outliers, and ground points clustering with Euclidean cluster algorithm, extracting target feature with VFH to establish model library and matching to detect spherical targets.
Secondly, the KF is used to real-time track the object, and the object position is estimated sequentially by real-time acquisition of the measured value, prediction and correction, while the adaptive PF is used to track the target, and the state of the target is estimated by sampling, calculating the weight and resampling. The efficiency of KF and adaptive PF in 3D lidar tracking is verified by indoor basketball tracking experiments, with a moving spherical object with or without occlusion and obstacles, respectively, at different speeds and over different trajectories. The experimental results show that adaptive PF has a small tracking error and strong robustness.
The motion tracking of a dynamic environment is one of the key components for intelligent agricultural harvesters to operate in real-world conditions. We will continue to exploit the 3D semantic perception with transfer learning and real-time location method of natural fruits for a better tracking performance.