An Efficient Online Trajectory Generation Method Based on Kinodynamic Path Search and Trajectory Optimization for Human-Robot Interaction Safety

With the rapid development of robot perception and planning technology, robots are gradually getting rid of fixed fences and working closely with humans in shared workspaces. The safety of human-robot coexistence has become critical. Traditional motion planning methods perform poorly in dynamic environments where obstacles motion is highly uncertain. In this paper, we propose an efficient online trajectory generation method to help manipulator autonomous planning in dynamic environments. Our approach starts with an efficient kinodynamic path search algorithm that considers the links constraints and finds a safe and feasible initial trajectory with minimal control effort and time. To increase the clearance between the trajectory and obstacles and improve the smoothness, a trajectory optimization method using the B-spline convex hull property is adopted to minimize the penalty of collision cost, smoothness, and dynamical feasibility. To avoid the collisions between the links and obstacles and the collisions of the links themselves, a constraint-relaxed links collision avoidance method is developed by solving a quadratic programming problem. Compared with the existing state-of-the-art planning method for dynamic environments and advanced trajectory optimization method, our method can generate a smoother, collision-free trajectory in less time with a higher success rate. Detailed simulation comparison experiments, as well as real-world experiments, are reported to verify the effectiveness of our method.


Introduction
Human-robot interactions have been involved in more and more applications, from automobile assembly to electronic product assembly applications [1,2]. In these applications, robots act as human partners, with side-by-side or face-to-face working with humans in shared workspaces to complete specific tasks, and the safety issue of human-robot coexistence becomes crucial [3,4]. For example, in the electronic product production line's common robot pick-place tasks, the arms of human partners may appear on the predefined motion path of the robot. At this point, it may be inefficient and unsafe for the robot to simply back off or stop motion. Instead, it may make more sense to adjust online the performing trajectory of the manipulator to avoid potential collisions, while maintaining the original task undisturbed [5,6]. Therefore, an effective online trajectory generation method is urgently needed to help a manipulator perform autonomous motion in dynamic and uncertain environments. For trajectory generation in a dynamic environment, the trajectory needs to meet safety, smoothness, and dynamic feasibility requirements. In addition, since the movements of human partners or dynamic obstacles are highly uncertain, the trajectory generation process should also meet real-time requirements. To this end, a robust and efficient online trajectory generation method is designed to acquire a safe, smooth, and dynamically feasible trajectory in real time.
Many safety-related motion planning methods for dynamic environments have been developed at different control levels. The reactive planning method using the potential field concept has the advantages of low computational cost and high speed. Based on these characteristics, Flacco et al. [7] introduced a collision avoidance method based on the principle of infinite depth in the depth space. Nascimento et al. [8] proposed a safe contour collision avoidance method based on the finite depth principle by fusing an external vision sensor and robot body sensors information. Tulbure et al. [9] proposed a collision avoidance algorithm that maintains convergence to the goal by combining local reactive planning with global planning. Lin et al. [10,11] introduced a velocity-based physical human-robot interaction method for non-redundant robots, which allows the end-effector to track the human guidance trajectory while avoiding collisions of the links with obstacles. Different from the potential field method, the danger field is constructed based on the state of the robot itself rather than the state of obstacles [12], and indicates the degree of danger of the robot's current position and velocity to objects. Based on the concept of danger field, Refs. [13,14] developed novel danger assessment and control methods for the safety of human-robot coexistence. Compared with the danger field, the safety field is generated using the obstacle state information and represents the source of danger in a more complete form [15]. Methods based on potential field and safety/danger field only consider the local space to avoid obstacles but may cause the robot to get stuck in local minima. Moreover, when the robot encounters obstacles, it may retreat from the original task instead of finding a new trajectory to reach the goal.
There are approaches that formulate trajectory planning as an optimization problem to generate collision-free trajectories. Zucker et al. [16] proposed a trajectory optimization method based on covariant Hamiltonian optimization (CHOMP), which iteratively optimizes a balance function including smoothness and collision costs to improve the initial trajectory quality. Schulman et al. [17] developed an efficient motion planning method that formulates a sequential convex optimization method and considers collision-free constraint formulae associated with obstacles. Zanchettin et al. [18] proposed a motion planning method by combining the trajectory planning method with the optimized control strategy. Ragaglia et al. [19] proposed a collision avoidance framework that takes safety requirements as constraints and maximizes production efficiency as an optimization goal. Most of these methods may generate collision-free trajectories with high computational costs and low success rates.
Learning-based planning methods have been developed for collision-free trajectory generation. Qureshi et al. [20] proposed a motion planning network (MPNet) that generates an end-to-end collision-free path from the starting point to the goal by encoding the workspace. However, this method does not consider dynamic obstacle constraints. Xu et al. [21] proposed a motion planning method based on a recurrent neural network. It defines a set of level set functions and virtual fences to encode the workspace to achieve obstacle avoidance and solves the quadratic programming problem online by a recurrent neural network to track the reference trajectory. Song et al. [22] proposed a robot trajectory planning method using a radial basis neural network to improve trajectory planning accuracy, but it does not consider obstacle avoidance. Shen et al. [23] proposed a redundant manipulator collision avoidance method by combining deep reinforcement learning and gradient projection methods. However, the real-world performance of this method has not been shown. Although great progress has been made in learning-based planning methods, considerable challenges remain in highly dynamic environments. Liu et al. [24] proposed a human-robot collaboration framework, which combined a human motion prediction model and a task model based on finite state machine to improve the efficiency of human-robot collaboration. It improves collaboration efficiency by predicting human motion trajectories, and the current work solves the problem of human-robot interaction safety via a hierarchical online trajectory generation algorithm. Our research method and objective are different from previous work.
Another class of methods used for motion planning in dynamic environments is replanning, which can update the robot motion trajectory according to the changes of dynamic obstacles in the workspace. To this end, Hauser et al. [25] proposed an adaptive time-stepping architecture for real-time replanning, which dynamically adapts the planning time and improves the stability of the replanner. Sun et al. [26] proposed a high-frequency replanning method with kinodynamic RRTs, which computes multiple RRTs in parallel and executes the first action of the optimal motion plan. Otte et al. [27] proposed a asymptotically optimal and single-query sampling-based replanning algorithm, which continuously improves and repairs the search graph during navigation to obtain the shortest path. Völz et al. [28] developed a predictive path following controller that time-parameterizes the planned path and computes optimal control actions along the planned path, and a continuous replanning strategy is invoked at fixed time intervals to avoid dynamic obstacles. Pupa et al. [29] proposed a safety-aware kinodynamic planning method, which replans the nominal trajectory by calling the RRT algorithm multiple times and scales the robot velocity according to safety rules. Most recently, Covic et al. [30] proposed a motion planning algorithm (DRGBT) dedicated to the fast exploration of dynamic environments by defining an adaptive horizon and a replanning mechanism. Sampling-based path-planning algorithms are used as initial path generators in these replanning algorithms. Sampling-based initial path generators are usually asymptotically optimal but computationally expensive. Moreover, the random behavior of sampling-based methods can also lead to unpredictable performance, especially with a limited number of samples [31]. Search-based methods play an important role in replanning due to the consistency of search results [32]. However, most of the existing search-based replanning methods [33][34][35] are aimed at robot systems with non-fixed bases, which may not be directly applicable to multi-joint collaborative robot systems. Table 1 summarizes and compares the similarities and differences of related works from five aspects: danger source; obstacle characterization; real-time performance; whether the collision avoidance trajectory tends to the target point (such as not simply retreating from the main task); and constraints.
Motivated by the above methods, this paper proposes a complete and efficient online trajectory generation method to help a manipulator for autonomous planning in highly dynamic environments. It does not suffer from expensive computational burdens or complex data structures and is suitable for real-time application. Our approach starts with an efficient kinodynamic path search method using heuristic search and linear quadratic minimum time control, which finds a safe, feasible, and time-minimized initial trajectory on voxel grids. Links constraints are introduced into the path search process to avoid invalid paths where links collide with obstacles. A trajectory optimization method using the B-spline convex hull property is adopted to post-optimize the initial trajectory to increase the clearance between the initial trajectory and obstacles and improve the smoothness. To avoid the collisions between the links and obstacles and the robot self-collision, a constraint-relaxed links collision avoidance method is designed by solving a standard quadratic programming problem, which minimizes the deviation between the actual trajectory and the back-end optimized trajectory. A derivative control point adjustment method is designed to eliminate infeasible higher-order derivatives. Finally, the path search module and the trajectory optimization module are integrated into a receding horizon replanning framework using a horizon-limited replanning mechanism. Compared with the current state-of-the-art planning methods for fixed-base robots, the DRGBT algorithm and RRT X algorithm, and the advanced trajectory optimization methods, the CHOMP algorithm and the TrajOpt algorithm, our method can generate smoother, collision-free trajectories in shorter time with higher success rates. Numerous simulation experiments are performed to verify the effectiveness and robustness of the proposed method. Furthermore, in a real-world pick-place experiment, we also demonstrate that our method can effectively replan a new trajectory to avoid dynamic obstacles and converge to the item placement box. The contributions of this paper are summarized as follows: (1) A complete and effective real-time online trajectory generation framework is developed bottom-up, which mainly includes kinodynamic path search, B-spline trajectory optimization, and links collision avoidance optimization. (2) A path search method considering links constraints and a trajectory optimization method using the B-spline convex hull property are presented. The former transforms a position-only geometric search into an efficient kinodynamic search by state-space motion primitive generation and heuristic cost evaluation. The latter fully considers dynamic constraints and converges quickly to generate a safe, dynamically feasible, and smooth trajectory. (3) A constraint-relaxed links collision avoidance optimization method is adopted, which effectively avoids link collisions while tracking the optimized task space trajectory. The rest of this paper is organized as follows. The key points of the planning problem are stated in Section 2. Section 3 introduces the links-constrained kinodynamic path search method. Section 4 presents the trajectory optimization method. Section 5 elaborates on collision avoidance methods for both links and obstacles and robot self-collision, and the fusion process of links collision avoidance optimization and back-end trajectory optimization. Section 6 presents the experimental results. The conclusion is reported in Section 7.

Problem Statement
In this study, we solve an online trajectory generation problem for a manipulator in dynamic environments. Consider a human-robot collaboration application where an n-DOF manipulator runs from an initial configuration, q s , to a specified goal configuration, q g , and a dynamic obstacle, O, may appear on its motion path. Then, the robot's trajectory, q(t), is considered to be feasible and collision-free when where χ lk (q a ) represents the geometric line segment representation of the k-th link when the robot is in configuration q a , and χ li (q a ) is the geometric line segment representation of the links other than the k-th link and not adjacent to the k-th link. d(χ lk (q a ), O) represents the minimum distance between the link segments and the obstacles. d(χ lk (q a ), χ li (q a )) represents the minimum distance between the k-th link and the i-th link. d 0 is the safe distance threshold between the links and obstacles, and d l0 is the safe distance threshold of the self-collision. For this reason, the shared workspace is equipped with a monitoring system that detects obstacles and estimates the distance between obstacles and the robot. Several methods for real-time obstacle detection can be found in the literature [36][37][38]. We aim to design an efficient trajectory generation method to solve the above problems. There are several key points that need to be addressed: • A safe and feasible initial trajectory from the starting position to the goal needs to be searched. Traditional path-planning algorithms such as A*, Dijkstra, and the samplingbased method RRT usually do not consider the nonstatic initial state of the robot, so there are problems in replanning. As shown in Figure 1a, the geometric shortest path may turn sharply, which may lead to the failure of path parameterization. Since the replanning has non-general dynamic characteristics, it is necessary to use the kinodynamic planner to achieve a nonstatic initial state to ensure dynamic feasibility. • Since the initial trajectory search does not consider the distance cost, the initial trajectory tends to be close to obstacles (see Figure 1b). In addition, the uncertainty of dynamic obstacle motion may also make the initial trajectory unfeasible. Therefore, trajectory optimization and replanning strategies are necessary. • Collision avoidance of the robot links needs to be considered. As shown in Figure 1c, although the red trajectory indicated by "yellow star" is feasible for the end-effector, the robot links will collide with obstacles if the end-effector runs along with it.

Links-Constrained Kinodynamic Path Search
We propose an efficient links-constrained kinodynamic path search method for the path search of a fixed-base n-DOF manipulator in a dynamic environment. It fully considers the link constraints in the path search process and realizes an efficient kinodynamic search by applying a series of discrete control inputs. Different from the traditional A* algorithm, nodes are not searched along a straight line but use a set of short-duration motion primitives related to the robot state to generate the edges of the graph. In addition, unlike quadrotors and unmanned vehicles, whose bodies can move flexibly in three-dimensional space, multi-joint robots with fixed bases need to fully consider the constraints of obstacles on the links (see Figure 1c).
The path search process is summarized as Algorithm 1, and the mathematical principles of the key components of the search algorithm (motion primitive generation, search cost evaluation) are also explained in detail. We take the robot task space starting state and goal state, the safe distance threshold, and a set of discrete control inputs as input, and a safe, feasible, and time-optimal initial trajectory is searched as output. Following the A* algorithm, we denote open set and closed set as P and C, and current grid node as p cur . Given the current state, discrete control inputs u , and duration τ, the NodesExpand() function produces discrete motion primitives (see Section 3.1). Primitive nodes that end in the same grid with a non-minimum search cost will be removed. The interest points on the links corresponding to the node that satisfies the feasibility check needs to be collision detected by Equation (1). The positions of interest points on the links are obtained using the Kinematicsmodel() function, and the distance between interest points and obstacles are calculated using the euclidean distance in the Distance() function. Once the minimum distance between the interest point and the obstacle approaches the safe distance threshold d 0 , the collision signal will be output by the CollisionCheck() function (see Section 5), and the new extended primitive node is marked as infeasible. The minimum search cost from the starting state to the current state is calculated by minimizing the time cost and control cost. Heuristic search method using Pontryagin's minimum principle to minimize initial trajectory time (Section 3.2). The entire process continues until the robot successfully reaches the goal region.

Motion Primitives for Node Expansion
We consider an environment with dynamic obstacles, where a robot with a fixed base and n joints moves its end-effector from an initial state x s to a goal state x g . Let p e represent the end-effector position in the task space. The trajectory p e (t) executed by the end-effector can be decomposed of path-velocity decomposition [39], where s µ (t) is the time law of the given parametric geometric path p e (s), which is generated be the state of a dynamic system consisting of a position and its κ − 1 order derivative. Let u(t) = p e (κ) (t) be the control input and u(t) ∈ U := [−µ max , µ max ] 3 ⊂ R 3 . In this study, instead of using the control set U directly, we consider a lattice discretization U d ⊂ U following [40] for each axis, where each control vector u d ∈ R 3 will define a short-duration motion for the system. The discretized U d is obtained by choosing a number of samples −u max , − −1 u max , · · · , −1 l u max , u max along each axis with discrete steps ∇ u = u max , and results in M = (2 + 1) 3 motion primitives [41]. A simple double integral system is considered for each axis (e.g., κ = 2), and the discrete state space model can be defined as where τ represents the short-duration step. Finally, given the current state of the endeffector and a set of discrete control inputs, the motion primitives for node expansion can be obtained.

Search Cost Evaluation
The search cost from the starting node x s to the current node x c is calculated by minimizing the time cost and control cost, where the time cost ensures that the time is minimized, and the control cost makes the state trajectory as smooth as possible. The search cost function [40] is defined as follows, dt indicates the smoothness of the trajectory, and ρ ≥ 0 indicates the importance of the trajectory duration T relative to its smoothness. The cost of the motion primitive generated by the discrete control input u d and duration τ can be expressed as Finally, the total search cost from the initial state x s to the current The search cost from the current state x c to the goal state x g is obtained using a heuristic function, which is very useful for accelerating the search speed. Combined with Pontryagin's minimum principle [42], the heuristic function is designed to solve the minimum time cost trajectory, that is, where p µc and p µg represent the current and goal positions, respectively. v µc and v µg represent the current and goal velocities, respectively. C * (T) indicates the cost function. By solving the root T f of ∂C * (T) ∂T = 0, the minimum time cost and feasible initial trajectory can be obtained. C * (T f ) indicates the minimum search cost [41]. Finally, the total search cost from the initial state to the goal state is denoted as f c = g c + C * (T f ).

Trajectory Optimization
Although the initial trajectory generated by the kinodynamic path search is dynamically feasible and collision-free, it may be suboptimal due to grids discretization and distance constraints with obstacles are not considered. In this section, the trajectory optimization method using the B-spline convex hull property is adopted to improve the smoothness of the trajectory and increase the clearance between the initial trajectory and obstacles. To adjust the infeasible high-order derivatives, an infeasible derivative control point adjustment method is designed using the derivative B-spline-bounded sufficient condition [43] and convex hull property. Furthermore, the planned initial trajectory may also become invalid as dynamic obstacles may appear at any time. To this end, a recedinghorizon replanning strategy is utilized to generate a new trajectory to reach the goal.

B-Spline Curve Formulation
A B-spline curve c(t) [44] of degree k is defined by the linear combination of a set of control points {cp 0 , cp 1 , · · · , cp n } and the B-spline basis functions B i,k , i.e., where the B-spline basis functions B i,k can be computed by the fast and efficient de Boor algorithm [45] on a non-decreasing knot vector {t 0 , t 1 , · · · , t m }. The number of knots, the degree, and the number of control points satisfy the relationship m = n + k + 1. For a uniform B-spline, its knot vector is uniformly divided by defined as the s-th control point span including k + 1 continuous control points. The general matrix representation of B-spline is described in [46]. Then, the position and lth-order derivative [43] of the B-spline curve can be expressed as In addition, b is the basis vector, and M k denotes a blending matrix. When ld is equal to zero, the upper and lower equations of Equation (10) are equivalent.

Optimization Approach
One of the important properties of B-spline is the strong convex hull property. As shown in Figure 2a, all points on a B-spline curve lie within the union of all convex hulls consisting of n + 1 successive control points. More precisely, if t ∈ [t i , t i+1 ), then c(t) is in the convex hull of control points cp i−k , cp i−k+1 , cp i . Another important property of B-spline is that the d-th derivative of a k-degree B-spline is a (k − d)-degree B-spline on the original knot vector with a new set of (n − d + 1) control points cp 0 , cp 1 , · · · , cp n−d . For example, the velocity-spline curve shown in Figure 2b. Moreover, the control points of where each knot span ∆t is the same.
(a) (b) In the current work, three cost terms related to safety, smoothness, and feasibility are used to model the trajectory cost, where the safety term indicates the cost of approaching obstacles, the smooth term indicates the trajectory smoothness cost, and the feasible term indicates the dynamic feasibility cost. Then, the objective function is written as where C s represents smoothness cost, C c is collision cost, and C d is for feasibility cost. The weight coefficients ϕ s , ϕ c , and ϕ d trade off each cost term to minimize the final optimization cost.
(1) Smoothness costs: For the smoothness penalty term, an elastic band function is designed to describe the smoothness of the position control points, which only uses the geometric information of the control points without involving time information. Moreover, the smoothness of the B-spline trajectory can also be improved by minimizing the acceleration and jerk control points [47]. Then, the smooth term penalty function is defined as where the scaling factor δ = d i−1 d i−1 +d i ensures that the relative distance between two adjacent control points remains unchanged, and d i = cp i − cp i+1 is calculated between the control points. In fact, the first term of the smoothing function treats all control points as a deformable elastic band and behaves as an internal contractive force to make the trajectory as evenly distributed on the straight line as possible. The second and third terms smooth the whole trajectory by minimizing higher-order derivatives.
(2) Collision costs: Since the initial trajectory may be close to the obstacles, the collision penalty function can keep the control point away from the obstacles by the repulsive action [7]. Therefore, the collision penalty term is defined as where d(cp i ) is the minimum Euclidean distance between control points and the obstacles, f ε is the repulsive force at the safe distance threshold d 0 , k d represents the maximum magnitude of the repulsive force, and α(α 1) is a shape parameter. This design allows the repulsive force magnitude f (d(cp i )) to reach its maximum value k d when d(cp i ) = 0, while f (d(cp i )) approaches 0 when d(cp i ) > d 0 and no repulsive force is generated. To facilitate fast distance detection, the Euclidean distance field (EDF) of the occupancy volume is calculated by an efficient algorithm [48] with complexity O(n1), where n1 = N 3 is the number of voxel grids, and N represents the size of the volume along a single axis. Furthermore, the trilinear interpolation technique is adopted to enhance the detection accuracy of distance [33], which compensates for voxel grid discretization errors and is beneficial for numerical optimization [41].
(3) Feasibility costs: The higher-order derivative of a B-spline curve is also a B-spline with the convex hull property. In other words, if the derivative control points are bounded within the convex hull, expanded by the maximum allowed derivative, then the derivative-spline is also bounded [43]. Based on this property, we ensure the feasibility of the trajectory by designing a penalty function that constrains the higher-order derivatives of the control points as follows: where w v , w a , and w j are the weights of the penalty terms of velocity, acceleration, and jerk, respectively. The penalty term is defined as

Numerical Optimization Method
The numerical solution of Equation (13) is obtained using the L-BFGS method [49], which uses curvature information to construct a Hessian approximation from the nearest iteration. Curvature information from earlier iterations that are not associated with the behavior of the current iteration Hessian is discarded to save storage. In the current work, the objective function is explicit and has separability, so the L-BFGS method that approximates the inverse Hessian from gradient information usually converges quickly and is robust [50]. Due to low computing requirements and small memory consumption, L-BFGS is suitable for real-time applications. The L-BFGS numerical optimization process is elaborated as follows: Let the continuously differentiable unconstrained optimization problem be min x f (x). The updating for x follows the approximate Newton steps [50]: where α k is the step length and satisfy the Wolfe condition, H k ∇ f k is the search direction, and H k is updated at every iteration by means of the following formula: where Furthermore, the inverse Hessian approximation is implicitly stored by storing a finite number of vector pairs {s i , y i } to avoid high storage and operation costs when the number of variables is large. The product of H k ∇ f k is computed by a two-loop recursion updating algorithm [51], which performs an iterative inner product and vector sum operation over ∇ f k and the vector pair {s i , y i }. In fact, only curvature information from the m (between 3 and 20) most recent iterations is included in the set of vector pairs. The initial inverse Hessian H 0 k for L-BFGS updating is chosen to follow [50], i.e., where is the scaling factor that attempts to estimate the size of the true Hessian matrix along the most recent search direction. Finally, the L-BFGS algorithm can be summarized and rewritten as Algorithm 2. x k+1 = x k + α k p k (α k satisfy the Wolfe conditions) 5: if k > m then 6: Discard( s k−m , y k−m ); 7: end if 8:

Infeasible Derivative Control Points Adjustment Method
For a uniform B-spline, each knot span ∆t is equal. Collision penalty tends to force larger separations between local control points close to obstacles. The robot needs to move farther in the same amount of time, which means the robot needs to move faster. Therefore, derivative control points that do not meet the feasibility requirements may appear. To this end, an adjustment method based on the derivative B-spline-bounded sufficient condition [43] and the convex hull property is designed, which locally adjusts the infeasible derivative control points to within the maximum allowable bound.
As in Equation (10), the position coordinate c(ξ) of B-spline curve in the s-th control point span can be expressed as: Let the mapping matrix be C ld , which satisfies db d ld ξ = b (C ld ) ; then, the corresponding derivative control points can be derived following [32]: It can be seen from Equation (25) that the derivative curve is also a B-spline curve with S ld Y s as the control point span. If |S ld Y s | ≤ λ max ld 1 k×1 , then the derivative control points are completely contained in the range from −λ ld to λ ld . Using the above derivation and the B-spline curve convex hull property, an infeasible derivative control point adjustment method is designed by adjusting the infeasible derivative control points to be within the convex hull of the maximum allowable derivative expansion. An optional adjustment scaling factor η is set as follows: where sgn r inf The adjustment of infeasible acceleration and jerk follows a similar process as well. Finally, infeasible derivative control points can become feasible by scaling factor adjustment. Then, the whole derivative spline is feasible according to the B-spline convex hull property.

Local Replanning Strategy
In an open industrial cell, it may be inefficient to directly generate the global trajectory from the starting point to the goal, since the part of the trajectory occupied by dynamic obstacles is never executed. To improve efficiency, following [41,43], a replanning strategy using receding horizon framework is adopted to generate a trajectory from the start point to the goal segmentally by a predefined searching radius δ. In other words, the path search is only performed in a spherical region centered on the current position of the end-effector and with a radius δ. An illustrative example is shown in Figure 3. Once the motion primitive node exceeds the search radius, the search process will be stopped, then the trajectory optimization and infeasible derivative control points adjustment will be performed.
The replanning process is activated in both active and passive modes. In the active mode, the replanner is invoked at a regular interval and updates the trajectory with the latest environmental information. In passive mode, the replanner is activated by collision detection, i.e., once the current planned trajectory collides with obstacles, the replanner will be triggered to ensure that a new safe trajectory is available.

Collision Avoidance Optimization for the Robot Links
In this section, the collision avoidance problems between links and obstacles, and the robot itself, are discussed, respectively. The former avoids the collisions between the robot links and obstacles, and the latter avoids the robot from self-collision during motion. Collision detection needs to be performed first. To reduce CPU consumption, the robot is approximated as a simple 3D sphere and cylinder as shown in Figure 4a. The sphere is characterized by the center and radius, R1. The cylinder is characterized by the center, axis length, L, and radius, R2. For self-collision avoidance, collision detection can be divided into three cases, as shown in Figure 4b: • Collision detection between spheres (Case 1): The distance, d, between the centers of two spheres is measured to detect whether the two spheres collide. If d > (R1 + R2), where R1 and R2 are the radii of the two spheres, then the two spheres do not collide. • Collision detection between a cylinder and a sphere (Case 2): The center of the sphere is projected onto the axis of the cylinder to detect collisions between the cylinder and the sphere. Two cases need to be discussed. First, if the projection of the sphere center is inside the cylinder, the distance, d, between the sphere center and its projection on the cylinder axis is considered as the collision detection distance. If d > (R1 + R2), the cylinder and the sphere do not collide. Second, if the projection of the sphere center is outside the cylinder, the distance, d, between the sphere center and the nearest cylinder end will be used as collision detection distance. If d > (R1 + R2), then no collision occurs. • Collision detection between cylinders (Case 3): The cylinders are reduced to two axes to detect collisions between two cylinders. Again, two possible cases that need to be discussed. First, if the intersection of the two cylinders axes is inside the first cylinder, the distance, d, between the closest points is used as the collision detection distance. If d > (R1 + R2), the two cylinders do not collide. Second, if the intersection of two cylinders axes is outside the cylinders, then the closest distance is determined by the distance, d, between the ends of the two cylinders. If d > (R1 + R2), then no collision occurs. To avoid robot self-collision, the distances between the links that may collide need to be calculated. Denote the Cartesian space positions of the simplified geometric axes at configuration q ∈ R n as CS(q), and the Euclidean distance vector from P i to P j as d P i , P j : R 3 × R 3 → R 3 , then the self-collision distance, dl, between robot components that may collide, can be defined as dl = min d P i , P j s.t. P i , P j ∈ CS(q), i, j ∈ {Link 1 , Link 2 , · · · , Link 6 } (28) where the position of the cylinder axis and the position of the sphere center can be easily obtained by the forward kinematics model of the robot. Due to the structural constraints of the robot itself, some link components never collide (including adjacent links). Therefore, we divide the link collision states into two categories: never-in-collision state, N, and possible-collision state, M. To speed up self-collision detection, only components that may collide with other components are detected. Pairs of link components that require self-collision detection are listed as shown in Table 2. Since the robot motion interpolation is discretized, we need to ensure that the robot's next movement distance is less than the minimum self-collision detection distance, dl, to avoid self-collision. The Cartesian velocity vector along the contact normal n is calculated to predict the distance the robot will move next. The Cartesian velocity vector is calculated by where J chain is the Jacobian of the closest point closer to the end-effector. The main reason is that self-collisions are usually caused by end-effector manipulation tasks. Figure 4c shows the closest points cp 1 and cp 2 , where cp 2 is closer to the end-effector. Once the self-collision distance, dl, is detected and the closest point Cartesian velocity x chain is obtained, then x chain is projected onto the contact normal n and multiplied by the time step ∆t; the self-collision avoidance constraint is defined as where the contact normal n is the direction vector of the distance vector between the two closest points. d l0 is the safe distance threshold of the self-collision. The above formula shows that the distance of the closest point moving along the contact normal direction in time step ∆t is less than the self-collision distance. Several illustrative examples of self-collision avoidance are also shown in Figure 4c, which correspond to the three collision detection cases shown in Figure 4b. For the links collision avoidance, as with the self-collision method, spheres and cylinders surrounding the robot body are used to provide safety contours. As shown in Figure 5a, the interest points (yellow) distributed along the links are set for collision detection between the links and obstacles. Different from the traditional collision avoidance method based on the concept of artificial potential field, it only makes the robot avoid obstacles along the distance vector direction. In the current work, a constraint-relaxed links collision avoidance method is adopted by solving a standard quadratic programming problem, which minimizes the deviation between the actual trajectory and the back-end optimized trajectory under the constraints of links collision avoidance and self-collision avoidance. As shown in Figure 5b, the hemispherical area (dark orange) centered on the interest point C is the feasible set of collision avoidance. Specifically, the center ω r of the feasible set of single interest point can be configured as: where d represents the minimum distance vector between the interest point and obstacles. To keep the interest point C away from obstacles, the component of collision avoidance velocityẋ r of the interest point C along the vector ω r needs to be non-negative, that is where ω ra ∈ R 6×1 is the zero-filled augmented matrices of ω r . J c is the Jacobian of the interest point C. J ro indicates the Jacobian matrix after dimensionality reduction.
Since there may be an infinite number of collision avoidance velocities satisfying Equation (32) in the feasible set space, the optimal collision avoidance velocity command cannot be manually selected. In addition, the self-collision avoidance constraint Equation (30) also needs to be satisfied during the links collision avoidance process. To solve the above problems, we integrate multiple tasks into quadratic programming (QP) framework with inequality constraints, as follows: where the objective function minimizes the velocity deviation of the end-effector from actual to desired under the constraints.ẋ op is the desired velocity profile generated by the back-end optimization step, Jq is the end-effector actual velocity profile.q min andq max are the minimum and maximum allowable joint velocities, respectively. Equation (33) is a standard quadratic programming problem, and its Hessian matrix H = J T J is symmetric positive definite, so there is a global optimal solution. The optimal solution is composed of collision avoidance joint velocity and compensation joint velocity. The velocities of the joints before the constrained interest point are used to avoid the robot from colliding with obstacles, and the velocities of the following joints are used as compensation values to track the desired velocity profile. It is worth noting that Equation (33) involves the back-end trajectory optimization step and the links collision avoidance step, where the optimized task space trajectory is used as the desired trajectory. More specifically, if the links are not constrained by obstacles or have no risk of self-collision, the robot will normally execute the optimized task space trajectory. Instead, the link collision avoidance optimization will be activated to achieve link collision avoidance while tracking the optimized task space trajectory.

Simulation and Real-World Experiment Results
In this section, we first verify the effectiveness of the proposed algorithm in a variety of different simulation scenarios, which include static obstacles as well as dynamic obstacles. We benchmarked the proposed algorithm against existing state-of-the-art motion planners and trajectory optimization algorithms. We first compare our method with two state-ofthe-art motion planners for manipulator arms, RRT X [27] and the DRGBT [30]. Second, we compare our method with two state-of-the-art trajectory optimization algorithms, CHOMP [16] and TrajOpt [17], which are popular in the field of industrial robot trajectory optimization and integrated into the open-source motion planning framework MoveIt [52]. We choose these benchmark methods due to their superior performance, reproducibility, and code availability. Compared with the advanced motion planners RRT X and DRGBT, our method can generate a shorter and smoother path in shorter time with a higher success rate. Compared with the advanced trajectory optimization algorithms CHOMP and TrajOpt, our method can generate a smoother optimization trajectory with a higher success rate and is more suitable for real-time applications. Finally, we also demonstrate the effectiveness of our method on a real-world robotic pick-place task. More experimental details are also presented in the Supplementary Materials Video S1, and the download link is in the Supplementary Materials section of the paper.

Experimental Settings
The experiments were performed on a 6-DOF collaborative robot, the model of which is shown in Figure 6. The size of the robot workspace is 3 × 3 × 3 m, which is modeled as a 3D grid map with a resolution of 1 mm. Too large a resolution may increase the discrete error, and too small a grid resolution may increase the path search time. Therefore, the choice of grid resolution is a compromise between discrete error and path search time. The choice of grid resolution in the current work fully considers the two factors. The start and goal positions in Cartesian space are p s = (0.3406, −0.3647, 0.4318) and p g = (0.3289, 0.4763, 0.5000), respectively, which are predefined. The Euclidean distance is used as a safety metric. The safe distance threshold between the robot and the obstacles is d 0 = 0.14 m, and the self-collision safe distance threshold is d s = 0.05 m. A greater safety distance will cause the robot to move more conservatively but may increase the task execution time and trajectory length, thereby reducing work efficiency. Therefore, the setting of the safety distance is a compromise between safety and efficiency.  The weights of the trajectory optimization cost function are set to ϕ s = 8, ϕ c = 0.3, and ϕ d = 0.01. The weight of each optimization term indicates its relative importance. The selection of weights is a compromise between the cost of each optimization item to minimize the total optimization cost. The proposed algorithm already considers the safety and dynamic feasibility of the initial trajectory in the kinodynamic path search step, while the trajectory smoothness is not considered. Therefore, the smoothing cost is applied with greater weight in the trajectory optimization objective function, while the costs of safety and dynamic feasibility are given smaller weights, respectively. Similarly, we apply a greater weight to the jerk penalty term in the feasibility penalty function and smaller weights to the penalties of the velocity and acceleration, respectively. The weights of velocity, acceleration, and jerk in the feasibility penalty function are set as ω v = 0.01, ω a = 0.01 and ω j = 0.1. The repulsive force magnitude in the collision penalty function is k d = 0.1, α = 6. A cubic B-spline curve is used in the back-end trajectory optimization step, i.e., k = 3.

Our
All simulations are performed on a laptop with Intel Core i7-9750 CPU @ 2.6 GHz and 8 GB memory running Ubuntu 18.04 and ROS Melodic, and the programming language uses C++. The real-world experiments are carried out on a physical 6-DOF collaborative robot. The robotic workspace is surrounded by two Kinect cameras to detect workspace obstacles. The experimental parameter settings are the same as the above simulation settings. The online trajectory generation algorithm is deployed on an external PC, and the PC and the robot controller communicate through a 1 kHz network port.

Scenes with a Single Static and Dynamic Obstacle
In the first scenario, there is only one static obstacle in the workspace, but it is so close to the robot that the original trajectory may become infeasible (see Figure 6). The robot wants to run from the starting position to the goal position safely and efficiently, the planned trajectory should not only be collision-free to ensure safety but also shorten the planning time as much as possible to improve the robot's work efficiency. In the second scenario, a single obstacle reciprocates between the start and goal positions at a speed of 0.03 m/s, periodically blocking the robot's motion. At this time, the trajectory generation algorithm should ensure that the search path is collision-free and meets the real-time requirements. Figures 6 and 7 qualitatively show the experimental results of our method and the benchmark motion planners in a single static and dynamic obstacle scene, respectively.  From the qualitative experimental results in Figure 6, it can be seen that there are some sudden turning points in the planned paths of both the RRT X algorithm and the DRGBT algorithm, and the overall trajectories are not smooth, while the trajectory generated by our method is smoother and shorter in length. In terms of total runtime, our method only takes about 3.04 s, while the RRT X algorithm and the DRGBT algorithm take about 7.83 s and 5.84 s, respectively. Furthermore, as can be seen from the qualitative experimental results of a single dynamic obstacle shown in Figure 7, although both our method and the benchmark motion planners can search for collision-free paths, our method shows superiority and robustness. From the overall path of the search, the search path of our method is still smooth and short, while the search paths of the RRT X algorithm and DRGBT algorithm are tortuous and have large fluctuations. From the total runtime, our method takes only about 3.13 s, almost the same as the static scenario, which shows the robustness of our method. The total running times of the RRT X algorithm and the DRGBT algorithm are 13.97 s and 15.10 s, respectively, which are much larger than the time taken by our method.

Scenes with Multiple Static and Dynamic Obstacles
In the third case, two static obstacles parallel and perpendicular to the tabletop approach the robot. The start and goal positions are the same as in the first scene. Unlike the single static obstacle scenario, obstacles parallel to the tabletop impose constraints on the robot trajectory, reducing the robot's feasible space. In the fourth scenario, two dynamic obstacles perpendicular to each other and close to the robot reciprocate at a speed of 0.03 m/s between the starting position and the goal position, which imposes constraints on the motion trajectory in a more complex form. It is a challenging task as the robot needs to replan the motion trajectory in real time to quickly bypass obstacles in both vertical sections and then converge to the goal configuration. Figures 8 and 9 qualitatively show the experimental results of our method and the benchmark motion planners in scenes with multiple static and dynamic obstacles, respectively.   Goal pos. Figure 9. Visualization of robot motion trajectories in a multiple dynamic obstacles comparison experiment. The first row: the motion trajectory of the RRT X algorithm. The second row: the motion trajectory of the DRGBT algorithm. The third row: the motion trajectory of our algorithm. Figure 8 shows the path search results of our method and the benchmark motion planners in a scene with two static obstacles. It can be seen that our method outperforms the RRT X algorithm and the DRGBT algorithm in both total path length and total runtime. In terms of path length, our method generates a trajectory that bypasses obstacles from below the obstacle that parallels to the tabletop, greatly shortening the total running path length, in line with human intuition. Compared with our method, the trajectories generated by the RRT X algorithm and the DRGBT algorithm seem to have unnecessary deflections so that the total path length may be larger. In terms of total runtime, the total runtime of our method is significantly smaller than that of the RRT X algorithm and that of the DRGBT algorithm, with little change compared to the previous test scene time, which further demonstrates the robustness of our method. Figure 9 shows the path search results of our method and the benchmark motion planners in a scene with two dynamic obstacles. It can be seen that our method is superior to the RRT X algorithm and the DRGBT algorithm in both search path quality and total runtime. In terms of path quality, our method does not exhibit abrupt turns due to multiple dynamic obstacle constraints but stably converges to the goal. In terms of total runtime, our method takes only 3.43 s from the starting point to the goal point, while the RRT X algorithm and the DRGBT algorithm take about 11.81 s and 16.16 s, respectively. The total running time of both benchmark planners are more than three times that of our method.

Quantitative Evaluation and Analysis of Simulation Results
The path search algorithms are also quantitatively compared in terms of algorithm success rate, single iteration time, total runtime, and Cartesian space path length. Each algorithm was run 100 times in each scenario to precisely obtain each evaluation index. The RRT X algorithm is an asymptotically optimal single-query replanning algorithm that refines and repairs the same search graph using the obstacles or robot change information. The existing search graph is quickly reconstructed through a graph rewiring cascade to repair its shortest-path subtree to the target. Therefore, RRT X needs to continuously update and repair the global search graph according to changes in the environment or robot position. The DRGBT algorithm is based on an adaptive horizon setting through predefined C-space path target nodes, where each node is assigned a weight determined by relative distance and captured environmental changes. This setting requires the algorithm to perform constant distance queries to modify local paths. In fact, both the RRT X algorithm and the DRGBT algorithm are essentially sampling-based path-planning methods, which obtain geometric path information without including time information. However, due to limited sampling, the quality of the planned path is not ideal. We also observed some unpredictable stochastic behaviors, as shown by large fluctuations and redundancy in the planned trajectories in Figures 7-9. Since the path search costs of benchmark planners are evaluated without considering the control cost, the search paths may not be smooth. Our method takes into account the nonstatic initial state of the robot by integrating the control input over the duration to obtain the position, velocity, and acceleration state of each node. Since the search cost is evaluated by minimizing the control and time costs, the search path is time-minimized and exhibits good smoothness. Table 3 quantitatively shows the experimental results of each evaluation index in the four research scenarios. In terms of success rate, both the proposed method and the DRGBT algorithm can successfully reach the target point in each run in the first and third scenarios, while the success rate of our approach is higher than that of the DRGBT algorithm in the second and fourth scenarios. The RRT X algorithm can successfully reach the target point in each run in the first scenario, while the success rate in other scenarios is lower compared to our method and the DRGBT algorithm. In terms of single iteration time, we show the maximum time, average time, and runtime standard deviation of the proposed method and the benchmark motion planners in four scenarios, respectively. From the experimental results, the single iteration time of our method significantly outperforms the benchmark motion planners. This shows that our method can quickly adjust the local trajectories in a very short time, which also explains why our method has a higher success rate than the benchmark motion planners in the second, third, and fourth scenarios. The randomized behavior of the sampling-based planner may also be another factor for the low success rate of the RRT X and DRGBT algorithms in dynamic environments. In terms of total runtime, the proposed method is also smaller than the benchmark motion planners, which means that our method can reach the target point from the starting point faster. In terms of path lengths in Cartesian space, the search path of our method is significantly smaller than those of the benchmark motion planners. The main reason is that the randomized behavior of the RRT X algorithm and the DRGBT algorithm may result in unpredictable performance, especially with a limited number of samples.

Comparison of Trajectory Optimization
For trajectory optimization, we benchmarked our method against existing state-ofthe-art trajectory optimization algorithms, the CHOMP algorithm [16], and the TrajOpt algorithm [17], which are widely used for manipulator trajectory optimization. We created a simulated environment consisting of multiple staggered static obstacles, and the robot needs to move from a starting position to a goal position safely and efficiently. For fairness, our method and the benchmark optimization algorithms are validated in the same experimental environment. Figure 10 qualitatively shows the experimental results of our method and the benchmark optimization algorithms. Figure 10a shows the whole trajectory executed by the robot. It can be seen that the motion trajectory generated by our method is smoother, and the overall trajectory length is shorter. Figure 10b also shows the screenshots of the robot performing the complete experiment. We quantitatively compare our method with the benchmark optimization algorithms in terms of success rate, single computation time, and trajectory smoothness. We perform 300 experiments for each algorithm, and the average value of each evaluation index was counted as shown in Table 4. We observe that our method can obtain collision-free trajectories from the starting point to the target point in all experiments, while the CHOMP algorithm and TrajOpt algorithm have a failure rate of 27% and 17%, respectively. It is mainly because CHOMP directly generates the initial trajectory from the starting point to the target point without considering obstacles. Then, the CHOMP algorithm iteratively adjusts the initial trajectory, but the gradient descent may fall into the local minimum of the cost function. The TrajOpt algorithm represents a trajectory in discrete-time form, may require post-optimization for execution, and may not stay collision-free. Our method adopts a replanning strategy that optimizes only one segment of the trajectory at each time step to avoid local minima. In addition, the initial trajectory generated by our path search module is inherently collision-free, which further improves the success rate of the optimization algorithm. In terms of single computation time, our method only takes about 0.216 ms, while the CHOMP and TrajOpt algorithms require 56.7 ms and 69.2 ms, respectively. The main reason is the CHOMP algorithm requires multiple iterations to optimize the entire trajectory and the TrajOpt algorithm needs to evaluate the fine discretization costs of the trajectory. The proposed method optimizes the replanned trajectory only within the search horizon instead of optimizing all trajectory points, thus saving optimization time. The global trajectory may become infeasible due to the uncertainty in the motions of the dynamic obstacles, so optimizing the global trajectory may be ineffective. In terms of total optimization time, our method only takes about 0.0635 s, while the CHOMP and TrajOpt algorithms take about 5.574 s and 1.94 s, respectively-much more than the optimization time of our method. Although the single iteration time of CHOMP is shorter than that of TrajOpt, the total optimization time is longer, which may be due to the higher number of iterations than TrajOpt. Then, combined with the trajectory smoothness information, it can be found that our method can generate a smoother trajectory even with less optimization time.

Real-World Experiments
In this section, we validate the proposed method in the real world. A human partner holding an obstacle gradually approaches the moving robot in the current experimental setup. Figure 11 qualitatively shows the experimental results of online collision avoidance for physical robots. The first and second rows show the experimental results of the end-effector and links adjusting their trajectories online to avoid collision with obstacles, respectively. Figure 12 also quantitatively shows the trajectory changes of the end effector and a constrained interest point in the X, Y, and Z directions. From Figure 12a,b, it can be seen that the motion trajectories of the end effector and the constrained interest point are effectively adjusted and gradually converge to the target position (red dashed box).
In addition, we also deploy the proposed algorithm in a robot pick-place task. More specifically, when the robot picks up an item and places it in a placement box, the human partner interferes with the robot, and the robot need replan the motion trajectory to the placement box. Figure 13 shows a series of screenshots of the experimental process. Figure 13a indicates that the robot starts to pick an item; Figure 13b indicates that the human partner interferes with the robot when the robot is carrying the item; Figure 13c indicates that the robot adjusts its motion trajectory to avoid the human partner (red dotted box); and Figure 13d indicates that the robot successfully avoids the obstacle and transports the item to the placement box. Figure 13e also quantitatively shows the trajectory change process of the end-effector.

Conclusions
In this paper, we propose an efficient and complete online trajectory generation method to help a manipulator autonomous planning in dynamic environments. The overall framework is built bottom-up, and the trajectory generation problem is decoupled into front-end kinodynamic path search and back-end B-spline trajectory optimization modules. Given the current end-effector state, a series of discrete control inputs, and the links constraints, the front-end path search module generates a safe, smooth, and time-minimized initial trajectory. Then, a trajectory optimization method using the B-spline convex hull property is designed to increase the clearance between the trajectory and obstacles and improve the smoothness. To avoid the collision between the links and obstacles and the links themselves, the constraint-relaxed links collision avoidance method is integrated into the back-end optimization step by solving a standard quadratic programming problem. Finally, a complete real-time collision-free motion planning framework is developed to improve the safety and efficiency of robots working in unstructured dynamic environments.