Next Article in Journal
Dual-Branch CNN–Mamba Method for Image Defocus Deblurring
Previous Article in Journal
Large-Deflection Mechanical Modeling and Surrogate Model Optimization Method for Deformation Control of Flexible Pneumatic Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Human–Robot Skill Transfer Strategy with Task-Constrained Optimization and Real-Time Whole-Body Adaptation

State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(6), 3171; https://doi.org/10.3390/app15063171
Submission received: 12 February 2025 / Revised: 8 March 2025 / Accepted: 10 March 2025 / Published: 14 March 2025

Abstract

:
Human–robot skill transfer enables robots to learn skills from humans and adapt to new task-constrained scenarios. During task execution, robots are expected to react in real-time to unforeseen dynamic obstacles. This paper proposes an integrated human–robot skill transfer strategy with offline task-constrained optimization and real-time whole-body adaptation. Specifically, we develop the via-point trajectory generalization method to learn from only one human demonstration. To incrementally incorporate multiple human skill variations, we encode initial distributions for each skill with Joint Probabilistic Movement Primitives (ProMPs) by generalizing the template trajectory with discrete via-points and deriving corresponding inverse kinematics (IK) solutions. Given initial Joint ProMPs, we develop an effective constrained optimization method to incorporate task constraints in Joint and Cartesian space analytically to a unified probabilistic framework. A double-loop gradient descent-ascent algorithm is performed with the optimized ProMPs directly utilized for task execution. During task execution, we propose an improved real-time adaptive control method for robot whole-body movement adaptation. We develop the Dynamical System Modulation (DSM) method to modulate the robot end-effector through iterations in real-time and improve the real-time null space velocity control method to ensure collision-free joint configurations for the robot non-end-effector. We validate the proposed strategy with a 7-DoF Xarm robot on a series of offline and real-time movement adaptation experiments.

1. Introduction

Recent advances in flexible manufacturing have heightened demands to release traditional industrial robots from repetitive tasks under controlled conditions. Robots are expected to complete tasks with humans in a shared workspace named human–robot coexistence [1]. Human–robot skill transfer plays a key role for the robot to learn new skills from humans and adapt to new, constantly changing task scenarios. In this context, Learning from Demonstration (LfD) [2,3,4,5,6] frameworks have been proposed, where robots learn tasks by observing and imitating human demonstrations. A core component of LfD are the Movement Primitives (MPs), which are well-established movement encoding methods with temporal and spatial modulation; complex tasks are accomplished by sequencing and combining different MPs [7].
For MPs learning, the basic-level capability refers to generalizing the task start, end, and intermediate points (via-points) under varying conditions while preserving the demonstration features, even for points outside the range (extrapolation ability); the research methods generally involve deterministic and probabilistic approaches. Deterministic methods, such as Dynamic Movement Primitives (DMPs) [8], model movements with forced nonlinear differential equations. DMPs generalize from a single demonstration but lack support for via-points or multiple trajectory statistics, and their extrapolation depends on start-to-goal relative distances. Probabilistic methods, such as Gaussian Mixture Models with Gaussian Mixture Regression (GMM-GMR) [9], Probabilistic Movement Primitives (ProMPs) [10], and Kernelized Movement Primitives (KMPs) [11], encode the mean and variance of multiple trajectories. GMM-GMR models the joint probability distributions using GMM and derives the task execution parameters via GMR, but struggles with via-point adaptation and extrapolation. ProMPs use basis functions and hierarchical Bayesian models to generalize via-points through conditional probability, but require extensive demonstrations and face extrapolation limits. KMPs avoid explicit basis functions, offering a non-parametric approach for high-dimensional tasks. They generalize via-points effectively and exhibit robust extrapolation, but suffer from numerical instability and high computational costs as timesteps increase. To combine deterministic and probabilistic approaches, Probabilistic Dynamic Movement Primitives (ProDMPs) [12] convert DMPs numerical integration into basis functions for encoding ProMPs, utilizing deep neural networks (NNs) for nonlinear trajectory conditioning. ProDMPs enhance via-point adaptation through NNs but exhibit limited extrapolation when via-points exceed the demonstration range.
The core level of MPs’ learning focuses on adapting MPs to unknown task scenarios, termed MPs adaptation. These scenarios involve task constraints, expressed as inequality equations in Joint and Cartesian space, including position/velocity limits, obstacles, and virtual walls. Extensive research on MPs adaptation has been conducted under various frameworks. Deterministic methods, such as DMPs, enhance obstacle avoidance by applying repulsive forces in the acceleration domain for multiple obstacles [13] and non-point obstacles [14]. Position limits are addressed through exogenous state transformations [15], while velocity limits are managed via time-governed differential equations [16]. DMPs compute efficiently as constraints are directly integrated, avoiding iterative processes. However, they struggle with simultaneous Joint and Cartesian space constraints, particularly under multiple repulsive forces, leading to local minima. DMPs also lack adaptability to nonlinear constraints and exhibit limited generalization, as they are tailored to specific trajectories, restricting flexibility in goal-reaching. Probabilistic methods, such as ProMPs, encode variations through covariance matrices to satisfy task constraints. For example, ref. [17] combines ProMPs with Gaussian multiplications, introducing primitives to handle obstacles, but this requires anticipating all possible situations. Alternatively, ref. [18] optimizes ProMPs by excluding regions using the Kullback–Leibler (KL) divergence and reward functions, with policies derived via Relative Entropy Policy Search (REPS) [19], while ref.  [20] minimizes the Mahalanobis distance to the original ProMPs. However, these methods focus solely on the end-effector, neglecting collisions for other links and constraints, like virtual walls or position/velocity limits. They are efficient but struggle to handle multiple combined constraints. To address these limitations, ref. [21] integrates multiple task constraints in Joint and Cartesian space into a unified probabilistic framework, optimizing ProMPs using the gradient ascent-descent algorithm. However, the covariance matrix of ProMPs basis functions can face symmetric positive-definite issues during iteration, and optimizing timesteps satisfying task constraints increases computational cost. KMPs replace explicit basis functions with a kernel matrix, extending to linear constraints (LC-KMPs) [22] via quadratic programming (QP). Further, ref. [23] handles nonlinear constraints (EKMPs) by approximating them with the Taylor expansion to solve with QP. While EKMPs improve efficiency over [21], they face numerical instability and high computational costs as timesteps grow, and their reliance on Taylor expansion for nonlinear constraints limits the adaptability due to inaccurate estimations.
When a robot adapts learned skills to new task-constrained scenarios, unexpected obstacles or human partners may appear during execution. Due to the significant replanning time required for MPs’ adaptation, real-time reactions are hindered, limiting applications to offline and static environments. Thus, real-time movement adaptation is crucial for dynamic environments, often categorized into global and local strategies. Global methods focus on convergence towards the goal. For instance, Rapidly Exploring Random Trees (RRT)* utilizes space-filling trees to find feasible paths with automatic sampling length [24], and Covariant Hamiltonian Optimization for Motion Planning (CHOMP) employs gradient optimization for motion planning [25]. These methods exhibit fast convergence but struggle in dynamic environments. Model Predictive Control (MPC) reduces adaptation to a finite time horizon [26], but defining cost functions is challenging. Reinforcement learning-based adaptation trains end-to-end strategies [27], but it lacks reliable convergence guarantees and requires extensive training. Local methods generate nonlinear control fields evaluated in real-time. Artificial Potential Fields (APF) [28] represent obstacles as repulsive fields, offering computational efficiency, but are prone to local minima. Harmonic Potential Functions (HPF) [29] ensure no critical points in free space but lack analytical solutions. Inspired by HPF, Dynamical System Modulation (DSM) [30] ensures convergence using fluid dynamics around obstacles, with stability proven via Lyapunov theory and extensions to handle multiple obstacles [31], real-time sensing [32], and rotation dynamics [33]. DSM demonstrates high efficiency for real-time obstacle avoidance in dynamic environments, providing closed-form solutions without local minima. However, DSM assumes no initial collisions and requires dynamic modulation in advance. If new positions from the last timestep encounter collisions during convergence to the goal, the robot may stop, limiting real-time performance. Additionally, DSM treats the robot as a zero-dimensional point, focusing only on the end-effector and ignoring collisions with other rigid links. In this context, null space velocity control [34] and task priority control [35] are proposed for non-end-effector real-time obstacle avoidance. However, they assume the obstacles to be stationary and only consider robot non-end-effector portions. The work [36] combines DSM with null space velocity control for whole-body adaptation, assigning retreat velocities to avoid collisions. However, this approach performs adaptation only once, and the retreat velocity norm limits non-end-effector competence.
This paper proposes a unified human–robot skill transfer strategy that integrates offline task-constrained optimization and real-time whole-body adaptation. The contributions are outlined as follows:
1.
Considering basic-level MPs learning, to facilitate human–robot skill transfer, we develop the via-point trajectory generalization method enabling the robot to learn and generalize smooth trajectories from only one human demonstration. To incrementally incorporate multiple human skill variations, we encode initial distributions for each skill with Joint ProMPs by generalizing the template trajectory with discrete via-points and deriving corresponding inverse kinematics (IK) solutions.
2.
Considering core-level MPs adaptation, given initial Joint ProMPs, we propose an effective task-constrained probabilistic optimization method integrating multiple task constraints in Joint and Cartesian space via a double-loop optimization. We decouple the ProMPs into Gaussians at each timestep, only optimizing those timesteps violating task constraints, and update ProMPs independently. Given optimized ProMPs, we propose the analytical movement adaptation method to utilize ProMPs directly for task execution with a threshold.
3.
Considering real-time movement adaptation, during task execution, we propose an improved robot whole-body movement adaptation method, combining offline task-constrained optimization. For the robot end-effector, we incorporate an offline-optimized trajectory for dynamic modulation and develop DSM by performing multiple modulation iterations until safe; for the robot non-end-effector, we integrate the offline-optimized Joint ProMPs to improve the real-time null space velocity control method to ensure collision-free joint configurations through iterations.
4.
We conduct both offline and online movement adaptation experiments with comparative results to validate the effectiveness of the proposed strategy.

2. Via-Point Trajectory Generalization and Initial Distributions Encoding

2.1. Learning New Skills in Cartesian Space with Via-Point Trajectory Generalization

To endow the robot with skill diversity for obstacle avoidance, the human provides three modes, namely, moving above, left, and right around the obstacle. For each mode, only one human demonstration is required, and we utilize a markerless vision capture system, the Kinect FORTH [37], to track human hand centroid movements with position and orientation sets x , y , z , o x , o y , o z , o w recorded in T timesteps. We adopt the Moving Average Filter (MAF) to ensure smoothness and transform to the robot base frame via hand-eye calibration.
To learn skills denoted as positions and quaternions in Cartesian space, we express the demonstration trajectory as an integration of linear elementary trajectory h and nonlinear shape modulation f, and the schematic diagram of the proposed via-point trajectory generalization is shown in Figure 1.
For 3D positions, where each degree of freedom (DOF) is independent, we derive position sets p with T timesteps in one DOF as follows:
y p = h p + f p
The elementary trajectory h p is a line directly connecting the start point p 1 and the end point p end , with the shape modulation f p defined as follows:
f p = p p 1 + r p 1 p end
where r is the scale factor denoting the proportion of projective points on the line h p ,
r = p 1 p . p 1 p end p 1 p end . p 1 p end
If no intermediate via-points are required to generalize, we directly generate new projective points between the new start point p 1 new and end point p end new according to r, and add learned shape modulation f p to generalize new trajectory sets p new :
p new = p 1 new + r p 1 new p end new + f p
Once a via-point t via , p via is needed, we scale the shape modulation with the coefficient k:
k = p via p 1 new + r p 1 new p end new f p 1
Subsequently, new trajectory sets p new are generalized upon:
p new = p 1 new + r p 1 new p end new + k f p
For quaternion sets, considering dimensions are affected by each other in orientation, we define the trajectory with:
y o = h o × f o
where o = o w , o x , o y , o z represents the quaternion sets, and × denotes quaternion multiplication. The elementary trajectory h o is derived using the spherical linear interpolation (SLERP) as follows:
h o = sin 1 α o β sin β h o 1 + sin α o β sin β h o end
where cos β = h o 1 . h o end , α o is the ratio of SLERP, ranging from 0 to 1. h o 1 and h o end are the start and end points of the elementary trajectory:
h o 1 = y o 1 × f o 1 1 , h o end = y o end × f o end 1
In order to learn f o from demonstration, we set α o to distribute uniformly over T timesteps and set f o 1 , f o end to 1 , 0 , 0 , 0 . Once new start quaternion sets o 1 new and end quaternion sets o end new are required during generalization, we compute a new elementary trajectory h o new using Equations (8) and (9), with learned shape modulation f o superimposed to generate a new orientation trajectory y o new using Equation (7). Note that there is no need to generalize via-points in orientation, as we can achieve via-points generalization in 3D positions while keeping the original orientations.

2.2. Model Joint Space Distributions with Probabilistic Movement Primitives

Probabilistic Movement Primitives (ProMPs) [10] are probabilistic methods based on hierarchical Bayesian models to encode variances in Joint or Cartesian space. To promote integrating all nonlinear task constraints for the robot end effector and other links analytically, we model the Joint space distributions with ProMPs. Given the positions and orientations x t , y t , z t , o t i = 1 N generalized in Section 2.1, we obtain all feasible joint sets q t i = 1 N = F IK x t , y t , z t , o t i = 1 N by the Jacobian inverse kinematic (IK) algorithm [38]. q t R D , t 0 , T , N is the number of demonstrations, and D is the joint DOFs.
ProMPs model joint positions as a linear combination of Gaussian basis functions Ψ t and a weight vector w i under the presence of a zero-mean Gaussian noise ε n ,
q i t = Ψ t T w i + ε n
Joint positions in one demonstration are expressed with the weight vector w i via linear ridge regression [39]. Using parameters θ = μ w , Σ w to estimate the weight vector w with Gaussian distributions p w ; θ = N w | μ w , Σ w , the probability of observing a trajectory τ given θ becomes:
p τ ; θ = p τ | w p w ; θ d w = t N q t Ψ t T μ w , Ψ t T Σ w Ψ t + Σ n
where μ w and Σ w denote the mean and covariance of the weight vector, which are derived by maximum likelihood estimation, and Σ n is the covariance of the Gaussian noise. When it comes to new task conditions Q * = q * , Σ q * , where q * and Σ q * are the expected mean and covariance of the joint positions at time t, the updated parameters θ new = μ w new , Σ w new can be computed based on Bayesian theory as follows:
p w Q * N Q * Ψ * T w , Σ q * p w
μ w new = μ w + Σ w Ψ * Σ q * + Ψ * T Σ w Ψ * 1 q * Ψ * T μ w Σ w new = Σ w Σ w Ψ * Σ q * + Ψ * T Σ w Ψ * 1 Ψ * T Σ w
where Ψ * denotes the basis function at time t. The Joint positions under new task conditions p new are sampled from Gaussian distributions as follows:
w new N μ w new , Σ w new p new = Ψ T w new

3. Offline Optimization and Adaptation Under Nonlinear Task Constraints

After generalizing feasible collision-free trajectories with initial ProMPs distribution encoded, we put forward the optimization method under various nonlinear task constraints. Given optimized ProMPs distribution, we present the movement adaptation method for task execution. We formulate the constrained optimization problem in Section 3.1, provide analytical expressions of task constraints in Section 3.2, and summarize the complete procedure in Section 3.3.

3.1. Problem Formulation

Given initial ProMPs distribution p 0 w = N w ; μ w 0 , Σ w 0 , the main objective is to obtain the optimized ProMPs distribution p w * = N w * ; μ w * , Σ w * , which is as close as possible to the original distribution while satisfying task constraints. We use the Kullback–Leibler divergence to represent the similarity between optimized and initial distributions owing to its analytical tractability, while expressing the constraints with cumulative distribution functions. However, in this paper, considering ProMPs model every timestep as an independent Gaussian distribution, we optimize the parameters of each timestep separately. Specifically, we convert the initial distribution p 0 w = N w ; μ w 0 , Σ w 0 to the mean and standard deviation μ t 0 , σ t 0 of each timestep t as follows:
μ t 0 = Ψ t T μ w 0 σ t 0 = sqrt Ψ t T Σ w 0 Ψ t
For each timestep t, the new mean and standard deviation μ t * , σ t * are optimized under task constraints separately. Let c k , t = C μ t , σ t denote the function of the k t h task constraints related to μ t , σ t , and F c k , t μ t , σ t denote the cumulative distribution function (CDF) of the random variable defined by c k , t , then we can rewrite the optimization problem as a Lagrangian function:
L μ t , σ t , λ k , t = D KL N q t ; μ t , σ t | | N q t 0 ; μ t 0 , σ t 0 + k , t λ k , t α k , t F c k , t μ t , σ t
where α k , t is the confidence level to represent the probability satisfying task constraints P μ t , σ t H c k , t α k , t , and H represents the one-side or two-side inequality constraints; the optimized mean and standard deviation μ t * , σ t * are derived by gradient ascent-descent methods as follows:
μ t * , σ t * , λ k , t * = max λ k , t 0 min μ t , σ t L μ t , σ t , λ k , t
Subsequently, given μ t * , σ t * , we implement another optimization step to obtain the optimized ProMPs distribution μ w * , Σ w * with:
μ w * = min μ w Ψ t T μ w μ t * Σ w * = min Σ w Ψ t T Σ w Ψ t σ t *
In Section 3.2, we formulate every task constraint by defining the function c k , t = C μ t , σ t related to μ t , σ t , expressing the inequality constraint function H and approximating the cumulative distribution function F c k , t μ t , σ t ; the optimized results are then numerically iterated based on automatic differentiation.

3.2. Constraints Definitions in Joint Space and Cartesian Space

A robot may encounter constraints in Joint space and Cartesian space at one time. Joint space constraints depend directly on μ t , σ t of each joint freedom, which usually refer to the motion range of the robot’s links. The Cartesian space constraints depend on the nonlinear forward kinematics function of μ t , σ t , standing for the 3D positions of the end effector or any other interest points on the robot links. In the following, we formulate several key constraints commonly used in obstacle avoidance, like the joint range limit, waypoints, hyperplane, and repellers.

3.2.1. Joint Range Limit

The joint range limit indicates the limit motion range in joint space based on the robot’s configuration or the user’s expectation. Considering the limit motion range is usually ensured with the forward kinematics (FK) function. In this paper, we focus on retaining the original joint range limit learned with ProMPs as much as possible. We formulate the constraint function as c k , t = q t μ t , σ t , and denote the two-side constraint with:
P μ t , σ t q k , t min < q t μ t , σ t < q k , t max α k , t
where k denotes the kth joint freedom, α k , t is the confidence level, and q k , t min and q k , t max are the minimum and maximum of the joint positions from demonstration, respectively. Due to the Gaussian distributions of the joint positions, we calculate the difference between the CDF of the maximum and minimum points as follows:
F c k , t μ t , σ t = F q k , t max μ t σ t F q k , t min μ t σ t
where F denotes the CDF of the Gaussian distributions.

3.2.2. Waypoints

Waypoints are Cartesian positions X ¯ t on the robot’s end effector or any other interest points that we expect the robot to reach at timestep t or a temporal interval T 0 , T . Waypoints are defined in the Cartesian space with the FK function T to associate with some DOFs of the joint positions. Based on the distance square between the expected points and the actual points, we denote the waypoint constraint function as follows:
c k , t = | T q t μ t , σ t X ¯ t | 2
And express the one-side constraint with:
P μ t , σ t | T q t μ t , σ t X ¯ t | 2 d k , t 2 α k , t
where d k , t is the distance threshold and α k , t is the confidence level. Due to the nonlinear nature of the constraint function, we can not derive the exact distribution of c k , t with Bayesian theory. To tackle the problem, the Unscented Transform [40] method is adopted in [21]. However, the initial factor deeply influences the estimation result—when the mean and standard deviation change, the factor can not reflect the variance of the constraint function precisely. Alternatively, in this paper, we utilize moment matching to estimate the c k , t distribution:
S = { c k , t q 1 , q 2 , , q D , q k { μ k 2 σ k , μ k + 2 σ k } } E c k , t Mean S V c k , t Var S
where D is the DOF number required to express the constraint function. We utilize 2x the standard deviation to constitute the set S and compute the mean and variance based on moment-matching. Given that the distribution of the random variable c k , t is a generalized- χ 2 , we approximate it with a simpler Gamma distibution to represent the CDF in an analytical way. The shape α and rate β of the Gamma distribution are derived as follows:
c k , t Γ α , β α = E c k , t 2 / V c k , t , β = E c k , t / V c k , t
The CDF of the Gamma distributions are then computed using F c k , t = F d k , t ; Γ α , β to represent the waypoints constraint probability via Equation (22).

3.2.3. Hyperplane

During motion, the robot is required to be confined to specific planes called the hyperplane or the virtual wall. Hyperplane constraints are defined in the Cartesian space related to the positions of the robot’s end effector or any other interest points. We can formulate a hyperplane constraint function as follows:
c k , t = n k , t T T q t μ t , σ t b k , t
where n k , t and b k , t are the normal and bias vectors of the hyperplane. The one-side inequality constraint is expressed with:
P μ t , σ t n k , t T T q t μ t , σ t b k , t 0 α k , t
The distribution of c k , t is a linear transformation of the robot’s 3D positions T q t μ t , σ t ; however, T q t μ t , σ t is nonlinear in μ t , σ t . We thus use our moment-matching method via Equation (23) to obtain the estimated mean E T and variance V T . The overall distribution of c k , t can be derived by:
c k , t N n k , t T E T b k , t , n k , t T V T n k , t
Subsequently, The CDF of c k , t is computed using Gaussian distributions to represent the hyperplane constraint probability via Equation (26). Note that we can impose hyperplane constraints at a timestep or a temporal interval, and combine several hyperplane constraints to represent more complex convex domains.

3.2.4. Repellers

The robot needs to avoid obstacles in the environment while performing tasks. Constraints between the robot and obstacles are called repeller constraints. Repellers constraints are imposed not only on the robot end effector but also on any other danger points on the robot link. In this paper, we choose three segments on the robot link in each direction: front, back, left and right, and use cylinders to envelop obstacles. Considering we only need to consider danger situations where the robot is apt to collide, we define the danger distance to express the repellers constraints. For three segments in four directions, we compute the shortest distance between segments on the robot link and the axis segment, with the minimum distance d min found afterward. If the minimum distance is greater than the safety radius R safe of obstacles, the danger distance d danger is 0, and if else, the danger distance is computed as d danger = R safe d min . We formulate the repellers constraint function as follows:
c k , t = 0 , d min R safe R safe d min T q t μ t , σ t , d min < R safe
Furthermore, we express the one-side inequality constraint to show the probability that the minimum distance is greater than the safety radius:
P μ t , σ t c k , t 0 α k , t
The distribution of c k , t is nonlinear in μ t , σ t . We thus utilize our moment-matching method with Equation (23) to obtain the mean and variance of c k , t . The CDF of c k , t is then computed using Gaussian distributions to denote the repellers constraint probability with Equation (29).

3.3. Optimization and Adaptation Procedure

3.3.1. Optimization Procedure

Based on the Lagrangian function in Equation (16) and constraints function defined in Section 3.2, we concretize the optimization procedure to obtain the optimal parameters. For notational brevity, we use the notation θ 0 = μ t 0 , σ t 0 and θ = μ t , σ t to denote the intial parameters and optimized parameters. We can rewrite the Lagrangian function in Equation (16) as follows:
L θ , λ k , t = D KL θ | | θ 0 + k , t λ k , t α k , t F c k , t θ
The term D KL θ | | θ 0 denoting the KL divergence between two univariate Gaussian distributions is further derived by:
D KL θ | | θ 0 = 1 2 + log σ 0 log σ + σ 2 + μ μ 0 2 2 σ 0 2
We express the optimized parameters derived in Equation (17) as follows:
θ * , λ k , t * = max λ k , t 0 min θ L θ , λ k , t
To solve the problem, we utilize a double-loop algorithm based on the gradient ascent-descent method. For the outer-loop, we optimize λ k , t with the gradient quasi-ascent steps. The Exponential Method of Multipliers (EMM) [41] method is suitable to solve the inequality constraints of the Lagrangian function via exponential factors. We can express the optimized λ k , t after each iteration as λ k , t s + 1 = λ k , t s . exp η k λ k , t L , where η k decides the convergence speed; the gradient of λ k , t is obtained by λ k , t L = α k , t F c k , t θ . For the inner-loop, we optimize θ with the gradient descent method. The LBFGS method [42] is tailored for solving nonlinear optimization problems with low computing and memory costs. We perform several LBFGS steps in θ , with the gradient θ L computed using the automatic differentiation frameworks in Tensorflow [43]. The convergence conditions are checked after each iteration of LBFGS to speed up convergence. After we obtain the optimized μ t * , σ t * , we directly perform several LBFGS steps to obtain the optimized μ w * , Σ w * based on Equation (18).

3.3.2. Adaptation Procedure

Subsequently, given optimized ProMPs distribution μ w * , Σ w * , we present the complete offline robot movement adaptation procedure. During the learning phase, the human first demonstrates three skills to avoid a single obstacle. The human hand centroid positions and orientations are equivalent to the robot end effector trajectories. For each mode, we generalize trajectory sets to the same start and end points, normalized to the same timestep T, to formulate the template trajectory. Aiming to generate feasible collision-free trajectory sets, we choose feature 3D via-points with equal intervals to generalize the initial trajectories (see Section 2.1). We check the collisions and fine-tune the via-points with threshold to construct initial collision-free trajectories x , y , z , o x , o y , o z , o w i = 1 N and obtain corresponding joint position sets q i = 1 N . We encode the initial joint distributions with ProMPs (see Section 2.2) as p 0 w = N w ; μ w 0 , Σ w 0 . To speed up optimization, we first find feature timesteps that actually violate task constraints, then impose task constraints at those timesteps t feature and optimize the ProMPs distributions (see Section 3.3.1). The optimized ProMPs p w * = N w * ; μ w * , Σ w * , the mean and standard deviation p q t = N q t ; μ t * , σ t * are treated as safe joint distributions.
During task execution, the robot first determines the start and end point by visual information, next randomly selects the running mode from three modes k a b o v e , l e f t , r i g h t , then samples the via-points with Gaussian distributions as p v i a = N μ x y z , σ x y z . The mean μ x y z and standard deviation σ x y z are computed from the initial via-points range. The trajectory is then generalized and fine-tuned to ensure safety according to the sampled via-point. The optimized μ t * and σ t * represent the joint limit range as q t safe μ t * 2 σ t * , μ t * + 2 σ t * with the initial joint value denoted as q t 0 = Ψ t T w * . The joint limited range and initial value are passed to the Jacobian IK algorithm to test whether the solution is calculated successfully. The succeed number of all feature timesteps decides whether the robot can perform the task successfully; that is, if the succeed number is more than a threshold, the robot directly runs the generalized trajectory, if else, the robot continues sampling until success. To obtain the threshold automatically, we perform equal space sampling from the initial via-points range, record the success result and corresponding succeed number, then determine the threshold at the turning point between success and failure. Note that if there are no feature timesteps that violate the task constraints, the robot only needs to generalize the trajectory without optimizing the distribution. Meanwhile, if any of the three modes can not be optimized, that is, the probability satisfying the task constraints always equals 0, we eliminate the mode from the optional modes. The complete offline optimization and adaptation procedure is outlined in Algorithm 1.
Algorithm 1 Robotic offline task constrained optimization and adaptation algorithm.
Input:
 •
Obstacles information obs x , obs y , obs z , R
 •
Mode template trajectory x , y , z , o x , o y , o z , o w k = 1 3
 •
Global start point and end point p 1 , p end
Output: Task execution trajectory x , y , z , o x , o y , o z , o w
 □
Generalize collision-free trajectories and encode initial ProMPs
 •
Choose and fine-tune via-points to generate N safe trajectories
 •
Obtain feasible joint sets q i = 1 N and encode with ProMPs μ w 0 , Σ w 0
 □
Optimize ProMPs under nonlinear task constraints
 •
Find feature timesteps t feature that violate task constraints
 •
Compute μ t 0 , σ t 0 using Equation (15)
 •
Derive c k , t , H and F c k , t μ t , σ t (see Section 3.2) to construct Equation (16)
 •
θ 0 θ 0 , λ k , t 0 defined in Equation (30)
 •
Optimisation 1: Learnable parameters 1:  μ t * , σ t * , λ k , t * in Equation (32)
repeat
  repeat
    θ s + 1 LBFGS L , θ L , θ s , λ k , t r
  until inner-loop-condition
   λ k , t r + 1 λ k , t r . exp η k α k , t F c k , t θ
until converged
 •
Optimisation 2: Learnable parameters 2:  μ w * , Σ w * in Equation (18)
repeat
   μ w s + 1 , Σ w s + 1 LBFGS μ w s , Σ w s , μ t * , σ t *
until converged
 •
Obtain optimized joint limit range and initial joint value q t safe μ t * 2 σ t * , μ t * + 2 σ t * , q t 0 = Ψ t T w *
 •
Determine the succeed threshold (see Section 3.3.2)
 □
Task Execution
repeat
 •
Sample modes k a b o v e , l e f t , r i g h t
 •
Sample via-points p v i a = N μ x y z , σ x y z
 •
Generalize and fine-tune trajectory
 •
Test succeed number of t feature via Jacobian IK
until success
 •
Run trajectory for task execution

4. Real-Time Whole-Body Movement Adaptation

Following offline movement adaptation under the static scene, the robot needs to react in real-time to immediate changes in the dynamic environment, for instance, multiple stationary or moving obstacles, new target points, etc. The analytical optimization process in Section 3 can not fulfill the real-time demands. In this context, real-time dynamic modulation upon velocity control provides a promising solution. To be specific, the offline task execution trajectory can be treated as an initial dynamic system; the objective is to obtain the modulated system for real-time robot whole-body movement adaptation including end-effector and other links (non-end-effector). We present the end-effector and non-end-effector adaptation methods in Section 4.1 and Section 4.2.

4.1. Real-Time End-Effector Adaptation

In this paper, we assume O convex obstacles around the robotic arm. Movement adaptation for the end-effector is achieved in Cartesian space by modulating the 3D positions while keeping the original orientations. Consider a state variable ξ R 3 denoting the end effector 3D positions; its temporal dynamics is governed by an autonomous (time-invariant) and continuous function f ξ as follows:
ξ ˙ = f ξ , f : R 3 R 3
Given the initial state ξ 0 , the trajectory can be iteratively computed by integrating f ξ according to:
ξ t = ξ t 1 + f ξ t 1 d t
where t is a positive integer and d t is the integration time step.
For each obstacle k k = 1 O , we define a distance function Γ ξ k with a continuous first-order partial derivative and monotonically increasing value away from the obstacle center ξ k c to distinguish the space around an obstacle into three regions:
Exterior points : H e = ξ k R 3 : Γ ξ k > 1 Boundary points : H b = ξ k R 3 : Γ ξ k = 1 Interior points : H i = ξ k R 3 : Γ ξ k < 1
To envelop obstacles, we construct the ellipsoid-shaped function Γ ξ k as follows:
Γ ξ k = i = 1 3 ξ k ξ k c / a i 2 p i k
where a i is the i-th axis length and p i k is a positive curvature parameter.
As shown in Figure 2, we parametrize the k-th obstacle (static or moving) as a vector P k = ξ k c , a k , ξ ˙ k L , ξ ˙ k R , where a k is the axis length, ξ ˙ k L and ξ ˙ k R are the linear and angular velocity with respect to the center ξ k c . Note that the static obstacle is a special case where the angular and linear velocities equal 0. Real-time obstacle avoidance for the end effector is achieved via a dynamic modulation matrix M ξ k R 3 × 3 given that the local modulation matrix will not bring in new extrema with a cheap computational cost and closed-form solution [44]. To be generic, we derive the modulated trajectory from the original dynamics system (DS) in Equation (33) as follows:
ξ ˙ k = M ξ k f ξ k ξ ˜ ˙ k + ξ ˜ ˙ k with ξ ˜ ˙ k = ξ ˙ k L + ξ ˙ k R × ξ ξ k c
where ξ ˜ ˙ k denotes the relative velocity induced by the obstacle motion. The modulation matrix M ξ k is composed of the orthogonal basis matrix E ξ k and the corresponding eigenvalue matrix D ξ k as follows:
M ξ k = E ξ k D ξ k E ξ k 1
For notational brevity, we define the relative position to the obstacle center ξ ˜ k = ξ ξ k c . Based on Equation (36), we derive the deflection hyperplane at point ξ with the normal vector n ξ ˜ k denoted as the distance function gradient d Γ ξ / d ξ :
n ξ ˜ k = Γ k ξ ˜ k ξ 1 , Γ k ξ ˜ k ξ 2 , Γ k ξ ˜ k ξ 3 T
Given the normal vector, we formulate the basis matrix as follows:
E ξ k = n ξ ˜ k , e 1 ξ ˜ k , e 2 ξ ˜ k
where the tangent vectors are orthonormal to the normal as follows:
e 1 ξ ˜ k = n ξ ˜ k 2 , n ξ ˜ k 1 , 0 e 2 ξ ˜ k = n ξ ˜ k × e 1 ξ ˜ k
We define the related diagonal eigenvalue matrix D ξ k to modify the normal and tangent directions independently with eigenvalues. The normal eigenvalue is selected to ensure impenetrability, that is, the eigenvalues should be zero on the obstacle surface. Meanwhile, the tangent eigenvalue should have the same magnitude in each direction to increase the velocity for moving around the obstacle. We formulate D ξ k as follows:
D ξ k = diag λ n ξ k , λ e ξ k , λ e ξ k
Considering the safety margin, reactivity, and tail-effect in [30], we define the eigenvalue as follows:
λ n ξ k = 1 1 Γ ξ k 1 ρ k n ξ ˜ k / η k T ξ ˙ k < 0 1 n ξ ˜ k / η k T ξ ˙ k 0 λ e ξ k = 1 + 1 Γ ξ k 1 ρ k
where η k is the safety magnitude for each direction, and ρ k is the reactivity parameter to determine the velocity modulation amplitude, that is, the larger the ρ k , the earlier a robot reacts to the obstacle. To remedy the tail-effect, we check n ξ ˜ k / η k T ξ ˙ k to evaluate whether the end effector is moving towards (negative) or away (positive) from the obstacle.
When it extends to multiple obstacles, as in Figure 2, we encapsulate all obstacles to create a single virtual obstacle by the weighted mean of each modulated DS in Equation (37). The benefit is that we only need to apply modulation once to ensure collision avoidance. Considering the impenetrability of the O obstacles and that the influence from other obstacles vanishes when approaching one obstacle boundary, we set the weight inversely proportional to the distance measure Γ ξ k 1 as follows:
w k ξ k = k = 1 , k i O Γ ξ k 1 Γ ξ k 1 + Γ ξ i 1
Given Equations (33) and (37), to integrate O obstacles, for simplicity, we first compute the weighted mean of the relative velocity induced by the obstacle motion as follows:
ξ ˜ ˙ all = k = 1 O exp 1 ρ k Γ ξ k 1 w k ξ k ξ ˙ k L + ξ ˙ k R × ξ ˜ k / η k
where we add the weight term w k rel = exp 1 ρ k Γ ξ k 1 to smooth the distance influence.
In this paper, the combined weighted mean of f ξ k refers to the init velocity ξ ˙ init computed from the offline task execution trajectory (see Section 3.3.2) with position differencing. To be specific, ξ ˙ init is a nonlinear function of ξ , which can be computed from the trajectory generalization method presented in Section 2.1 given the current position ξ . We further simplify Equation (37) as follows:
ξ ˙ mod = M ξ ˜ all ξ ˙ init ξ ˜ ˙ all + ξ ˜ ˙ all = q ξ ˜ all Q ξ ˜ all + ξ ˜ ˙ all
where Q ξ ˜ all and q ξ ˜ all represent the combined magnitude and direction of M ξ ˜ all ξ ˙ init ξ ˜ ˙ all .
For the magnitude, given separate equivalent magnitude Q ξ ˜ k for each obstacle from Equation (37) and the weight in Equation (44), we can obtain:
Q ξ ˜ all = k = 1 O w k ξ k Q ξ ˜ k
where Q ξ ˜ k is obtained from Equation (37):
M ξ k f ξ k ξ ˜ ˙ k = q ξ ˜ k Q ξ ˜ k
Concerning the combined directions, given each equivalent directions q ξ ˜ k , as [31], we define the function κ R 2 to project the 3D original space onto a 2D hyper-sphere κ space with radius π as follows:
κ ξ ˙ k , ξ ˜ = arc cos q ^ 1 ξ ˙ k q ^ 2 ξ ˙ k , q ^ 3 ξ ˙ k q ^ 2 ξ ˙ k + q ^ 3 ξ ˙ k , q ^ ξ ˙ k = R f T q ξ ˜ k
where R f is the projection matrix formulated from the unit vector n f ξ for the init vel ξ ˙ init along with two orthonormal bases e 1 f ξ , e 2 f ξ :
R f ξ = n f ξ , e 1 f ξ , e 2 f ξ
We then evaluate the weighted mean in this κ space as follows:
κ ξ all = k = 1 O w k ξ k κ ξ ˙ k , ξ ˜
with the combined direction q ξ all expressed back to the original 3D space as follows:
q ξ all = R f ξ cos κ ξ all , sin κ ξ all κ ξ all κ ξ all T
The final modulated velocity and position for the O obstacles can be computed with Equation (46) and velocity integration (Equation (34)). To sum up, the complete real-time robot end effector adaptation method comprises the following steps: First, we obtain the initial task execution trajectory with T timesteps and d t integration steps after the offline optimization and adaptation procedure in Section 3.3.2. Given the initial velocity at t = 0 with position differencing, the modulated velocity is computed from Equation (46) with the presence of dynamic obstacles. We then apply velocity integration (Equation (34)) to obtain the expected position for the next timestep. The offline trajectory beyond the current timestep is treated as a template trajectory. Given the current position and the changing target point, we utilize the proposed trajectory generalization method in Section 2.1 to generate the remaining trajectory and compute the velocity via position differencing. The velocity is treated as the initial velocity at the next timestep, and we apply modulation iteratively to obtain the whole modulated trajectory with the expected velocity for the robot end-effector. Note that if the modulated positons collide, we treat them as init positions and modulate them iteratively until safe.
During the real-time end-effector modulation process, we leverage the demonstration information from the offline optimization phase through trajectory generalization, preserving the human demonstration characteristics. The initial velocity at the next timestep is determined based on the generalized trajectory, rather than being arbitrarily chosen, which integrates offline optimization and online modulation. The end-effector modulation provides the modulated velocity and the desired position at each timestep, which are utilized for subsequent real-time non-end-effector adaptation.

4.2. Real-Time Non-End-Effector Adaptation

We ensure real-time obstacle avoidance for the robot end effector in Section 4.1; however, the potential collisions between other robot links (non-end-effector) and obstacles are not considered. To achieve real-time non-end-effector adaptation, given the expected velocity in Cartesian space, we utilize the robot joint redundancy with null-space velocity control. In particular, the 7-DoF manipulator is considered in this paper; its end effector velocity x v and joint velocity q v conform to:
x v = Jq v
where J is a 6 × 7 Jacobian matrix. Considering J is not a square matrix, given the expected end-effector velocity x v , the joint velocity q v is derived by:
q v = J # x v + N δ v
where J # is the generalized inverse and N is the null space matrix of J ; that is, whatever the value of the 7-dimensional joint velocity vector δ v , the resulted end-effector velocity N δ v remains 0. To derive J # , we minimize the norm error between J # x v and q v with the least square solution J + = J T JJ T 1 satisfying the Moor–Penrose inverse [45]. We rewrite Equation (54) with J + as follows:
q v = J + x v + I J + J δ v
The term I J + J δ v provides distinct joint velocity configurations resulting in the same end-effector velocity.
Considering multiple solutions for the joint velocity vector δ v in Equation (54), we aim to ensure that δ v retains the joint velocity distributions of the 7DoF ProMPs during offline optimization while minimizing the norm error between J # x v and q v in Equation (54). This allows us to integrate offline task-constrained optimization and real-time movement adaptation for robot whole-body control with Joint ProMPs. To derive the joint veloicty distributions, at the current time step t, the mean μ q v t and standard deviation σ q v t of the joint velocities are computed based on the joint positions and are expressed in the form of ProMPs basis functions as follows:
μ q v t = 1 d t μ q t + 1 μ q t σ q v t = 1 d t 2 σ q t + 1 + σ q t
where
μ q t + 1 = Ψ t + 1 T μ w * , μ q t = Ψ t T μ w * σ q t + 1 = Ψ t + 1 T Σ w * , σ q t = Ψ t T Σ w *
where μ q t , σ q t represents the optimized mean and standard deviation of the joint positions with μ w * , Σ w * denoting the optimized mean and covariance matrix of the 7DoF Joint ProMPs. To simplify the notation, we omit the superscript t. Given that the modulation is performed at each timestep, the mean and standard deviation of the joint velocities at each timestep are expressed with μ q v and σ q v .
To minimize the norm q v J # x v in Equation (54), while ensuring that the resulting joint velocity vector adheres to the optimized distribution at each time step, we define the loss function L ( δ v ) as follows:
L ( δ v ) = q v J # x v 2 + λ i = 1 7 ( q v i μ q v i ) σ q v i 2
We further substitute Equation (54) into the above equation to obtain:
L ( δ v ) = N δ v 2 + λ i = 1 7 N δ v + J # x v μ q v i σ q v i 2
where λ is the regularization term. By minimizing L ( δ v ) with respect to δ v , we obtain the joint velocity vector as follows:
δ v = N T N + λ N T Σ q v 1 N 1 N T μ q v J # x v
where Σ q v = diag ( [ σ q v 1 , σ q v 2 , , σ q v 7 ] ) 2 represents the diagonal covariance matrix composed of the standard deviations σ q v for the offline-optimized 7DoF joint velocities.
In addition to preserving the joint distribution, we also need to perform real-time collision detection between the robot’s links and obstacles with dynamic modulation to achieve null-space velocity control. Figure 3 demonstrates the null space velocity control strategy. We detect the closest point on the robot non-end-effector to each obstacle and enforce velocity x v 0 at that point if the point is within the obstacle safety margin d m . The vector d 0 starts from the obstacle surface to the point C 0 , with the same direction as x v 0 , and J 0 is the Jacobian matrix from the base to point C 0 . We further denote the end-effector velocity x v and the point C 0 velocity x v 0 with Equation (53):
x v = Jq v , x v 0 = J 0 q v
We substitute Equation (54) to x v 0 = J 0 q v with δ v derived as folllows:
δ v = J 0 N # x v 0 J 0 J # x v
To achieve a trade-off between maintaining consistency with the optimized distribution and ensuring a safe distance from the robot non-end-effector, we define the coefficients γ > 0 and ρ > 0 satisfying γ + ρ = 1 . We weight the joint velocity vectors from Equations (60) and (62) to obtain the combined joint velocity vector δ v * as follows:
δ v * = γ N T N + λ N T Σ q v 1 N 1 N T μ q v J # x v + ρ J 0 N # x v 0 J 0 J # x v
Taking q ˙ and x ˙ as control variables and substituting Equation (63) to q ˙ = J # x ˙ + N δ v * , we obtain:
q ˙ = J # x ˙ + γ N N T N + λ N T Σ q v 1 N 1 N T μ q v J # x v + ρ N J 0 N # x v 0 J 0 J # x v
Due to the idempotent and hermitian nature of N = I J + J in Equation (55), that is, J 0 N # = N J 0 N # , we simplify Equation (64) as follows:
q ˙ = J # x ˙ + γ N N T N + λ N T Σ q v 1 N 1 N T μ q v J # x v + ρ J 0 N # x v 0 J 0 J # x v
Given the expected end-effector velocity x v , we can utilize Equation (65) to design a feedback controller for non-end-effector adaptation. In this context, J # x ˙ is used to track x v , J 0 J # x v denote the point C 0 velocity attributed to x v , and J 0 N # is used to transform the Cartesian velocity x v 0 J 0 J # x v into the joint velocity. In this paper, considering the manipulator redundancy is 1, less than 3, we define the direction of x v 0 the same as vector d 0 as x v 0 = v 0 n 0 , n 0 = d 0 d 0 , where n 0 is the unit vector of d 0 , and v 0 is a scalar velocity. To obtain smooth and flexible motion for obstacle avoidance, we define v 0 as in Equation (66), where d m is the safety margin, and d n denotes the distance at which the obstacle begins to affect the manipulator to construct a smoothing factor ( 1 + cos π d 0 d m d n d m ) .
v 0 = λ v v n λ v = d m d 0 2 1 if d 0 < d m 1 2 1 + cos π d 0 d m d n d m d m d 0 2 1 if d m < d 0 < d n 0 if d 0 d n
Let x = T q denote the end effector pose obtained through the forward kinematics function T . ; the expected end effector pose can be derived with velocity integration:
x d = ξ ˙ mod + ξ ˙ mod d t = x v + x v d t
we then express the feed-back controller error e x as follows:
e x = x d T q
and further denote the resulted joint gradient d q as follows:
d q = J # e x + γ N N T N + λ N T Σ q v 1 N 1 N T μ q v J # x v + ρ J 0 N # x v 0 J 0 J # x v
We derive q with integration q = q + d q d t , and continue the iteration process until the error meets the stopping criterion to obtain the modulated joint positions q mod for real-time task execution.
To integrate the offline and real-time robot movement adaptation methods presented in Section 3 and Section 4, we summarize the overall control framework in Figure 4.
Given an offline task execution trajectory x , y , z , o t = 1 T obtained from Algorithm 1, the initial velocity ξ ˙ init t is computed using position differencing. In the presence of O dynamic obstacles, represented as ξ k c , a k , ξ ˙ k L , ξ ˙ k R k = 1 O , we derive the modulation matrix M ξ k R 3 × 3 and the weight coefficient w k ξ k for each obstacle. Additionally, we identify the closest point C 0 on the robot’s non-end-effector links and define the corresponding vector d 0 . The avoidance velocity x v 0 = v 0 n 0 , n 0 = d 0 d 0 is then determined accordingly from Equation (66).
For online end-effector adaptation, we compute the equivalent modulation magnitude and direction for each obstacle, denoted as q ξ ˜ k Q ξ ˜ k . These are then combined using weighted summation with w k ξ k , along with hyper-sphere projection κ R 2 , to obtain the overall modulation effect q ξ ˜ all Q ξ ˜ all . The resulting modulated velocity ξ ˙ mod serves as the desired end-effector velocity. The expected position x d at the next timestep is determined via velocity integration. Note that if x d collides, we treat it as the initial position and modulate it iteratively until safe.
For online non-end-effector adaptation, given the mean and diagonal covariance matrix μ q v , Σ q v from offline optimization with 7DoF Joint ProMPs, along with the null-space velocity component, Equation (69) is constructed to design the control diagram. Given the expected position x d , the system iterates through a control loop, minimizing the error defined in Equations (67)–(69) until the stopping criteria are met. If any joint configuration results in a collision, the nominal velocity norm v 0 is iteratively increased, and the control loop is repeated until a safe motion is achieved. The modulated joint positions q mod are then used for real-time task execution. Furthermore, at each timestep, the initial velocity for the next step is estimated using trajectory generalization based on the current position x d . This process repeats iteratively until all T timesteps are completed.

5. Experimental Validation

In this section, we validate the proposed human–robot skill transfer strategy within an industrial scenario where a robot learns different skills from the human to move around the obstacle and grasp the emergency stop. We first conduct human–robot skill transfer experiments. After the robot learns the skills, we evaluate our method in different static (offline) and dynamic (real-time) environments, and compare our method to other similar movement adaptation methods. For clarity, we summarize the definitions of all the parameters employed in the experiment section in Table 1.

5.1. Template Trajectory Generalization from Only One Human Demonstration

The Ufactory Xarm 7-DoF manipulator is selected to validate our method. A Kinect V2 camera is positioned on the side to track continuous human hand movements, while a Zed2i binocular camera is mounted overhead to capture task-related object poses. For a single obstacle scenario, the robot is taught three distinct skills: moving above, left, and right around the obstacle using the Kinect FORTH system. Each skill is demonstrated only once, with the human hand centroid trajectories recorded (see Figure 5a). The Kinect FORTH system captures real-time 6D poses of the human hand centroid and provides a colored model of the fingers, enabling precise trajectory acquisition.
For each skill, the trajectory is smoothed using a Moving Average Filter (MAF), normalized to 30 timesteps, and transformed into the robot base frame. It is further converted to the table frame at the tabletop surface. Utilizing our trajectory generalization method (see Section 2.1), the trajectory is generalized with the same start and end points, without via-points, to generate the template trajectory for each skill (see Figure 5b). To account for the inherent noise in the Kinect FORTH system, the trajectories are aligned to the same start and end point, ensuring consistency for subsequent offline task optimization. A suitable start point is set, and the end point is determined based on the emergency stop 3D positions detected by YOLO [46]. The template trajectory corresponds to the minimum Z-position of the robot gripper center.

5.2. Offline Movement Adaptation

Given the template trajectory, we validate the offline task-constrained adaptation algorithm (Algorithm 1) under a static scene with the presence of a new big bottle, as shown in Figure 6a. We randomly place the emergency stop and the bottle with 3D positions detected by YOLO. For each skill, the robot first generates collision-free end-effector trajectory sets with via-point trajectory generalization (see Section 2.1). We select the feature via-points corresponding to the max norm of shape modulation f p and set discrete via-points with equal spacing to construct the initial trajectory sets. To be specific, for the above skill, the X- and Y-positions are within the radius R obs = 39 mm around the obstacle with Z within ±100 mm around the template trajectory via-points; for the left and right skill, we fix X and set the Y-, Z-positions distributed within ±100 mm around. Meanwhile, the X-, Y-, and Z-positions are unioned based on the limited Y- and Z-ranges for task execution. We check the collisions and fine-tune the via-points with the threshold γ = 30 mm; the resultant collision-free trajectory sets are shown in Figure 6b. We further obtain collision-free joint sets with Jacobian IK and set 35 basis functions to encode the 7DoF joint distributions with ProMPs, as shown in Figure 6c.
The task constraints include several critical factors: (1) Joint Range Limit: Ensures each joint operates within its specified range to prevent mechanical damage or errors expressed as follows: q k , t min < q t < q k , t max , where k denotes the kth joint freedom, and q k , t min and q k , t max are the minimum and maximum of the joint positions from demonstration, respectively.
(2) Waypoint Constraints: Guarantee precise start and end positions in task space. The forward kinematics (FK) function T maps joint configurations to end-effector positions, with a precision requirement of 1mm: | T q 1 X ¯ 1 | 2 1 , | T q 30 X ¯ 30 | 2 1 , where X ¯ 1 , X ¯ 30 are the desired end-effector positions.
(3) Hyperplane Constraints: Define spatial boundaries for the robot motion. For instance, the Z-direction not up to 575 mm is constrained by the following: n z , t T T ( q t ) b z , t 0 , where n z , t = ( 0 , 0 , 1 ) , b z , t = ( 0 , 0 , 575 ) . Similarly, all constraints of Y and Z are defined as 100 < Y < 700 , 10 < Z < 575 .
(4) Repeller Constraints: Prevent collisions between robot links 1 to 6 and obstacles. For example, a safe radius R safe = 36.23 mm is enforced to avoid contact with a water bottle for the robot link 6: R safe d min T 6 q t 0 , where d min T 6 q t denotes the minimum distance between the robot link 6 and obstacles, and T 6 q t represents the FK function of the robot link 6.
For all constraints, we set α k , t = 0.95 , considering the Gaussian nature of 2× standard deviation. To speed up convergence, we set λ k , t 0 = 100 for LBFGS and check the convergence conditions after each iteration.
We determine the feature timesteps violating task constraints and merely optimize those timesteps. In this experiment, we discover that there are no task constraint violations for all timesteps of each skill; that is, the optimized ProMPs are the same as the initial ProMPs. The robot thus directly utilizes the generalized trajectory for task execution. During task execution, the robot selects randomly from three skills with via-points sampled from the original via-points range. We repeat 100 experiments with the robot task execution snapshots, as shown in Figure 6d. We run the offline movement adaptation algorithm for three skills parallelly. The total time is 1.64332 s with a 100% task success rate. The results demonstrate that the robot can generalize smooth trajectories and encode continuous ProMPs distributions with discrete feature via-points. During task execution, the robot exhibits flexibility with different skills to avoid obstacles.
During the above experiment, for all timesteps, we discover that the probability satisfying the task constraints always equals 1. The reason is that the waypoint constraints are ensured by template trajectory generalization with the same start and end point; the hyperplane constraints, namely, the limit Y and Z constraints for the end-effector, are ensured by the feature via-points range for trajectory generalization; the repellers constraints for robot link6 to link1 are fulfilled with the bottle height, that is, the robot link6 to link1 will not collide at all timesteps. Meanwhile, for the above and left skills, if we generalize the collision-free end-effector trajectory sets, the robot link6 to link1 will not collide given the robot configuration.
As a result, to testify the performance of ProMPs optimization and adaptation, we elevate the obstacle height with a danger scene (see Figure 7a) and constrain the robot to move right around the obstacle. We set the proper feature via-points range so that partial collision-free end-effector trajectory sets will clearly collide with the bottle at other robot links (see Figure 7b). We define the same task constraints mentioned above; the difference is that there are timesteps violating the repellers constraints where the robot link6 and link5 may collide. We generalize the collision-free end-effector trajectory sets with 40 discrete feature via-points by fixing the X-, Y- and descend Z-positions, and encode the original 7DoF ProMPs distributions p w 0 = N w 0 ; μ w 0 , Σ w 0 , p q t = N q t ; μ t 0 , σ t 0 . For the repellers constraints, we find the minimum distance from three segments in four directions and perform moment-based distribution estimation of the danger distance (see Section 3.2.4). Utilizing double-loop LBFGS and EMM optimization in Algorithm 1, we obtain the optimized ProMPs distributions p w * = N w * ; μ w * , Σ w * , p q t = N q t ; μ t * , σ t * satisfying task constraints, as shown in Figure 6c. The joint limit range q t safe μ t * 2 σ t * , μ t * + 2 σ t * and initial joint value q t 0 = Ψ t T w * are passed to the Jacobian IK to test the succeed number of the feature timesteps. To determine the threshold, we select the same via-points from the original range with the succeed number and obstacle avoidance result (success or failure) recorded; the corresponding relation is shown in Figure 6d. During task execution, we choose 100 uniform via-points to testify the optimized ProMPs; that is, if the succeed IK number of the generalized trajectories is more than a threshold, the robot will run the trajectory, otherwise, the robot will stop until success. The task execution result is shown in Figure 7e,f.
We run Algorithm 1 with a total time of 126.45 s. There are a total of 8 out of 30 timesteps violating the task constraints and the succeed number threshold is 7 given the optimized ProMPs distributions. The robot executes 39 trajectories from 100 feature via-points with a 100% success rate. The results illustrate that the robot can optimize ProMPs distributions under integrated nonlinear task constraints analytically. Considering the probability distribution essence, specific regions are excluded from the original ProMPs at the cost of losing some distributions satisfying the task constraints. Consequently, we determine the threshold with the same sparse and discrete via-points during trajectory generalization. The threshold and optimized ProMPs are then directly used for robot task execution by sampling from the continuous via-points range with Gaussian distributions. In a word, we provide a robot with the ability to perform tasks under multiple nonlinear constraints in a generic way; that is, there is no need to determine that the task constraints are satisfied each time when generalizing a new trajectory.

5.3. Real-Time Whole-Body Movement Adaptation

Following offline robot movement adaptation under static scenarios, we further evaluate our real-time whole-body movement adaptation method presented in Section 4 under the dynamic environment manifested as multiple stationary or moving obstacles, new target points, etc. Specifically, we first perform Algorithm 1 with the presence of a single obstacle (blue painting) and the emergency stop. The robot then randomly selects from three skills, namely, moving above, left, and right around the obstacle to generate the initial task execution trajectory. During task execution, the human gets involved in the process by suddenly moving a big bottle towards the gripper (see Figure 8b at t = 1.8 s), changing the obstacle and emergency stop positions in real-time (see Figure 8b at t = 6 s, 7.2 s, and 10.0 s, etc.).
We utilize YOLO to detect the object 3D positions and velocities at d t = 0.2 s intervals with an ellipsoid-shaped distance function Γ ξ k enveloped (see Equation (36)). We treat the emergency stop as the target object with other objects considered obstacles. To formulate Γ ξ k , we set the curvature parameter p i k = 10 and derive the center ξ k c , linear velocities ξ ˙ k L , and angular velocities ξ ˙ k R from YOLO. We obtain the normal vector n ξ ˜ k from Equation (39) and formulate the basis matrix E ξ k and the diagonal eigenvalue matrix D ξ k from Equations (40) and (42). In this paper, we model the robot gripper with the same ellipsoid-function to derive equivalent Γ ξ k and n ξ ˜ k from two ellipsoids. We set a 10 mm margin for each direction to compute the corresponding safety magnitude η k , define the reactivity parameter ρ k = 1 , and remedy the tail-effect via n ξ ˜ k / η k T ξ ˙ k to formulate Equation (43). Meanwhile, we disregard the impact of obstacles when Γ ξ k > 4.0 . Given the initial task execution trajectory as input, the robot performs whole-body adaptation for real-time modulation (see Figure 4). Concerning end-effector adaptation, we derive the modulation matrix M ξ k R 3 × 3 and weight coefficient w k ξ k for each dynamic obstacle with weighted magnitude Q ξ ˜ all and directions q ξ ˜ all computed from Equations (46)–(52) to derive the modulted velocity ξ ˙ mod . Regarding the non-end-effector adaptation, we detect the closest point C 0 with x v 0 defined to perform null-space velocity control (see Section 4.2).
We conduct 10 dynamic experiments for each of the three skills by randomly placing and changing the objects, including obstacles and emergency stop, in real-time to evaluate the robot movement adaptation competence; the results are shown in Figure 8. Considering space restrictions, we select typical modulation cases for specific timesteps, as shown in Figure 8a, and choose one episode of moving above around the obstacle with snapshots, as shown in Figure 8b.
During the above experiments, considering the detected minimum distance d 0 for the robot, link1 to link6 are always larger than the safety margin d m = 10 mm (see Figure 3), null-space velocity control for the robot non-end-effector is not required, and we only denote end-effector modulation, as shown in Figure 8a. The left, middle, and right columns denote the robot init positions, modulation process, and modulated positions, respectively.
In the first row, the human suddenly moves a big bottle towards the robot end-effector with relative X and Y linear velocities ξ ˙ 1 L x = 2.22 mm / s , ξ ˙ 1 L y = 117.13 mm / s , the distance functions for the big bottle Γ 1 ξ 1 = 1.6613 and blue painting Γ 2 ξ 2 = 9.6395 are derived. As Γ 2 ξ 2 = 9.6395 > 4 , we ignore the influence of the blue painting, and the robot only modulates in response to the big bottle. We define the weight term w 1 rel to smooth the distance influence and obtain the weighted relative velocity ξ ˜ ˙ all from Equation (45); the direction coincides with the normal vector n ξ ˜ 1 . Given the init velocity ξ ˙ init , we compute the diagonal eigenvalue matrix D ξ 1 and the basis matrix E ξ 1 to formulate the modulation matrix M ξ 1 . The modulated velocity ξ ˙ mod is then computed from Equation (46) with one single obstacle. We apply velocity integration to obtain the expected position for the next timestep, generalize new trajectory sets ξ next (see Section 2.1) at the current position, and compute the next velocity ξ ˙ next with position differencing, treating this as the initial velocity for subsequent modulation. After modulation, we discover that there is an evident backward movement for the robot in the right column.
In the second row, the init robot position after the last timestep modulation will collide with the big bottle, and the robot is subject to the combined effects of multiple obstacles. Distinct from [31], in this paper, if the robot init position will collide, we develop the method by iteratively modulating positions until they are collision-free. Specifically, to ensure the impenetrability of obstacles, we set the big bottle and blue painting distance function Γ 1 ξ 1 = 1.001 , Γ 2 ξ 2 = 2.4987 with the distance weights derived from Equation (44) as w 1 ξ 1 = 0.9993 , w 2 ξ 2 = 0.0007 . Given the normal vector n ξ ˜ 1 , n ξ ˜ 2 , we iteratively compute the modulation matrix M ξ 1 , M ξ 2 for each obstacle and derive the weighted magnitude Q ξ ˜ all and direction q ξ ˜ all from Equations (47)–(52) to obtain the modulated velocity ξ ˙ mod until safe. New trajectory sets ξ next are generalized and the next velocity ξ ˙ next is computed. Note that if the emergency stop position changes, the robot can adapt in real-time with our trajectory generalization method. Figure 8b shows robot dynamic modulation snapshots with multiple dynamic obstacles and changing targets (the emergency stop). The total running time is 16.8 s with 84 timesteps, and the average modulation time for a single timestep is 0.1356 s. The results indicate that the robot exhibits real-time adaptation proficiency under dynamic environments.
For the above end-effector modulation experiments, considering the minimum distances between the robot non-end-effector and obstacles are always larger than the safety margin d m = 10 mm , the non-end-effector adaptation proposed in Section 4.2 is not involved. Consequently, to further verify the mutual performance of end-effector and non-end-effector adaptation, we elevate the obstacle height and design a task scenario where the robot non-end-effector will collide with the big bottle at the beginning of task execution (see the first row of Figure 9a), and after the robot moves around, the human places the big bottle to a new position where the robot non-end-effector will collide when it tends to grasp the emergency stop (see the last row of Figure 9a).
The robot selects the move above skill to generate the initial task trajectory; the typical modulation process is shown in Figure 9a with task execution snapshots shown in Figure 9b. The left, middle, and right columns in Figure 9a denote the initial joint configurations, the modulation process, and the modulated joint configurations, respectively. To achieve real-time efficiency, we utilize cylinders to envelop obstacles and robot links with minimum distance detected by the Flexible Collision Library (FCL). To balance offline-optimized Joint ProMPs and null-velocity contol, we set λ = 1 , γ = 0.2 , ρ = 0.8 to formulate Equation (65), and we set d m = 10 , d n = 5 , v n = 10 mm / s to formulate Equation (66).
The first row illustrates the robot whole-body modulation at the beginning. Given the distance function Γ 1 ξ 1 = 1.7806 , Γ 2 ξ 2 = 8.9839 > 4 , the robot is modulated under the big bottle normal vector n ξ ˜ 1 with the expected end-effector modulated velocity ξ ˙ mod . We detect the collision at the current position. To ensure impenetrability of the obstacles, we set d 0 = 0.01 mm and iteratively apply null-space velocity control until safe. To be specific, we treat the robot collision link center as the control point C 0 and derive the Jacobian matrix J 0 at C 0 . We compute the retreat velocity v 0 from Equation (66) and obtain the enforced velocity x v 0 . We then utilize the control loop defined in Equation (69) to derive the modulated joint configuration q mod iteratively. Note that if the modulated joint configurations will collide, we increase the magnitude order of v 0 until safe. The middle column shows the initial and modulated joint configurations colored in purple and green, and the right column demonstrates that the robot non-end-effector heads away from the big bottle while keeping the expected end-effector positions.
The second row shows the robot end-effector modulation with the big bottle and blue painting Γ 1 ξ 1 = 2.5283 , Γ 2 ξ 2 = 2.7075 ; the robot non-end-effector will collide at this position with d 0 = 0.01 mm . We define the enforced velocity x v 0 to obtain the modulated joint configurations iteratively until safe, and the robot tends to move away from the big bottle when approaching the emergency stop to grasp. Figure 9b illustrates robot task execution snapshots; the total running time is 13.8 s with 69 timesteps, and the average modulation time for a single timestep is 0.1757 s. The results demonstrate that our method can simultaneously perform end-effector and non-end-effector modulation in real-time, allowing the robot to adapt joint configurations in confined spaces while keeping its expected end-effector poses.

5.4. Comparative Experiments

In this paper, we propose a combined strategy for robot offline and real-time movement adaptation based on human–robot skill transfer. To validate the performance, we conduct several comparative experiments for proposed methods in the strategy.

5.4.1. Offline Movement Adaptation

Concerning offline movement adaptation, KMPs were proposed to address the limitation of explicitly defining the basis functions in ProMPs. The KMPs method was first introduced in [11] by employing GMM/GMR to encode the demonstrated distribution and then deriving the mean and covariance of each timestep under via-points based on dual transformation and KL divergence. The most notable contribution was the derivation of a kernel matrix that depends only on each timestep, eliminating the need to explicitly define the basis functions and reducing the dimensionality of the parameters. Furthermore, they extended KMPs to incorporate linear constraints in [22] (LC-KMPs), allowing for the rapid acquisition of optimal Lagrange multipliers via quadratic programming (QP). However, this method can only be applied to linear constraints in Cartesian space. In [23], they further extended LC-KMPs to incorporate nonlinear constraints (EKMPs) by using Taylor expansion to convert the problem into a QP problem and solve the optimal Lagrange multipliers. This approach can handle various nonlinear constraints in Joint space. To account for all task constraints, especially including collisions between the robot’s other links and obstacles, optimization needs to be performed in Joint space. As a result, to further compare the performance, we select EKMPs from [23] and the ProMPs-based method from [21], which is similar to ours.
We take the danger scene (see Figure 7a) with the same task constraints to compare. For the three methods, we provide the same initial collision-free trajectory sets of the robot end-effector and obtain the corresponding Joint space distributions with Jacobian IK. Some joint distributions may lead to collisions between the robot links 1 to 6 and obstacles. The goal is to optimize these distributions to generate new joint configurations for task execution. We then execute the task 100 times and evaluate the methods based on several metrics as follows: number of optimization parameters, number of constraints, total iterations, optimization time, distribution loss, and task success rate. The results are shown in Table 2.
In [21], the number of optimization parameters mainly includes the weights and covariance matrices of the 35 basis functions in ProMPs. Due to the use of Cholesky decomposition, a total of 35 + 18 × 35 = 665 parameters needs optimization. For 30 timesteps, each timestep involves 7 joint range constraints, 4 hyperplane constraints, 6 repellers constraints (from robot links 1 to 6), and waypoints constraints for the start and end points, resulting in a total of ( 7 + 4 + 6 ) × 30 + 2 = 512 constraints.
In EKMPs [23], the optimization targets the Lagrange multipliers that define the task constraints, with a quadratic programming (QP) approach employed. As a result, the number of optimization variables is equal to the number of task constraints, which is 512. Using the optimal multipliers, a kernel matrix is constructed to derive the mean and covariance for each timestep.
In our method, before optimization, the overall ProMP distributions are decoupled into the mean and standard deviation for each timestep. We then identify timesteps that may violate the task constraints based on the mean and standard deviation at each timestep. We find that there are 8 timesteps violating the task constraints, and the waypoint constraints at the start and end points are already satisfied, resulting in ( 7 + 4 + 6 ) × 8 = 136 constraints. In addition to optimizing the mean and covariance of ProMPs, we also optimize the mean and standard deviation of the joint positions for the 8 timesteps with 665 + 14 × 8 = 777 parameters in total.
During optimization, we discover that the algorithm in [21] integrates all nonlinear task constraints with μ w , Σ w for each timestep. This integration necessitates the evaluation of symmetric positive definiteness for the covariance matrix Σ w during optimization, resulting in the most iterations and the longest optimization time. EKMPs [23], which involve large matrix computations, also encounter numerical instability. Specifically, as the number of timesteps and task constraints increases, the matrix size grows, and many matrix parameters become unstable during inversion. Some elements may also become redundant, which is a characteristic of the KMP formulation. Although QP optimization reduces the number of iterations, the extensive matrix computations required to calculate the mean and covariance at each timestep still lead to longer optimization times.
In contrast, our method essentially decouples the ProMP parameters to each timestep and identifies the feature timesteps that violate task constraints, optimizing the distribution for those specific timesteps separately. Notably, after optimizing the mean and standard deviation, the optimization of the mean and covariance of ProMPs becomes linear in terms of μ w and Σ w , which avoids introducing numerical instability. Here, we do not rely on ProMPs’ inherent ability to generate new parameters. Instead, we address the issue of ProMPs’ limited extrapolation ability through optimization. However, since our method utilizes TensorFlow’s automatic differentiation rather than a QP formulation, the total number of iterations is slightly higher than that of EKMP. Nonetheless, the optimization time in our method is significantly reduced due to improved numerical stability.
Furthermore, we evaluate the generalization capability of each algorithm. In this context, generalization refers to the ability to maintain the initial distribution when encountering different task constraints and to achieve a high success rate. Specifically, if the optimized distribution closely matches the original distribution and achieves a high success rate, the method is considered to have strong generalization. The results show that [21] retains more of the original distribution compared to EKMPs [23]. However, in [21], due to the use of the unscented transform, which is sensitive to the coefficients, the estimates for μ w and Σ w under task constraints are inaccurate. This leads to unfiltered joint distributions that violate the task constraints, resulting in task failures. Similarly, EKMP’s use of Taylor linear expansion does not separate out some failed distributions, leading to a lower task success rate.
Our method, however, after obtaining the optimized joint distributions, introduces a threshold to mitigate the distribution loss caused by the probabilistic characteristics. Specifically, the robot first generates task trajectories with via-point generalization, obtains the corresponding inverse kinematics solutions, and then evaluates the number of solutions within the optimized joint range. If this number exceeds the threshold, the robot executes the task. This approach reduces the distribution loss and improves success rates. The threshold determination process is obtained by sampling the via-points and recording the task execution results and the number of inverse solutions. However, due to the discrete nature of via-points, when the threshold intersects with continuous space, the optimization algorithm indicates task success, but the robot can fail. This issue requires more continuous via-point ranges for accurate task execution.
Overall, despite the higher number of optimization parameters, our method achieves reduced optimization time and higher success rates by decoupling the mean and Gaussian distribution for each timestep and employing a dual optimization process for the ProMP parameters. These results demonstrate the high generalization capability of our method. Our method exhibits efficiency at the cost of increasing optimization parameters. The optimization time cannot fulfill real-time needs; however, the robot is equipped with flexible skills incrementally augmented from human demonstrations before online task execution.

5.4.2. Real-Time Movement Adaptation

Concerning robot real-time movement adaptation based on imitation learning (IL), ref. [47] first proposes the method of learning stable nonlinear dynamical systems (SEDS) with GMM-GMR. They utilize GMM to learn the relation of the Cartesian positions and velocities. Given the current positions, they use GMR to generate the velocities for task execution. They further develop the method in [30,31], and ref. [32] to achieve real-time movement adaptation for the robot end-effector. In this paper, we compare our method to theirs (SEDS-GMR modulation) under the same dynamic environment (see Figure 8b). To be specific, the robot randomly generates the initial task execution trajectory from three skills (moving above, left, and right around obstacles). During task execution, the human gets involved by placing and changing objects in real-time. Given the initial task execution trajectories, we choose 6 Gaussian kernels and train the optimal GMM parameters with 1000 iterations offline to ensure accurate trajectory reproduction via GMR. We conduct 10 experiments for each skill with the qualitative and quantitative results shown in Figure 10 and Table 3. Figure 10a denotes 3D positions modulation when the robot selects to move right around the obstacle. We discover that the modulation remains consistent for both methods if the init 3D positions before modulation will not collide with the obstacle. However, if the collision occurs for the init positions (see Figure 10b,c), SEDS-GMR modulation will not ensure the process is collision-free despite the modulation matrix influence.
In contrast, our method iteratively performs modulation until collision-free; that is, if the modulated positions will collide, we treat them as init positions and modulate them until safe. Note that our method may fail in specific danger cases: if the robot iterates many times for modulation, during the motion process from the current position to the modulated position, collision will occur. However, in most cases, our method ensures a high success rate.
Table 3 shows several benchmarks for both methods. Our method exhibits a higher success rate. In addition, since SEDS-GMR modulation requires obtaining precise GMM parameters before task execution, the offline training time is longer. Our method generalizes the trajectories in real-time without training parameters (see Section 2.1) and computes the velocities via position differencing. Although our method involves iterating the modulation when encountering collisions, resulting in a relatively longer time for single-step modulation, it still meets real-time requirements. Moreover, our method outperfoms SEDS-GMR modulation in total task execution time, requiring fewer convergence timesteps and exhibiting a lower collision rate. The final position error effectively reflects the accuracy of the modulation in achieving the desired end-effector position while ensuring obstacle avoidance. Both methods achieve comparable final position accuracy, confirming that our approach maintains accuracy while ensuring obstacle avoidance. The results indicate that our method can complete real-time end-effector modulation while ensuring safety, and quickly generalize trajectories preserving human skill characteristics.
The aforementioned comparison experiments with SEDS-GMR only consider the robot end-effector given that the other robot links will not collide during task execution. To further evaluate the robot whole-body movement adaptation performance, we conduct comparison experiments under the scene in Figure 9. We select a commonly-used trajectory planner named Covariant Hamiltonian Optimization for Motion Planning (CHOMP) which iteratively optimizes the cost function including trajectory smoothness and collision cost; meanwhile, a null space velocity control method [36] similar to ours is involved.
As in Figure 9, we obtain the modulated robot end-effector trajectory with 69 timesteps. For all methods, we set the same expected 3D positions at each timestep and test the robot non-end-effector modulation performance. We repeat 30 experiments with the results shown in Figure 11 and Table 4. The first row of Figure 11 shows that CHOMP may occasionally generate erratic and random joint configurations when collisions are detected. Additionally, unnecessary spaces are utilized to reach the desired positions, leading to redundant planning. The null space control method in the second row employs a fixed retreat velocity x v 0 , leading to collisions when the velocity is insufficient to maneuver the robot non-end-effector away from the obstacles. In contrast, our method iteratively employs null space velocity control, gradually increasing x v 0 until collison-free to ensure safety, as shown in the last row.
Quantitative results are presented in Table 4. For each experiment, we compute the average single-step modulation time and the total joint configuration iterations, namely, the intermediate joint configurations traversed to the expected 3D positions. Additionally, the Cartesian modulation path lengths for the end-effector are derived. We discover that CHOMP achieves a 93.33% success rate with a shorter average single-step modulation time. However, due to its stochastic nature, it generates a large number of random trajectories, leading to significantly longer modulation paths and higher variability. As a result, the total task execution time is longer. The null space control method yields a lower success rate, as the enforced velocity could not ensure collision-free joint configurations. However, since it only performs a single iteration, the overall execution time is relatively short. Compared to these methods, our approach iterates multiple times to ensure joint configuration safety. While this results in a slightly longer overall execution time, it remains within a reasonable range and meets real-time requirements. Additionally, our method exhibits a lower collision rate, as the iterative process effectively prevents unsafe configurations. The final position error reflects the precision of our approach in maintaining accuracy while successfully completing the obstacle avoidance task. The results indicate that our method enables real-time robot whole-body movement adaptation with a high success rate. When potential collisions for the robot non-end-effector occur, our method can obtain joint configurations through iterations in real-time with minor deviations for the modulated path. By integrating offline optimization and online dynamic modulation, our method can adapt to different robotic tasks by incrementally augmenting skills from human demonstrations offline and performing real-time movement adaptation during task execution.

6. Discussion

6.1. Comparison to Previous Work

This paper proposes a human–robot skill transfer strategy including basic-level MPs learning, core-level MPs adaptation, and real-time robot whole-body adaptation. Considering basic-level MPs learning, we develop the via-point trajectory generalization method adapting to the task start, end, and via-points from only one human demonstration. Our approach ensures robust extrapolation beyond the demonstration range. By discretizing the via-points, we can efficiently generate a large set of trajectory sets without additional demonstrations. These skills are encoded using ProMPs, preserving the probabilistic properties. In other words, the proposed via-point generalization enhances the extrapolation capability of ProMPs [10], while retaining key probabilistic characteristics, like the mean and covariance.
Considering core-level MPs adaptation, we develop an effective task-constrained optimization method by combining multiple task constraints into a unified probabilistic framework. To account for all task constraints in Joint and Cartesian space simultaneously, we encode ProMPs in Joint space. We introduce a double-loop optimization method with ProMPs, where each timestep is first decoupled as the Gaussian distribution, and only those timesteps violating task constraints are optimized. After optimizing the Gaussians at each timestep, a second optimization is performed to obtain the optimized ProMPs. To mitigate the loss of distributions due to the probabilistic nature, we introduce a threshold to directly utilize the ProMPs for task execution with analytical movement adaptation.
To evaluate MPs adaptation performance under different task constraints, three performance indices are considered as follows: (i) Computational Efficiency denotes the number of parameters, iterations, and runtime required to derive the optimized MPs; (ii) Constraint Diversity Satisfaction refers to whether MPs can simultaneously handle multiple types of task constraints in both Joint and Cartesian space; and (iii) Generalization Capability evaluates whether the optimized MPs can retain the distribution of original MPs while successfully executing tasks across different task scenarios. Our method ensures the Constraint Diversity Satisfaction to handle multiple task constraints in Joint and Cartesian space. Compared to [21], our method avoids symmetric positive-definite issues during iterations by decoupling the Gaussians for each timestep and optimizing the ProMPs independently. Furthermore, unlike EKMPs [23], our method does not involve large matrix computations or numerical stability issues, resulting in improved Computational Efficiency. Additionally, we estimate nonlinear task constraints with different probability distributions, rather than first-order Taylor expansions, thus exhibiting robust generalization capabilities compared to EKMPs [23]. By incorporating a threshold that accounts for the probabilistic nature, our method exhibits robust generalization compared to [21].
Considering real-time movement adaptation, we propose an improved real-time robot whole-body control method integrating offline task-constrained optimization. For the robot end-effector, we incorporate the offline-optimized trajectory with via-point generalization for dynamic modulation, and develop DSM by performing multiple modulation iterations until safe if initial positions will collide, while satisfying real-time needs. For the robot non-end-effector, we integrate the offline-optimized Joint ProMPs to the real-time null space velocity control method and ensure collision-free joint configurations through iterations.

6.2. Limitations and Future Work

Although the proposed strategy has made some progress, it still faces certain limitations that require further investigation. The method demonstrates strong performance in environments with a moderate number of obstacles; however, its effectiveness in extremely confined spaces remains an area needing refinement. In such scenarios, joint configurations become highly constrained, and the iterative modulation process may lead to slower convergence or even failure to identify feasible solutions. Future efforts should prioritize enhancing the iterative adjustment process in confined environments by incorporating more detailed task constraints and refining the dynamic modulation strategies. These improvements would ensure more reliable performance in such challenging settings.
Another notable limitation lies in the iteration time for real-time modulation, particularly for applications demanding ultra-fast responses on commercially available robotic systems. Although our method achieves real-time performance in most cases, further reducing iteration times to meet the requirements of ultra-fast response applications remains an unresolved challenge. Potential avenues for improvement include leveraging deep learning techniques or heuristic approaches to reduce the computational overhead, as well as fine-tuning the interpolation times and refresh rates to enhance speed.
Moreover, although our method has been validated on a specific robotic arm, its generalization across different robotic platforms with varying degrees of freedom and kinematic configurations remains to be thoroughly evaluated. Expanding testing to include diverse robotic systems would provide a more comprehensive assessment of its applicability. Additionally, comparing our approach with other state-of-the-art algorithms in human–robot skill transfer, as well as evaluating its performance in environments with increased obstacle density and stricter motion constraints, would offer deeper insights into its versatility and scalability. Finally, exploring the extension of this strategy to more complex tasks, such as the manipulation of deformable objects or operation in environments with higher uncertainty, represents a promising direction for future research.

7. Conclusions

This paper presents a strategy for human–robot skill transfer with offline task-constrained optimization and real-time whole-body modulation. The human only needs to provide one demonstration to teach the robot one skill category. Continuous ProMPs distributions can be obtained with discrete via-points from the proposed via-point trajectory generalization method. Facing task constraints under new task scenarios, the objective is to optimize ProMPs by excluding specific regions. The proposed effective probabilistic optimization method can incorporate task constraints in Joint and Cartesian space conveniently to derive optimized ProMPs. We define the threshold enabling the robot to accomplish tasks directly without checking the task constraints for each timestep. Additionally, the human can incrementally demonstrate new skills to the robot when the current skills cannot complete the task.
Once the robot selects a specific skill for task execution, in the face of multiple dynamic obstacles, we propose a real-time adaptive control method for robot whole-body obstacle avoidance. The method solves the collision problem in DSM that may occur during the convergence towards a target. We utilize modulations iteratively to obtain the expected end-effector positions in real-time with no need to stop moving, which increases the success rate. Moreover, for non-end-effector obstacle avoidance, we propose the improved null-space velocity control method independent of manually defined retreat velocities by deriving safe joint configurations through iterations, ensuring a high success rate and stable behavior with reduced randomness.
The proposed strategy makes robot task execution more flexible. In the offline phase, we can provide the robot with different skills without adhering to specific task constraints. After ProMPs optimization under new task-constrained scenarios, the robot selects randomly from the feasible skills. During task execution, the proposed real-time whole-body adaptation method can modulate in real-time under dynamic environments, which benefits human-robot coexistence in manufacturing. The proposed strategy may fail in very narrow spaces. However, in most cases, the proposed strategy can avoid obstacles in advance with the weighted influence of multiple dynamic obstacles. The experimental results demonstrate that our method can ensure a high success rate while meeting real-time requirements. The strategy is general and can be applied to other redundant robotic arms.
Concerning the potential scalability of the proposed strategy to other frameworks that do not necessarily use ProMPs, in this paper, we first generalize a large set of collision-free trajectories through via-point generalization from a single human demonstration. To denote different ways of completing the task, we can use other probabilistic frameworks to encode the distribution. We decouple ProMPs into Gaussian distributions at each timestep, optimizing Gaussians for those time steps violating task constraints, and then obtain the optimized ProMPs separately. This approach can be extended to other probabilistic methods, such as using the GMM to model the overall Gaussian distributions and obtain the mean and standard deviation for each timestep with GMR. By applying our task-constrained optimization methods, the Gaussian distributions for each timestep can be optimized with new probability distributions derived by the characteristics of other probabilistic methods. For real-time movement adaptation, we integrate the optimized trajectory for end-effector dynamic modulation. This process can also be used with other MPs to generalize real-time trajectories. Meanwhile, we incorporate Joint ProMPs into the null space velocity control formulation. For other probabilistic methods, the formulation will integrate their probabilistic nature so that during real-time movement adaptation, the robot can preserve the offline-optimized distributions.
Future work will focus on enabling robotic arms to perform more complex tasks in extremely confined spaces, such as opening doors or manipulation of deformable objects, while avoiding obstacles in real-time.

Author Contributions

Conceptualization, G.D., X.Z. (Xizhe Zang) and X.Z. (Xuehe Zhang); Methodology, G.D.; Software, X.Z. (Xizhe Zang) and X.Z. (Xuehe Zhang); Validation, G.D.; Formal analysis, G.D.; Investigation, G.D.; Resources, X.Z. (Xizhe Zang), X.Z. (Xuehe Zhang), C.L. and J.Z.; Data curation, G.D.; Writing—original draft, G.D.; Writing—review & editing, G.D.; Visualization, G.D. and C.L.; Supervision, X.Z. (Xizhe Zang), X.Z. (Xuehe Zhang), C.L. and Y.Z.; Project administration, Y.Z. and J.Z.; Funding acquisition, Y.Z. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was founded by the Advanced Technology Marine Vessel Research Project (CBZ02N23-05), the National Outstanding Youth Science Fund Project of National Natural Science Foundation of China (52025054), and the Major Research Plan of the National Natural Science Foundation of China (92048301).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mukherjee, D.; Gupta, K.; Chang, L.H.; Najjaran, H. A survey of robot learning strategies for human-robot collaboration in industrial settings. Robot. Comput.-Integr. Manuf. 2022, 73, 102231. [Google Scholar] [CrossRef]
  2. Argall, B.D.; Chernova, S.; Veloso, M.; Browning, B. A survey of robot learning from demonstration. Robot. Auton. Syst. 2009, 57, 469–483. [Google Scholar] [CrossRef]
  3. Zhu, Z.; Hu, H. Robot learning from demonstration in robotic assembly: A survey. Robotics 2018, 7, 17. [Google Scholar] [CrossRef]
  4. Calinon, S. Learning from demonstration (programming by demonstration). Encycl. Robot. 2018, 1–8. [Google Scholar] [CrossRef]
  5. Correia, A.; Alexandre, L.A. A survey of demonstration learning. Robot. Auton. Syst. 2024, 182, 104812. [Google Scholar] [CrossRef]
  6. Zare, M.; Kebria, P.M.; Khosravi, A.; Nahavandi, S. A survey of imitation learning: Algorithms, recent developments, and challenges. IEEE Trans. Cybern. 2024, 54, 7173–7186. [Google Scholar] [CrossRef] [PubMed]
  7. Jaquier, N.; Zhou, Y.; Starke, J.; Asfour, T. Learning to sequence and blend robot skills via differentiable optimization. IEEE Robot. Autom. Lett. 2022, 7, 8431–8438. [Google Scholar] [CrossRef]
  8. Ijspeert, A.J.; Nakanishi, J.; Hoffmann, H.; Pastor, P.; Schaal, S. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput. 2013, 25, 328–373. [Google Scholar] [CrossRef]
  9. Calinon, S.; Guenter, F.; Billard, A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 2007, 37, 286–298. [Google Scholar] [CrossRef]
  10. Paraschos, A.; Daniel, C.; Peters, J.; Neumann, G. Using probabilistic movement primitives in robotics. Auton. Robot. 2018, 42, 529–551. [Google Scholar] [CrossRef]
  11. Huang, Y.; Rozo, L.; Silvério, J.; Caldwell, D.G. Kernelized movement primitives. Int. J. Robot. Res. 2019, 38, 833–852. [Google Scholar] [CrossRef]
  12. Li, G.; Jin, Z.; Volpp, M.; Otto, F.; Lioutikov, R.; Neumann, G. ProDMP: A Unified Perspective on Dynamic and Probabilistic Movement Primitives. IEEE Robot. Autom. Lett. 2023, 8, 2325–2332. [Google Scholar] [CrossRef]
  13. Lauretti, C.; Cordella, F.; Zollo, L. A hybrid joint/Cartesian DMP-based approach for obstacle avoidance of anthropomorphic assistive robots. Int. J. Soc. Robot. 2019, 11, 783–796. [Google Scholar] [CrossRef]
  14. Pairet, È.; Ardón, P.; Mistry, M.; Petillot, Y. Learning generalizable coupling terms for obstacle avoidance via low-dimensional geometric descriptors. IEEE Robot. Autom. Lett. 2019, 4, 3979–3986. [Google Scholar] [CrossRef]
  15. Duan, A.; Camoriano, R.; Ferigo, D.; Calandriello, D.; Rosasco, L.; Pucci, D. Constrained DMPs for feasible skill learning on humanoid robots. In Proceedings of the 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), Beijing, China, 6–9 November 2018; pp. 1–6. [Google Scholar]
  16. Dahlin, A.; Karayiannidis, Y. Adaptive trajectory generation under velocity constraints using dynamical movement primitives. IEEE Control Syst. Lett. 2019, 4, 438–443. [Google Scholar] [CrossRef]
  17. Paraschos, A.; Lioutikov, R.; Peters, J.; Neumann, G. Probabilistic prioritization of movement primitives. IEEE Robot. Autom. Lett. 2017, 2, 2294–2301. [Google Scholar] [CrossRef]
  18. Koert, D.; Maeda, G.; Lioutikov, R.; Neumann, G.; Peters, J. Demonstration based trajectory optimization for generalizable robot motions. In Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), Cancun, Mexico, 15–17 November 2016; pp. 515–522. [Google Scholar]
  19. Izadi, N.H.; Palhang, M.; Safayani, M. Layered Relative Entropy Policy Search. Knowl.-Based Syst. 2021, 223, 107025. [Google Scholar] [CrossRef]
  20. Koert, D.; Pajarinen, J.; Schotschneider, A.; Trick, S.; Rothkopf, C.; Peters, J. Learning intention aware online adaptation of movement primitives. IEEE Robot. Autom. Lett. 2019, 4, 3719–3726. [Google Scholar] [CrossRef]
  21. Frank, F.; Paraschos, A.; van der Smagt, P.; Cseke, B. Constrained probabilistic movement primitives for robot trajectory adaptation. IEEE Trans. Robot. 2021, 38, 2276–2294. [Google Scholar] [CrossRef]
  22. Huang, Y.; Caldwell, D.G. A linearly constrained nonparametric framework for imitation learning. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4400–4406. [Google Scholar]
  23. Huang, Y. Ekmp: Generalized imitation learning with adaptation, nonlinear hard constraints and obstacle avoidance. arXiv 2021, arXiv:2103.00452. [Google Scholar]
  24. Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
  25. Zucker, M.; Ratliff, N.; Dragan, A.D.; Pivtoraiko, M.; Klingensmith, M.; Dellin, C.M.; Bagnell, J.A.; Srinivasa, S.S. Chomp: Covariant hamiltonian optimization for motion planning. Int. J. Robot. Res. 2013, 32, 1164–1193. [Google Scholar] [CrossRef]
  26. Bhardwaj, M.; Sundaralingam, B.; Mousavian, A.; Ratliff, N.D.; Fox, D.; Ramos, F.; Boots, B. Storm: An integrated framework for fast joint-space model-predictive control for reactive manipulation. In Proceedings of the Conference on Robot Learning, PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 750–759. [Google Scholar]
  27. Cai, M.; Aasi, E.; Belta, C.; Vasile, C.I. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. IEEE Robot. Autom. Lett. 2023, 8, 2158–2165. [Google Scholar] [CrossRef]
  28. Tulbure, A.; Khatib, O. Closing the loop: Real-time perception and control for robust collision avoidance with occluded obstacles. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5700–5707. [Google Scholar]
  29. Szulczyński, P.; Pazderski, D.; Kozłowski, K. Real-time obstacle avoidance using harmonic potential functions. J. Autom. Mob. Robot. Intell. Syst. 2011, 5, 59–66. [Google Scholar]
  30. Khansari-Zadeh, S.M.; Billard, A. A dynamical system approach to realtime obstacle avoidance. Auton. Robot. 2012, 32, 433–454. [Google Scholar] [CrossRef]
  31. Huber, L.; Billard, A.; Slotine, J.J. Avoidance of convex and concave obstacles with convergence ensured through contraction. IEEE Robot. Autom. Lett. 2019, 4, 1462–1469. [Google Scholar] [CrossRef]
  32. Huber, L.; Slotine, J.J.; Billard, A. Fast obstacle avoidance based on real-time sensing. IEEE Robot. Autom. Lett. 2022, 8, 1375–1382. [Google Scholar] [CrossRef]
  33. Huber, L.; Slotine, J.J.; Billard, A. Avoidance of Concave Obstacles through Rotation of Nonlinear Dynamics. IEEE Trans. Robot. 2023, 40, 1983–2002. [Google Scholar] [CrossRef]
  34. Seraji, H.; Bon, B. Real-time collision avoidance for position-controlled manipulators. IEEE Trans. Robot. Autom. 1999, 15, 670–677. [Google Scholar] [CrossRef]
  35. Petrič, T.; Žlajpah, L. Smooth continuous transition between tasks on a kinematic control level: Obstacle avoidance as a control problem. Robot. Auton. Syst. 2013, 61, 948–959. [Google Scholar] [CrossRef]
  36. Zheng, D.; Wu, X.; Pang, J. Real-time whole-body obstacle avoidance for 7-DOF redundant manipulators. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 2626–2631. [Google Scholar]
  37. Oikonomidis, I.; Kyriazis, N.; Argyros, A.A. Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British Machine Vision Conference, Dundee, UK, 29 August–2 September 2011; Volume 1, p. 3. [Google Scholar]
  38. Carpentier, J.; Mansard, N. Analytical derivatives of rigid body dynamics algorithms. In Proceedings of the Robotics: Science and Systems (RSS 2018), Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar]
  39. McDonald, G.C. Ridge regression. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 93–100. [Google Scholar] [CrossRef]
  40. Wan, E.A.; Van Der Merwe, R. The unscented Kalman filter for nonlinear estimation. In Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), Lake Louise, AB, Canada, 4 October 2000; pp. 153–158. [Google Scholar]
  41. Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
  42. Berahas, A.S.; Takáč, M. A robust multi-batch L-BFGS method for machine learning. Optim. Methods Softw. 2020, 35, 191–219. [Google Scholar] [CrossRef]
  43. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
  44. Kronander, K.; Khansari, M.; Billard, A. Incremental motion learning with locally modulated dynamical systems. Robot. Auton. Syst. 2015, 70, 52–62. [Google Scholar] [CrossRef]
  45. Barata, J.C.A.; Hussein, M.S. The Moore–Penrose pseudoinverse: A tutorial review of the theory. Braz. J. Phys. 2012, 42, 146–165. [Google Scholar] [CrossRef]
  46. Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar]
  47. Khansari-Zadeh, S.M.; Billard, A. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans. Robot. 2011, 27, 943–957. [Google Scholar] [CrossRef]
Figure 1. The proposed via-point trajectory generalization schematic. (a) 3D via-point generalization with obstacles. p and p new represent x, y, and z values over T time steps, with one via-point shown as an example. The blue cylinder denotes the obstacle, and the new via-point can be selected from above, left, or right of the obstacle for safe avoidance. (b) Orientation generalization using SLERP. At new orientations o 1 new and o 1 end , the orientation is generalized based on the quaternion multiplication of h o new and f o 1 , f o end .
Figure 1. The proposed via-point trajectory generalization schematic. (a) 3D via-point generalization with obstacles. p and p new represent x, y, and z values over T time steps, with one via-point shown as an example. The blue cylinder denotes the obstacle, and the new via-point can be selected from above, left, or right of the obstacle for safe avoidance. (b) Orientation generalization using SLERP. At new orientations o 1 new and o 1 end , the orientation is generalized based on the quaternion multiplication of h o new and f o 1 , f o end .
Applsci 15 03171 g001
Figure 2. Dynamic system modulation for robot end-effector.
Figure 2. Dynamic system modulation for robot end-effector.
Applsci 15 03171 g002
Figure 3. Null-space velocity control for robot non-end-effector.
Figure 3. Null-space velocity control for robot non-end-effector.
Applsci 15 03171 g003
Figure 4. The integrated offline and real-time robot movement adaptation control diagram.
Figure 4. The integrated offline and real-time robot movement adaptation control diagram.
Applsci 15 03171 g004
Figure 5. Template trajectory generalization from only one human demonstration. (a) Snapshots of human hand centroid tracking with the Kinect FORTH system. The first, middle, and last rows denote move above, left, and right around the obstacle, respectively. (b) The generalized template trajectory for three modes at the minimum Z-position of the robot gripper center with the same start point (symbol ‘o’) and end point (symbol ‘x’).
Figure 5. Template trajectory generalization from only one human demonstration. (a) Snapshots of human hand centroid tracking with the Kinect FORTH system. The first, middle, and last rows denote move above, left, and right around the obstacle, respectively. (b) The generalized template trajectory for three modes at the minimum Z-position of the robot gripper center with the same start point (symbol ‘o’) and end point (symbol ‘x’).
Applsci 15 03171 g005
Figure 6. Offline movment adaptation experiment with a new big bottle. (a) Experimental setup. (b) End-effector collision-free trajectory sets for three skills with via-point trajectory generalization. (c) Joint ProMPs distributions given discrete collision-free joint sets (taking the move above skill as an example). Solid lines denote mean values with shaded areas denoting ± 2 standard deviations. (d) Robot task execution snapshots: the first, middle, and last rows denote move above, left, and right around the obstacle.
Figure 6. Offline movment adaptation experiment with a new big bottle. (a) Experimental setup. (b) End-effector collision-free trajectory sets for three skills with via-point trajectory generalization. (c) Joint ProMPs distributions given discrete collision-free joint sets (taking the move above skill as an example). Solid lines denote mean values with shaded areas denoting ± 2 standard deviations. (d) Robot task execution snapshots: the first, middle, and last rows denote move above, left, and right around the obstacle.
Applsci 15 03171 g006
Figure 7. Offline movement adaptation experiment under a danger scene with the elevated obstacle. (a) Experimental setup. (b) Evident collision for other robot links when generalizing collision-free end-effector trajectory with certain via-points. (c) Optimized ProMPs distributions under nonlinear task constraints with LBFGS and EMM. We choose typical joint 2 and 4 distributions with obvious changes, the line and shaded areas denote the original and optimized mean and standard deviations. (d) Threshold determination based on the relationship between via-point Z-position and obstacle avoidance result. The succeed number threshold is denoted with a blue solid circle. Blue and red areas denote success and failure results. (e) Generalized task execution trajectory sets with 100 discrete via-points; red and blue lines and areas denote success and failure results. (f) Robot task execution snapshots with the critical trajectory between success and failure. The first row and last rows show different motion views.
Figure 7. Offline movement adaptation experiment under a danger scene with the elevated obstacle. (a) Experimental setup. (b) Evident collision for other robot links when generalizing collision-free end-effector trajectory with certain via-points. (c) Optimized ProMPs distributions under nonlinear task constraints with LBFGS and EMM. We choose typical joint 2 and 4 distributions with obvious changes, the line and shaded areas denote the original and optimized mean and standard deviations. (d) Threshold determination based on the relationship between via-point Z-position and obstacle avoidance result. The succeed number threshold is denoted with a blue solid circle. Blue and red areas denote success and failure results. (e) Generalized task execution trajectory sets with 100 discrete via-points; red and blue lines and areas denote success and failure results. (f) Robot task execution snapshots with the critical trajectory between success and failure. The first row and last rows show different motion views.
Applsci 15 03171 g007
Figure 8. Robot end-effector real-time movement adaptation experiments. (a) Typical modulation cases for specific timesteps. The left, middle, and right columns denote the robot init positions, modulation process, and modulated positions. We utilize ellipsoids to envelop the task objects and robot gripper. (b) One episode execution snapshots when the robot selects the skill of moving above around the obstacle with 0.2 s intervals for each adjacement timestep.
Figure 8. Robot end-effector real-time movement adaptation experiments. (a) Typical modulation cases for specific timesteps. The left, middle, and right columns denote the robot init positions, modulation process, and modulated positions. We utilize ellipsoids to envelop the task objects and robot gripper. (b) One episode execution snapshots when the robot selects the skill of moving above around the obstacle with 0.2 s intervals for each adjacement timestep.
Applsci 15 03171 g008
Figure 9. Robot whole-body online movement adaptation experiments. (a) Typical modulation cases for specific timesteps. The left, middle, and right columns denote the robot init joint configurations, modulation process, and modulated joint configurations. We utilize cylinders to envelop the task objects and robot links; the initial and modulated joints are colored in purple and green, respectively. (b) Task execution snapshots when the robot selects the skill of moving above around the obstacle with 0.2 s intervals for each adjacement timestep.
Figure 9. Robot whole-body online movement adaptation experiments. (a) Typical modulation cases for specific timesteps. The left, middle, and right columns denote the robot init joint configurations, modulation process, and modulated joint configurations. We utilize cylinders to envelop the task objects and robot links; the initial and modulated joints are colored in purple and green, respectively. (b) Task execution snapshots when the robot selects the skill of moving above around the obstacle with 0.2 s intervals for each adjacement timestep.
Applsci 15 03171 g009
Figure 10. Robot end-effector modulation comparison between SEDS-GMR and our method. Green and purple lines denote SEDS-GMR and our method, respectively. (a) Modulated 3D trajectories when the robot moves right. Arrows illustrate obstacle motion trends with cylinders denoting obstacle positions at different time, for instance, the emergency stop (the red cylinder) is moved to the new position at t = 8.4 s from the init position at t = 0.2 s. (b) End-effector trajectories when the robot moves left. (c) End-effector trajectories when the robot moves above. The snapshots show collision cases for SEDS-GMR during task execution.
Figure 10. Robot end-effector modulation comparison between SEDS-GMR and our method. Green and purple lines denote SEDS-GMR and our method, respectively. (a) Modulated 3D trajectories when the robot moves right. Arrows illustrate obstacle motion trends with cylinders denoting obstacle positions at different time, for instance, the emergency stop (the red cylinder) is moved to the new position at t = 8.4 s from the init position at t = 0.2 s. (b) End-effector trajectories when the robot moves left. (c) End-effector trajectories when the robot moves above. The snapshots show collision cases for SEDS-GMR during task execution.
Applsci 15 03171 g010
Figure 11. Robot non-end-effector modulation comparison experiments. The first, middle, and last rows denote task execution snapshots of CHOMP, null space control [36], and our method, respectively.
Figure 11. Robot non-end-effector modulation comparison experiments. The first, middle, and last rows denote task execution snapshots of CHOMP, null space control [36], and our method, respectively.
Applsci 15 03171 g011
Table 1. Definitions of all parameters employed in the experiment section.
Table 1. Definitions of all parameters employed in the experiment section.
Offline AdaptationDescription
f p Trajectory shape modulation for 3D positions
R obs The obstacle radius
X , Y , Z The 3D position sets for via-points range
γ The threshold for fine-tuning via-points
TThe forward kinematics function
d k , t The distance threshold to represent task constraints probability
n z , t , b z , t The hyperplane normal and bias vectors for the maximum Z
R obs The obstacle safe radius
α k , t The confidence level to represent the task constraints probability
λ k , t 0 The initial Lagrange multiplier
p w 0 = N w 0 ; μ w 0 , Σ w 0 Initial Gaussian distributions of ProMPs
p w * = N w * ; μ w * , Σ w * Optimized Gaussian distributions of ProMPs
p q t = N q t ; μ t 0 , σ t 0 The initial mean and standard deviation of each timestep
p q t = N q t ; μ t * , σ t * The optimized mean and standard deviation of each timestep
q t safe μ t * 2 σ t * , μ t * + 2 σ t * The optimized joint limit range
Ψ t T Gaussian basis functions
q t 0 = Ψ t T w * The initial joint value for Jacobian IK
d t The interval between adjacent timesteps
Γ ξ k The ellipsoid-shaped distance function
p i k , ξ k c The curvature parameter and the center of the distance function
ξ ˙ k L , ξ ˙ k R The linear and angular velocities of dynamic obstacles
Real-Time AdaptationDescription
n ξ ˜ k The normal vector of the deflection hyperplane at point ξ
E ξ k , D ξ k The basis and the diagonal eigenvalue matrix
η k , ρ k The safety magnitude and reactivity parameter for Γ ξ k
n ξ ˜ k / η k T ξ ˙ k Tail-effect definition for Γ ξ k
M ξ k R 3 × 3 , w k ξ k The modulation matrix and weighted coefficient for each dynamic obstacle
Q ξ ˜ all , q ξ ˜ all The weighted magnitude and directions for all dynamic obstacles
ξ ˙ init , ξ ˙ mod The initial and modulated velocity in Cartesian space
C 0 The closest point between the robot links and the obstacle
x v 0 , J 0 The retreat velocity and the Jacobian matrix at point C 0
d 0 The minimum detected distance between robot links 1 to 6 and the obstacle
d m The safety margin for dynamic obstacles
w 1 rel , ξ ˜ ˙ all The weight term and weighted relative velocity
ξ next , ξ ˙ next The next generalized trajectory sets and the next velocity computed
d n The distance at which the obstacle begins to affect the manipulator
v n The scalar velocity
q mod The modulated joint configuration
Table 2. Robot offline movement adaptation comparison. We set the same task constraints in the danger scene (Figure 7a). Given the initial collision-free end-effector trajectory sets, we obtain the corresponding Joint space distributions using Jacobian IK. Some Joint distributions may lead to collisions for robot links 1 to 6, and the goal is to optimize the distributions for task execution, which is then performed 100 times.
Table 2. Robot offline movement adaptation comparison. We set the same task constraints in the danger scene (Figure 7a). Given the initial collision-free end-effector trajectory sets, we obtain the corresponding Joint space distributions using Jacobian IK. Some Joint distributions may lead to collisions for robot links 1 to 6, and the goal is to optimize the distributions for task execution, which is then performed 100 times.
Algorithm in [21]EKMPs in [23]Our Method
Parameters for Optimization665512777
Constraints Formulas F c k , t Numbers512512136
Total Iteration Numbers2653664
Optimization Time (s)853.36728.58126.45
Distribution Loss76.48%70.32%84.36%
Task Success Rate92%91%99%
Table 3. Robot end-effector modulation comparision.
Table 3. Robot end-effector modulation comparision.
SEDS-GMR Modulation [30,31]Our Method
Task Success Rate73.33%93.33%
Offline Parameters Learning Time (s)24.6856 ± 0.15460.0904 ± 0.0148
Average Real-time Modulation Time (s)0.1238 ± 0.01670.1465 ± 0.0285
Total Task Execution Time (s)23.2 ± 116.8 ± 0.4
Collison Rate26.67%6.67%
Final Position Error (mm)0.52 ± 0.130.55 ± 0.09
Convergence TimeSteps116 ± 584 ± 2
Table 4. Robot non-end-effector modulation comparision.
Table 4. Robot non-end-effector modulation comparision.
CHOMPNull Space Control in [36]Our Method
Success Rate93.33%76.67%96.67%
Average Modulation Time (s)0.1434 ± 0.05630.1654 ± 0.01460.1757 ± 0.0086
Joint Configurations Iterations126 ± 2410 ± 220 ± 1
Modulation Path (m)9.8829 ± 1.23680.8204 ± 0.00460.8232 ± 0.0024
Total Task Execution Time (s)232.8 ± 60.2 13.2 ± 0.6 13.8 ± 0.4
Collision Rate6.67%23.33%3.33%
Final Position Error (mm)0.83 ± 0.160.66 ± 0.140.59 ± 0.06
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ding, G.; Zang, X.; Zhang, X.; Li, C.; Zhu, Y.; Zhao, J. A Human–Robot Skill Transfer Strategy with Task-Constrained Optimization and Real-Time Whole-Body Adaptation. Appl. Sci. 2025, 15, 3171. https://doi.org/10.3390/app15063171

AMA Style

Ding G, Zang X, Zhang X, Li C, Zhu Y, Zhao J. A Human–Robot Skill Transfer Strategy with Task-Constrained Optimization and Real-Time Whole-Body Adaptation. Applied Sciences. 2025; 15(6):3171. https://doi.org/10.3390/app15063171

Chicago/Turabian Style

Ding, Guanwen, Xizhe Zang, Xuehe Zhang, Changle Li, Yanhe Zhu, and Jie Zhao. 2025. "A Human–Robot Skill Transfer Strategy with Task-Constrained Optimization and Real-Time Whole-Body Adaptation" Applied Sciences 15, no. 6: 3171. https://doi.org/10.3390/app15063171

APA Style

Ding, G., Zang, X., Zhang, X., Li, C., Zhu, Y., & Zhao, J. (2025). A Human–Robot Skill Transfer Strategy with Task-Constrained Optimization and Real-Time Whole-Body Adaptation. Applied Sciences, 15(6), 3171. https://doi.org/10.3390/app15063171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop