A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration

: The insufficient generalisation capability of the conventional learning from demonstration (LfD) model necessitates redemonstrations. In addition, retraining the model can overwrite existing knowledge, making it impossible to perform previously acquired skills in new application scenarios. These are not economical and efficient. To address the issues, in this study, a broad learning system (BLS) and probabilistic roadmap (PRM) are integrated with dynamic movement primitive (DMP)-based LfD. Three key innovations are proposed in this paper: (1) segmentation and extended demonstration: a 1D-based topology trajectory segmentation algorithm (1D-SEG) is designed to divide the original demonstration into several segments. Following the segmentation, a Gaussian probabilistic roadmap (G-PRM) is proposed to generate an extended demonstration that retains the geometric features of the original demonstration. (2) DMP modelling and incremental learning up-dating: BLS-based incremental learning for DMP (B i -DMP) is performed based on the constructed DMP and extended demonstration. With this incremental learning approach, the DMP is capable of self-updating in response to task demands, preserving previously acquired skills and updating them without training from scratch. (3) Electric vehicle (EV) battery disassembly case study: this study developed a solution suitable for EV battery disassembly and established a decommissioned battery disassembly experimental platform. Unscrewing nuts and battery cell removal are selected to verify the effectiveness of the proposed algorithms based on the battery disassembly experimental platform. In this study, the effectiveness of the algorithms designed in this paper is measured by the success rate and error of the task execution. In the task of unscrewing nuts, the success rate of the classical DMP is 57.14% and the maximum error is 2.760 mm. After the optimisation of 1D-SEG, G-PRM, and B i -DMP, the success rate of the task is increased to 100% and the maximum error is reduced to 1.477 mm.


Introduction
Industrial robots have been designed to perform repetitive tasks, so they have been extensively used in the mass production of automotive, aerospace, and marine industries.As the demand for customised production has dramatically increased in recent years, collaborative robots (cobots) have been actively developed to carry out customised tasks within unstructured environments [1].Learning from demonstration (LfD) is an effective approach to facilitating cobots to fulfil the above purpose.LfD can enable cobots to efficiently realise customised operations based on human demonstrations without the need for detailed motion programming or reprogramming [2,3] illustrated in Figure 1, where a cobot performs a pick-and-place task.To create a demonstration, a human operator drags the end-effector of the cobot to pick up an object from a start point (the yellow box in Figure 1) and place it at a target point (the marker in Figure 1).The demonstration (trajectory) is shown as the red curve in Figure 1.The demonstration is recorded and used to train an LfD-based learning model.To accomplish a similar task where the marker is moved to a new position, the trained LfD-based learning model can adaptively generate a new trajectory (generated trajectory).This is illustrated as the blue dashed line in Figure 1.Dynamic movement primitive (DMP) is an effective learning model for implementing LfD [4].Unlike Gaussian mixture modelling (GMM), the DMP model only needs to learn from one demonstration, thereby minimising the difficulty of creating multiple demonstrations [5][6][7].The mathematical representation of the DMP model includes a second-order spring damping system and a force item [8].The former ensures that the generated trajectory created using the DMP model can converge to the target without divergence.The latter, which is constructed based on a series of radial basis function (RBF) kernels, is used to control the shape of the generated trajectory and manage the convergence process.The advantage of the DMP model is its robustness to perturbations [9].However, the model exhibits some limitations as well.For instance, when the position of the marker illustrated in Figure 1 is moved far away from the previous demonstration area, the generated trajectory via DMP cannot work properly [10].That is, the difference between the positions of the generated trajectory and the marker should not exceed δ, which is a threshold representing the maximum error allowable for the successful execution of the pick-and-place task.The situation is illustrated by the green dashed curve in Figure 1.To address the issue, the following considerations are given:

•
A straightforward idea is to create a new demonstration for the new situation.However, redemonstration is time consuming as the experimental environment needs to be set up again [3].To mitigate the problem, it will be beneficial to reuse part of the originally created demonstration to adaptively generate a new trajectory (namely, an extended demonstration) for the new target; • DMP is a one-shot learning model, which means that when the extended demonstration is learned, the knowledge gained from the previous demonstration is forgotten.Therefore, it is imperative to develop an incremental learning mechanism for the DMP model to improve its generalisation capability for various situations.
Various approaches for trajectory generation were developed to complement demonstration data, such as task-parameterised Gaussian mixture models (TP-GMMs) [3], adversarial generative models [11], B-spline-based models [12], and PRM-based models [13].These approaches can generate new trajectories flexibly.However, the lack of imitation and utilisation of prior knowledge (i.e., existing demonstrations) poses safety risks in LfD applications.In addition, some researchers attempted to improve the performance of DMP [14].For instance, neural networks and reinforcement learning [15][16][17] were used to enhance the learning performance of the force term in the DMP model.The development of the incremental learning mechanisms for the DMP model received attention as well [18][19][20].However, the previous research has problems of high complexity and computational difficulty.A more detailed analysis is provided in the following Section 2.
Based on the above considerations, in this study, an improved DMP model with extended demonstration and incremental learning capabilities is designed.The capabilities are implemented based on a 1D-based topology trajectory segmentation algorithm (1D-SEG), a Gaussian probabilistic roadmap (G-PRM), and a broad learning system (BLS).By constructing an appropriate sampling strategy and loss function, G-PRM can generate an extended demonstration based on the features extracted from the segmented original demonstration via 1D-SEG.This extended demonstration and the force item of the DMP model are fed into the BLS to update the DMP model to store previously learned skills (i.e., incremental learning).Three innovative characteristics of this study are represented below.

•
Segmentation and extended demonstration: 1D-SEG is combined with G-PRM to generate an extended demonstration by incorporating the features of the original demonstration, so that fewer demonstrations are required to minimise the cost of data collection; • DMP modelling and incremental learning update: The BLS learns the difference between the extended demonstration and the original demonstration by incrementally increasing the number of network nodes (hereafter referred to as additional enhancement nodes).The force item of the previously constructed DMP model is updated with the results generated by the BLS.

•
Electric vehicle (EV) battery disassembly cases and an experimental platform were used to verify the effectiveness of the developed approach.Based on the approach, the successful disassembly of nuts and battery cells were achieved.
The remainder of this paper is structured as follows.Section 2 reviews the related work from the perspective of the trajectory generation and DMP-based optimisation.The key research gaps of the related work are summarised in Section 2. According to these gaps, segmentation and extended demonstration-based methods (namely 1D-SEG and G-PRM) and BLS-based incremental learning for DMP (namely Bi-DMP) are developed in Section 3. In Section 4, a disassembly experimentation platform based on EV batteries and collaborative robots is presented.The effectiveness of the proposed algorithms was verified by three cases, i.e., pick-and-place, unscrewing nuts, and battery cell removal.A comprehensive discussion is given in Section 5. Finally, Section 6 concludes the research.

Trajectory Generation
Trajectory generation is an important function to support robotic applications.Zhu et al. [3] developed an LfD-enabled trajectory generation algorithm that combines Gaussian noise with a task-parameterised Gaussian mixture model (TP-GMM).This algorithm facilitates the creation of the generation trajectory by augmenting the original demonstration.Hu et al. [21] proposed a method based on multi-region cost learning.By dividing a demonstration (trajectory) into various sub-regions and establishing cost functions for each sub-region, this method can fine-tune and preserve the geometric features of the demonstration more accurately to optimise the generated trajectory.Peng et al. [11] designed a generative adversarial imitation learning (GAIL)-based algorithm to optimise generated trajectories.In this algorithm, reinforcement learning and generative adversarial networks were combined to allow for more diversity in trajectory generation.
In order to achieve the local adjustment capability and meet the precision requirement of trajectory generation, Li et al. [12] proposed a trajectory generation method that combines a B-spline algorithm with a pruning algorithm.The pruning algorithm was used to create a trajectory, and B-spline was applied to smoothen the trajectory.However, the control points of the B-spline algorithm cannot accurately mimic the features of the created trajectory.To enhance the similarity of the generated trajectory to the original demonstration and enable local adjustment, sampling-based trajectory generation algorithms are highly promising choices, attributed to their simplicity, flexibility, and computational efficiency [13].Hüppi et al. [13] and Zhou et al. [22] proposed a trajectory generation method based on PRM.PRM can generate trajectories through the Dijkstra algorithm and sampling points.The distribution of sampling points can affect both the algorithm's efficiency and the quality of the generated trajectory.It is important to design a suitable sampling strategy.
However, several limitations persist when applying these approaches in manufacturing processes: (1) due to the lack of efficient trajectory segmentation methods, the generation of trajectories can only rely on the entire original demonstration, which may not be economical enough when trajectory adjustments are needed for just one sub-region.(2) Mimicking the characteristics of the demonstration proves challenging, and the shape of generated trajectories is hardly controlled because there is a lack of research on efficient sampling strategy and similarity measurement between the original demonstration and extended demonstration.

DMP-Based Optimisation
DMP has been widely used in LfD. Park et al. [23] and Ijspeert et al. [24] refined the classic DMP to accommodate demonstrations with the same start and target, as well as with obstacle avoidance, which provides a basis for further optimisation.The force item in DMP is constructed based on RBF kernels and weights which play a crucial role in determining the learning effect on demonstration.Consequently, the optimisation of the force item has received substantial attention.Teng et al. [25] integrated the Takagi-Sugeno fuzzy system with DMP, indicating that the fuzzy inference system can estimate the force item.To better learn the non-linear demonstrations, various researchers used neural networks to optimise the force item.For instance, Si et al. [15] adopted radial basis function neural networks (RBFNNs) to model the forcing item in DMP, enabling the learning of the position and orientation of demonstrations.To further improve the quality of DMPgenerated trajectories, both Noohian et al. [16] and Kim et al. [17] not only replaced the force item with neural networks but also applied reinforcement learning for model training.The searching mechanisms in RL can improve the quality of trajectories generated by DMP.Moreover, Davchev et al. [6] introduced a residual correction policy aimed at improving the generalisation of DMP in peg-in-hole applications, thereby refining the quality of trajectories generated by DMP.Notably, the aforementioned methods are mainly focused on the improvement of force item and the DMP training process.Due to a lack of incremental learning capacity, when confronted with a new demonstration, the DMP has to be retrained.
To enable incremental learning within DMP, several scholars have made preliminary explorations.Lu et al. [18,19] attempted to combine a BLS with the force item of DMP, aiming to equip DMP with incremental learning capability.The BLS has a flat network structure and rapid update capability [20].This allows the DMP to be iteratively updated efficiently.However, this approach is complex in force item construction.Simplifying the approach for combining BLS and DMP and reducing the computational complexity are necessary.In conclusion, optimising and enhancing the force item to endow DMP with incremental learning capabilities are crucial in real applications.
Therefore, the key research gaps of the related works can be summarised as follows: • Mimicking the geometric features of the original demonstration proves challenging.
There is a lack of research in designing an efficient sampling strategy and similarity measurement to avoid sharp turns in generated trajectories and enhance their resemblance to the demonstrations.
• These approaches neglect integrating the incremental learning function within DMP.
When presented with a new demonstration, the DMP model has to be retrained.Meanwhile, there is a shortage of industrial applications leveraging incremental learning.

Research Methodology
The methodology in this research consists of the following steps: (i) segmentation and extended demonstration; (ii) DMP modelling and incremental learning update.The framework of this section is shown in Figure 2. Details are given below.Before introducing the methodology of this paper, it is necessary to state the superiority of the designed method.Traditional trajectory segmentation methods mainly include subjective decision making based on trajectory geometric features [26,27] and adaptive segmentation based on neural networks [28].In the optimised 1D-SEG developed in this paper, both the geometric features of the trajectory and a rigorous mathematical process are considered.This can make the segmentation results more objective.Compared to other trajectory generation algorithms like rapidly exploring random tree (RRT) [29], the advantage of PRM lies in the ease of optimising sampling points, as it constructs a roadmap to facilitate finding better trajectories in complex environments.In addition, the flat structure and rapid update strategy of the BLS do not require extensive training time, making it suitable for quickly building models.

Segmentation of Demonstration
Curvature serves as a quantitative measure of a trajectory's features and, inspired by [30], a curvature-based trajectory segmentation algorithm has been designed.The details are as follows: As illustrated in Figure 3, the blue dashed curve represents the curvatures of a trajectory.According to the changing trend of the curvatures, the trajectory can be divided into sub-regions represented as Pi.Mai and Mii denote the local maximum and local minimum curvatures in the ith region.Ma0 and Mi3 are the global maximum and global minimum curvature for the entire trajectory.The Euclidean distance () between Mai and Mii in the ith region is defined as the curvature distance of the ith region ||Pi||.To optimise the segmentation of the trajectory, a threshold ε is defined.If ||Pi|| < ε, the ith region and its Mai and Mii will be eliminated in the segmentation process.The rest of the local maximum and local minimum curvatures are selected as the segmentation points.

Extended Demonstration via G-PRM
An extended demonstration will be created for the new target.The extended demonstration not only retains the features of the original demonstration but also reduces the computational complexity compared to traditional neural network algorithms.The process is detailed below.
After segmentation, the original demonstration   is divided into several subdemonstrations (specifically, sub-1 and sub-2 as illustrated in Figure 4b).Subsequently, the  from the segmentation point and the target point to the new target point are calculated as ||d1|| and ||d2||, respectively, as shown in Figure 4b.Since ||d2|| < ||d1||, sub-2 is closer to the new target point.Consequently, sub-2 is selected as the trajectory to be imitated (  ) for generating the extended demonstration (  ), as shown in Figure 4b,c.For generating the   , the PRM-based method is a preferred choice [31].Figure 5 illustrates the fundamental processes of PRM.Initially, PRM involves generating sampling points (represented by blue dots) while avoiding obstacles, as depicted in Figure 5a.Subsequently, each sampling point is connected to its adjacent sampling points to form a roadmap, as illustrated in Figure 5b.Finally, to determine the shortest trajectory from position 1 to position 2, the Dijkstra algorithm is employed to search results in the roadmap.Figure 6 displays the flowchart of the G-PRM, with two red blocks highlighting the improvements.To more accurately express the meaning of the following equations, specific letters that represent sets and matrices will be presented in bold.

Generation of Sampling Points
According to Figure 5, the sampling points in PRM are generated randomly, which makes it difficult to control the shape of the resulting trajectory.By using Gaussian distribution-based sampling points (Gaussian sampling points,  ), the adjustment of the generated trajectory   can be effectively facilitated.
The initialisation of  relies on Gaussian noise and coordinate differences.Here are the details: a Gaussian distribution is defined as (,  2 ), where  is the mean, and  2 represents the variance.Gaussian noise points directly generated from the (,  2 ) are defined as  = { 1 ,  2 , … ,   }.The '~' symbol indicates that each dimension of the element in the  follows (,  2 ) , which is expressed as Equation (1). denotes the number of , and   = (   ,    ,    ),  ∈ [1, 𝑠𝑎𝑚𝑝𝑙𝑒].
needs further processing to yield , which is distributed between the segmentation point and the new target point.The number of elements in  matches that in  , hence  = { 1 ,  2 , … ,   } , and   = (   ,    ,    ) , with the same  ∈ [1, 𝑠𝑎𝑚𝑝𝑙𝑒].A time series  = { 1 ,  2 , … ,   } serves as a control variable to calculate the   from   based on the coordinate differences between the segmentation point and the new target point.The segmentation point and new target point for   are  = (  ,   ,   ) and _ = (  ,   ,   ) , respectively.The calculation for   in  and   is described in Equation (2):

Bias Optimisation
G-PRM can generate an initial   based on .To derive bias  for updating  and   , two key processes are required: ➀ it is essential to design a Fréchet distance-based [32] similarity criterion to measure the similarity between   and   ( (  ,   ) ). (  ,   ) is derived from Fréchet distance.➁ Scaling   and   to the same scale facilitates the calculation of coordinate differences.
➀ To calculate the Fréchet distance between   and   , three steps are involved: Step 1-Initialisation definitions: This paper defines é(, ) as the Fréchet distance between the first  points of   and the first  points of   .Both   and   contain an equal number of points, with  and  ∈ (1, ]. represents the number of points on the two trajectories, respectively.Since   and   are discrete, the computation of the Fréchet distance of the two trajectories is based on .
Step 2-Boundary conditions for Fréchet distance: where é(1,1) in Equation ( 3) represents the Fréchet distance between the first points () of   and   , the (•) represents the distance between two elements in the bracket. _ and  _ represent the ath point and the bth point in   and   , respectively.The (•) the largest element in the brackets.
Based on the calculation results from Equation (4), (  ,   ) is calculated as shown in Equation ( 5): where where  _  ,  _  , and  _  represent the x, y, and z coordinates of the ith point in   . _  ,  _  , and  _  refer to the x, y, and z coordinates of the ith point in  .Finally, the bias  = { 1 ,  2 , . .,   } for updating  is calculated based on the results of ➀ and ➁.The initial  in each dimension is zero. is the learning factor, and  represents the current number of iterations.Equation (8) shows the calculation process of : In Equation ( 8),    ,    , and    represent coordinate differences between   and   in the x, y, and z dimensions, respectively. _  ,  _  , and  _  are the x, y, and z coordinates of the ith point in  .To update  based on the , interpolation is a necessary method to align the number of elements in  with .After interpolation, the  is renamed and reformulated as  = { 1 ,  2 , … ,   }.Subsequently, updated  ( ) is generated through Equation ( 9), where each coordinate is calculated accordingly: The  is used to generate updated   via G-PRM. is regenerated based on Equations ( 1) and (2) during each iteration.Subsequently, (•) needs to be recalculated based on updated   and   .Additionally, Equations ( 6)-( 9) are applied iteratively to continue updating   and .Through this process, the similarity between   and   gradually increases.After reaching the pre-set number of iterations, bias optimisation is completed.

Modelling of DMP
In this session, the   is used as an example to model the DMP.This study focuses on discrete DMP represented in the Cartesian space.The Cartesian space representation exhibits advantages over the joint space representation as it allows for trajectory planning without considering the joints and their relative positions [33].The DMP model is represented below [4,23,24]: where   represents the original demonstration; ̇ and ̈ denote the first-and second-order derivatives of the demonstration, respectively;    and    (   ,    > 0) are damping and stiffness factors (normally    = 4   );  is the timing parameter used to adjust the timestep of the demonstration;  0 and  mark the start and target of the demonstration;  is a constant that is equal to    ;  is the phase variable computed by the canonical system as expressed using Equation (11).It is used to calculate the value of the force item for each timestep; () is the force item represented in Equation ( 12);   in () is calculated by locally weighted regression (LWR) [24].
In Equations ( 11) and ( 12),   is a constant, and it is always set to one;  is the same as that defined in Equation (10);  denotes the number of RBF kernels; ψ  () represents the ith RBF kernel;   > 0 and ℎ  > 0 are the centre and width of the ith RBF kernel.
For computational convenience, Equation ( 11) is transformed into Equation ( 13) to calculate : where  is the inverse of the number of the demonstration's timestep points.When designating a new target   to replace  in Equation (10), each ̈ in the generated trajectory, denoted as ̈   , is calculated by using Equation (10).It is updated with the change in ;  is a timestep within the generated trajectory.
The next coordinate   +1 is calculated by using Equation ( 14):

Bi-DMP for Incremental Learning Updating
In this section, the structure and mathematical formulation of Bi-DMP are constructed.The core task of Bi-DMP is to construct the incremental force item for DMP.Inspired by Lu et al. [18,19] and Chen et al. [20], the main structure of the BLS designed for DMP is shown in Figure 7.In Figure 7, the grey dots symbolise the output of the BLS.The hidden layer of the BLS is represented by blue, red, and orange dots.The blue dots denote mapping features, while the red and orange dots correspond to enhancement nodes and additional enhancement nodes, respectively.Therefore, the incremental force item is the hidden layer combined with the weight corresponding to ∆  .Further explanations of the remaining labels in Figure 7 will be provided in the following sections.
This paper postulates that the force item of the   is   .  has been computed and parameterised by LWR [24] and is represented in Equation (12).The   should combined with sub-1 to form a complete trajectory (namely   -based complete demonstration) as shown in Figure 4c, namely   .It is used to calculate the original target force item   .Both   and   are sets containing the force item value for each point in demonstrations.Replacing   with   in Equation ( 10), the initial ∆  can be written as Equation (15): The key components of the hidden layer are   , , and , respectively.  refers to the mapping features,  represents the enhancement nodes, and  indicates the additional enhancement nodes.This paper assumes that  is the number of mapping features,  denotes the number of enhancement nodes, and  indicates the number of additional enhancement nodes, as depicted from Equation ( 16) to Equation (18): is a randomly initialised weight matrix and subsequently fine-tuned via a sparse autoencoder [20].Both    and    are initialised as orthogonal matrixes.The terms    ,    , and    represent biases, while  and  are activation functions.As suggested in [20],  can be a linear activation function, while can be the Tansig function, as detailed in Equation (19): To determine the weight  ∆  corresponding to ∆  , it is necessary to calculate the basis weight matrix   based on mapping features and enhancement nodes.Equations ( 20)-( 22) represent the computation process: In Equation ( 21),  denotes matrix [|], where   represents the Moore-Penrose pseudo-inverse matrix of .  is a transpose matrix of , and  is the regularisation parameter.I is an identity matrix.Once the   is obtained, the  is incorporated into Equation (20) to improve   for higher accuracy in approximating ∆  .Then, Equation ( 20) is rewritten as Equation ( 23): where  = [|] (hidden layer matrix), The advantage of incremental learning is that when  is added, there is no need to recalculate [|] −1 , thereby accelerating the solution process and facilitating network expansion [20].The [|] −1 is defined as   .  is calculated by Equation ( 24): The temporary matrix  and  can be calculated according to Equation ( 25): where =  − .  represents the Moore-Penrose pseudo-inverse matrix of .Therefore, the  ∆  can be calculated according to Equation (23) as Equation ( 25):

Experiments and Case Studies
In simulations, trajectories from the LASA dataset [34] and authors' handwriting trajectories were adopted to validate the algorithm.

Extended Demonstration Based on 1D-SEG and G-PRM
Handwriting A-shaped trajectories and M/S-shaped trajectories from LASA are selected to validate the 1D-SEG and G-PRM.Initially, 1D-SEG is applied to segment the original demonstrations based on their curvatures, as illustrated from Figures 8a,b and 9a.Subsequently,   is calculated according to 3.1, depicted by the purple dashed curves in Figure 9b.Following the algorithm in 3.2, G-PRM generates   (yellow curves in Figure 9b).More specifically, the red and green dots in Figure 9b represent the segmentation and new target points of   , respectively.The blue dots are the target points of the original demonstrations.

Modelling of DMP
In this subsection, three trajectories are selected to construct DMP.As illustrated in Figure 10, the blue curve trajectories represent demonstrations, whereas the red dashed curve trajectories are the outcomes of DMP.The force item of the DMP will serve as a benchmark for the following incremental learning processes.

Case Study 1-Pick-and-Place-Based Bi-DMP
As the most common industrial step, pick-and-place is always used as a benchmark trajectory to validate algorithms.In this part of the study, operators of different ages and genders were selected to perform pick-and-place experiments to collect their demonstration trajectories.Specifically, the operators were asked to use a gripper to catch an object, drag the robot to move, and place the object into an orange box. Figure 12 shows the details: Figure 12a shows the experimental platform, while Figure 12b presents the demonstrations from operators.These demonstrations were collected via the TM collaborative robot, and the trajectories are shown in Figure 13a.It can be observed that there are discrepancies in the demonstration trajectories when performing the same task, due to factors such as gender, height, and arm span [35,36].These factors should not be ignored, as operators prefer to work with a robot that matches their habits and physical characteristics in HRC.For example, the trajectory demonstrated by a person of short stature may collide with the body of a person of tall stature.Therefore, using only one DMP to generalise this pick-and-place task is not appropriate.Bi-DMP, on the other hand, can model and retain the differences in all demonstrations.
In the experiment, the red curve trajectory is chosen as a benchmark demonstration for the construction of DMP, as shown in Figure 13b.The purple dashed curve in Figure 13c is the path generated by the DMP.In order to preserve the discrepancies between different operators, Bi-DMP is used to learn the two new trajectories based on the constructed DMP as shown in Figure 13d.The orange and blue dashed curves are generated by Bi-DMP.According to the above results, Bi-DMP can effectively extend the force item of DMP.Starting from the constructed DMP, the force item is continuously updated so that the differences in the demonstrations can be fully learned.

Experiment Platform and Problem Descriptions
Unscrewing nuts and removing battery cells are critical steps in EV battery disassembly applications.The experimental platform is shown in Figure 14a.A multi-purpose adaptor and a vacuum chuck are designed and employed to perform the mentioned tasks, as shown in Figure 14b and Figure14c, respectively.
Due to the hazardous nature of the EV battery and the constraints of the working space, it is imperative for operators to minimise the duration of the demonstration.This results in nuts closer to the original demonstration being easily disassembled, as illustrated by the red curves in Figure 14d.However, nuts positioned far from the demonstration bring challenges for disassembly due to generalisation issues, as illustrated by the red dashed trajectory in Figure 14d.In the case of battery cell removal, trajectories generated from the Bi-DMP previously used for unscrewing nuts exhibit a deviation due to the lack of specific demonstrations which may cause unforeseen collisions with the working environment.The algorithms proposed in this paper can address these problems.

Design and Use of End-Effectors
Figure 15a,b show the structural design and application of end-effectors for unscrewing nuts and removing battery cells, respectively.The end-effector designed for unscrewing nuts is a multi-purpose adaptor, comprising a series of steel cylinders and a spring system, as depicted in Figure 15a.During the unscrewing process, the steel cylinders adaptively enclose the nuts, allowing for their removal rotation and friction induced by the multi-purpose adaptor.The end-effector for battery cell removal is a vacuum chuck.It consists of four chucks and pipelines, as shown in Figure 15b.To remove a battery cell, two chucks secure it, as highlighted in the red rectangular box in Figure 15b.Due to safety and payload considerations, only one battery cell should be removed at a time.

Demonstration and Implementation
To ensure the safety of the operator, only the nut at position No. 3 was unscrewed during the demonstration, as shown in Figure 16a.The demonstration adhered strictly to the disassembly rules: moving the multi-purpose adaptor from the start point to the target position (nut), unscrewing it, and then returning to the start place.The diameter of the multi-purpose adaptor is 25 mm, and the length of the groove is 28.5 mm, as shown in Figure 16c.If the distance between the central area of the nut and the endpoint of the DMP-generated trajectory exceeds the threshold value of 1.75 mm, the unscrewing nuts task is likely to fail due to collisional interference, as highlighted in the red ellipse in Figure 16c.Figure 16b illustrates the generated trajectory via DMP.Positions No. 1 to No. 8 corresponding to the light blue trajectories generated by DMP were able to complete the unscrewing nuts task.In contrast, positions from No. 9 to No. 14 (purple trajectories) failed to complete the task due to excessive errors.Thus, the EV battery is categorised as having a successful area and a failed area.Table 1 shows the error associated with each position.In this scenario, 1D-SEG and G-RPM are used to segment the demonstration and generate   .The outcomes of 1D-SEG are shown in Figure 17a.Typically, the new target point for   is selected to be at the centre of the failed nut positions, marked by the brown pentagram in Figure 17b.This position provides better coverage of the working area.The yellow trajectory in Figure 17b shows the   -based complete demonstration.Subsequently, Bi-DMP is trained based on the force item of previously constructed DMP.The trajectories generated by Bi-DMP are used to re-execute the unscrewing nuts task in the failed area as shown in Figure 17c.According to the error analysis provided in Table 2, all errors are smaller than the threshold.Therefore, the remaining tasks can be executed successfully.Figure 18 shows the implementation of unscrewing nuts.Another significant application is the removal of battery cells.However, due to the height difference between the battery cells and nuts, the shape of the generated trajectories shows deviation, as shown in the purple trajectories in Figure 19a.To address the issue, the Bi-DMP previously used for unscrewing nuts needs to be updated.The 1D-SEG and G-PRM are adopted to generate new   for battery cell removal based on the yellow trajectory as shown in Figure 17b.The red trajectory in Figure 19c presents the new   -based complete demonstration for removing the battery cell, and the new target point for this demonstration is positioned at the geometric centre of B6, as illustrated in Figure 19b.The light blue trajectories are generated by Bi-DMP for battery cell removal.
Figure 20 shows the implementation of the battery cell removal process.In summary, the above case studies address two critical tasks in EV battery disassembly.It can be concluded that the algorithms proposed in this paper can meet the requirements of these applications.

Time Complexity Analysis
Time complexity can be used to measure the complexity of an algorithm and serves as a reference when computational resources are limited or real-time performance is required.Compared to trajectory segmentation based on neural networks, the discrete curvature-based method offers a faster computation speed because it does not require the collection of large amounts of data for training.At the same time, the Gaussian distribution, as the most widespread form of data distribution in nature, is used in the roadmap construction process in PRM.The fundamental algorithms involved in this paper are DMP, PRM, and BLS.Gaussian mixture model (GMM), TP-GMM, and RRT have similar functionalities.Analysis of trajectory segmentation algorithms using a CNN as an example [29].The (•) is used to denote the time complexity function.According to the studies on time complexity by Liang et al. [37], Curry et al. [38], and Bianchini et al. [39], the time complexity of the aforementioned algorithms is qualitatively expressed in Table 3.
The sorting in Table 3 is based on the time complexity of the algorithms from high to low, with a detailed analysis as follows: the convolution operations and the large number of parameters make the time complexity of the CNN the highest, especially in large-scale datasets and deep learning structures.Table 3 lists the time complexity of only a one-layer CNN.In actual computations, the time complexity of all CNN layers in the network must be summed.The time complexity of GMM is influenced by the number of Gaussian clusters [1], the total number and dimensionality of the samples, and the number of iterations of the EM algorithm.When TP is added, the data dimensionality D will increase, making the complexity of TP-GMM slightly higher than that of GMM with the same parameters.The time complexity of the BLS mainly depends on the number of samples, the data dimensionality, and the number of nodes in the BLS.Although the computation of the BLS relies on large-scale matrix operations, its time complexity is lower than that of a CNN.The time complexity of DMP is mainly determined by the number of samples and data dimensionality.The time complexities of PRM and RRT are related to the number of sampling points/knots.Their time complexities are similar, while PRM has an advantage in trajectory generation due to the construction of the roadmap.

Trajectory Generation Analysis among Different Algorithms
The most prominent LfD methods are the GMM-based and DMP-based algorithms.Table 4 discusses the differences among the four algorithms: GMM, TP-GMM, DMP, and Bi-DMP, during the execution of the unscrewing nuts task depicted in Figure 16.In model construction, the GMM-based algorithm requires a larger number of demonstrations, whereas DMP requires only one demonstration.Therefore, in terms of cost, DMP is the more optimal choice.In the task of unscrewing nuts, there are differences in the performance of GMM and TP-GMM compared with DMP and Bi-DMP.GMM can generalise the distribution of demonstrations and then subsequently generate trajectories via Gaussian mixture regression (GMR) to perform tasks.However, since GMM cannot adapt to dynamic environments, it can only perform the demonstrated tasks.It means that it can only handle unscrewing a nut from a fixed position.The TP-GMM gains generalisation capabilities by adding frames of reference (frames).However, due to the sensitivity of the frames' orientation and the inherent uncertainty of the probabilistic model, the success rate of TP-GMM is sub-optimal in high-precision applications.In addition, due to generalisation issues, DMP cannot complete all the tasks of unscrewing nuts.In contrast, DMP enhanced with a BLS can accomplish all the tasks, achieving the highest success rate.Due to differences in programming logic and computer hardware configuration, algorithm running times vary.Overall, GMM-based methods have longer running times than DMP-based methods.This is because the K-means algorithm and the EM algorithm used in GMM computations require multiple iterative operations, especially the EM algorithm.As previously mentioned, GMM and TP-GMM are probabilistic models.Thus, their generated trajectories have uncertainties, with maximum errors and difference fluctuations greater than those of DMP.The difference is the sum of the coordinates of the Euclidean distance between the generated trajectory and the demonstration trajectory for the same target.Particularly for TP-GMM, significant deviations can occur (maximum error and difference will be huge) if the frames are incorrectly specified.Therefore, for the issues discussed in this paper, DMP-based methods are more suitable.
Table 5 shows the differences in trajectory generation between the G-PRM and the classic PRM.As shown in the figures within the table, the blue curve represents the imitated trajectory, the purple dashed line is the trajectory generated by the classic PRM, and the red curve is the trajectory generated by G-PRM.The cyan dots are Gaussian distribution sampling points used to construct the roadmap.The blue dot is the starting point, while the red and purple dots represent the target point and the new target point, respectively.Since the classical PRM lacks imitation capabilities, the generated trajectory is merely a curve connecting the starting point and the new target point, with a similarity to the blue curve of less than 60%.Despite resampling in each iteration, the similarity to the blue curve does not improve.The G-PRM designed in this paper optimises the sampling points to continuously improve the similarity of the generated trajectory (red curve) to the blue trajectory.Its final similarity is 34.04% higher than the trajectory generated by the classical PRM.The study has some limitations.The incremental learning process in Bi-DMP currently handles one task at a time, which may limit its efficiency in more complex applications requiring simultaneous task learning.Expanding the BLS structure to support multitask incremental learning could significantly enhance its utility.In addition, the implementation of the method in more complex industrial applications, such as turbine and engine assembly, should be explored to fully understand its potential and limitations.

Conclusions
In summary, this paper proposed a systematic framework aimed at improving the application scope of DMP and BLSs.The research is organised into three main contents: (1) segmentation and extended demonstration: as a series of demonstration pre-processing methods, 1D-SEG and G-PRM are designed to segment the original demonstration and to generate   .These methods effectively preserve the geometrical characteristics of the demonstration and simplify the calculation.( 2 Institutional Review Board Statement: Not applicable.
. The process of LfD is Citation: Wang, Y.; Li, W.; Liang, Y.A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration.

Figure 1 .
Figure 1.LfD-based learning for a pick-and-place application.

Figure 2 .
Figure 2. The framework of the research methodology.

Figure 7 .
Figure 7.The structure of BLS for DMP.

Figure 9 .
Figure 9.The results of G-PRM on handwriting demonstration trajectories.(a) Segmentation points on demonstrations.(b) G-PRM-based extended demonstrations.

Figure 11
Figure 11 intuitively expresses the geometric differences between original and extended demonstrations and the learning performance of Bi-DMP.The red dashed curves in Figure 11 represent the trajectories generated via Bi-DMP.

Figure 15 .
Figure 15.The usage of different end-effectors.(a) The usage of the multi-purpose adaptor.(b) The usage of the vacuum chuck.

Figure 16 .
Figure 16.Measurement of success and failed areas.(a) Demonstration process.(b) Generated trajectory via DMP.(c) The successful and failed areas in unscrewing nuts.

Figure 19 .
Figure 19.G-PRM and Bi-DMP results for battery cell removal.(a) Generated trajectories via former Bi-DMP.(b) The new target point of   for battery cell removal.(c) Generated trajectories via new Bi-DMP.
(N • D • F 2 • C • L) N: the number of samples, D: the dimension of input data (feature number), F: the size of convolution kernel, C: the number of convolution kernels, L: the number of layers, I: the number of iterations of expectationmaximisation (EM) algorithm, K: the number of Gaussian clusters, k: the number of sampling knots, n: the number of sampling points, E: the number of BLS nodes TP ) DMP modelling and incremental learning updating: Bi-DMP is trained based on   along with the force item of constructed DMP.The incremental learning mechanism of Bi-DMP enables the DMP to continually update without training from scratch.(3) EV battery disassembly case study: this study established a decommissioned battery disassembly experimental platform.Unscrewing nuts and battery cell removal have shown that the proposed algorithms effectively support industrial applications.Author Contributions: Conceptualisation, Y.W. and W.L.; methodology, Y.W.; software, Y.W.; validation, Y.W. and Y.L.; formal analysis, Y.W.; data curation, Y.W.; writing-original draft preparation, Y.W.; writing-review and editing, W.L. and Y.L.; visualisation, Y.W.; supervision, W.L.; project administration, W.L. All authors have read and agreed to the published version of the manuscript.Funding: This research was sponsored by the National Natural Science Foundation of China (Project No. 51975444), the International Cooperative Project of the Ministry of Science and Technology of China (Project No. G2022013009), the Science and Technology Commission of Shanghai Municipality (Project No. 23010503700), the Engineering and Physical Sciences Research Council, UK (Project No. EP/N018524/1), the China Scholarship Council (CSC No. 202106950049), and China Postdoctoral Science Foundation (Project No. 2023M741426).

Table 1 .
1. Error between the central area of nut and the target point generated by DMP. 2. Error between the central area of nut and the target point generated by DMP (Continued).

Table 2 .
Error between the central area of nut and the target point generated by Bi-DMP.

Table 3 .
Time complexity analysis for different algorithms.

Table 4 .
Quantitative comparison of various algorithms.

Table 5 .
Quantitative comparison of PRM-based algorithms.