Industrial Robot Trajectory Tracking Control Using Multi-Layer Neural Networks Trained by Iterative Learning Control

Fast and precise robot motion is needed in many industrial applications. Most industrial robot motion controllers allow externally commanded motion profiles, but the trajectory tracking performance is affected by the robot dynamics and joint servo controllers, to which users have no direct access and about which they have little information. The performance is further compromised by time delays in transmitting the external command as a setpoint to the inner control loop. This paper presents an approach for combining neural networks and iterative learning controls to improve the trajectory tracking performance for a multi-axis articulated industrial robot. For a given desired trajectory, the external command is iteratively refined using a high-fidelity dynamical simulator to compensate for the robot inner-loop dynamics. These desired trajectories and the corresponding refined input trajectories are then used to train multi-layer neural networks to emulate the dynamical inverse of the nonlinear inner-loop dynamics. We show that with a sufficiently rich training set, the trained neural networks generalize well to trajectories beyond the training set as tested in the simulator. In applying the trained neural networks to a physical robot, the tracking performance still improves but not as much as in the simulator. We show that transfer learning effectively bridges the gap between simulation and the physical robot. Finally, we test the trained neural networks on other robot models in simulation and demonstrate the possibility of a general purpose network. Development and evaluation of this methodology are based on the ABB IRB6640-180 industrial robot and ABB RobotStudio software packages.


Introduction
Robots have been widely utilized for industrial tasks including assembly, welding, painting, packaging, and labeling. In many cases they are controlled to track a given trajectory by external motion command interfaces, which are available for many industrial robot controllers, including the MotoPlus of Yaskawa Motoman, low-level interface (LLI) for Stäubli, robot sensor interface (RSI) of Kuka, and externally guided motion (EGM) of ABB. Although high-precision industrial robots have been well established for decades in manufacturing and fabrication applications that require precise motion control, such as aerospace assembly, laser scanning, and operation in crowded and unstructured environments, it still remains a challenge to track a given trajectory accurately. In general, a trade-off exists between reduction of cycle time and improvement of tracking accuracy for industrial robots.
Trajectory tracking control of robot manipulators is a well-studied problem [1]. The typical control architecture is based on joint torque control, which is challenging to implement on industrial robots. The controller structure usually consists of a linear stabilizing feedback controller and a feedforward compensation for trajectory tracking. Asymptotically perfect tracking is achieved by adaptive control [2], given that the robot dynamics are expressed in the linear-in-parameter form. However, accurate dynamics information for a robot is rarely available in practice. In this case, either simplified reduced-order models are identified [3], or machine learning techniques including wavelet networks, Gaussian Processes, and fuzzy logic systems [4] are utilized to approximate a sophisticated model for controller design. Iterative learning control (ILC) [5,6] relaxes the requirement of model information, but the learned representation is not transferable between different trajectories. Recently, neural network (NN)-based [7] controller design has attracted a great deal of attention with its potential ability to generalize beyond the training set. A detailed review of neural-learning control is presented in Section 2.
The goal of this work was to design an outer-loop feedforward controller to compensate for the inner-loop dynamics for the trajectory tracking control of an articulated industrial robot. Our approach, as illustrated in Figure 1, combines iterative learning control with a multi-layer neural network (MNN). ILC finds the commanded robot motion corresponding to a collection of desired robot joint trajectories. These input/output trajectory pairs are then used to train the inverse dynamics of the inner control loop (with desired robot joint trajectory as the input and commanded robot motion as the output) represented as an MNN (Neural Network I, NN-I, in Figure 1). As the inner loop may be non-minimum-phase (from time delay or non-collocated input/output map due to joint or structural flexibility), the MNN needs to be able to approximate a non-causal system to avoid instability. We used a robot dynamics simulator from a robot vendor (RobotStudio from ABB) to rapidly generate a large set of training trajectories. The simulator was also used to guide the training process and evaluate the performance of the trained MNN. When the NN feedforward control is applied to a physical robot, the tracking performance invariably degrades due to the discrepancy between simulation and reality. To narrow this reality gap, the output layer of NN-I was adjusted using a moderate amount of additional data obtained from the physical robot. This process is similar to transfer learning [8].
Using the proposed approach, we have demonstrated significantly improved tracking performance on a physical robot ABB IRB 6640-180 using NN compensation. We have also applied the trained NNs (NN-I) to two other robot models in simulation and both show improved tracking. Compared to state-of-the-art neural-learning controller design, the contributions of this paper are as follows. Transfer learning: We narrowed the reality gap by updating the output layer of the simulation-trained MNN using the data from the physical robot. • Demonstration: The improved performance using our approach was shown through simulation and experimental trials with a 6-degree-of-freedom (dof) nonlinear industrial robot. • Generalizability: Different robots from the same vendor exhibit similar inner-loop dynamics. We showed that the MNN trained for one robot improves the performance on other robot models from the same vendor as well.
The rest of the paper is organized as follows. Section 2 summarizes related work on neural learning for robot trajectory tracking control and highlights our contributions. Section 3 states the problem, followed by the methodology in Section 4. The simulation and experimental results are presented in Section 5. Finally, Section 6 concludes the paper and adds some new insight.

Related Work
Numerous techniques have been proposed to control uncertain nonlinear dynamical systems using universal approximators such as Gaussian processes [9] and neural networks [10][11][12]. In a general framework of neural-learning control, NNs are used to either approximate the robot forward dynamics or inverse dynamics for controller design [13], and the control problem falls within the context of either reinforcement learning [14,15], adaptive control [16], or optimal control [17]. Radial basis function (RBF) NNs, multi-layer feedforward NNs, and recurrent neural networks (RNNs) have been explored for trajectory tracking control [18][19][20][21][22]. The concept of incorporating fuzzy logic into NN control has also grown into a popular research topic [23][24][25][26].
Radial basis function NNs are a popular choice in neural adaptive controller design since they naturally express the dynamics approximation into a linear-in-parameter form [27]. In [28], an adaptive control strategy in joint torque level using RBF NNs for dynamics approximation was proposed to compensate for the unknown robot dynamics and an unknown payload. However, typically, a large amount of parameters are required to approximate robot dynamics accurately by using the RBF NNs, and it is a non-trivial task to select the centers and shapes of radial basis function kernels. In addition, users usually only have access to the joint position or velocity setpoint control for many industrial robots. In [29], a neural adaptive control technique was introduced for a trilateral teleoperation system with dual-master/single-slave manipulators. In [30], a NN-based sliding mode adaptive controller achieved robust trajectory tracking control for robot manipulators subject to uncertainties. However, torque control was required and only simulation results were presented. An adaptive controller based on a two-layer feedforward perceptron NN was proposed in [31,32] for robot manipulator control. The adaptation of the unknown parameters was derived by the Lyapunov stability analysis and Taylor series expansion. However, the extension to more layers becomes more complex. Adaptive neural control methods have also been investigated for humanoid robots [33,34]. Other proposed control frameworks that use general function approximators for linearizing controller design can be found in [35][36][37].
There have been several approaches that use NNs to approximate a system inverse for dynamical compensation [38][39][40]. None of these approaches consider the possible noncausality in stable dynamical inversion. In [41], a double-NN architecture was introduced, with one NN for the nonlinear system identification and another one learning the system inverse. However, the structure was complex and led to more parameters. It has also been shown that RNN outperforms Gaussian processes [9] in the performance of learning robot inverse dynamics with linear time complexity, given sufficient training data [42].
Our focus is on inner-loop dynamics compensation rather than torque-level control, which is more readily applicable to current industrial robots. The inner-loop dynamics are stable and nearly diagonal, but may be non-minimum-phase due to time delay or unstable zero dynamics. The use of an MNN for robot control in this setting is new and unique.

Problem Statement and Approach
Consider an n-dof robot arm under closed-loop joint servo control with input u ∈ R n , typically either the joint velocity commandq c or the joint position command q c , and the output q ∈ R n , the measured joint position. The inner-loop dynamics, G, is a sampled-data nonlinear dynamical system possibly with communication transport delay and quantization effects. Our goal is to find a feedforward compensator G † (as shown in Figure 2) such that, for a given desired joint trajectory q d , the feedforward control u = G † q d would result in asymptotic tracking: q = Gu → q d , as t → ∞. As G is assumed unknown, we will use a multi-layer neural network to approximate G † . The plant G may be non-minimum-phase (e.g., due to time delay or unstable zero dynamics from joint flexibility), so the causal inverse would be unstable. In this case, a stable inverse exists but is non-causal. Therefore, the neural network approximation of G † must be able to represent the possible non-causal property.
Locally, near a given configuration, G is a linear system that may be represented by an auto-regressive/moving-average (ARMA) model in the sampled-data implementation. As the coefficients in the ARMA model may depend on the robot pose, a global representation of G is a nonlinear ARMA (NARMA) model. Similarly, G † may also be represented by a stable but possibly non-causal NARMA model. In this case, a multi-layer feedforward network with a window of past and future inputs is a natural candidate to approximate G †. Note that since the desired trajectory q d is assumed to be known, future q d may be used in the non-causal implementation. It is also possible to directly use a recurrent neural network (RNN) to directly capture the dynamics in G †. A bidirectional recurrent neural network (BRNN) may be needed to capture the non-causal behavior.
We used an ABB IRB6640-180 robot as a test platform. IRB6640-180 is a robot manipulator with six revolute joints, a 180 kg load capacity and a maximum reach of 2.55 m. It has high static repeatability (0.07 mm) and path repeatability (1.06 mm) [43]. In this paper, we assume the robot internal dynamics and delays are deterministic and repeatable, which is reasonable considering the high accuracy and repeatability of the robot. The ABB controller has an optional externally guided motion (EGM) module [44], which provides an external interface of commanded joint angles, u = q c , and joint angle measurements at a 4 ms rate. ABB also offers a high-fidelity dynamical simulator, RobotStudio [45], with the same EGM feature included. We used this simulator to implement ILC and in turn generate a large amount of training data for G † implemented as a multi-layer NN.
EGM sends the commanded joint angle q c ∈ R 6 to the low-level servo controller as a setpoint, and reads the actual joint angle vector, q ∈ R 6 , measured by the encoders. Ideally, G should be the identity matrix. However, because of the robot dynamics and the inner control loop, G is a nonlinear dynamical system.

Methodology
Though Gu may be implemented in simulation and physical experiments, finding G † is challenging because G is difficult to characterize analytically. Our approach to the approximation of G † does not require the explicit knowledge of G and only uses Gu for a specific input u. The key steps include the following.

1.
Model-free gradient descent-based iterative refinement: For a specific desired trajectory q d , we approximate G by a linear time-invariant system G(s). The gradi- As shown in [46], this can be obtained using Gu for a suitably chosen u.

2.
Approximation of dynamical system using a feedforward neural network: The first step generates multiple (q d , u) trajectory pairs corresponding to u = G † q d . We would like to approximate G † by a feedfoward neural network and train the weights using (q d , u) from iterative refinement. To enable a feedforward net to approximate a dynamical system, we use a batch of q d to generate a batch of u. Furthermore, the q d batch is shifted with respect to u to remove the effect of transient responses.

3.
Improvement of neural network based on physical experiments: We perform the first two steps using a dynamical simulator of G (in our case, RobotStudio). When the trained NN approximation of G † is applied to the physical system, tracking errors may increase due to the discrepancy between the simulator and the physical robot. Instead of repeating the process all over again using the actual robot, we just retrain the output layer weights of the NN.
The rest of this section describes these steps in greater details.

Iterative Refinement
Consider a single-input/single-output (SISO) linear time-invariant system with transfer function G(s), input u, and output y. We use the underline of a signal to denote the trajectory of the signal over [0, T] for a given T. Given a desired trajectory y d , the goal is to find an optimal input trajectory u so that the 2 -norm of the output error, y − y d , is minimized. Given an initial guess of the input trajectory u (0) , we may use gradient descent to update u: where G * (s) is the adjoint of G(s), e y := (y (k) − y d ), and y (k) = G(s)u (k) . The step size α k may be selected using a line search to ensure the maximum rate of descent. Let the state space parameters of a strictly proper G(s) be (A, would be unstable and G * (s)e y cannot be directly computed. For a stable computation of the gradient descent direction, we use the technique described in [46,47]. We propagate backward in time with time-reversed e y to stably compute the time-reversed gradient direction. The result is then reversed in time to obtain the gradient descent direction forward in time. The key insight here is to perform the gradient generation using Ge y instead of an analytical model of G. As a result, no explicit model information is needed in the iterative input update. The process of one iteration of gradient descent based on Ge y is shown in Figure 2 of [47] and described below.
(a) Apply u (k) to the system at a specific configuration and obtain the output y (k) = Gu (k) and corresponding tracking error e (k) (c) Define the augmented input u (k) (t) = u (k) (t) + e y R (t). Let y (k) = Gu (k) (t) with the system at the same initial configuration as in step (a).
(d) Compute e (t) = y (k) (t) − y (k) (t), and reverse it in time again to obtain the gradient direction G * (s)e y (t) = e (T − t).
The above iterative refinement algorithm is easily applied to a single robot joint with G given by RobotStudio. The first guess of u (0) is simply chosen to be the desired output trajectory y d . The convergence and robustness analysis of the gradient-based ILC can be referred to in [47]. We may also fit the response of RobotStudio using a third-order system (an underdamped second-order system cascaded with a low-pass filter) within the linear regime of the response: Within the linear regime, G(s) may be used instead of G to speed up the iteration (as RobotStudio is more time-consuming than computing a linear response). Table 1 compares the tracking errors y − y d in RobotStudio, with using RobotStudio or the fitted linear model to generate the gradient descent direction. As the amplitude and frequency of the desired output is still in the linear regime, both approaches work well. Figure 3 shows the trajectory tracking improvement of a sinusoidal trajectory. Figure 3a reports the uncompensated case (i.e., y (0) = Gy d ). RobotStudio output indicates an initial dead time and transient, but matches well with the linear response afterwards. Because of the RobotStudio dynamics, there is a significant tracking error in both amplitude and phase. Figure 3b shows the input u and output y after 8 iterations. After the initial transient, y and y d are almost indistinguishable. The input u has amplitude reduction and phase lead compared to y d (also u (0) ) to compensate for the amplitude gain and phase lag of RobotStudio for the input y d .
Time (second) 0 5 10 15 Angle (degree)   The inner-loop control for industrial robots typically tightly regulates each joint, and the system is almost diagonal as seen from the outer loop. Therefore, the single-axis algorithm may be directly extended to the multi-axis case by using the same iterative refinement procedure applied to each axis separately.

Feedforward Neural Network
Despite the significantly improved tracking performance, iterative refinement finds u = G † q d only for a local specific q d . Our aim is to obtain a global approximation of G † directly by learning from a set of (u, q d ) obtained from iterative refinement. To this end, we use multi-layer NNs to represent G † . Due to the decoupled nature of G (command of joint i only affects the motion of joint i), we have n separate NNs, one for each joint. We train these NNs from scratch using a large number of sample trajectories covering the robot workspace, and the corresponding required inputs are obtained by iterative refinement as described in Section 4.1.
We hypothesize that G may be represented as a nonlinear ARMA model. We use a feedforward net to approximate this system, similar to [48]. At a given time t, the input of the NN is a segment of the desired joint trajectory q d (t) : The output is also chosen as a segment of u: u := {u(τ) : τ ∈ [t, t + T]}. Updating a segment of the outer loop control instead of a single time sample reduces computation by avoiding frequent invocation of the NN. The choice of the future segment of q d is to allow approximation of non-causal behavior needed in inverting the non-minimum phase or strictly proper G.
We explored the NN architectures, particularly the number of hidden layers and the number of nodes in each layer, by experiment. The selection of T (identically, the input dimension of NNs) should guarantee that the input of NNs contains enough information for the NNs to model the dynamics, and at the same time should not be too large or it requires a lot more training data. By testing, we chose T to correspond to 25 samples (in the case of EGM with a 4 ms sampling rate, T = 0.1 s). The input and output layers therefore have 50 and 25 nodes, respectively. Table 2 lists the results of mean-squared testing error on unseen testing data with different architectures. After testing, we selected a fully-connected network including two hidden layers, and each hidden layer contained 100 nodes, as illustrated in Figure 4. To cover the workspace of the robot as much as possible, the robot started at a random configuration and each trajectory was composed of 3000 samples, or 12 s. A large amount of trajectories, including sinusoids, sigmoids, and trapezoids, were collected to train the NNs in TensorFlow on a NVIDIA GeForce GTX 1050 Ti GPU card, completing in around 45 min. The main Python file ran on a desktop with an Intel Core i7-7700 CPU with 4 cores and 8 logical processors. We used a per-GPU batch size of 256, and used L 2 norm regularization to avoid overfitting, and the rectified linear unit (ReLU) function to introduce nonlinearity for the NNs. A fixed learning rate of 10 −4 was employed for the training using AdamOptimizer. Using trajectories with different velocity and acceleration profiles, we were able to activate and collect data from both linear and nonlinear regions of EGM.

Transfer Learning
A reality gap exists between RobotStudio and the physical robot due to model inaccuracies and external physical load (such as grippers and cameras) on the physical robot. RobotStudio captures the dynamics of the physical robot very well for a trajectory with low velocity, as in Figure 5a. However, for trajectories with large velocity and acceleration, discrepancies between RobotStudio and physical robot responses become obvious (mainly in the output magnitude) as shown in Figure 5b.
To narrow down the reality gap between the simulator and the physical plant, we applied the iterative refinement procedure on the physical robot for 20 different trajectories and used the results to fine-tune the output layer weights of the trained NNs. The trained NNs are essentially regarded as a feature extractor. As only the amplitudes of the response showed discrepancies (as shown in Figure 5), it was reasonable to only fine-tune the output layer weights using data from the physical robot, with the parameters of the first two layers of NNs fixed.

Single-Joint Motion Tracking
As multiple sinusoids were used in the training of the NN (with discrete angular frequencies ranging from 0.5 to 15 rad/s), we tested its generalization ability on a chirp (multi-sinusoids) that was not part of the training set (with continuous angular frequencies ranging from 0.628 to 1.884 rad/s). Figure 6 illustrates the result for joint 1 of the simulated robot with NN compensation. Figure 6a shows the uncompensated output with the robot output lagging behind the desired output. The nonlinear effects are denoted by the deviation of the linear model output from the RobotStudio output. Figure 6b shows that, by compensating for the lag effect and amplitude discrepancies, the command input generated by the NN drives the output to the desired output closely after a brief initial transient. To test the generalization capability of the trained network beyond the sinusoidal trajectory, Figure 7 shows the tracking result of a random trajectory generated from a uniform distribution for joint 1 of the simulated robot. The trained NN works for compensating for the lag effect and amplitude degradation, though not as well as for the chirp signal.  Figure 8 compares the tracking of a 6-dimensional sinusoidal joint trajectory (ω = 5 rad/s, and with different amplitudes and phases from the training data) without and with NN compensation. Table 3 lists the corresponding tracking errors in terms of 2 and ∞ norms (using error vectors at each sampling instant). The ∞ error was calculated by ignoring the initial transient.

Multi-Joint Motion Tracking
To further test the generalization capability of the trained NNs, we conducted another test of tracking 6-dimensional joint trajectories planned by MoveIt! for a large-structure assembly task [49] in RobotStudio. The comparison of tracking performance without and with NN compensation is shown in Figure 9. Table 4 reports the corresponding tracking errors.  Angle (radian)  We also evaluated the performance of the trained NNs on a Cartesian square trajectory (in the x-y plane, with a z constant of 1.9 m) with bounded velocity and acceleration (maximum velocity of 1 m/s and a maximum acceleration of 0.75 m/s 2 ). During the motion, the orientation of the robot end-effector remained fixed. Figure 10 shows that the trained NNs effectively corrected the tracking errors and Figure 11 highlights the tracking performance in three Cartesian directions. The corresponding 2 norms of tracking errors in three Cartesian directions are summarized in Table 5.  Table 5.

Comparison with MoveL Command
We also compared the NN-based compensation with built-in ABB motion command MoveL, which is used to move the tool center point linearly to a given destination. Considering the 0.6∼0.7 s initial dead time of EGM, we compared the tracking of a straight line with a total movement time of 5 s, using MoveL, EGM without NN compensation, and EGM with NN compensation, as well as running an extra 3 ILC iterations using the input generated by the NN as the initial guess u (0) . The corresponding RMSE tracking errors are summarized in Table 6. The built-in motion command MoveL achieves accurate tracking, but it is for a fixed path and cannot be modified on the fly. So it is not applicable to real-time collision avoidance or sensor-based motion control. EGM, on the other hand, allows for sensorguided motion with a high sampling rate, but the tracking performance is compromised by the delay and internal dynamics. Table 6 indicates that the NN compensation produces a similar level of tracking performance compared with MoveL, and at the same time it retains the advantages of the sensor-guided motion of EGM.

Physical Experiment Results on IRB6640-180
We evaluated the tracking performance of the same chirp signal (in Figure 6) for joint 1 of the physical robot, as demonstrated in Figure 12. The command input was filtered by the NNs through transfer learning. Figure 12a shows the uncompensated output with the robot output lagging behind the desired output. Figure 12b shows that, by compensating for the lag effect, the command input generated by the NNs drives the output to the desired output closely after a brief initial transient. We found that for trajectories with low velocity profiles (like this chirp signal and the sinusoid in Figure 5a), there exists a negligible difference between the filtered inputs of NNs trained by simulation data and NNs obtained by transfer learning.
However, for trajectories with large velocity and acceleration profiles, transfer learning plays a key role in generating an optimal input to reduce trajectory tracking errors for the physical robot. Figure 13 illustrates the tracking of a sinusoidal signal with an angular frequency of 8 rad/s (with different amplitude and phase from the training data). The comparison between Figures 13b,c demonstrates that transfer learning is necessary to correctly compensate for the phase lagging and magnitude degradation for the physical robot given an aggressive motion profile. Table 7 summarizes the corresponding 2 norms of tracking errors.

RobotStudio Simulation Results for Other Robots
One natural question that arises is whether the trained NNs also work on different robot models. We applied the trained NNs by simulation data to two robot models IRB120 and IRB6640-130 in RobotStudio, and tested their tracking on a chirp trajectory (the same one as we used in Section 5.1.1) and a sinusoidal signal with an angular frequency of 8 rad/s and a magnitude of 5 degrees, as shown in Figure 14.
The figure demonstrates that the trained NNs effectively compensate for the lag effect and amplitude discrepancies for these two robot models. For tracking of the chirp signal, NNs compensate for the delay well for these robots and a possible explanation for this could be that EGM has the same time delay for all robot models. For tracking of the sinusoid signal with obvious nonlinear effects, NNs compensate for both the delay and the magnitude degradation well, since their inner-loop dynamics have similar behavior to the IRB6640-180 robot. Figure 14. The trained NNs using IRB6640-180 also improve the tracking accuracy of a chirp and a sinusoidal joint trajectories for IRB120 and IRB6640-130 robots in RobotStudio, possibly because these robots have similar inner loop dynamics.

Conclusions and Future Work
In this paper, the possibility of combining multi-layer NNs and ILC to achieve highperformance tracking of a 6-dof industrial robot was explored. Large amounts of data for NN training were collected by ILC in a high-fidelity physical simulator. The trained NNs generalized well to unseen trajectories and tracking performance was improved significantly. Transfer learning was adopted to narrow down the reality gap, and the generality of the NNs was further explored on two different robot models.
Future work will include the development of predictive motion and force controllers based on the trained NNs. Moreover, the tracking accuracy could be further improved by introducing feedback corrections. Finally, it will be worthwhile to compare the NNs with other neural-learning controllers, and verify the possibility of the trained NNs serving as a general dynamics compensator.