Trajectory Modeling by Distributed Gaussian Processes in Multiagent Systems

This paper considers trajectory a modeling problem for a multi-agent system by using the Gaussian processes. The Gaussian process, as the typical data-driven method, is well suited to characterize the model uncertainties and perturbations in a complex environment. To address model uncertainties and noises disturbances, a distributed Gaussian process is proposed to characterize the system model by using local information exchange among neighboring agents, in which a number of agents cooperate without central coordination to estimate a common Gaussian process function based on local measurements and datum received from neighbors. In addition, both the continuous-time system model and the discrete-time system model are considered, in which we design a control Lyapunov function to learn the continuous-time model, and a distributed model predictive control-based approach is used to learn the discrete-time model. Furthermore, we apply a Kullback–Leibler average consensus fusion algorithm to fuse the local prediction results (mean and variance) of the desired Gaussian process. The performance of the proposed distributed Gaussian process is analyzed and is verified by two trajectory tracking examples.


Introduction
Trajectory tracking is a common problem in control and robotics, and its generation systems represent a large class of dynamical physical models. In the past few decades, various control schemes have been investigated and modeled, and most of them can be considered as a subset of computed torque control laws [1]. Generally speaking, in order to track trajectories, one needs to know the system model, such as the kinematic model, observation model, and motion model [2]. However, in many practical applications, one usually cannot obtain the model information/knowledge, or the system model is dynamical and is difficult to characterize. The system model is often filled with a high degree of uncertainty, nonlinearity, and dependency, which makes it difficult to model accurately. Therefore, traditional modeling methods are no longer suitable for the actual dynamical environment [3]. More recently, data-driven approaches are getting more and more attention in many fields, such as the control and machine learning communities [4,5]. Since data-driven methods can train the system model with high efficiency and precision, they have become the most popular choice for system modeling [6,7]. In particular, the Gaussian process (GP) is the most representative one, and it has been successfully applied to many fields.
A research frontier in the realm of GP is the trajectory modeling issue. Due to its capability to tackle complex perturbations, uncertainties, dependencies, and nonlinearities, GP is becoming a popular choice in various systems, such as in solar power forecasting [8], permanent magnet spherical motors [9], iterative learning control [10], and in swarm kinematic model [11]. In particular, GP has been proven to be effective in improving the learning accuracy and the learning effectiveness of uncertainties and dependencies in low

Related Works
Gaussian process-based modeling and based trajectory tracking have been widely investigated and applied over the past two decades. In the first place, most focus on the centralized GP and the multi-input-output GP. In addition, they are developed based on the need for engineering applications in learning and control fields such as GP-based tracking control, state space model learning, and their applications to trajectory tracking. For example, Beckers et al. studied the stable Gaussian process-based tracking control of Lagrangian systems [1]. Umlauft et al. learned stable Gaussian process state space models [27], while Mohammad et al. learned stable nonlinear dynamical systems with Gaussian mixture models [5]. In addition, Pushpak et al. designed control barrier functions for unknown nonlinear systems using Gaussian processes [28]. Umlauft et al. considered human motion tracking with stable Gaussian process state space models in ref. [29] and proposed an uncertainty-based control Lyapunov approach for control-affine systems modeled by the Gaussian process [30]. They also calculated uniform error bounds of Gaussian process regression with application to safe control [31]. Even Gaussian process-based trajectory tracking and control are becoming research hotspots; they focus mainly on one agent and are seldom involved in multi-agent systems. In the second place, distributed and centralized GPs are flourishing in solving data-driven learning algorithms for multi-agent systems. Generally speaking, the main research results are organized as follows: (1) For contributions of models and theories, the unknown map function was modeled and characterized as a GP but with zero-mean assumption, and a distributed parameter and non-parameter Gaussian regression was proposed by using Karhunen-Loeve expansion in refs. [13,14]. To scale GP to large datum, Deisenroth et al. introduced a robust Bayesian committee machine, a practical and scalable product-of-experts model for large-scale distributed GP regression [32]. To address the hyperparameter optimization problem in big data processing, Xie et al. proposed an alternative distributed GP hyperparameter optimization scheme using the efficient proximal alternating direction method of multipliers [33]. Multiple-task GP

Contributions
More recently, GP was widely used to model tracking systems and applied to track the target in a real-world environment, such as speed racing (quadrotors) [58], trajectory tracking for wheeled mobile robots [59], 3D people tracking [60], and Simultaneous Localization and Mapping (SLAM) [61,62]. However, these applications focus on one agent, which ignores the advantages of multi-agent systems. After surveying these related references, we find that the trajectory tracking problem is mainly solved by control methods, not datadriven methods, and we focus on one agent, not a multi-agent collaboration. In addition, these existing GP-based learning algorithms are limited by training manners. Motivated by the above discussion, we investigate the distributed GP to learn the trajectory system model in this paper. More specifically, the main contributions of the paper are four-fold.
(1) Compared with GP in Lagrangian systems [1,58], this paper considers a general statespace model for both discrete-time and continuous-time.
(2) Compared with centralized GP in the state space model [27][28][29][30], PD control and model predictive control are combined together with GP to achieve tracking system modeling and estimating, which can make the estimation error globally uniformly bounded. (3) Compared with existing multiple GPbased centralized approaches, such as collaborative GP [32][33][34][35]63,64], this paper achieves a distributed GP manner to estimate the state, and we apply Kullback-Leibler (KL) average consensus to fuse local training results of GPs, which is different with the Wasserstein metric for measuring GP [65]. (4) Compared with the centralized GP without giving the performance bound [32][33][34][35]63,64] or only providing Kullback-Leibler average analysis [26], this paper analyzes the probabilistically globally ultimate bound of distributed GP.

Paper Structure
The remainder of the paper is organized as follows. Section 2 introduces some preliminaries, including notations, graph theory, Gaussian process, and Kullback-Leibler average consensus. Section 3 states the considered systems. Section 4 designs the local control strategy and proposes a Kullback-Leibler (KL) average consensus to fuse the local predictions of GP. Section 5 provides two tracking experiments. Finally, Section 6 concludes the paper.

Notation
Throughout the paper, vectors and vector-valued functions are denoted with bold characters. Matrices are described with capital letters. trace(·), log(·), det(·), ·, · , · , ⊕, , N (·, ·), and GP (·, ·) denote, respectively, the trace operation, the logarithm operation, the determinate operation, the inner product, the 2-norm of a matrix or vector, the addition operation of probability density functions (PDFs), the multiplication operation of PDFs, a Gaussian distribution, and a Gaussian process. Moreover,

Paper Structure
The remainder of the paper is organized preliminaries, including notations, graph theory, G average consensus. Section 3 states the consider control strategy and proposes a Kullback-Leibler predictions of GP. Section 5 provides two tra concludes the paper.

Notation
Throughout the paper, vectors and vector-v characters. Matrices are described with capital l ⊕ , ⨀ , (•,•) , and (•,•) denote, respectivel operation, the determinate operation, the inner pr the addition operation of probability density operation of PDFs, a Gaussian distribution, and a Moreover, , , , and denote, respectively, th the second-order differential operation, the pred addition, ( ‖ ), , denote, respectively probabilities and , the expectation operation, identity matrix. In addition, , , and (•) operation, the partial derivative operation, and th

Graph Theory
A graph is defined as = ( , ); where i edges. In particular, graph is undirected iff ( The order is | | and the size of is | |. Further set of neighbors for node .
Then, the Gaussian process can be organized In addition, the covariance function (kernel states/variables , ′ ∈ χ and the common kern squared-exponential (SE) kernel, the polynomia Matèrn kernel. Assumption 1. Suppose the measurement equation observed vector, ∈ ⊂ ℝ is the state vector de denote, respectively, the first-order differential operation on x, the second-order differential operation, the prediction, and the mean operation of x. In addition, D KL (p q), E V, I n denote, respectively, the KL divergence/distance between probabilities p and q, the expectation operation, the variance operation, and an n-by-n identity matrix. In addition, dx du , ∂x ∂u , and O(·) denote, respectively, the derivative operation, the partial derivative operation, and the complexity.

Graph Theory
A graph is defined as G = (N , ε); where N is a set of nodes and ε ⊆ N × N a set of edges. In particular, graph G is undirected iff (u, v) ∈ ε ⇔ (v, u) ∈ ε for all u, v ∈ N . The order is |N | and the size of G is |ε|. Further, let N i = {j ∈ N : (j, i)} ∈ ε denote the set of neighbors for node i.

Definition 1. ([66]).
A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
A Gaussian process is completely specified by its mean function and covariance function. We define mean function m(x) : x ∈ χ → R and the covariance function (kernel) and denote the Gaussian process as f (x) ∼ GP(m(x), k(x, x )). When f : χ → R n is an ndimensional map, the GP can be denoted by f j (x) ∼ GP(m(x), k(x, x )), j ∈ {1, . . . , n}.
Then, the Gaussian process can be organized as [29] f (x) = In addition, the covariance function (kernel) measures similarity between any two states/variables x, x ∈ χ and the common kernel functions include the linear kernel, squared-exponential (SE) kernel, the polynomial kernel, the Gaussian kernel, and the Matèrn kernel. Assumption 1 . Suppose the measurement equation is y = f (x) + , where y ∈ R m is the observed vector, x ∈ X ⊂ R n is the state vector defined in a compact set X, and is the measurement noise obeying a Gaussian distribution with zero mean and variance σ 2 I n (denoted by ∼ N 0, σ 2 I n ).
In addition, f is the unknown mapping function ( f : χ → R n ) and is assumed to be a GP (denoted by f ∼ GP (0, k θ ( x, x )) ). Here k θ (x, x ) is a kernel function with respect to hyper-parameters θ, K XX = k θ (X, X) denotes the covariance matrix of set X, and K xX = k θ (x, X) denotes the covariance matrix between x and X. Generally speaking, given a training set D = (X, y), (input X and output y) [29], the log-likelihood can be computed by log p(y|X, θ ) = − 1 2 y T K XX + σ 2 I n −1 y − 1 2 log K XX + σ 2 I n − n 2 log 2π. ( Then, when a new input x * is introduced, the posterior prediction of the Gaussian process [29] is To summarize, the likelihood maximization of (2) is performed to compute gradients for training, and the mean and covariance functions (3) are used for fast predictions.
More specifically, given an arbitrary new testing input x * ∈ χ conditioning a dataset D described above, the prediction response y * is jointly Gaussian distributed with the training set, which is given by For j = 1, . . . , n, the posterior distribution corresponding to fj(·) at x * yields a Gaussian distribution with mean function and covariance function as Furthermore, in order to learn the hyper-parameter θ j given a chosen kernel, we can use the maximum likelihood function based on Bayes' rules as which can be solved by gradient-based approaches [29].

Kullback-Leibler Average Consensus Algorithm
This section introduces the consensus/fusion algorithm of GPs. A Gaussian process is a Gaussian probability density function over mean function and covariance function. Therefore, the fusion of GPs is indeed the fusion of probabilities. It raises a problem: How to achieve consensus/fusion of probabilities among multiple agents?
Before proceeding on, we first introduce some definitions. p(x)dx = 1 and p(x) ≥ 0, ∀x ∈ R n denote the set of probabilities (PDFs) over R n and let p i (·) ∈ P (i ∈ N ) denote the local probability/PDF of agent i.

Definition 3.
(Kullback-Leibler divergence [26]). In statistics, the Kullback-Leibler divergence, D KL (p q) (also called relative entropy), is a statistical distance: a measure of the probability distribution p(·) is different from the probability distribution q(·), which is defined as (for distributions p(·) and q(·) of a continuous random variable x) Definition 4. (Probabilistic operation [26]). Define the plus ⊕ and multiplicative operators over probabilities (p(·) and q(·) ) for a variable (x ) and a real constant as a Then, we attempt to find a Kullback-Leibler average consensus/fusion algorithm over probabilities obtained by multiple agents.
Motivated by this, we define the weighted KLA (p) among the probabilities p i N i=1 as where a i ≥ 0 denotes the weight of agent i and satisfies ∑ i∈N a i = 1.
Then, the average consensus/fusion problem is to achieve for all agents i ∈ N , where l is the consensus step and p represents the asymptotic KLA with uniform weights. Second, we attempt to find the solution of the average consensus p in (9). Based on [26], the solution is with a i = 1/|N |. In addition, the local consensus of agent i at the l-th consensus step can be obtained by where a ij is the consensus weight satisfying a ij ≥ 0, ∑ i∈N a i,j = 1 and a ij represents the (i, j)-th component of the consensus matrix A (if j / ∈ N i a i,j = 0). Therefore, when the l-th iteration of the consensus algorithm is initialized by p i 0 (·) = p i (·), we can finally obtain the consensus as Third, for special Gaussian case, the local probability p i (·) takes the form as where µ i ∈ R n and ∑ i ∈ R n×n denote the mean vector and the covariance matrix, respectively. In view of this case, the KLA can be directly obtained by operating the means and the covariances instead of probabilities. The following lemma states the KLA on Gaussian distributions. (14), with corresponding weigh a i , then the weighted KLA p(·) = N ·; µ,∑ can be calculated by directly fusing the means µ i and the covariance matrices ∑ i as Lemma 1 indicates that the consensus/fusion of Gaussian probabilities can directly operate their means and covariance matrices. Note that a Gaussian process is indeed a Gaussian probability. Therefore, the KLA consensus/fusion on GPs can be directly obtained by fusing the mean functions and the covariance functions.

Uniform Error Bounds
This section analyzes the probabilistic uniform error bounds. [31]). ∀x ∈ X, if there exists a function η(x) such that µ(x) − f (x) ≤ η(x), then, on a compact set X ⊂ R n , GP has a uniformly bounded error. A probabilistic uniform error bound is one that holds with a probability of at least 1 − δ for any δ ∈ (0, 1).

Definition 6.
(Lipschitz constant of the kernel [64]). The Lipschitz constant of a differentiable covariance kernel k(·, ·) is Next, we show that the posterior prediction (3) of GP is continuous. Given the continuous unknown f with Lipschitz constant L f and the Lipschitz continuous kernel k with Lipschitz constant L k , we then have the following theorem.  for any τ ∈ R + and δ ∈ (0, 1) with In addition, ∀x ∈ X , it follows that Proof of Theorem 1. The proof is given in Appendix A.

Asymptotic Analysis
The asymptotic analysis of the error bound (19) in the limit N → ∞ is organized as the following theorem.

Theorem 2. ([31]
). Given a GP defined by the continuous covariance kernel function k with Lipschitz constant L k , and an infinite data response of measurements X t , y i t of the continuous unknown map f with Lipschitz constant L f and the maximum absolute value f max . The first N measurements inform the posterior predictions of the GP as (µ(·) and ∑(·)). If there exists a > 0 such that Proof of Theorem 2. The proof is given in Appendix B.

Problem Formulation
The trajectories generate from a continuous dynamical system where x ∈ χ ⊂ R n in a compact set, χ denotes the state (location), u ∈ U ⊆ R n denotes the control input, ω denotes the process noise with w ∼ N 0, σ 2 ω I and the initial state is x(0) = x 0 . We have N agents/sensors connected with a network to acquire measurements (location or velocity). In particular, sensor i measures where y i j is the observed vector of sensor i(i = 1, . . . , N ) at the j-th step (j = 1, . . . , N), i j is the measurement noise with i j ∼ N 0, σ 2 I . Suppose that a training data set D of trajectories is given. D contains the state (current location) and the measurement, which is denoted by The nonlinear map function f : χ → R n is unknown and is assumed to be a Gaussian process. In addition, the following assumption is satisfied.
Assumption 2. f (x) Suppose the measurement is Lipschitz continuous and has a bounded RKHS (reproducing kernel Hilbert space) norm with respect to fixed common kernel k, The objective is to find an estimatedf of f , for which the output trajectory x tracks the desired trajectory x d T such that the tracking error e = x − x d = e 1 e 2 T vanishes over time, i.e., lim t→∞ e = 0. l. Since the noises ω and i j and the uncertain dynamics affect the system and control, we use multiple agents to eliminate the influence of stochastic uncertainty, i.e., given localf i , the goal is also to fuse them and to find a fused/consensus f such that the uncertainty also vanishes over time.

Control Design and Analysis
Classical control uses static feedback gains. Low feedback gains are designed to avoid saturation of the actuators and good noise suppression. However, the considered unknown dynamics require a more minimal feedback gain to keep the tracking error under a defined limit. After performing a training procedure, we use the mean function of GP to adapt the gains. For this purpose, the uncertainty of the GP and multiple agents are employed to scale the feedback gains.
Before proceeding on, the following natural assumptions and lemmas are given.
the trajectory x(t) of the dynamics (21) is globally ultimately bounded.
Next, we design the controller and the control law such that stability and highperformance tracking are achieved. The controller is designed as wheref is the model estimation of nonlinear dynamics f andf is obtained by utilizing the posterior mean function µ N of GP trained by the data set D, and ρ is the control law.
In addition, ρ is designed as a Proportional-Derivative (PD) type controller ρ = ..
where r = k p e 1 + e 2 is the filtered state with . r = f (x) −f (x) − k c r, k p ∈ R + , and the control gain k p ∈ R + .
Given the above controller, one needs to verify the effectiveness of the model estimationf and the choices of the parameters k d and k p . The following theorem states the control law with guaranteed boundedness of the tracking error. Theorem 3. Consider the system (23), where f satisfies Assumption 2 and admits a Lipschitz constant L f . If Assumption 3 is satisfied, then the controller withf = µ N and the control law guarantee that the tracking error is globally ultimately bounded and converges to a ball , ∀x ∈ X, where β and γ are given in Theorem 1, with a probability of at least 1 − δ, δ ∈ (0, 1).

Proof of Theorem 3.
The proof is given in Appendix C.
Remark 1. From Theorem 3, it can be seen that trajectory tracking with high probability is achieved with the proposed GP-based controller. Compared with most existing results where only uniformly ultimate boundedness of the trajectory tracking errors was achieved [1,68], the proposed control law ensures high control precision in the presence of the estimation errors from GP.
Proof of Remark 1. The proof is given in Appendix C.

Consensus
The aforementioned control law focuses one agent/sensor. Since the noises ω and i j affect the measurements and the dynamical system, the proposed controller will fluctuate for different agents. Furthermore, since the uncertainty of dynamics exists, the proposed controller may also change for different agents. Therefore, this section will fuse them and make them reach a consensus, i.e., given localf i (µ i N and ∑ i N ), the goal is to fuse them and to find a fused/consensus f µ N and ∑ N such that the uncertainty and the disturbance can vanish over time. Obviously, the controller (23) and the control law (24) for different agents can reach a consensus f µ N and ∑ N .
More specially, after training the localf i by using GP, node i sends the result to its neighbors N i . After collecting the training results from neighbors, it performs the following dynamic consensus/fusion step. Given weights a i satisfying a i ≥ 0 and ∑ i∈N i a i = 1 based on the Kullback-Leibler average consensus given in Section 2.4, the desired weighted KLA takes the Gaussian form as f (·) = N ·; µ, ∑ , in which the fusion of the mean function µ and the fusion of the covariance function ∑ can be calculated by while the global/centralized fusion using i = 1, . . . , N . The flowchart is given in Figure 1.

Remark 1.
From Theorem 3, it can be seen that trajectory tracking with high probability is achieved with the proposed GP-based controller. Compared with most existing results where only uniformly ultimate boundedness of the trajectory tracking errors was achieved [1,68], the proposed control law ensures high control precision in the presence of the estimation errors from GP.
Proof of Remark 1. The proof is given in Appendix C. □

Consensus
The aforementioned control law focuses one agent/sensor. Since the noises and affect the measurements and the dynamical system, the proposed controller will fluctuate for different agents. Furthermore, since the uncertainty of dynamics exists, the proposed controller may also change for different agents. Therefore, this section will fuse them and make them reach a consensus, i.e., given local (μ and ∑ ), the goal is to fuse them and to find a fused/consensus ̅ μ and ∑ such that the uncertainty and the disturbance can vanish over time. Obviously, the controller (23) and the control law (24) for different agents can reach a consensus ̅ μ and ∑ .
More specially, after training the local by using GP, node sends the result to its neighbors . After collecting the training results from neighbors, it performs the following dynamic consensus/fusion step. Given weights satisfying 0 and ∑ while the global/centralized fusion using 1, . . . , . The flowchart is given in Figure 1.  After obtaining the consensus mean function, the controllers of different agents can be designed to be a unified controller.

Remark 2.
The main advantage of the distributed method lies in that local nodes can only receive part of the training data or even missing data. Neighboring nodes can make the prediction faster and keep high accuracy through information interaction and consensus algorithm, which can also avoid processor failure caused by data loss or node/sensor failure.

GP-Based Model Predictive Control for Discrete-Time System
The above discussion discusses the continuous-time system. Usually, we need to discretize the system in an actual physical system. This section designs the control strategy for discrete-time by using GP-based model predictive control (MPC).
First, the considered system (21) is assumed to be discrete-time and can be modeled by GP, where the control tuple x k = x k u k T and the state difference δx k = x k+1 − x k are, respectively, designed as the training input and the desired target. Given the training date set D = {x t , y = δx k } k=1...N , according to (4) and (5), at a new training input x * , we can obtain the mean function and covariance function as follows where , and K is defined in (4). Therefore, (26) is given to predict the next step. By using the moment matching approach [46], the mean function and covariance function of the training target at time k can be calculated by At time k + 1, the mean and covariance functions are updated as For more details, please refer to [46]. Then, based on (6), we next attempt to learn the hyper-parameters θ. A distributed GP-based MPC scheme is presented to address this problem. First, we design the objective function as where the cost function is where p is the desired trajectory (desired state), Q and R are positive definite weight matrices, and L is the prediction horizon and also the control horizon. According to GP in Section 2, (30) can be rewritten as Next, to address the optimization problem (29), a gradient-based method is used. Set EF l . Using the chain rule, the gradient can be calculated by where ∂F l ∂u l , ∂F l ∂ ∑ l and ∂F l ∂u l−1 are easy to calculate. In addition, where ∂u k+l−1 ∂u k+l−1 and ∂ ∑ k+l−1 ∂u k+l−1 are easy to calculate. Finally, the gradient-based algorithm is formulated as Algorithm 1.

Remark 3.
Similarly, due to the stochastic uncertainty caused by the noises and model perturbations, we can use multiple agents to address this problem. The consensus/fusion algorithm is given above, which is similar to the continuous system. Therefore, we will not introduce it any more.

Remark 4.
The GP has been widely applied in various real-world applications such as quadrotor tracking, 3D people tracking, localization and mapping, and control-based application models. These applications have attracted much attention from engineers and researchers. As for the limitations, in our opinion, the first is that the model needs real-world data to achieve perfect training and application. The second is that the dynamics are Gaussian distributed or Gaussian-approximate distributed.

Simulations
To evaluate the performance and to verify the effectiveness of the proposed algorithms, this section provides two trajectory tracking examples, where one is the trajectory tracking of a robotic manipulator and the other one is the trajectory tracking of an unmanned quadrotor. All simulations are conducted on a computer with 2.6 GHz Intel(R) Core(TM) i7-5600U CPU and MATLAB R2015b.

Trajectory Tracking of Robotic Manipulator
First, we consider the trajectory of a Puma 560 robot arm manipulator in x-y-z plane with 6 degrees of freedom (DoFs), which is shown in Figure 2. The Puma 560 robot was designed to have approximately the dimensions and reach of a human worker. It also had a spherical joint at the wrist, just as humans have. Roboticists use like waist, shoulder, elbow, and wrist when describing serial link manipulators. For the Puma, these terms correspond respectively to joints 1, 2, 3, and 4-6, which is shown in Figure 2.

Trajectory Tracking of Robotic Manipulator
First, we consider the trajectory of a Puma 560 robot arm manipulator in --plane with 6 degrees of freedom (DoFs), which is shown in Figure 2. The Puma 560 robot was designed to have approximately the dimensions and reach of a human worker. It also had a spherical joint at the wrist, just as humans have. Roboticists use like waist, shoulder, elbow, and wrist when describing serial link manipulators. For the Puma, these terms correspond respectively to joints 1, 2, 3, and 4-6, which is shown in Figure 2. For the considered robot arm, 1, 2, and 3 are the control torques of the motors controlling the joint angles , , . The trajectory of the robotic manipulator can be controlled by these torques. The motion can be described by the following Lagrangian system [1] where denotes the generalized coordinates with their time derivatives , and denotes the generalized input.
is the mass matrix, is the Coriolis matrix and is the potential energy matrix. An additional unknown dynamic (train to obtain its form), which depends on , , , affects the system as a generalized force, in which one can refer to [2] for details. The process methodology is illustrated in Figure 3. For the considered robot arm, τ1, τ2, and τ3 are the control torques of the motors controlling the joint angles φ, θ, ψ. The trajectory of the robotic manipulator can be controlled by these torques. The motion can be described by the following Lagrangian system [1] H(q) ..
where q denotes the generalized coordinates with their time derivatives . q, .. q and τ denotes the generalized input. H is the mass matrix, C is the Coriolis matrix and G is the potential energy matrix. An additional unknown dynamic κ(q)(train to obtain its form), which depends onq = .. q T , . q T , q T T , affects the system as a generalized force, in which one can refer to [2] for details. The process methodology is illustrated in Figure 3.
Tracking error is the error between the actual value of joint angle or velocity with the desired valuesq The following controllers are tested. (1) Computed torque (CT) controller:  3 ,Î xx 3 ,Î yy 3 ,Î zz 3 are the unknown parameters.
are called the regressor vectors defined as [70].m 3 = − S T Y 1 /γ 1 ,  Tracking error is the error between the actual value of joint angle or velocity with the desired values The following controllers are tested.  From Figure 4, it can be seen that the PD controller action is able to hold the position of the robot arm at the desired joint angles. λ = 30 and K d = 20 are the gains associated with holding the respective positions necessary. The convergence of the plots is achieved in about 10 s. The first couple runs of the robot can be used for tuning the robot, and the robot should have good repeatability after that. In addition, from the position and velocity plots of a computed torque controller ( Figure 5) and adaptive controller (Figure 6), it is observed that both controllers are able to achieve convergence of parameters to the desired values. However, the proposed PD torque controller has a quicker convergence time and also has lesser gains compared to the computed torque controller. The error results from Table 1 further verify the effectiveness. Therefore, the proposed PD torque controller is better for the considered application. In addition, we can also obtain that the distributed GP can effectively eliminate the uncertainty and disturbance caused by the system model and the noises.
with holding the respective positions necessary. The convergence of the plots is achieved in about 10 s. The first couple runs of the robot can be used for tuning the robot, and the robot should have good repeatability after that. In addition, from the position and velocity plots of a computed torque controller ( Figure 5) and adaptive controller (Figure 6), it is observed that both controllers are able to achieve convergence of parameters to the desired values. However, the proposed PD torque controller has a quicker convergence time and also has lesser gains compared to the computed torque controller. The error results from Table 1 further verify the effectiveness. Therefore, the proposed PD torque controller is better for the considered application. In addition, we can also obtain that the distributed GP can effectively eliminate the uncertainty and disturbance caused by the system model and the noises.     Figure 7. Note that IGP and CGP, i.e., existing GP, are centralized methods, which are different   To further compare with the multiple agent processing methods, these approaches are tested: (1) Independent GP (IGP): model trained independently for each input [6]; (2) Combined GP (CGP): one agent to train GP by combining data across inputs [34]; (3) Proposed distributed GP with BIC (Bayesian Information Criterion) criterion. The training results (interpolation and extrapolation manners) of NMSE (normalized mean square error) with regard to the number of training data points are demonstrated in Figure 7. Note that IGP and CGP, i.e., existing GP, are centralized methods, which are different from the proposed GP, which is a distributed method for training. From Figure 7, the first line displays the training results of the interpolation manner for the three methods. As we can see from it, for joint 1, the proposed distributed GP achieves the best performance for any number of training points. For joint four and joint six, the performance of the proposed GP is close to IGP and also better than CGP when the training points increase. The second line displays the training results of the extrapolation manner for the three methods. As we can see from it, for joint 1, the proposed distributed GP achieves the best performance for small numbers of training points (<500). However, when training points are increased further, the CGP is better than the proposed distributed GP (note that they are very close). For joint four and joint six, the performance of the proposed GP is close to IGP and also better than CGP when the training points increase. To sum up, the proposed distributed GP model can reach the performance of the centralized method and is close to (even outperforms) the existing state-of-the-art centralized-based multiple combined and multi-task methods.

Trajectory Tracking of an Unmanned Quadrotor
This section tests the proposed distributed GP-based model predictive control (GPMPC) of an unmanned quadrotor. The trajectory of an unmanned quadrotor is generated by a discrete-time Euler-Lagrange dynamical system [2]. The goal is to track its positions (X, Y, Z) and Euler angles ( , , ). To compare with the state-of-the-art

Trajectory Tracking of an Unmanned Quadrotor
This section tests the proposed distributed GP-based model predictive control (GPMPC) of an unmanned quadrotor. The trajectory of an unmanned quadrotor is generated by a discrete-time Euler-Lagrange dynamical system [2]. The goal is to track its positions (X, Y, Z) and Euler angles (φ, θ, ψ). To compare with the state-of-the-art controllers, the efficient MPC (EMPC) [71] and the efficient nonlinear MPC (ENMPC) [72] are also tested in the simulations. The parameters are selected as L = 5 and Q = R = diag (1, 1, 1).
In the first scenario, the unmanned quadrotor tracks a "Lorenz" trajectory with Gaussian white noise (zero mean and unit variance), which is shown in Figure 8. To train the system model, we use the efficient MPC design proposed in [71]. One hundred seventy measurements, states, and controls are used to train the GP. The datum from the rotational system is with the range [0, 1], the angle ϕ is with a range [−1.6, 1.6], and the input is with the range −4 × 10 −8 , 7 × 10 −8 . The training of GP takes 5 s, and we use 10 agents to train. The values of mean squared error (MSE) trained by GP are very small. The mean squared error (MSE) for the positions is 4.3618 × 10 4 close to the stable GPMPC (GPMPC1) [73] with 4.3498 × 10 4 ; MSE for the angels is 1.5743 × 10 8 also close to GPMPC1 with 4.3030 × 10 9 . This indicates that the proposed distributed GP (GPMPC2) is efficient and well-trained, which is illustrated in Figure 9 (with different confidences). Note that the stable GPMPC (GPMPC1) is the most recent best method at present and is also a method for one agent. Therefore, the training results are very close, which indicates that the proposed distributed GP can achieve a good training performance. also a method for one agent. Therefore, the training results are very close, which indicates that the proposed distributed GP can achieve a good training performance.  The positions and attitudes tracking results are demonstrated in Figure 10, and the tracking errors are displayed in Figure 11 and Table 2. As we can see from Figures 10 and 11, the proposed distributed GP can learn the system model well and track the trajectories with high precision, which is close to the state-of-the-art controllers. This also indicates that as long as the training sets are introduced, we can track the trajectory without model knowledge (model-free), i.e., the proposed GP can learn the system model well. In addition, as long as multiple agents are introduced, the model uncertainties and noise disturbances can be eliminated and suppressed.  Figure 10, and the tracking errors are displayed in Figure 11 and Table 2. As we can see from Figures 10 and 11, the proposed distributed GP can learn the system model well and track the trajectories with high precision, which is close to the state-of-the-art controllers. This also indicates that as long as the training sets are introduced, we can track the trajectory without model knowledge (model-free), i.e., the proposed GP can learn the system model well. In addition, as long as multiple agents are introduced, the model uncertainties and noise disturbances can be eliminated and suppressed.      Furthermore, the covariance on positions and attitudes by different GP models (the stable GPMPC (GPMPC1) [69] and the proposed distributed GPMPC (GPMPC2)) is displayed in Figure 12. This indicates that the proposed distributed GP can also reach the performance of the state-of-the-art GP model. In the second scenario, the unmanned quadrotor tracks an "Elliptical" trajectory with Gaussian white noise (zero mean and unit variance), which is shown in Figure 13. The tracking performance and the tracking errors are shown in Figures 14 and 15 and Table 3. From Figures 13-15, we can also see that the proposed distributed GP can learn the trajectory model effectively, which is very close to the desired reference trajectory and is close to the state-of-the-art controllers. The covariance results by GPMPC1 and GPMPC2 are shown in Figure 16, which further verifies the effectiveness of the proposed distributed GP.  In the second scenario, the unmanned quadrotor tracks an "Elliptical" trajectory with Gaussian white noise (zero mean and unit variance), which is shown in Figure 13. The tracking performance and the tracking errors are shown in Figures 14 and 15 and Table 3. From Figures 13-15, we can also see that the proposed distributed GP can learn the trajectory model effectively, which is very close to the desired reference trajectory and is close to the state-of-the-art controllers. The covariance results by GPMPC1 and GPMPC2 are shown in Figure 16, which further verifies the effectiveness of the proposed distributed GP. In the second scenario, the unmanned quadrotor tracks an "Elliptical" trajectory with Gaussian white noise (zero mean and unit variance), which is shown in Figure 13. The tracking performance and the tracking errors are shown in Figures 14 and 15 and Table 3. From Figures 13-15, we can also see that the proposed distributed GP can learn the trajectory model effectively, which is very close to the desired reference trajectory and is close to the state-of-the-art controllers. The covariance results by GPMPC1 and GPMPC2 are shown in Figure 16, which further verifies the effectiveness of the proposed distributed GP.  Figure 13. Elliptical trajectory tracking. Figure 13. Elliptical trajectory tracking.

Conclusions
This paper used the Gaussian process to learn the trajectory model, and a distributed GP-based model learning strategy was proposed. For the continuous-and discrete-time system, we, respectively, designed a GP-based PD controller and a GP-based MPC controller to address the problem. To address the uncertainties of the model and the disturbances of the noises, a distributed multiple-agent system was used to train the model. In addition, since data-driven algorithms needed a large number of training sets, the distributed GP model could also be employed to address this problem by using a Kullback-Leibler average consensus fusion criterion.
The proposed GP can solve the actual model-free problem as long as the training data sets are given. Since the considered multi-agent is interconnected and it is only used to eliminate the uncertainties of the model and disturbances of the noises, future research mainly focuses on the efficiency of distributed Gaussian process and the robustness of the multi-agent network. In the future, we will focus on the application deployment of an unmanned aerial vehicle (UAV) and its usage in UAV detection and location. UAV racing is a challenging problem to overcome.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Proof of Theorem 1
Proof. First, we prove the Lipschitz constant of µ(·) and the modulus of continuity of ∑(·). At two different states x and x ' , the norm of the difference of µ(·) is µ(x) − µ(x ) = (K xX − K x X )α with α = K XX + σ 2 I −1 y. Then, based on the Lipschitz continuity of the kernel and the Cauchy-Schwarz inequality, we have µ(x) − µ(x ) ≤ L k √ N α x − x , which proves that µ is Lipschitz continuous.
Similarly, we have Due to the Lipschitz continuity of kernel k (also K), we have Therefore, the continuous modulus ω ∑ can be obtained by combing the above equations and taking the square root.

Appendix B Proof of Theorem 2
Proof. According to Theorem 1, given β N (τ) = 2log M(τ,x)π 2 N 2 3δ and N > 0, it follows that with a probability of at least 1 − δ/2. In addition, given the distance r = max x,x ∈X x − x , we can obtain a trivial bound M(τ, X) ≤ 1 + r τ n . Then, we can obtain that β N (τ) ≤ 2n log 1 + r τ + 4 log πN − 2 log 3δ. On the one hand, Lipschitz constant L µ is bounded by On the other hand, since f is bounded by f max and K X N X N is positive semi-definite, K X N X N + σ 2 I −1 y N is bounded by where vector ϕ N composed of N variables is a Gaussian disturbance with zero mean and covariance σ 2 . This indicates that ϕ N 2 σ 2 obeys a chi-square distribution, i.e., ϕ N 2 σ 2 ∼ χ 2 N . Note that with a probability of at least 1 − η N we have Then, by using the union bounds over all N > 0 and setting η N = log π 2 N 2 3δ , we can obtain that with a probability of at least 1 − δ/2 . Therefore, the Lipschitz constant L µ of the posterior mean function u N (·) has In addition, since ηN is logarithmically increased with the number of training samples N, it follows that L µ ∈ O(N) with a probability of at least 1 − δ/2 . Furthermore, based on (17) and K X N X N + σ 2 I N −1 , we can bound the modulus of continuity ω ∑ N (·) as According to (37), the uniform bound holds with a probability of at least 1 − δ with Note that if the error is designed to vanish, the above equation should be guaranteed convergence to 0 as N → ∞ . This indicates that τ(N) decreases faster than O (Nlog(N)) −1 .

Appendix C Proof of Theorem 3
Proof. According to Lemma 2 and 3, and recalling that the noise w is the stationary Gaussian process, we use the following Lyapunov candidate According to Theorem 1, the model error is bounded with probability p . V(x) < 0 ≥ 1 − δ. According to Lemma 3, we can obtain the global ultimate boundedness.