Cooperative Visual-SLAM System for UAV-Based Target Tracking in GPS-Denied Environments: A Target-Centric Approach

Autonomous tracking of dynamic targets by the use of Unmanned Aerial Vehicles (UAVs) is a challenging problem that has practical applications in many scenarios. In this context, a fundamental aspect that must be addressed has to do with the position estimation of aerial robots and a target to control the flight formation. For non-cooperative targets, their position must be estimated using the on-board sensors. Moreover, for estimating the position of UAVs, global position information may not always be available (GPS-denied environments). This work presents a cooperative visual-based SLAM (Simultaneous Localization and Mapping) system that allows a team of aerial robots to autonomously follow a non-cooperative target moving freely in a GPS-denied environment. One of the contributions of this work is to propose and investigate the use of a target-centric SLAM configuration to solve the estimation problem that differs from the well-known World-centric and Robot-centric SLAM configurations. In this sense, the proposed approach is supported by theoretical results obtained from an extensive nonlinear observability analysis. Additionally, a control system is proposed for maintaining a stable UAV flight formation with respect to the target as well. In this case, the stability of control laws is proved using the Lyapunov theory. Employing an extensive set of computer simulations, the proposed system demonstrated potentially to outperform other related approaches.


Introduction
Nowadays, Unmanned Aerial Vehicles (UAVs), computer vision techniques, and flight control systems have received great attention from the research community in robotics. This interest has as a consequence the development of systems with a high degree of autonomy. UAVs are very versatile platforms. In particular, aerial robots with rotary wings, as the quadcopters, allow great flexibility of movements, which makes them very useful for several tasks and applications [1,2]. Multi-robot systems have also received great attention from the robotics research community. This attention is motivated by the inherent versatility that those systems have to perform tasks that could present difficulties to be realized by a single robot. The use of several robots can have advantages such as an increase of robustness, better performance, and efficiency [3,4].
In this context, one important research problem is the control and coordination of a team of UAVs flying in formation with respect to a non-cooperative moving target. In this context, a fundamental task that must be addressed to control the flight formation is the estimation of the positions of UAVs and the moving target. For most applications, GPS (Global Positioning System) still represents the main alternative for addressing the localization problem of UAVs. Nevertheless, the use of GPS presents some drawbacks, for instance, in scenarios where GPS signals are jammed intentionally [5] or when the precision error is substantial and they provide poor operability due to multipath propagation (natural and urban canyons [6,7]). In addition, there exist scenarios (e.g., indoor) where GPS is completely unavailable. Moreover, for non-cooperative targets, their position has to be estimated using on-board sensors.
In such scenarios, when considering a team of aerial robots flying in formation with respect to a particular target, the absolute pose (e.g., world-centric configuration) is not so important compared to the relative position information between the aerial vehicles and the target (e.g., Robot-centric configuration). In this case, sensors can help to estimate the required position information such as range sensors (laser or sonar) and radio-frequency (RF) tag-like-sensors (see [8][9][10][11]). However, these kinds of sensors are expensive and sometimes heavy, and their use in outdoor environments is limited somehow. Moreover, some of these sensors need the cooperation of the moving target. Active laser systems (e.g., LiDAR [12]) represent a very interesting sensing technology; they can operate under any visibility condition (i.e., both day and night, unlike cameras) and can directly provide 3D measurements about the surrounding environment. On the other hand, LiDAR is generally expensive overweighting the system for certain applications presenting moving parts which can induce some errors. Stereo systems [13] and depth cameras [14] also can obtain 3D information about the target and the environment; however, this kind of system requires that the objects to be measured are near the sensor, thus considerably limiting their range of application.
In the above context, more works have appeared focusing on the use of cameras to develop navigation systems based on visual information that can operate in periods or circumstances where the GPS is unavailable. Cameras are well suited for their use in embedded systems, and also can be used for estimating the target relative position.

Related Work
In this work, the use of a cooperative visual-based SLAM scheme is proposed, for addressing the problem of estimating the relative position of the aerial robots with respect to one target, to control the flight formation with respect to the same target.
In the literature, different approaches can be found that use visual information to carry out the control of UAVs (Visual-Servoing): [15][16][17][18][19]. In addition, Refs. [20][21][22][23] are examples of approaches where the position of a target is estimated in a probabilistic manner by means of the fusion of monocular measurements from two or more cameras. These works are only focused on the estimation of the target position and assume that there exists an ideal control scheme that lead the robots towards the moving target. In [24][25][26][27][28], different control schemes are presented for addressing the problem of maintaining a desired flight formation with respect to a moving target. A compilation of different techniques for addressing the flight formation control problem is presented in [29]. One of the underlying assumptions in these works is that the position of vehicles is available, either globally referenced or locally referenced.
In [30][31][32], methods for the tracking, control, and estimation of a target using a UAV with a monocular camera on board are presented. In these works, it is assumed that the geometry of the target is known or its movement is restricted to a plane. In [33,34], tracking control schemes using multi-robot and single-robot configurations are presented. These works make the strong assumption that the relative distances of the vehicles with respect to the target are known. In [35], a scheme for tracking control and estimation of a target using a small UAVs with monocular cameras on-board is presented. In this work, it is assumed that there are available measurements of the altitude of the vehicles with respect to the target. Another single-robot scheme is presented in [36]. This work relies on stereo-vision; however, as it will be shown later, the use of this sensor is unreliable in the present problem that authors state. In [37], a multi-robot system based on an LiDAR is presented.
In [38], a method to control an space robot that follows an uncooperative target using a vision system is presented. The estimation of the position of the target is carried out using an adaptive unscented Kalman filter based on measurements obtained from an homography. In this case, to obtain the homography, the coplanar visual landmarks need to be extracted from the target.
In [39], a single-robot SLAM scheme, where the state of the dynamic target is included in the system state, is presented. In this case, the position of the robot, the map, and the target are estimated using a constrained local submap filter (CLSF) based on an EKF (Extended Kalman Filter) configuration. In [40], the problem of cooperative localization and target tracking with a team of moving robots is addressed. The problem is modeled as a least-squares minimization problem and solved using sparse optimization. Static known landmarks are used to define a common reference frame for the robots and the targets. However, in many applications, it is complicated or even impossible to have prior knowledge about the position of landmarks. In [41], an approach is presented for estimating the position of objects tracked by a team of mobile robots and the use of such objects for a better self-localization. In this case, relationship among objects is used to estimate objects' position given a map. However, the geometric knowledge of objects is required in order to carry out the location of the robots. In [42], a control scheme for commanding the formation with respect to a target is presented. In this case, the local sensor information is used to estimate the relative position and orientation of the robots with respect to the target (target-centric configuration).

Objectives and Contributions
Recently, some works, such as [43], present the development of observers and controllers for relative estimation and circumnavigation of a target using bearing-only measurements or range with bearing measurements. In that work, the movement of the target is restricted to a plane. In addition, one of the assumptions is that range measurements with respect to the target are available, which is very difficult to obtain for non-cooperative targets. In [44], a Gaussian sum FIR filter (GSFF) to estimate the position of a target is presented. In this work, both estimation and tracking of the target are carried out only in a two-dimensional space. In [45], a UAV-based target tracking and its recognition system are presented. In this work, a geographic information system (GIS) is used to provide geo-location, environmental, and contextual information. The work in [46] presents a distributed control strategy applied to a team of agents to create a prescribed formation with its centroid tracking a moving target. In this work, one of the assumptions is that some relative position measurements between agents and the target are available.
To try avoiding the restrictions given in the previous approaches, in this present work, the use of a cooperative visual-based SLAM scheme is studied for addressing the problem of estimating the relative position of the aerial robots with respect to a non-cooperative dynamic target in GPS-denied environments. The general idea is to use a priori unknown static natural landmarks, randomly distributed in the environment, as reference points for locating UAVs with respect to a dynamic target moving freely in 3D space. The above objective is achieved using only monocular measurements of the target and landmarks, as well as measurements of altitude differential among UAVs. The proposed scheme by authors does not need any geometric knowledge of the moving target nor the landmarks. In addition, there are no assumptions about some cooperative behavior of the target.
The configuration of the proposed cooperative visual-based SLAM system is based on the standard EKF-SLAM (Extended Kalman Filter SLAM) methodology. In this context, it is extremely important to provide the proposed system with properties such as observability and robustness to ill-constrained initial conditions. The above properties have a fundamental role in the convergence of the filter, as shown in [47,48]. Therefore, an extensive nonlinear observability test is carried out in order to analyze the system. In this case, novel, contributive and important theoretical results are presented from this analysis.
In this work, two innovations are proposed to improve the observability properties of the system, and thus its performance. Firstly, the use of a target-centric SLAM configuration is proposed instead of the use of a more common World-centric or Robot-centric SLAM configuration. In the target-centric SLAM configuration, the system state is parameterized with respect to the target position. Secondly, measurements of altitude differential between pairs of UAVs obtained from altimeters are integrated into the system.
Another key difference from most related works is presented; typically, those works focus only on addressing one side of the dual nature of the estimation-control problem. In this work, additionally to the proposed SLAM estimation technique, a flight formation control scheme based on a back-stepping approach [49] is also proposed, to allow carrying out the formation of quadcopters with respect to the target. In this case, the stability of control laws is proved using the Lyapunov theory. In simulations, the state estimated by the SLAM system is used as a feedback to the proposed control scheme to test the closed-loop performance of both estimator and control.

Paper Outline
The document is organized in the following manner: Section 2 presents the general system specifications and mathematical models. Section 3 presents the nonlinear observability analysis. In Section 4, the proposed SLAM method is described. Section 5 presents the proposed control system. Section 6 shows the numerical simulations results, and finally, in Section 7, the conclusions and final remarks of this work are given.

System Specification
In this section, the mathematical models used in this work are introduced. Firstly, the model used for representing the dynamics of the relative position of the j-th UAV-camera system with respect to the target is described. Then, the model representation of the relative position of the i-th landmark with respect to the target is described. In addition, the measurement models used in this work are presented: (i) the camera projection model and (ii) the relative altitude model. Finally, the dynamic model of a quadcopter is presented as well.
In applications like aerial vehicles, the attitude and heading (roll, pitch, and yaw) estimation is well handled by available systems (e.g., [50,51]). In particular, in this work, it is assumed that the orientation of the camera is always pointing toward the ground. Considering the above assumption, the system state can be simplified by removing the variables related to attitude and heading (which are provided by the AHRS). In addition, therefore, the problem can be focused on relative position estimation. In addition, it is important to note that, with this assumption, the mathematical formulation is kept simple enough to make the observability analysis of Section 3 a tractable problem. In practice, the looking-downward camera assumption can be easily addressed, for instance, with the use of a servo-controlled camera gimbal.
Regarding the visual sensors required for implementing the proposed system, the only assumption is that a set of visual features of the environment, and one visual feature of the target, can be detected and tracked consistently using any available computer vision algorithm.

Dynamics of the System
Let consider the following continuous-time model describing the dynamics of the proposed system (see Figure 1): where the state vector x is defined by: with i = 1, ..., n 1 and j = 1, ..., n 2 , where n 1 and n 2 are respectively the number of landmarks included into the map and the number of UAV-camera systems.
Additionally, let x t = x t y t z t Additionally, where W R c j ∈ SO3 is the rotation matrix that transforms from the world coordinate reference system W to the coordinate reference system C of the j-th camera. Recall that the rotation matrix W R c j is known and constant assuming the use of the servo-controlled camera gimbal.

Camera Measurement Model for the Target
Let consider the projection of the target over the image plane of a camera. In this case, it is assumed that some visual feature points can be extracted from the target by means of some available computer vision algorithm like [53][54][55][56][57][58]. Using the pinhole model, the following expression is defined: represent the position (in meters) of the target with respect to the coordinate reference system C of the j-th camera. Additionally,

Altitude Differential Measurement Model
In this work, the altitude differential between a pair of UAVs will be used as a filter update measurement. In this case, the altitude differential between the j-th UAV and the n-th UAV, given by a pair of on-board altimeters in each vehicle, is defined by:

Dynamic Model of the Quadcopter
This section presents the dynamic model of a quadcopter which is composed of a rigid structure and four rotors. The plant has six degrees of freedom: three for the translation and three for the rotation (see Figure 1). Using the Euler-Lagrange formalism and the parametrization with respect to the Tait-Bryan angles, the model can be defined similarly as [59]: T be the position (in meters) of the j-th quadcopter, with respect to the reference system W. Let σ σ σ j = [φ j , θ j , ψ j ] T be the Tait-Bryan angles (in radians) of the j-th quadcopter, with respect to the reference system W. Let m j be the mass (in kg) of the j-th quadcopter. Let µ j be the thrust factor (in N· s 2 ) of the j-th quadcopter. Let g be the constant of gravity (in m/s 2 ). Let u j q be the total force (in N) of the j-th quadcopter, supplied by the four rotors over the axis z with respect to the coordinate reference system Q. Let J j p be total inertia moment (in N· m · s 2 ) of the j-th quadcopter, due to the rotors. Let d j be the drag factor (in N· m · s 2 ) of the j-th quadcopter. Let I j x , I j y , and I j z be the inertia moment over each axis (in N· m · s 2 ) of the j-th quadcopter. Let τ j φ , τ j θ , and τ j ψ be the torque (in N · m) of the j-th quadcopter, supplied by the rotors over each axis. Finally, let Ω j be the overall propeller rotational speed (in rad/s) of the j-th quadcopter.
The inputs are defined by: In addition, m = {1, ..., 4}. Let l j be the distance (in m) from the center of mass of the j-th quadcopter to the center of one rotor. Let f j m be the thrust force (in N), supplied for the m rotor of the j-th quadcopter. In addition, let w j m be the angular velocity (in rad/s) of the m rotor of the j-th quadcopter.

Observability Analysis
In this section, the nonlinear observability properties of the proposed system are studied. Observability is an inherent property of a dynamic system and has an important role in the accuracy and stability of its estimation process; moreover, this fact has important consequences in the context of SLAM and the convergence of the EKF.
A system is defined as observable if the initial state x 0 , at any initial time t 0 , can be determined given the state transitionẋ = f(x), the observation model y = h(x) of the system, and observations z[t 0 , t] from time t 0 to a finite time t. In Hermann and Krener [60], it is demonstrated that a nonlinear system is locally weakly observable if the observability rank condition rank(

Observability Matrix
The observability matrix O O O can be computed from: where L s f h is the s-th-order Lie Derivative [61], of the scalar field h with respect to the vector field f. In this work, for computing Equation (13), zero-order and first-order Lie derivatives will be used for each kind of measurement.
In case of the measurements given by a monocular camera, according to Equations (3) and (1), the following zero-order Lie derivative is defined for the projections of landmarks: where For the same kind of measurement, the following first-order Lie derivative can be defined: where and In case of the measurement given by a monocular camera, according to Equatons (5) and (1), the following zero-order Lie derivative is defined for the projections of the target: where For the same kind of measurement, the following first-order Lie derivative can be defined: where t H dc and In case of the measurement of the altitude differential, according to Equations (7) and (1), for the zero-order Lie derivative, if j < n (the index of the j-th camera is lesser than the index of the n-th camera) On the other hand, if j > n (the index of the j-th camera is higher than the index of the n-th camera), then and for both cases For the first-order Lie derivative, if j < n: In addition, if j > n: Using the Lie derivatives described above, the observability matrix for the proposed system Equation (1) can be defined as follows:

Theoretical Results
Three different cases of system configurations were analyzed. The idea is to study how the observability of the system is affected due to the availability (or unavailability) of the different types of measurements. The cases to be considered are: • Case 1: The following measurements are available: (i) monocular measurements of the projection of landmarks over each UAV-camera system, (ii) multiple (two or more) monocular measurements of the projection of the target over each UAV-camera system. • Case 2: The following measurements are available: (i) monocular measurements of projection of landmarks over each UAV-camera system, (ii) a single monocular measurement of the projection of the target over a UAV-camera system, and (iii) the altitude differential measurements between UAV-camera systems. • Case 3: The following measurements are available: (i) monocular measurements of the projection of landmarks over each UAV-camera system, (ii) multiple (two or more) monocular measurements of the projection of the target over each UAV-camera system, and (iii) the altitude differential measurements between UAV-camera systems.

Case 1
For the first case, considering only the respective derivatives on the observability matrix, Equation (30), the maximum rank of the observability where n 1 is the number of landmarks being measured, n 2 is the number of robots, and 3 is the number of states of the linear velocity of the target. In this case, n 1 is multiplied by 3, since this is the number of states per landmark, and n 2 is multiplied by 6, since this is the number of states per robot. Therefore, . The unobservable modes are spanned by the right nullspace basis of the observability matrix O O O: It is straightforward to verify that the right nullspace basis of O O O spans for N 1 , (i.e., O O ON 1 = 0). From Equation (31), it can be seen that the unobservable modes cross through all states; therefore, all states are unobservable. It should be noted that, adding Lie derivatives of higher-order to the observability matrix, the previous result does not improve.

Case 2
For the second case, the maximum rank of the observability ). In this case, the unobservable modes are spanned by the following right nullspace basis of the observability matrix O O O: The states t x c j , t v c j involved in the right nullspace basis (N 2 ) belong to the j-th robot that observes the target. It is straightforward to verify that the right nullspace basis (32) it can be seen that the unobservable modes cross through all states, therefore, all states are unobservable. It should be noted that adding Lie derivatives of higher-order to the observability matrix the previous result does not improve.

Case 3
For the third case, the maximum rank of the observability . Given that dim(x x x) = 3n 1 + 6n 2 + 3, then, the system under the third case is locally weakly observable because rank(O O O) = dim(x).

Remarks to the Theoretical Results
The results of the observability analysis are summarized in Table 1.

•
Having altitude differential measurements between two UAV-camera systems is a necessary condition to obtain the previous results (see Figure 2). In this case, results do not improve when adding more altitude differentials.

•
It is necessary to have at least two monocular measurements of the target in order to obtain the previous results (see Figure 2).

•
To obtain the previous results, it is necessary to link the members of the multi-UAV system through at least two measurements in common (see Figure 2). That means that: (i) a robot needs to share the observation of at least two landmarks with any other robot; or (ii) a robot needs to share the observation of at least one landmark and the target with any other robot; or (iii) a robot needs to share the observation of at least one landmark with any other robot having the measurement of its altitude differential with that same robot; or (iv) a robot needs to share the observation of the target with any other robot having the measurement of its altitude differential with that same robot.
Two Landmarks in common Target Altitude differential Target in common + Altitude differential Target One Landmark in common + Target in common Altitude differential One Landmark in common + Altitude differential Figure 2. Minimum requirements to obtain the results of the observability analysis for the proposed system in Case 3.

EKF-Based SLAM
In this work, the system state in Equation (2) is estimated using the EKF-based SLAM methodology [62,63]. Figure 3 shows the architecture of the proposed system. In this case, it is assumed a ground-based centralized architecture, where all the most computing-demanding processes, like the visual processing and the main estimation process, are carried out on a ground-based computer. Algorithm 1 outlines the proposed EKF-SLAM based method.

Algorithm 1: EKF-SLAM target-Centric Multi-UAV
Data:x 0 , P 0 , Q, R, i R j , th 1 , th 2 (where th 1 is an arbitrary threshold for the least number of features allowed within the map. In addition, th 2 is an arbitrary threshold for the greatest number of sample steps allowed without measurements for a map feature)

System Prediction
At every step k, the estimated system statex takes a step forward by the following discrete model: where t a c j and a t represent unknown linear accelerations that are assumed to have Gaussian distribution with zero mean. Moreover, let n ∼ N (0, Q) be the noise vector that affects the state, while ∆t is the differential of time and k the sample step. In this work, a Gaussian random process is used for propagating the velocity of the vehicle. The proposed scheme is independent of the kind of aircraft and therefore is not restricted by the use of an specific dynamic model. An Extended Kalman Filter (EKF) propagates the system statex over time as follows: The state covariance matrix P takes a step forward using [47]:

Measurement Updates
Assuming that, for the current sample step, a set of measurements is available, then the filter is updated with the Kalman update equations as follows [47]: with and where z is the vector of the current measurements, h is the vector of the current prediction measurements, and r ∼ N (0, R) is the noise vector that affects the measurements.

Visual Updates (Landmarks)
When in the current sample step a set of visual measurements of the landmarks is available, the system is updated with this kind of measurements. In this case:

Visual Updates (Target)
When in the current sample step a set of visual measurements of the target is available, the system is updated with this kind of measurements. In this case:

Altitude Differential Updates
When in the current sample step a set of altitude differential measurements between pairs of UAVs is available, the system is updated with this kind of measurements. In this case:

Map Features Initialization
The system state x is augmented with new map features using a cooperative strategy. The 3D relative position with respect to the target of the new map features is estimated using a pseudo-stereo system formed by the monocular cameras mounted on a pair of UAVs that observe common landmarks. In this case, when a new potential landmark is observed by two cameras, then it is initialized employing a linear triangulation.
The state of the new feature is computed using the a posteriori values obtained in the correction stage of the EKF. According to Equation (3), the following expression can be defined in homogeneous coordinates [52]: where i γ j c is a scale factor. Additionally, it is defined: Using Equation (45), and considering the projection onto two any UAV-cameras, a linear system can be formed in order to estimate t x a i : where D i † is the Moore Penrose right pseudo-inverse matrix of D i , and with When a new landmark is initialized, the system state x is augmented by: In addition, the new covariance matrix P new is computed by: where ∆J is the Jacobian for the initialization function, and i R j is the measurement noise covariance matrix for ( i u j c , i v j c ).

Map Management
It is well known that due to the nature of the Kalman Filter, in SLAM, the system state can always reach a size that will make it impossible to maintain a real-time performance for a given hardware configuration. In this sense, the present work is mainly intended to address the problem of navigation with respect to a target (local navigation). Therefore, features that are left behind by the movement of the cameras will be removed from the system state and covariance matrix. This strategy will prevent that the system state size reaches an unmanageable amount that seriously affects the computational performance.

Control System
This section presents the control scheme that allows for carrying out the flight formation of the UAVs with respect to the target. While the proposed control scheme is presented assuming that UAVs are quadcopters, the methodology can be easily extended for being used with any other aerial configurations.

Dynamic Model Flight Formation
The dynamic model used for representing the flight formation is based on the leader-follower scheme [64]. In this case, it is desired to maintain the j-th UAV to a distance t x j q , t y j q (in the x − y plane) from the target (See Figure 4). In addition, it is also desired to maintain the j-th UAV at an altitude differential t z j q from the target (See Figure 4). Given the above considerations, the following expression can be defined:  Differentiating twice Equation (52) with respect to time and using Equation (8), the dynamics of the formation t x q j can be obtained:

Control Scheme
The control scheme is designed to allow that the flight formation converges to the desired values. The control system is divided into two subsystems (see Figure 5): (i) the flight formation control, which involves the translational dynamics of the quadcopter and the target; (ii) the rotational control. Because the reference signal of the rotational subsystem is obtained from the flight formation control subsystem, it is assumed that the dynamics of the former is faster than the dynamics of the latter.
A master-slave system is defined by: The overall control scheme is based on the Backstepping control technique [49], and each control-loop subsystem is designed by the so-called technique exact linearization by state feedback [49]. The state values required by the flight formation control subsystem are obtained from the estimator described in Section 4. The attitude of the j-th UAV is obtained assuming the availability of an on-board AHRS. To obtain the control laws using analysis in continuous time, it is assumed that the estimated values are passed through a zero-order hold (ZOH) (see Figure 5). Figure 5. Control scheme diagram.

Flight Formation Subsystem Control
Since the estimated state of the relative position of the j-th UAV with respect to the target is defined with respect to the reference system C (see Sections 2 and 4), it is necessary to apply a transformation to the estimated state tx c j for obtaining tx q j , which, in turn, is necessary to obtain the control laws.
Therefore, the following equation is defined: Let q d c j be the translation vector (in meters) from the coordinate reference system Q to the coordinate reference system C. Note that q d c j is assumed to be known and constant.
Firstly, a control law for the altitude differential t z j q of the flight formation subsystem is developed. Considering the dynamics of t z j q given in Equation (53), with control input u j q , and defining: where the error signal is e j z = tẑ j q − t z j q d , and t z j q d is the desired value. Deriving Equation (56) and substituting in Equation (53), it is obtained: The following control law is proposed: where k j z , λ j z , and κ j z are positive gains assuming thatθ j ,φ j ∈ [− π 2 , π 2 ]. Now, consider the longitudinal and lateral differential t x j q and t y j q of the flight formation subsystem. By substituting Equation (58) in Equation (53), then, for the two first states, it can be obtained: Taking φ j and θ j as control inputs in Equation (59), and defining: The following control laws are proposed: and where k  (58) and (62), it is necessary to know the accelerations of the target, which are estimated by deriving the estimated velocities of the target obtained from the EKF by using the following differentiator with the super-twisting algorithm [65] (see Figure 5): whereẋ t is the velocity of the target estimated by the differentiator,v t is the velocity of the target estimated by the EKF, K 1 and K 2 are positive definite diagonal matrices of adequate dimensions.
To prove the stability of the flight formation subsystem, the following theorem is used: (57) and (61) (56) and (60) is asymptotically stable.

Theorem 1. Consider the dynamic in Equations
Proof. Consider the following Lyapunov candidate function: with s t j = s j x s j y s j z T . By deriving: Substituting Equations (58) and (62) in Equation (66), it is obtained: Therefore, the origin s t j = 0 of Equations (56) and (60) is asymptotically stable.

Rotational Subsystem Control
For the rotational subsystem, with dynamics characterized by Equation (9), and control input τ j , it can be defined: where the error signal is e r j = σ j − σ j d and σ j d is the desired value. Deriving Equation (69) and substituting in Equation (9), it is obtained: The following control law is proposed: where λ r j , κ r j , and K r are positive definite diagonal matrices of adequate dimensions. It is straightforward to demonstrate that B j −1 exists.
To prove the stability of the rotational subsystem, the following theorem is used: Proof. Consider the following Lyapunov candidate function: By deriving:V Substituting Equation (71) in Equation (73), it is obtained: Therefore, the origin s r j = 0 of Equation (69) is asymptotically stable.

Closed Loop Stability
In order to prove the closed-loop stability of the whole system in Equation (54), the following theorem is used: Proof. Consider the following Lyapunov candidate function: from Equations (67) and (74), it can be seen thaṫ Therefore, the origin s t j = s r j = 0 of the closed loop system is Globally Asymptotically Stable (GAS).

Simulation Setup
In this section, the proposed cooperative visual-SLAM system for UAV-based target tracking is validated through computer simulations. A Matlab R implementation of the proposed scheme was used for this purpose. To this aim, a simulation environment has been developed. The environment is composed of 3D landmarks, randomly distributed over the ground (See Figure 6). To execute the tests, two quadcopters (Quad 1 and Quad 2), equipped with the set of sensors required by the proposed method are simulated. In simulations, it is assumed that there exist enough landmarks in the environment that allow for being observed in common by the cameras of the UAVs, at least a subset of them.
The measurements from the sensors are emulated to be taken with a frequency of 10 Hz. To emulate the system uncertainty, the following Gaussian noise is added to measurements: Gaussian noise with σ c = 4 pixels is added to the measurements given by the cameras. Gaussian noise with σ a = 25 cm is added to the measurements of altitude differential between two quadcopters obtained by the altimeters. It is important to note that the noise considered for emulating monocular measurements is bigger than the typical magnitude of the noise of real monocular measurement. In this way, the errors in camera orientation are considered caused by the gimbal, in addition to the imperfection of the sensor.
In simulations, the target was moved along a predefined trajectory (see Figure 6). The parameter values used for the quadcopters and the cameras are shown in Table 2. The quadcopters' parameters are like those presented in [66]. The camera parameters are like those presented in [17]. 239.50 Figure 6 shows the trajectories of the three elements composing the flight formation. To highlight them, the position of the elements of the formation is indicated at different instants of time.

Convergence and Stability Tests
In a series of tests, the convergence and stability of the proposed cooperative visual-SLAM system, in open-loop, were evaluated under the three different observability conditions described in Section 3.2. In this case, it is assumed that there is a control system able to maintain the aerial robots flying in formation with respect to the target.
Two different kinds of tests were performed:  In this case, the estimated relative position of the Quad 1 with respect to the target is plotted for each reference axis (row plots). Note that, for the sake of clarity, only the estimated values for Quad 1 are presented. The results for Quad 2 are quite similar to those presented for Quad 1. In these figures, each column of plots shows the results obtained from an observability case.
In both figures, it can be observed that both the observability property and the initial conditions play a preponderant role in the convergence and stability of the EKF-SLAM. For several applications, at least the initial position of the UAVs is assumed to be known. However, in SLAM, the position of the map features must be estimated online. The above outcome confirms the importance of using good feature initialization techniques in visual-SLAM. Of course, as it can be expected, the performance of the EKF-SLAM is considerably better when the system is observable. Note that, for the observability case 3, and despite a noticeable error drift, the filter still exhibits certain stability for the worst case of features initialization, when the initial conditions of the UAVs are known.

Comparative Study
Using the same simulation setup, for this series of tests, the performance of the proposed system, for estimating the relative position of the UAVs with respect to the moving target, was evaluated through a comparative study. Table 3 summarizes the characteristics of five UAV-based target-tracking methods used in this comparative study. Table 3. Characteristics of the methods used in the comparative study.

Method System Configuration Parametrization Visual Measurements Other Measurements
Proposed Multi-UAV-SLAM Target-centric Monocular Altitude differential (1) Single-UAV-SLAM Target-centric Monocular - (2) Multi-UAV-SLAM Robot-centric Monocular Altitude differential (3) Single-UAV Robot-centric Stereo-vision - (4) Multi-UAV Robot-centric Pseudo-stereo GPS (5) Single-UAV Robot-centric Monocular Range There are some remarks about the methods used in the comparison. Method (1) represents a purely standard monocular-SLAM approach. Features initialization is based on [67]. Because the metric scale cannot be retrieved using only monocular vision, it is assumed that the positions of landmarks seen in the first frame are perfectly known. Method (2) is similar to the proposed previous method, but the system is robot-centric parametrized, instead of target-centric. Method (3) represents an standard visual-stereo tracking approach. In this case, the moving target position is directly obtained by a stereo system, with a baseline of 15 cm. The method (4) is based on the approach presented in [68]. In this case, UAVs are equipped with GPS, and the moving target position is estimated through the pseudo-stereo vision system composed of the monocular cameras of each UAV. Gaussian noise with σ g = 20 cm is added to GPS measurements. Method (5) estimates the target position using monocular and range measurements to the target. In this case, it is assumed some cooperative-target scheme for obtaining range measurements. Technology for obtaining such kind of range measurements is presented in [69]. Gaussian noise with σ r = 20 cm is added to the range measurements. Note that methods (3), (4), and (5) are not SLAM methods. Methods (3) and (5) provide only the relative estimation of the target position with respect to the UAV. Method (4), due to the GPS, provides also global position estimations of the target and UAVs.
In case of SLAM methods and to carry out a more realistic evaluation, the data association problem was also accounted for. To emulate the failures of the visual data association process, 5% of the total number of visual correspondences are forced to be outliers in a random manner. Table 4 shows the number of outliers introduced into the simulation due to the data association problem. Outliers for the visual data association in each camera as well as outliers for the cooperative visual data association are considered.  Figure 9 shows the results obtained from each method for estimating the relative position of Quad 1 with respect to the target. For the sake of clarity, the results are shown in two columns of plots. Each row of plots represents a reference axis.  Table 5 summarizes the Mean Squared Error (MSE) for the estimated relative position of Quad 1 with respect to the target in the three axes. Regarding the performance of SLAM methods for estimating the features map, Table 6 shows the total (sum of all) Mean Squared Errors for the estimated position of landmarks, while Table 7 shows the total Mean Squared Errors for the initial estimated position of the landmarks. According to the above results, the relative position of the UAV with respect to the moving target was recovered fairly well with the following methods: authors' proposed Target-Centric Multi-UAV SLAM approach, (2) Robot-Centric Multi-UAV SLAM, (4) Multi-UAV Pseudo-stereo, and (5) Single-UAV monocular-range.
It is important to note that method (4) relies on GPS and thus it is not suitable for GPS-denied environments. Method (5) relies on range measurements to the target which can be often difficult to obtain, or it requires some cooperative target scheme which is not even possible to accomplish for several kinds of applications. Regarding the other approaches, method (1) does not have direct three-dimensional information of the target (only monocular measurement) and thus the estimation is not good. In the case of method (3), it is well known that, with a fixed stereo system, the quality of measurements considerably degenerates as the distance with respect to the target increases.
Thus far, the above results suggest that both the proposed approach and the method (2) (both SLAM systems) exhibit good performance in challenging conditions: (i) GPS-denied environments, (ii) non-cooperative target, and (iii) long distance to the target. The only difference between these two methods is the parametrization of the system (Target-centric vs. Robot-centric). To investigate if there is an advantage in the use of one parametrization over the other, another test was carried out.
In this case, using the same simulation setup, the robustness of the methods was tested against the loss of visual contact to the moving target by one UAV during some period. Figure 10 shows the estimated relative positions of Quad 1 with respect to the target obtained with both systems. Note that, during the seconds 45 to 75 of simulation, there are no monocular measurements of the target done by Quad 1. Only the estimated values of Quad 1 are presented, but the results of Quad 2 are very similar.
Given the above results, it can be seen that, contrary to the method (2), the proposed system is robust to the loss of visual contact of the target by one of the Quads. This is a clear advantage of the proposed Target-centric parameterization.

Control Simulations' Results
A set of simulations was also carried out to test the whole proposed system in an estimation-control closed-loop manner. In this case, to maintain a stable flight formation with respect to the moving target, the estimates obtained from the proposed visual-based estimation system are used as feedback to the control scheme described in Section 5).
In this test, the values of vectors t x q    Figure 12 shows the real and estimated relative position of Quad 1 (left column) and Quad 2 (right column) with respect to the target. Table 8 summarizes the Mean Squared Error (MSE), in each axis, for the estimated relative position of Quad 1 and Quad 2 with respect to the target. Table 8. Total Mean Squared Error in the estimated relative position of Quad 1 and Quad 2 with respect to the target. According to the above results, it can be seen that the control system was able to maintain a stable flight formation along with all the trajectory with respect to the target, using the proposed visual-based estimation system as feedback.

Conclusions
This research presents a cooperative visual-based SLAM system that allows a team of aerial robots autonomously following a non-cooperative target moving freely in a GPS-denied environment. The relative position of each UAV with respect to the target is estimated by fusing monocular measurements of the target and landmarks using the standard EKF-based SLAM methodology.
The observability property of the system was investigated using an extensive nonlinear observability analysis. In this case, new theoretical results were presented and the observability conditions were derived. To improve the observability properties of the system, and thus its performance, the use of a Target-centric SLAM configuration is proposed. In addition, measurements of altitude differential between pairs of UAVs, obtained from altimeters, are integrated into the system for the same purpose. Additionally to the proposed estimation system, a control scheme was proposed allowing the control of UAVs flight formation with respect to the moving target. The stability of control laws is proved using the Lyapunov theory. An extensive set of computer simulations was performed in order to validate the proposed scheme. According to the simulation results, the proposed system shows an excellent and robust performance to estimate the relative position of each UAV with respect to target. The results also suggest that the proposed approach has better performance when it is compared with other related methods. Moreover, with the proposed control laws, the contributed SLAM system demonstrates good performance in closed-loop. On the other hand, although computer simulations are useful for evaluating the full statistical consistency of the methods, they can still neglect important practical issues that appear when the methods are applied in real scenarios. Accordingly, it is important to note that future work will be focused on developing experiments with real data in order to fully evaluate the applicability of the proposed approach. Funding: This research has been funded by Project DPI2016-78957-R, Spanish Ministry of Economy, Industry and Competitiveness.

Conflicts of Interest:
The authors declare no conflict of interest.