Visual-Based SLAM Configurations for Cooperative Multi-UAV Systems with a Lead Agent: An Observability-Based Approach

In this work, the problem of the cooperative visual-based SLAM for the class of multi-UA systems that integrates a lead agent has been addressed. In these kinds of systems, a team of aerial robots flying in formation must follow a dynamic lead agent, which can be another aerial robot, vehicle or even a human. A fundamental problem that must be addressed for these kinds of systems has to do with the estimation of the states of the aerial robots as well as the state of the lead agent. In this work, the use of a cooperative visual-based SLAM approach is studied in order to solve the above problem. In this case, three different system configurations are proposed and investigated by means of an intensive nonlinear observability analysis. In addition, a high-level control scheme is proposed that allows to control the formation of the UAVs with respect to the lead agent. In this work, several theoretical results are obtained, together with an extensive set of computer simulations which are presented in order to numerically validate the proposal and to show that it can perform well under different circumstances (e.g., GPS-challenging environments). That is, the proposed method is able to operate robustly under many conditions providing a good position estimation of the aerial vehicles and the lead agent as well.


Introduction
In recent years, advances in technology and miniaturization of flight control systems have contributed to the growth of interest for the Unmanned Aerial Vehicles (UAVs). The UAVs are very versatile in their movements making them suitable for a large number of applications [1,2]. For many applications, it is necessary that two or more UAVs perform a flight formation with respect to a lead agent, which can be another UAV, a human, or another kind of mobile robot or vehicle (multi-UAV systems with a lead agent). In general, in these kinds of systems, the lead agent will be the only member of the formation that can move freely through its environment. To control the flight formation, it is necessary to have knowledge about the states (localization) of the aerial robots as well as the state of the lead agent.
In different scenarios, autonomous navigation capability is a primary mission requirement, which in outdoor environments it is typically fulfilled by integrating a Global Positioning System (GPS). However, several mission profiles require a multi-UAV system to operate in GPS-challenging or GPS-denied environments. In these kinds of environments (e.g., natural or urban canyons, or mixed indoor-outdoor scenarios) multipath or shadowing of the GPS satellite signal creates serious difficulties for GPS receivers to yield a reliable position. Therefore, a sort of additional sensory information (e.g., visual information) should be integrated into the system in order to improve accuracy and robustness. In a practical scenario, it is clear that during a flight mission there can exist periods of time where the whole set of (or some) UAVs moves indistinctly through a GPS-challenging or a GPS-denied environment.
In the above context, more works have appeared focusing on the use of cameras in order to develop navigation systems based on visual information that can operate when GPS is partially available or denied. Moreover, even when the position sensor is available, the visual information can be fused in order to improve the accuracy and robustness of the system. Also, cameras are well suited for their use in embedded systems. In this work, the use of a cooperative visual-based SLAM (Simultaneous Localization and Mapping) scheme is studied for addressing the problem of estimating at the same time the location of both the aerial robots and the lead agent. The general idea is that the aerial vehicles, flying in formation with respect to the lead agent, carry out the task of self-localization as well as locating the lead agent. The above scheme allows flexibility and freedom for the lead agent to carry out its own mission without the restriction of fulfilling the task of localizing itself (See Figure 1).

Related Work
The visual-based SLAM methods make use of visual features as landmarks and can be used for addressing the problem of the state estimation of robots. In this scenario, the robot operates in a priori unknown environment using only angular measurements obtained from cameras, to simultaneously building a map of its surroundings which it is used at the same time to track its position. Currently, there are two main approaches for implementing vision-based SLAM systems: (i) Filtering-based methods ( [3][4][5]) and (ii) the optimization-based methods ( [6,7] ).
In the literature, there are several works that present SLAM methods applied to UAVs, such as in [8], where a Visual SLAM method is developed in outdoors, in partially structured environments. In [9], the estimation of the UAV state obtained through a Visual SLAM method is used as feedback to the control laws that stabilize the aircraft. In [10], a Visual SLAM method is developed for GPS-challenging environments. In [11], the estimation of the UAV state obtained through a Visual SLAM method is used to carry out emergency landings. In [12], a sensor fusion method for Visual SLAM through the integration of a monocular camera and a 1D-laser range-finder is presented. In [13], an approach based on graph-SLAM and loop closure detection for online mapping of unknown outdoor environments, using a small UAV, is proposed. In [14], a Inertial-Visual SLAM is proposed. In this method, inertial measurements are integrated into the system in order to recover the metric scale of the estimations. On the other hand, the performance of this method can be affected by the dynamic error bias which is common to low-cost MEMS sensors. In [15], an EKF-based method is proposed in order to perform visual odometry with an unmanned aircraft. This method makes use of inertial sensors, a monocular downward facing camera and a range sensor (sonar altimeter). Unlike vision-based SLAM, in visual odometry approaches, there is not a mapping process. Furthermore, in those approaches, the operating altitude of the UAV is limited by the operating range of the sonar. More recently, another approach that addresses the problem of visual-based navigation in GPS-denied environments ( [16][17][18]) has been appearing.
Multi-robot systems have also received great attention from the robotics research community. This attention is motivated by the inherent versatility that those systems are performing tasks that could be difficult to realize by a single robot. The use of several robots can have advantages like cost reductions, more robustness, better performance and efficiency ( [19,20]). In the case of the SLAM problem, in [21,22] a centralized architecture is used where all vehicles send their sensor data to a unique Kalman filter. In [15,23,24] the idea of combining monocular SLAM with cooperative, multi-UAV information to improve navigation capabilities in GPS-challenging environments is presented. In [25], the idea of combining monocular SLAM with cooperative, human-robot information to improve navigation capabilities is presented. In [26], a visual-based cooperative localization method is proposed. According to the analysis presented in this work, the proposed system is completely observable. However, in this case, only distances and the relative orientations between the robots are estimated. This fact can be a clear drawback for applications where the global measurements of the system are required (e.g., absolute position).
As will be seen later, an important UAV technology that it is needed in order to implement the architectures proposed in this work is the cooperative visual-based detection and tracking of mobile targets. In this case, there are several works that deal with these kinds of problems. For instance, [27] presents a vision-based target detection and localization system, that makes use of different capabilities of aerial and ground unmanned vehicles as a cooperative team. In [28], a visual-based approach that allows a UAV to detect and track a cooperative flying vehicle autonomously using a monocular camera is presented. The algorithms are based on template matching and morphological filtering. In [29], a sophisticated vision-aided flocking system for UAVs is presented, which is able to operate in GPS-denied unknown environments, for missions of exploring and searching. In [30], it is presented a cooperative vision-based estimation and tracking system for objects of interest that are located on or near the ground. In the work in [31], an automatic cooperative tracking of targets system, using two quadrotors UAVs equipped with stereo vision systems, is presented. The system includes vision-based algorithms for searching and detecting of the target on the video stream. The research in [32] addresses the topic of vision-based target detection and tracking using a team of UAVs for maritime border surveillance. In this work, a method on how to integrate the perception into the control loop using two distinct teams of UAVs that are cooperatively tracking the same target is presented.

Objectives and Contributions
In this work, three configurations of cooperative SLAM for multi-UAV systems containing a lead agent are presented and analyzed. In all cases, each aerial robot is equipped with a monocular camera on board. The main differences between the system configurations have to do with the set of sensors used in each case (additional to the monocular cameras), as well as the circumstances they operate in a suitable manner.
The idea of presenting different system configurations, instead of a single "integrated" proposal, has to do the easiness of the mathematical analysis. However, also, this will allow a higher flexibility and modularity in the system implementation, since the features of each configuration are complementary and not exclusive. In other words, the authors' proposal should facilitate the implementation of an integrated system according to requirements (or availability) of hardware and the circumstances where the system will operate.
The configurations of the cooperative SLAM system proposed in this paper are based on the standard EKF-based SLAM methodology. In this context, it is extremely important to provide the proposed system with properties such as observability and robustness to errors in initial conditions since the properties mentioned above have a fundamental role in the convergence of the filter, as shown in [33,34]. Therefore, an extensive nonlinear observability test is carried out in order to analyze the system configurations. In this case, novel and important theoretical results and considerations are presented from the observability analysis. Also, in this work, the initialization process of new map features is carried out in a cooperative way. The 3D position of the new map features is estimated by means of a pseudo-stereo system formed by monocular cameras mounted on a pair of UAVs that observe common landmarks. This allows the landmark to be initialized with less uncertainty and lesser error. The pseudo-stereo system allows to initialize landmarks at distances farther than using stereo systems with a rigid baseline [35] or delayed monocular initialization methods. The above feature allows to the proposed cooperative system to have better performance in environments where the landmarks are far from the measurement system, contrary to SLAM approaches based on depth cameras, standard stereo systems or sonars.

Paper Outline
The document is organized in the following manner: Section 2 presents the general system specifications and mathematical models. Section 3 introduces the system configurations proposed in this work. Section 4 presents the nonlinear observability analysis. In Section 5 the proposed method is described. Section 7 shows the numerical simulations results, and finally, in Section 8 the conclusions of this work are presented.

System Specification
In this section, the different mathematical models used in this work are introduced. Those models are: the model used for representing the dynamics of a camera carried by a quadcopter, the model used for representing the dynamics of the lead agent, the representation of the landmarks as map features, the camera projection, and the GPS, altimeter, and range measurement models.

Dynamics of the System
In applications like aerial vehicles, the attitude and heading (roll, pitch, and yaw) estimation is well handled by available systems (e.g., [36,37]). In particular, in this work is assumed that the orientation of the camera always points toward the ground. In practice, the foregoing assumption can be easily addressed with the use of a servo-controlled camera gimbal. Considering the above aspects, the system state can be simplified by removing the variables related to attitude and heading which are provided by the attitude and heading reference system (AHRS). Therefore, the problem will be focused on the position estimation. Let consider the following continuous-time model describing the dynamics of the proposed system (see Figure 2): where the state vector x is defined by: with i = 1, ..., n 1 and j = 1, ..., n 2 , where n 1 and n 2 are respectively the number of landmarks included into the map and the number of UAV-camera systems. In this work, the term landmarks will be used to refer to natural features of the environment which are detected and tracked from the images acquired by the cameras.  Additionally, where W R c j ∈ SO3 is the rotation matrix, that transforms from the world coordinate reference system W to the coordinate reference system C of the j-th camera. Recall that the rotation matrix W R c j is known and constant, by the assumption of the use of the servo-controlled camera gimbal.

Camera Measurement Model for the Projection of the Lead Agent
Let consider the projection of the lead agent over the image plane of a camera. In this case, it is assumed that some visual feature points can be extracted from the lead agent by means of some available computer vision algorithm like [39][40][41][42] or [43].
Using the pinhole model (see Figure 3) the following expression can be defined: represent the position (in meters) of the lead agent with respect to the coordinate reference system C of the j-th camera. Additionally,

GPS Measurement Model
Let consider a GPS carried by the lead agent. From the GPS, measurements of the global position of the lead agent are obtained, therefore it can be defined:

Altimeter Measurement Model
Let consider an altimeter carried by the j-th quadcopter. From the altimeter, measurements of the altitude of the j-th quadcopter are obtained, therefore it is defined:

Range Measurement Model
Let consider a range sensor, from which measurements of the relative distance of the j-th quadcopter respect to the lead agent can be obtained. Therefore, it is defined: For a practical implementation, several techniques like [44] or [45] can be used in order to obtain these kinds of range measurements.

System Configurations
In this section, three different configurations of cooperative visual-based SLAM for Multi-UAV systems with a lead agent are introduced:

First Configuration
The first configuration takes into account the possibility of visual contact of at least one of the robots with the lead agent (See Figure 1). The system in its first configuration uses as inputs the following measurements: (i) the measurements obtained of the GPS carried by the lead agent, (ii) the monocular measurements obtained from the projection of the landmarks over each UAV-camera system, and (iii) the monocular measurements obtained from the projection of the lead agent over an UAV-camera system that has visual contact with it.

Second Configuration
The second configuration takes into account a scenario where there is no visual contact from any robot with the lead agent. In this case, the system is assumed to have the following sensor measurements: (i) the measurements obtained of the GPS carried by the lead agent, (ii) the monocular measurements obtained from the projection of the landmarks over each UAV-camera system, (iii) the measurements of the range of a robot with respect to the lead agent, and iv) the measurements obtained of an altimeter carried by one of the robots.

Third Configuration
The third configuration takes into account the scenarios where the GPS measurements are unavailable (GPS-denied environments). In Section 4, it will be analysed that in the absence of GPS the system will be partially observable for any possible configuration. Therefore, the objective will be to reduce the number of unobservable variables and modes of the system. In this case, in order to maximize the observability property of the system, in the absence of GPS, the third configuration is obtained by combining the first two configurations. The system in its third configuration is composed of the following measurements: (i) the monocular measurements obtained from the projection of the landmarks over each UAV-camera system, (ii) the monocular measurements obtained from the projection of the lead agent over an UAV-camera system that has visual contact with it, (iii) the measurements of the range of a robot with respect to the lead agent, and (iv) the measurements obtained by an altimeter carried by one of the robots. Table 1 shows a summary of the three configurations according to the kind of measurements assumed to be available in each case. Table 1. System configurations summary.

Measurements of Lead Agent
Altimeter on UAV

GPS on Lead Agent
Regarding the practical scenarios where each configuration can be useful the following can be summarized: The first or second configuration can be considered for any scenario where the GPS measurements (even with low quality) are available, for instance outdoor or medium-dense environments. In particular, the second configuration will be useful in periods or circumstances where the lead agent is not seen by the monocular cameras carried by the UAVs. On the other hand, the third configuration will be more useful in any scenario or circumstance where there is no GPS availability, for instance in indoor or highly dense urban areas.
It is important to note that in a practical scenario, as it has been said before, some or all the UAVs can evolve in environments with low GPS coverage or even with no coverage. In this sense, from the implementation point of view, it is very straightforward to conceive an integrated system by simply including all the sensory inputs whenever they are available. If the minimum requirements established for each configuration are accomplished, then the integrated system will cope indistinctly with GPS-challenging or GPS-denied environments, with a performance subject to the observability criteria defined for each scenario.

Observability Analysis
As was previously mentioned, observability is a property of dynamic systems, because it has an important role in the convergence of the EKF. In this section, the observability properties of each cooperative SLAM configuration defined in Section 3 are studied. In the work of Hermann and Krener [46], it is demonstrated that a non-linear system is locally weakly observable if the observability rank condition rank(

Observability Matrices
In the case of the first configuration, the observability matrix O 1 O 1 O 1 can be computed from: where L s f h is the s-th-order Lie Derivative [47], of the scalar field h with respect to the vector field f. In (10) the zero-order and first-order Lie Derivatives are used for each measurement.
In the case of the measurements given by a monocular camera, for the projections of the landmarks, according to (3) and (1), the following zero-order Lie derivative can be defined: where For the same kind of measurement, the following first-order Lie Derivative can be defined: where and In the case of the measurement given by a monocular camera, for the projections of the lead agent, according to (5) and (1), the following zero-order Lie derivative can be defined: For the same kind of measurement, the following first-order Lie Derivative can be defined: and In the case of the measurement given by a GPS carried for the lead agent, according to (7) and (1), the following zero-order Lie derivative can be defined: where I is the identity matrix. For the same kind of measurement, the following first-order Lie Derivative can be defined: For the second configuration, the observability matrix O 2 O 2 O 2 can be computed as follows: In this case, from the measurement given by a range sensor, according to (9) and (1), the following zero-order Lie derivative can be defined: For the same kind of measurement, the following first-order Lie Derivative can be defined: In the case of the measurement given by an altimeter carried by the j-th quadcopter, according to (8) and (1), the following zero-order Lie derivative can be defined: For the same kind of measurement, the following first-order Lie Derivative can be defined: The third configuration of the system takes into account circumstances where the GPS can be unavailable. In this case, the observability matrix O 3 O 3 O 3 can be computed as follows:

Theoretical Results
For the sake of presentation, a single observability matrix O t O t O t is expanded including all the Lie derivatives obtained from each system configuration. In this case, in order to differentiate the Lie derivatives that belong to each kind of system configuration, the rows of the matrix (31) are differentiated by colors: The rows that are common to all configurations are indicated in blue. The rows that are common to the first and second configuration are indicated in red. The rows that are common to the first and third configuration are indicated in orange. The rows that are common to the second and third configuration are indicated in black. For instance, according to the above, the observability matrix O 2 O 2 O 2 (second configuration) will be composed of selecting the blue and red rows from . . . . . .
According to (10) and (31), the rank of the observability where n 1 is the number of landmarks being measured, n 2 is the number of robots and 6, is the number of states of the lead agent. In this case, n 1 is multiplied by 3, since this is the number of states per landmark given by the Euclidean parametrization, n 2 is multiplied by 6, since this is the number of states per robot given by its global position and its derivatives. Given that dim(x x x) = 3n 1 + 6n 2 + 6, then, the system under its first configuration is locally weakly observable, because rank( Now, according to (23) and (31), the rank of the observability Therefore, the proposed system under the second configuration is also locally weakly observable.
For the third configuration, according to (30) and (31), the maximum rank of the observability ). In this case, the unobservable modes are spanned by the right nullspace basis of the observability matrix O 3 O 3 O 3 , therefore: It is straightforward to verify that the right nullspace basis of (32) it can be seen that the system is partially observable and that the unobservable modes cross with the states that correspond to the global position in x and y of the robots, the landmarks and the lead agent; these states are unobservable. An important conclusion is that all the vectors of the right null space basis are orthogonal with the rest of the states and therefore these states are completely observable.
It is very important to note, that in the absence of GPS measurements the system cannot be completely observable, given the absence of global position measurements within the system. This result is consistent with the one that can be expected from any SLAM system, which is world-centric but does not include global measurements. Therefore, in the case of the third configuration, the objective was to obtain the best possible observability result, that is, to reduce to a greater extent the number of unobservable states. It is worth noting that the first and second configurations obtain a less favorable result with respect to the third configuration in terms of the observability property when GPS measurements are not considered (see Table 2).
Some important remarks about the former observability analysis can be extracted: • In the case of the first and third configurations, a single robot having visual contact with the lead agent represents a sufficient condition to obtain the previous results (see Figure 4).

•
In the case of the second and third configurations, only a single range sensor, used for measuring the relative distance of the j-th quadcopter respect to the lead agent, is needed in order to obtain the previous results (see Figure 4).

•
In the case of the second and third configurations, only a single altimeter sensor, used for measuring the altitude of the j-th quadcopter, is needed in order to obtain the previous results (see Figure 4).

•
In the case of the third configuration, if two or more robots have visual contact with the lead agent, range measurements are not necessary in order to obtain the previous results.

•
For all the configurations, in order to obtain the previous results, it is necessary to link the members of the multi-UAV system through the measurements of the landmarks (see Figure 4). In other words, a robot needs to share the observation of at least two landmarks with another robot.

•
In the case of the third configuration, adding Lie Derivatives of higher order to the observability matrix does not improve the results.

EKF-Cooperative Monocular SLAM
The main goal of the proposed method is to estimate the system state using the EKF-based SLAM methodology. Figure 5 shows the architecture of the proposed system under its three configurations.

EKF-SLAM
Using (1), the following discretized system state model a can be defined: For configuration 1, the system measurements are defined according to (3), (5) and (7), as: For configuration 2, the system measurements are defined according to (3), (7), (8) and (9), as: For configuration 3, the system measurements are defined according to (3), (5), (8) and (9), as: where a h and α c α c α c j represent unknown linear accelerations that are assumed to have Gaussian distribution with zero mean. Moreover, let n k ∼ N N N (0, Q k ), r 1k ∼ N N N (0, R 1k ), r 2k ∼ N N N (0, R 2k ) and r 3k ∼ N N N (0, R 3k ) be the noise vectors that affect the state and the measurements, which are assumed to be mutually uncorrelated. Let ∆t be the differential of time and k the sample step. In this work, a Gaussian random process is used for propagating the velocity of the vehicle. The proposed scheme is independent of the kind of aircraft and therefore is not restricted by the use of a specific dynamic model. The prediction stage of the EKF is defined by: The correction stage of the EKF is defined by: with and P is the covariance matrix of the system state and K is the Kalman gain.

Initialization of Map Features
The initialization process of new map features is carried out in a cooperative way. The 3D position of the new map features is estimated by means of a pseudo-stereo system formed by the monocular cameras mounted on a pair of UAVs that observe common landmarks. In this case, the process of initialization is carried out when a new potential landmark is observed by two cameras; if this condition is fulfilled then the landmark can be initialized by means of a linear triangulation.
The state of the new feature is computed using the a posteriori values obtained in the correction stage of the EKF. According to (3), the following expression can be defined in homogeneous coordinates: where i γ j c is a scale factor. Additionally, it is defined: Using (47), and considering the projection onto two any UAV cameras, a linear system can be formed in order to estimate x a i : where D i † is the Moore Penrose right pseudo-inverse matrix of D i , and when a new landmark is initialized, the system state x is augmented by: And the new covariance matrix P new is computed by: where ∆J is the Jacobian for the initialization function, and i R j is the measurement noise covariance matrix for ( i u j c , i v j c ).

Control Flight Formation
This section presents the high-level control scheme that allows to carry out the formation of the UAVs with respect to the lead agent. The kinematic model is based on the leader-follower scheme presented in [48].
The kinematic model of the j-th UAV can be described as: Let x q j = x i y i z i T be the position (in meters) of the j-th UAV, with respect to the coordinate reference system W (See Figure 6). Let v j x and v j y be the linear velocity (in m/s) components in the x and y directions of the j-th UAV, with respect to the coordinate reference system Q (located in the center of mass of the aerial vehicle). Let v j z be the linear velocity (in m/s) component in the z direction of the j-th UAV, with respect to the coordinate reference system W. Let ψ j be the yaw angle (in radians) of the j-th UAV, with respect to the coordinate reference system W. Let ω j be the angular velocity (in radians/s) for the yaw angle of the j-th UAV, with respect to the coordinate reference system W.
It is desired to maintain the j-th UAV to a distance λ j x , λ j y from the lead agent (See Figure 6). Let λ j x , λ j y be the coordinates of the lead agent with respect to the coordinate reference system Q. It is also desired to maintain the j-th UAV at an altitude differential λ j z from the lead agent. Given all these considerations, the following can be defined: Differentiating (54) with respect to time, using (1), (53) and (54), the following can be obtained: Finally, the dynamics of the formation can be defined as follows: In the following, it is assumed that there exist uncertainty and disturbances coupled to the input of the system. Therefore, Equation (56) can be defined as: where ∆ g is a bounded and unknown uncertainty and disturbances term satisfying ∆ g ≤ , being a positive constant. The control scheme is designed in order to allow that the proposed dynamics of the formation converges to the desired values. In this case, the sliding mode control technique [49] is proposed to be used for the development of a robust controller. The state values required by the control system are obtained from the estimator described in Section 5. The yaw angle of the j-th UAV is obtained through an AHRS onboard the aerial vehicle. To obtain the control laws by means of an analysis in continuous time it is assumed that the estimated value is passed through a zero order hold (ZOH) (See Figure 7). Since the estimated state of the position of the UAV is defined with respect to the reference system C (see Sections 2 and 5), it is necessary to apply a transformation to the estimated statex j c for obtaininĝ x j q , which in turn, is necessary to obtain the control laws. Therefore, the following equation is defined: where q d c j is the translation vector (in meters) from the coordinate reference system Q to the coordinate reference system C. Please note that q d c j is assumed to be known and constant.
Firstly, the next equation is defined: where e λ =λ − λ d , and λ d are the desired values and K 1 is a positive definite diagonal matrix of adequate dimensions. Then, deriving (60) and substituting the dynamics defined in (58) as well as taking u as the control input, the following equation can be obtained: The following control law is proposed: where K 2 is a positive definite diagonal matrix of adequate dimensions.
To prove the stability of the flight formation system dynamics, the following Lyapunov candidate function is proposed: substituting (62) in (64), it is obtained: where α = λ min (K 2 ). So if α is chosen such that α > , it is assured thatV λ is negative definite. Therefore, the formation system dynamics will reach and remain in the surface s λ = 0 in a finite time.

Simulation Setup
The performance of the proposed navigation architecture has been assessed by means of numerical simulations. To this aim, a simulation environment has been developed, The environment is composed of 3D landmarks, randomly distributed over the ground (See Figure 8). To execute the tests, two Quadcopters, equipped with the set of sensors required by the proposed method, are emulated to follow (flying in formation) a lead agent moving freely in the environment. For the first set of tests, only the estimations obtained from the SLAM system will be evaluated, and thus, it will assumed that a control system exists capable of maintaining a perfect flying formation of the aerial robots with respect to the lead agent. For the second set of tests, the system state estimated by the SLAM will be used as feedback to the control scheme proposed in Section 6, in order to evaluate the closed-loop performance of the system. In computer simulations, it is assumed that the initial condition of the UAVs and the lead agent states are known with certainty. The measurements from the sensors are emulated to be taken with a frequency of 10 Hz. The intrinsic parameters used for the cameras are f In simulations, it is also assumed that there exist enough landmarks in the environment that allow a subset of them to be observed in common by the cameras of the UAVs. To emulate the system uncertainty, the following Gaussian noise is added to measurements: Gaussian noise with σ c = 3 pixels is added to the measurements given by the cameras. Gaussian noise with σ a = 50 cm is added to the measurements given by the altimeter and Gaussian noise with σ r = 50 cm is added to the measurements given by the range sensor. Gaussian noise with σ h = 0.05 radians is added to the measurements given by the AHRS system. In the case of GPS measurements, a Gaussian noise with σ g = 1.5 m is added. It is important to note that the noise considered for emulating GPS measurements is approximately three times bigger than the typical magnitude of the noise of a real GPS. In this case, since a local coordinate reference system is used, instead of a Geo-referenced one, then the most important source of error of the GPS that affects the local estimations is the random noise. The reason for emulating such extremely poor GPS performance is for showing that the proposed system is robust enough for dealing with very uncertain position data.
In practical applications, there are several factors that affect and degrade the performance of the Visual SLAM systems, the problem of data association is one of the main problems that appear when implementing the Visual SLAM algorithms. In addition, in cooperative visual systems, the data association problem is extended from the single-image case to the multiple-image case. Also, the gimbals used for stabilizing the cameras can be subject to small errors in its operation.
To take into account the above practical considerations, the following aspects are also considered in computer simulations: (i) outliers for the visual data association in each camera; (ii) outliers for the cooperative visual data association; (iii) perturbations in the orientation of the cameras. To emulate the failures of the visual data association process, 5% of the total number of visual correspondences are forced to be outliers in a random manner. In this case, each outlier is modeled by means of a measurement error e m = e 2 u + e 2 v pixels. In this case, e u and e v are the errors in the coordinate u and v (in pixels) of the projection of the landmark over the image of the camera, given by the false value in the correspondence with respect to the real value in the correct correspondence, and e m ∈ [0, 15] with a continuous uniform distribution. The errors in cameras orientation, due to the gimbal assumption, are emulated by perturbation of the actual orientation by means of the following sinusoidal function e r = 0.04 sin(t · 0.3) radians. Table 3 shows the number of failures introduced into the simulation due to the data association problem. Table 3. Number of failures introduced into the simulation.

SLAM Simulation Results
In the tests presented in this section, only the estimation problem is addressed, and thus, it is assumed that a control system exists capable of maintaining the flying formation of the aerial robots with respect to the lead agent.
In this case, the trajectories followed for the Quad 1, the Quad 2 and the lead agent are given by the following parametric function: x j c (t) = x h (t) = 100 cos(t · 0.015) 1 + sin 2 (t · 0.015) y j c (t) = y h (t) = 100 sin(t · 0.015) cos(t · 0.015) 1 + sin 2 (t · 0.015) with the following initial conditions: Figure 8 shows the trajectories followed by the lead agent, the Quad 1 and the Quad 2.

Comparative Study
A comparative study has been carried out in order to have an insight into the performance of the proposed system under its three configurations. In this case, the comparison is carried out with respect to a single-UAV SLAM system. Therefore, the study also allows to observe the advantages and drawbacks of multi-UAV systems compared with single robot systems. The computer simulation setup for the three proposed cooperative SLAM configurations, both for the case of multi-robot and single robot are described below:

1.
For the first system configuration, the features initialization process is carried out as shown in Section 5, having Quad 1 visual contact with the lead agent. In the case of the single-UAV system, only one quadcopter (Quad 1) is used for obtaining measurements instead of two. In this latter case, the features initialization process is carried out as shown in [50]. In both systems, the GPS is carried by the Quad 1.

2.
For the second system configuration, the simulation setup is similar to the one used for testing the first configuration, but in this case, the lead agent is maintained outside the field of view of the robots. In addition, range measurements are obtained from the Quad 1 with respect to the lead agent, and an altimeter is carried by the Quad 1.

3.
For the third system configuration, the simulation setup is similar to the one used for testing the second system configuration, but in this case, no GPS measurements are considered. Instead, visual measurements of the lead agent obtained from the Quad 1 are considered. Figure 9 shows the real and estimated trajectory obtained from the Multi-UAV system and the single-UAV system. For better comparison purposes note that the flying trajectory is divided into three stages: (i) during the time 0 to 70 s, the system is tested under its first configuration, (ii) during the time 70 to 140 s, the system is tested under its second configuration, and (iii) during the time 140 to 210 s, the system is tested in under the third configuration. Please note that, in this case, for the sake of clarity, for the case of the proposed Multi-UAV system, only the estimated trajectory of the Quad 1 is presented. The results of the estimated state of the Quad 2 are very closely similar to those presented for the Quad 1.

Configuration 1 Configuration 2 Configuration 3 Configuration 1 Configuration 2 Configuration 3
Estimated with Multi−UAV system Estimated with Single−UAV system Actual lead agent trajectory Actual Quad 1 trajectory Figure 9. Estimated state of the lead agent (left plots) and of the Quad 1 (right plots), for both kind of system, under the three configurations. Figure 10 shows the evolution over time of the absolute value of the error for the estimated trajectory of the lead agent (e h ) and the Quad 1 (e 1 ) with respect to the global reference system W (i.e.,  Table 4 summarizes the Mean Squared Error (MSE) for the estimated position in the three axes of the lead agent and of the Quad 1. In this case, Table 4 shows the MSE obtained during each system configuration and the MSE obtained for the whole trajectory. According to the above results, it can be seen that the Multi-UAV system presents a better performance. Table 5 provides an insight into the performance of the proposed method for estimating the location of the 3D landmarks composing the environment. In this case, the total (sum of all) of the Mean Squared Errors for the estimated position of the landmarks is presented. Also, the total of the Mean Squared Errors for the initial estimated position of the landmarks is presented. Please note that the results are presented for each coordinate of the reference frame W. Please note that Tables 4 and 5 show the errors obtained during each system configuration as well as the errors obtained for the whole trajectory. The results show that the proposed Multi-UAV method has a much better performance than the single-UAV system, regarding the error obtained in the estimation of landmarks position. To evaluate the statistical consistency of each method, the average NEES (Normalized Estimation Error Squared [51]) over n 3 Monte Carlo runs was computed, as it is proposed in [52]. The NEES is estimated as follows: In addition, the average NEES is computed from: (68) Figure 11 shows the average NEES over 50 Monte Carlo runs obtained for each system (multi-UAV and Single-UAV). The average NEES is calculated taking into account the eighteen variables that define the complete state of the UAVs and the lead agent (position and linear velocity). It is very interesting to note how the consistency of the filter considerably degenerates in the case of the single-robot system under its third configuration, this effect happens when the system is not fully observable. On the other hand, for the multi-UAV system case, the consistency of the filter remains practically stable. Given the previous results, the proposed Multi-UAV system presents a good performance in its three configurations, despite all the failures and disturbances introduced into the system. The above study provides a good insight into the robustness of the proposed system. Steps Average NEES over 50 runs Multi−UAV system Single−UAV system Configuration 1 Configuration 2 Configuration 3 Steps Figure 11. Average NEES obtained with both systems.

GPS-Denied Test
A set of simulations tests were carried out with the intention of evaluating the performance of each system configuration under GPS-denied environments. For this test, the flying trajectory is divided into two stages: (i) during time period 0 to 105 s, the vehicles move with GPS availability, (ii) during time period 105 to 210 s, the vehicles move with without GPS availability. Throughout the trajectory, the state of the system is estimated independently using each system configuration.
The following aspects are considered in this test: For configuration 1, altimeter measurements are also included, since, in practice, almost every UAV is equipped with this kind of sensor. For the configuration 3, during the first stage of the trajectory, GPS measurements are also included. These previous considerations will allow evaluating the performance of configuration 3 both in GPS-available environments and in GPS-denied environments. Figure 12 shows for each configuration the evolution of the error over the time for the estimated position of the Quad 1 and the lead agent too, (i.e., e h r = ( . Given the above results, it can be observed that the three configurations yield good results in GPS-available environments, and it is also confirmed that configuration 3 provides good results in GPS-denied environments. However, the performance of the configuration 1 and 2 clearly degenerates when there is no GPS availability. Although configuration 1 uses at least two monocular measurements of the lead agent allowing to obtain three-dimensional information from the lead agent, is interesting to note that its performance is not so good in GPS-denied environments as expected. On the other hand, the better three-dimensional information is obtained by the sensory fusion of a monocular measurement and a range measurement with respect to the lead agent (as in configuration 3). The above results are produced by the fact that the 3D reconstruction from monocular measurements is more sensitive to noise.

Control Simulations Results
A set of simulations was also carried out, where the SLAM estimates are used as feedback to the control laws involved in commanding the formation of the UAVs with respect to the lead agent (see Section 6). This will allow observing the performance of both the estimation and control in a closed-loop manner.
Based on the same simulation setup used previously, now, the trajectory followed by the UAVs is given by a kinematic model with control inputs. The initial conditions of the position of the quadcopters are the same used previously. In the case of the yaw angle of the aerial vehicles, the initial conditions are as follows: ψ 1 = 1 and ψ 2 = 1 2 . The vector λ d , that defines the desired values for the flight formation is: In this test, the trajectory is divided into three stages, each one for testing each system configuration. Figure 13 shows the trajectories of the three elements composing the flight formation. Figure 14 shows the evolution of the error with respect to the desired values λ d which define the flight formation.  Figure 15 shows, for each configuration, the evolution of the error over the time for the estimated position of the Quad 1 as well as the lead agent, when these estimates are used as feedback to the control laws.
Given the previous results, it can be observed that the estimation with the three configurations and the control have good performance in closed-loop.

Conclusions
In this work, the problem of the cooperative visual-based SLAM for the class of multi-UAV systems that integrate a lead agent has been addressed. For this purpose, three system configurations were proposed and investigated by means of an intensive nonlinear observability analysis. The main differences between the system configurations have to do with the set of sensors used in each case, additional to the monocular cameras carried by each aerial vehicle, as well as the circumstances they operate in a suitable manner. The objective of each configuration is to ensure that the system can be able to maximize the property of observability under different scenarios. In this case, several theoretical results were obtained from the analysis; for instance, sufficient conditions required for obtaining the observability results were established.
The first or second configuration can be considered for any scenario where the GPS measurements (even low quality measurements) are available. In particular, the second configuration will be useful in circumstances where the lead agent is not seen by cameras carried by the UAVs. The third configuration will be more useful in any scenario or circumstance where there is no GPS availability. The proposed structure based on different system configurations facilitates the mathematical and experimental analysis. However, from the implementation point of view, it will be more convenient to use an integrated system which includes all the sensory inputs whenever they are available.
The proposed system configurations were implemented using the standard EKF-based SLAM methodology. In this case, the initialization process of new map features is carried out in a cooperative way. In addition, a high-level control scheme was proposed allowing the control of the UAVs formation with respect to the lead agent.
An extensive set of computer simulations was performed in order to validate the proposed scheme, for instance: (i) the proposed multi-UAV system was compared against a single-UAV system, (ii) the consistency of the filter was investigated by means of the average NEES test, (iii) the performance of each system configuration were evaluated under GPS-denied environments, and (iv) the performance of both the estimation and the control were evaluated by analyzing the behavior of each system configuration in closed-loop.
In the above cases, several aspects regarding applicability in real scenarios of the proposed approach were considered, for instance, the association problem of visual data for each camera and the pseudo-stereo matching as well, and also the errors induced by the assumption of the camera gimbal were emulated in simulations.
Based on the results of the simulations, it can be observed how the proposed method improves the estimation of the state by considering the multi-robot cooperative scheme. Also, it is shown that the estimates produced by the proposed SLAM system can be used directly as feedback to the high-level control laws that command the flight formation. The simulation results also show that the proposed scheme is able to offer a good performance, even in the absence of GPS measurements.
However, although computer simulations are useful for evaluating the full statistical consistency of the methods, they can still neglect important practical issues that appear when the methods are used in real scenarios. In this sense, it is important to note that future work could be focused on developing experiments with real data in order to fully evaluate the applicability of the proposed approach.