Robust Formation Maintenance Methods under General Topology Pursuit of Multi-Agents Systems

: In this article, methods of formation maintenance for a group of autonomous agents under ageneral topology scheme are discussed. Unlike rendezvous or geometric formation, general topology pursuit allows the group of agents to autonomously form trochoid patterns, which are useful in civilian and military applications. However, this type of topology is established by designing a marginally stable system that may be sensitive to parameter variations. To account for this drawback of stability, linear ﬁxed-gains are turned into a dynamical version in this paper. By implementing a disturbance observer controller, systems are shown to maintain their formation despite the disturbances or uncertainties. Comparison in the effectiveness of the presented method with model reference adaptive control and integral sliding mode control under the uncertainties of the gains is also conducted. The capabilities of controllers are demonstrated and supported through simulations.


Introduction
A multi-agent system (MAS) is made up of a fleet of agents that collaborate under a set of designed rules. In fields such as space-based applications, smart grids, and machine learning, multi-agent systems employ networked numerous autonomous agents to complete complicated tasks, where local interactions among the agents are used to achieve an overall goal. Agents can be ships [1], unmanned vehicles [2], cars [3], or also satellites, for whom numerous projects for Earth observations needing MAS were conducted in recent years [4,5].
Out of the various tasks that are required for MAS, formation control witnessed ample development. The formation control is defined as coordinated supervision for a group of agents to follow a predefined trajectory while maintaining a desired spatial pattern or accomplishing designed objectives such as target circling, formation-shape maintenance, reorganizing shape, agent-fault detection, or consensus [6][7][8]. Consensus, or agreement concordat, have been actively studied using the so-called cyclic pursuit (CP) protocol. Cyclic pursuit is a simple leaderless distributed control law, consisting of a group of n agents, where each agent uses its neighboring agents' information to compute its velocity or acceleration vector. These autonomous groups use local interactions to render global formations.
Amid the studies of cyclic pursuit, circular pattern generation and preservation is an up-to-date topic among researchers. In [9], by measuring a desirable relative phase angle between neighbors, the suggested controller directs a team to circumnavigate an arbitrary distribution of target points at a specified radius from the targets. Authors in [10] present a control method based on the chasing agent's bearing angle knowledge and preserve the inter-agent distance between the agents in a heterogeneous manner. The study conducted in [11] examines the circular formation control issue of MAS in order to achieve any predetermined phase distribution. The proposed control law drives all of the agents to a circle and arranges them in places dispersed on the circle based on preset relative phases. In [12], the challenge of surrounding and monitoring a moving object using a fleet of unicycle-like vehicles is addressed. By using agents' communication, the developed controller guides the vehicles into an equally spaced formation along a circle, the center of which follows the target's motion. Scientists in [13] used a team of under-actuated vehicles to guarantee that a moving target is circled uniformly with the required radius, velocity, and inter-vehicle spacing. In these geometric formations, the agents attain and maintain a preset relative distance. By nature, the closed-loop system that generates geometric formation is intrinsically stable.
Outside the limits of circular patterns and geometric formation lies the class of trochoid motif. A trochoid can be defined as the curve drawn out by a point fixed to a circle as it rolls down a straight line (the point might be on, inside, or outside the circle, see Figure 1). In addition to their aesthetic appeals, these trajectories are helpful in a variety of civil and military purposes, as every point on an annular area in a plane can be covered by the trajectories. As a result, a group of agents may conduct activities such as search, exploration, surveillance, patrolling, monitoring, and other similar civilian usages, as well as cleaning and grass mowing [14][15][16]. Multi-agents systems using a protocol to generate these trochoid behaviors are marginally stable per se. Pavone and Frazzoli showed that n agents eventually converge to a single point, a circle, or diverge following a logarithmic spiral pattern under specific conditions for the value of the gain (the word gain refers to tuning parameter) [17]. It leads the overall network to have purely imaginary axis eigenvalues. The requirement of possessing imaginary-axis eigenvalues is also the backbone of Tsiotras' paper where it was shown that a marginally stable system is a magnificent and simple way to generate artistic patterns [18]. Juang carried out the analysis and added a new rigidity gain, leading also to a multiple imaginary-axis eigenvalues configuration, which renders a class of epicycle patterns. This configuration was used in a dynamic coverage scheme, where agents stay within a certain area and ensure a minimum time between two passages over the same point [19,20]. Moses et al. kept the investigation of geometric pattern formation by manipulating the location of the eigenvalues of the overall system while keeping multiple-axis eigenvalues. Further, the law to achieve trochoidal pattern formation was generalized to render such patterns under a generalized topology pursuit (GTP), and the pattern formation was extended to three dimensions space [21,22]. The reason for this expansion was illustrated by demonstrating that, given general graph topologies, the agents may span areas of varying circular radii.
The crucial point is to ensure the gains' stability as they are the gatekeeper of the network formation. The criteria of having purely imaginary eigenvalues, and all the remaining in the left half complex plane are decisive in trochoid patterns generation. Such a marginally stable configuration relies only on the gains value's selection. The aforementioned gains are fixed and can be subject to uncertainties, which is a limitation. Thus, the challenge of maintaining the marginal stability of such networks under some bounded disturbances on the gains arose.
Control techniques such as adaptive control or dynamic gains approach were developed for MAS. In [23], authors employed time-varying feedback coupling gains in order to synchronize the development of a complex network. In addition, the work conducted in [24] deals with the synchronization of MAS by using weightage based on local information. In [25], the study presents a modified follower's control rule in which each agent follows its neighbor along a line of sight rotated by a fixed offset angle and with a dynamic control gain. Examples of dynamical control by the application of adaptive approaches for cyclic chase were antecedently covered. The researchers in [26,27] use model reference adaptive control on a MAS under generalized cyclic pursuit (GCP) scheme to keep the formation notwithstanding the current uncertainty. Further, the development of an Integral sliding mode controller (ISC) for GCP was approached with the aim of sustaining the motion [28]. In situ situations were also covered in [29], where authors compared a PID controller with an ISC to maintain the formation of a MAS subjected to constant and time-varying disturbances or commands. The conclusion was made that the ISC presents better stability in case of communication failure.
In this paper, we develop an approach based on a disturbance-based observer control (DOB) to ensure uncertainties' rejection and formation maintenance; as DOB is one of the most extensively utilized robust motion control methods among many controllers. It can be employed to estimate the torque disturbance of a DC servo speed control system, and its performance has been thoroughly investigated [30][31][32][33][34]. Using a nonlinear controller, authors in [35] used a DOB as well to perform the trajectory tracking of a quadrotor in the presence of external disturbances. Previous methods to ensure the stability of the gains were developed in [26,28] where ISC and MRAC were tested. The latter methods can also be applied to the presented GTP and thus will be tested. In the present study, we enhance the work conducted in [26,28] and show that these methods can be employed in a more general case of communication graph. Through simulations, we compare the performances of DOB, ISC, and MRAC and draw conclusions about the best trade-off. Notation 1. The bold mathematical letters define matrix and/or vector. The subscript n and 3n are added to help the reader with the size of the squared matrices. The subscript f indicates the final value of the variable of interest. The superscript T means transpose. The bold letter i stands for the imaginary number such that i 2 = −1; it should not be mistaken with i, which represents the agents' identification number. The words pattern/formation are interchangeable as they are used in the same sense.

Problem Description
With the aim of helping the reader, a brief overview of graph theory is provided. For a network of n agents communicating with each other, a weighted directed graph G can be outlined mathematically by a node set V that represents the agents, an edge set E ⊆ V × V that indicates the interaction between the agents, and a weighted adjacency matrix A = a ij ∈ R n×n that can be used to store the weights of the edges. In a directed graph, edges link two vertices asymmetrically, while, in an undirected graph, edges link two vertices symmetrically. If agent j can obtain information from agent i, then it is said that an edge (i, j) exists. The weighted adjacency matrix A is defined such that a ij is positive if (j, i) ∈ E and 0 otherwise. The set of the interacting neighbor of agents i is denoted by N i = {j ∈ V : (i, j) ∈ E }. A directed path of graph G is a series of edges that connects a series of vertices. In a directed graph, a directed spanning tree exists if at least one node has a directed path to all other nodes. There are several definitions of the Laplacian matrix in the literature. It is defined as the following n × n real matrix: where, for all i, is the diagonal degree matrix, in which is the weighted in-degree of the vertex v. The Laplacian is symmetric for undirected graphs, while it is not for directed graphs [36].
In order to generate three dimensional trochoidal patterns, consider a group of n leaderless agents whose interactions are based on a weighted directed graph L that includes a directed spanning tree. Agents are modeled according to the single integrator kinematicṡ where p i ∈ R 3 denotes the position of i-th agent in Cartesian coordinates, and v i is the following control law [22] v is the proper rotation matrix by angle θ f around the axis [a x , a y , a z ] T ; a unit vector with a 2 x + a 2 y + a 2 z = 1, and where c f = cos θ f and s f = sin θ f . Equation (5) can be written as the following networkẋ = Ax (7) in which x ∈ R 3n is the stacked position vector, and A ∈ R 3n×3n is the state matrix As the behavior of the system relies on the location of the eigenvalues, the conditions on the gains to place the eigenvalues at desired locations were developed. As L has to be an asymmetric matrix associated with a weighted directed graph having a directed spanning tree, it possesses one zero eigenvalue, and the rest are non-zero eigenvalues with positive real parts [37]. Thereby, taking the negative of the Laplacian will give one zero eigenvalue, and the rest are non-zero eigenvalues with negative real parts. Name the eigenvalues of −L by µ i , i = 1, 2, . . . , n, with µ 1 = 0 and arg(µ i ) as the corresponding angle of the i-th eigenvalue, arranged such that In order to obtain the desired eigenvalues position, we name the eigenvalues µ a and µ b such that [22] arg min(µ a − µ 1 In other words, (9a) means that there are no additional eigenvalues on the line connecting µ 1 and µ a . Similarly, (9b) implies that the line connecting µ a and µ b has no eigenvalues. The three-dimensional rotation matrix R(θ f ) in (6) has the following three eigenvalues: From the properties of the Kronecker product, the eigenvalues of −L ⊗ R(θ f ) are where i = 1, ..., n; see Appendix A for details on the Kronecker product. Finally, the eigenvalues of A in (8) are for i = 1, 2, ..., n. Then, according to ( [22], Lemma 4.1, 4.2), to reach the location of specific eigenvalues and thus the desired motion, the following critical values for θ f and g f must be computed as and the two pairs of purely imaginary eigenvalues will be distinct if Denoted by λ ± a , λ ± b are the two pairs of imaginary-axis eigenvalues. A representation of the eigenvalues repartition is drawn on Figure 2, with the right side presenting the Figure 2. Evolution of the eigenvalues' repartition while building the network.
Equation (12) reveals that the eigenvalues repartition depends solely on θ f , which is the key value of the network formation and will be referred to as angle or gain, as its role can be assimilated to a gain value. In its fixed versions presented in [19,22], any perturbations of any agents' gain would lead to an unstable formation with no way to correct their value. The overall system (7) under uncertain gains is represented bẏ implying that the problem can be turned into a dynamic version of θ f to reject δθ. In the footsteps of [26,28], we develop a disturbance observer controller to deal with uncertainties and we extend the previously MRAC method to this general network topology. The benefits are the insurance of formation maintenance despite the perturbation possibly encountered in an agent's gain.

Proposed Methods
In the following, different methods are proposed, and we start by the development of the proposed disturbance observer controller, then a review on the integral sliding controller for this specific network [28], and finally by the MRAC method reformulated to match the problem specification. The two former are robust type, while the latter is adaptive type. In DOB-based robust control, Internal and external disturbances are calculated utilizing recognized dynamics and observable gains of agents, and system robustness is easily accomplished by feed-backing disturbance estimations [34]. The sliding mode control is a two-part controller design. The first half is the design and operation of a sliding surface to meet the design requirements, and the second part relates to choosing a control rule that makes the switching surface attractive. By adding an integral term, the system trajectory always starts from the sliding surface, meaning that the reaching phase is eliminated, and robustness over the whole state space is assured [38]. Model reference adaptive control (MRAC) is a direct adaptive strategy with some adjustable controller parameters and an adjusting mechanism to adjust them [39]. These controllers are structurally different and their functioning will be reminded. A first glance of the results is shown on Table 1. To address the issue of being unable to directly detect disturbances in order to correct them in an open loop, disturbance observers that evaluate disturbances based on observable state variables in the plant and a model of its dynamics were created. The feedback controller was implicitly synthesized to achieve controller robustness by utilizing disturbance estimations rather than their real values. It is feasible to alleviate the measurement of disturbances by using all available past knowledge about the plant model and current measurements of its inputs and outputs. An algorithm for assessing the status of the plant and disturbances was added to the control system for this purpose. The DOB provides a feed-forward compensation term to directly weaken the disturbances in the control systems, by choosing that structure for the observer, one have control over how accurate the estimate is. In the present system, the DOB was applied directly to act on the gains of every agents and can be stated under the following theorem: Theorem 1. If a group of n agents modeled by a single integrator are under the control law (5) in which the gains of each agent are set tö with µ a and µ b the eigenvalues following the property (9), δ θ i a bounded time-varying disturbance, and u i the controller defined as where κ p and κ v are positive constant gains appropriately chosen for the heading angle to reach θ f in desired time andδ θ i the estimated disturbance from the observer. Then at steady-state the eigenvalues of (8) will possess two pairs of imaginary axis eigenvalues and the network (7) will keep its marginal stability despite uncertainties in agents' gains.

Proof of Theorem 1. Writing (16a) in terms of state-space form yieldṡ
with and δ θ i the bounded disturbance, function of ω i and t. The disturbances include perturbation caused by parameter changes, unmodeled dynamics, external disturbances, and consider that they satisfy the matching criterion. In addition, assume that, as t increases, the derivatives of the disturbances tend to certain constants. The controller u i will be separated into two components such that where u 0 is the nominal controller and u δ i the disturbances estimate peculiar to agent i. Consider first the case where no disturbance occurs (δ θ i = 0) to define u 0 . It is intended to bring θ i towards θ f as fast as possible. Thus, we select with κ p , κ v > 0, and are constants. Equation (21) is a simple PD controller and straightforward to understand. As uncertainties occur, the following u δ i is proposed. Ideally u δ i = −δ θ i , yet as disturbances are not measurable, we propose to design the disturbance attenuation u δ i to resist the disturbances as The following disturbance observer is adopted [40]: whereδ θ i the disturbance estimation vector, q the internal variable vector of the observer, and H the observer gain matrix to be designed. The disturbance estimation error is defined as Combining (22)-(24), the closed-loop system is governed by where the feedback control gain is selected such that Λ + ΠK is Hurwitz, and the observer gain matrix is selected such that −HΠ is Hurwitz, hence it can be shown that the closed-loop system (25) is bounded-input, bounded-output (BIBO) stable. The design parameters H is freely selected, and thus, the stabilization of Λ + ΠK and −HΠ is ensured. In addition, if the disturbances tend to constants, it can be shown that closed-loop system is asymptotically stable with appropriately chosen parameters K and H. The disturbances are rejected and with time, θ i (∞) → θ f , ∀i, giving the desired gains, thus the eigenvalues repartition, henceforth the desired pattern and the maintained formation.
Law (27) is composed of a PD controller, coupled with a disturbance observer. Two reasons lead to this choice. First, PID controller could be used however we assume the disturbances to be time-varying, and PID is bad at rejecting time-varying disturbances [29].
Second, the controller used in [28] is also composed of a PD controller (considered as a nominal control) augmented by an ISC. This is purposely to make the comparison between controllers more fair and precise. As the disturbances are rejected, A(θ i (∞)) → A(θ f ), the overall network depicts the desired patterns. As a remark, the form A(θ i (t)) can be applied to any system with fixed gains. In [18], authors show that their network is also depicting epicycle and trochoid, and this formation relies on the eigenvalues of A. Moreover, if the Laplacian is taken such that L = circ. 1 −1 0 0 . . . 0 (26) where circ. stands for circular matrix [41]. The result would be the one presented in [19], as (26) is a special case of [22]. A representation of the used DOB is depicted in Figure 3. Disturbance rejection is also one of the main features of sliding mode control. This nonlinear control method alters the dynamics of a system by the application of highfrequency switching control. The state feedback control law switches from one continuous structure to another based on the current position in the state space. Hence, sliding mode control is a variable structure control method. The functioning and method of this controller applied to the network are as follows: Consider the system under (16) in the same conditions, where where m i is a positive value, s is a sliding surface in which σ may be designed as a linear combination of the system dynamic angles, and z is designed to be seen as an integrator.
The key difference with DOB lies in the design of (27a) which is expressed in terms of the sliding surface. As a result, the network (7) will possess imaginary axis eigenvalues and will keep its marginal stability despite uncertainties in the gains [28].
On the other hand, adaptive control can also be used to sustain the motion. We expand MRAC in [26] to the three-dimensional case and under the general topology pursuit. MRAC is an adaptive controller that uses the information it gathers during its closed-loop operation to change itself and improve its performance. The key difference between adaptive controllers and linear controllers is the adaptive controller's ability to adjust itself to handle unknown model uncertainties. MRAC forces the output of the actual plant to tracks the output of a reference model having the same reference input. An uncertain plant, defined byẋ = Ax + Bv (28) where A and B are uncertain matrices, needs to match a perfect model, defined bẏ where v is a reference signal, through a proper choice of the control law r. The block diagram is depicted in Figure 4. r

Reference model
Adaptive law

Controller Plant / System
Gains v The method expanded here is to transform each agents' control law to match the form of the MRAC one. Developing (5) for agent i, one can obtain The main idea is to consider the leading agents' positions as command signals and express (30) asṗ with  (31) and (32) represents the perfect behavior of each agent. As explained, uncertainties can occur in (4) and the state-space equation for agent i is thus described byṗ where Ξ i represents the model uncertainties, which are unknown but bounded. If uncertainties were known, the control law would be chosen in terms of fixed gains such that Putting (34) into (33) givesṗ As (35) has to follow (31), one can obtain the matching conditions that ensure perfect tracking. Yet, as Ξ i are unknown, the control fixed values law are replaced by their estimate Putting (37) into (31)ṗ the tracking error and its derivative can be expressed as whereK A Lyapunov analysis shows that, if the adaptive laws are chosen aṡ for a matrix P = P T > 0 satisfying the algebraic Lyapunov equation for some Q = Q T > 0, where the rates of adaptation Γ p = Γ T p > 0 and Γ v = Γ T v > 0, then the closed-loop error dynamics are uniformly stable [26,39,42]. Henceforth, by ensuring the stability of the gains, the eigenvalues of the network will reach and stay at their desired location. In contrast to robust control techniques, under MRAC, uncertain plant parameters are directly identified and compensated instead of trying to find the best compromise between performance and robustness. To compare the effectiveness, some simulations have been conducted in the next section.

Simulations
Take n = 5 agents under the following Laplacian: Negate (44) and obtain its eigenvalues and identify Putting (46) into (16) gives the following numerical values the reference patterns as well as the eigenvalues repartition are shown in Figure 5. To analyze the efficiency of the controllers, the disturbances δ θ i are selected to be where δ i , ω i are constants proper to each agent, w is white noise, and U(t i ) is a step disturbance occurring at time t i . We chose the time of simulation as t sim = 100 s. Uncertainties in θ i (0) are also taken into consideration. In the rest of the simulation, the obtained results will be traced only for agent 1 as the rest of the network is behaving in the same fashion. The uncertainties occurring on agent 1 gains are depicted in Figure 6. We also compare the results with the ISC using Equations (13), (14) and (24) from [28] under the same nominal control and disturbances as those of the DOB. To match the dimensions of the problem, we denote the gains for the nominal controller of the ISC and of the DOB as We also select Q = 5, and the following values for MRAC: In order to compare the different controllers, the pattern of agent 1 is depicted in three dimensions in Figure 7 and the evolution of x, y, and z component are depicted in Figure 8. MRAC differs from the robust controllers in that it does not need previous knowledge of the bounds of uncertain or time-varying parameters; robust control guarantees that if the changes are within given bounds the control law need not be changed, while adaptive control is concerned with control law changing itself. It can be seen that the three controllers handle the uncertainties in their gains and show very satisfactory results. Every agent obeys the trajectory law (5), which, in its developed version, is a 3 × 3 matrix for each agent. This matrix represents the gains' evolution under uncertainties and correction. We denote for agent 1 As the focus is on the gains' variation, their evolution is plotted for agent 1 in Figure 9.
Reference MRAC ISC DOB Figure 9. Gains evolution of agent 1.
As seen, MRAC presents huge changes in the gains' value, while ISC and DOB tend to a constant one. Therefore, MRAC presents very satisfactory result at a price of high-frequency change in the gains.
As they are robust controllers, DOB and ISC are compared for the rest of the experiment. As observed in Figure 8, the ISC and DOB present very similar results as their curves overlay. The main difference is found in the evolution of θ i (t) and in the control effort required, see Figure 10. It is observed that, under the same nominal controller, the ISC requires a larger control effort.
A final comparison is the evolution of the positions of the eigenvalues. As expected, both controllers bring the eigenvalues to their desired places, as seen in Figure 11; the circle marks the origin, while the cross marks the final position. It is clear that the eigenvalues reach the desired positions and maintain their designed values. As a result, the formation pattern is maintained despite uncertainties in the gains. It can be seen that the agent's angles reach the desired final value with an acceptable control effort. Both robust controllers ensure the stability of the gains. The ISC has the cons of requiring a higher control effort, compared to the DOB. For the same nominal controller u 0 , it is a trade-off for the designer to chose between ISC and DOB. We can see that both controllers present very similar attitudes that evolve in the same manner. Table 2 presents a quantitative comparison. As a precision, the control effort 1 is the highest, and the other values are expressed relatively, and the error is computed between the perfect position of agent i and its corrected position. The DOB controller presents the advantage of requiring less control effort than MRAC under the same disturbances and for similar performance (see Table 2); hence, DOB is a better choice. In conclusion, the MRAC controller presents excellent results in trajectory and formation maintenance, slightly better than agents under ISC or DOB. However, it induces a high-frequency change in the gains' values. The agents under ISC or DOB present very similar results, the difference being in the control effort required, to which the DOB is lesser.

Conclusions and Future Work
A group of n agents under general topology pursuit may exhibit a trochoid-like pattern. To do so, careful selection of their gains was previously designed, with the drawback of having a marginally stable network. In practice, it is desired to come up with a design that is robust against variations and disturbances. In the paper, previous methods using MRAC or ISC were reviewed, and the DOB scheme was developed. Comparisons between these three robust controllers for formation maintenance were made and trade-offs were analyzed. MRAC presents the best trajectory result but the highest control cost. ISC and DOB are robust controllers and present analogous results, DOB having the advantage of lesser control effort. By using these controllers, the formation is sustained in the presence of uncertainties.
However, there are some limitations as the model used is a single integrator, meaning that there are no physical constraints. Future research can include unicycle models or quadrotor models to be closer to reality. Moreover, this type of dynamic gains control can be applied to similar MAS with structurally unstable networks, e.g., to systems in [18].

Appendix A. Kronecker Product
The Kronecker product "⊗" is an operation on two matrices of arbitrary size resulting in a block matrix [43]. If W is an m × n matrix whose elements are w ij , i = 1, ..., m and j = 1, ..., n, and R is a p × q matrix, then the Kronecker product W ⊗ R is the mp × nq block matrix Suppose that W and R are square matrices of size n and p, respectively. Let µ 1 , . . . , µ n be the eigenvalues of W, and ρ 1 , . . . , ρ p be the eigenvalues of R, then the eigenvalues of W ⊗ R are λ ij = µ i ρ j , i = 1, . . . , n. j = 1, . . . , p.