In this section the control algorithm, which guides the movements of the agents, will be presented. Originally, the algorithm was designed for searching tasks [
16], and it was shown how the algorithm could be configured to adapt to different scenarios [
17]. Moreover, the algorithm was modified to carry out surveillance tasks in other works [
18,
19]. Given that the version of the algorithm used in this work does not largely differ from past implementations, only a general summary will be here discussed.
3.1. Behavior-Based Control
The behavior-based control algorithm is composed by 6 behaviors. Each time an agent reaches a cell, it has to decide which cell to visit next. Each behavior evaluates in that moment the 8 surrounding cells and assigns values to them. All this values are summed up applying weights by a final decision module, which finally selects the cell with higher value.
Pheromones
The pheromones behavior is the main one and it is in charge of leading the agents to areas with higher age. Similarly to the equivalent behavior implemented in past works, the discretization cells generate pheromones as time goes by. The concentration of pheromones is spread out in the scenario following the diffusion equation, with a diffusion coefficient to be set. As the agents observe the area, they remove a certain percentage of pheromones concentration. The main difference between the implementation in this work compared with the past ones, is that there is only one layer of pheromones. In
Figure 2, the pheromone behavior has been labeled with 1. The output of the behavior is related to the concentration of pheromones inside each search cell.
Coincident cells
It may happen that two close-enough agents decide to go to the same cell, in which case a collision might occur. To avoid this situation, a coincident cell behavior has been implemented. This behavior penalizes cells that have been previously selected by other agents as next cells. In
Figure 2, the coincident cells behavior has been labeled with 2.
Energy saving
The energy saving behavior has two main purposes. On the one hand, it tries to minimize the energy spent by trying not to change the flying direction. On the other hand, given the dynamics of the quadcopters, not changing the current direction helps in following the established tracks between the center of the cells. Therefore, this behavior penalizes movements that implies higher flying direction changes. In
Figure 2, the energy saving behavior has been labeled with 3.
Diagonal movement
The surveillance space has been divided in cells that are inscribed inside the sensor footprint. Because of this, when an agent moves in the
x of
y direction, there is an overlap of a part of the sensor footprints with the adjacent cells. However, if the complete area were swept following diagonal movements, there would be no overlap, which would be more efficient. Due to this fact, this behavior reinforces diagonal movements. In
Figure 2, the diagonal movement behavior has been labeled with 4.
Keep distance
Based on a virtual potential field, this behavior controls that the distance between the agents is appropriate. Inspired by the intermolecular attractive-repulsive forces, it creates a virtual force whose sign and magnitude depends on the distance to each agents, punctuating the surrounding cells accordingly. In
Figure 2, the keep distance behavior has been labeled with 5.
Keep velocity
Inspired by how the bird flocks behave, this behavior aims to keep the velocity of the agents similar. The idea is that if some agents are flying in a given direction (presumably because the are traveling to areas with higher ages), surrounding agents are enforced to follow that direction. In
Figure 2, the keep velocity behavior has been labeled with 6.
Final decision
Once each surrounding cell has been rated by each behavior, the final decision module applies a weight to each value, and sums them up:
where the
coefficients are the weights associated with each behavior, and the subindex
g indicates the cell,
. The module then selects the cell with higher score and it is passed to the low level control and collision avoidance module.
3.2. Low Level Control and Collision Avoidance
Once the goal cell has been selected, the low level control generates a velocity command in global axis to fly from the current position to it. Therefore, this module requires the current position of the agent in the global axis O. The velocity is then transformed into the body frame and a PID controller is then applied to finally generate the commands.
Although the coincident cell behavior (see
Figure 2, labeled with 2) may avoid some collisions between the quadcopters, in some other situations an additional mechanism must be implemented to avoid an impact. In
Figure 3, two examples of these type of situations have been represented.
To resolve such situations and looking for fast reactivity and simplicity, another approach has been implemented, which is based on two mechanisms: a virtual force and an exchange cell protocol.
Virtual force
As extensively done in the past [
20], the first mechanism to avoid collisions between agents is a virtual force. It is meant to keep a safety distance between the agents while they head to their respective goal cells, trying to interfere as less as possible and not to provoke unwanted deadlocks.
If an agent
A detects a conflict with agent
B, it first calculates its relative velocity with the other agent:
where subscript refers to the agent and the superscript to the reference frame. The reference frame attached to
B has been defined centered on it and with its axes parallel to the inertial frame, named
O. Secondly, a virtual force
is created, which has two normal components:
where
is the nominal speed of the agents,
is a correction factor,
is a characteristic distance, and
is a parameter that controls how fast the force increases as the agents get closer. Note that the resultant force
has speed dimensions. The normal and tangential vectors,
and
, are calculated by:
Note that the virtual force over agent
B due to the presence of agent
A is exactly equal but with contrary direction,
. If
N agents are involved in the conflict, the total force is simply calculated as the summation of the virtual forces:
where
is the virtual collision force caused from the presence of agent
n in agent
A. When the total force is calculated, thanks to the conversion factor
, it can be summed to the velocity command to correct it and avoid a collision:
Although this method does not guarantee that a group of agents reaches a deadlock situation, based on experience, it does not usually happen during the mission, even when the density of agents is very high. Anyway, in case of a deadlock, after some reasonable time elapsed without reaching the destination cell, the agents evaluate again the surrounding cells, selecting another one and solving then the deadlock. In
Table 2, the selected values of the parameters of the virtual force equations have been shown.
Cell exchange protocol
In some situations, although the virtual force would avoid a collision between agents, their movements would imply separating too much from the trajectory between cells. In some of those cases, an exchange of the goal cells between the agents would solve the conflict without requiring wasting time and energy avoiding the collision. Take as example the situations shown in
Figure 3a,b; if both agents exchanged their goal cells (
and
), the collision would be avoided and both agents would be able to travel straight towards their new goal cells. Therefore, a cell exchange protocol would be beneficial for these cases.
Such a protocol should not relay on a complex communication system because keeping communications simple is key to preserve scalability of the complete swarm. Therefore, the cell exchange protocol must not be based on a negotiation method (with bidirectional communications) but on a individually decision taking. The proposed cell exchange mechanism can be divided into two different procedures: capture other agent’s cell and resolve coincident cells.
The first procedure is activated when an agent detects a potential collision in an early future with another agent. It then forecasts whether the collision would be avoided if both cells were exchanged. If it is so, then it automatically captures the other agent’s goal cell and broadcasts it. To avoid recursive exchanges if more than 2 agents are involved in the collision, each agent keeps a list of agents with which the cell has been exchanged in the past (
exchanged_agents_list). In case the other agent’s identification number is in the list, the cell will not be captured, because it was already captured in the past. That list is cleared when a goal cell is reached. As it will be shown in
Section 3.3, all the other agents are informed about the new and the old goal cells through the broadcast messages (that is, the agent informs about the new goal cell it is traveling to and the old cell that has been exchanged).
Note that when an agent A decides to capture the goal cell of agent B, this last one does not automatically realize that this has happened, although it may reach the conclusion by itself that it should capture the goal cell of agent A to avoid the collision (as agent A did). Anyway the agents must be provided with a mechanism to solve the situation in which two (or more) agents fly to the same cell.
Let us suppose that A decides to capture the goal cell of B and broadcasts the information once the decision has been taken. When B receives the message, it detects that both agents are flying to the same cell, and then B triggers the second procedure (resolve coincident cells procedure). First, B checks if A is in its own exchanged_agents_list. Since it is not, then checks that A is broadcasting an alternative cell (agent A’s old goal cell) and takes it as its new goal cell. Then B broadcasts the information, being the exchange of cells effectively done.
It could happen that once A has captured B’s goal cell, it processes its own information and realizes that it is sharing the same goal cell with B (B did not have enough time to execute the resolve coincident cells protocol). Agent A would then check if B is in its exchanged_agents_list, and given that it is indeed, would do nothing, waiting for B to take A’s old goal cell.
3.4. Optimization
The complete control algorithm contains 13 parameters to be configured, whose optimal values depend on the scenario variables [
17]: surveillance area per agent,
; number of agents,
; nominal speed,
; radius of the sensor footprint,
; and area shape factor,
. Then, for each tuple
, the algorithm must be optimized to find a set of values that maximizes the evaluation variable (in this case, the efficiency defined as per Equation (
10)).
In past works [
17], a genetic algorithm was used for the optimization of the former algorithm, showing good results compared with other methods, such as Bayesian optimization. Therefore, the same method has been used for this work with the following characteristics:
Chain of genes: a vector made up by the 13 parameters of the algorithm. Each of the genes is normalized with a range of valid values.
Population: 100 members.
Initial population: randomly generated.
Fitness function: To evaluate each member, Equation (
10) is used. The duration of the surveillance has been set to 600 s. The efficiency is averaged over 5 simulations with different initial positions.
Crossover: 50 new members are generated. The parents are paired using the roulette-wheel technique, with a probability proportional to the efficiency value. The genes of the parents are created by applying a weighted sum of each gene individually. The weighting coefficient is a random number between 0 and 1.
Next generation selection: the new members are evaluated and the best 100 (from the total population of 100 parents and 50 of the offspring) are selected for the next generation.
Stopping criteria: the optimization is stopped when one of these criteria is met:
- -
Maximum number of generations (20) has been reached.
- -
Maximum number of generations (5) without an improvement higher than 10% of the best member has been reached.
- -
Maximum number of generations (5) without an improvement higher than 10% of the mean efficiency of the population has been reached.
Note that, for this work, only 6 scenarios must be optimized (4, 6, and 8 agents, at 0.10 and 0.15 m/s), which needs a reasonable amount of time (The complete process took around 3 days in a personal computer. Considering 15 generations on average for each of the 6 scenarios, with 50 members to be evaluated and 5 simulations with different initial conditions each time, each simulation needs approximately
= 11.5 s. The time compression of the simulations is 600 s/11.5 s = 52.). However, one may argue that if the algorithm is to be used for a broad scenario type range (with different footprint radius or area size, for instance), the optimization process would become unfeasible in terms of computational time. For example, considering
0.3, 0.6, 0.9 would multiply the number of optimizations by 3. In [
17] it is explained in detail how this can be successfully done in a reasonable amount of time, so that the algorithm can adapt to any tuple
within a previously known range of values. In
Appendix A, a table with the final values of each of the 13 parameters have been presented for each scenario.