This section presents a detailed explanation for the multi-UAV exploration strategy for the time-invariant distribution case. The extension to the time-varying case is provided in the next section. Given numbers of UAVs deployed for the wildlife monitoring, the proposed exploration strategy is to determine the trajectory for the UAV k, in the team. The OT-based multi-agent exploration strategy is developed considering the limited energy for the agents to carry out the monitoring mission with the given reference spatial distribution. This limited energy of the UAVs also limits the total flight time of the agents, which can be transformed into the total number of UAV points for each agent by the specified velocity and discrete-time interval . Here, it is assumed that all agents have identical energy levels initially and therefore, the UAV points is the same across all agents. Given that the agent k has numbers of points, each UAV point is assumed to be uniformly distributed with the weight , at any discrete-time . The weight is assigned to each UAV point, describing the time-averaged behavior of the UAVs.
Similar to the weights for each UAV point, the weights are uniformly assigned to each sample point in the given reference distribution. Given numbers of sample points, each sample point has the equal weight initially. Unlike the weight of UAV points , the weights for sample points are time-dependent and decrease over time. This is because a sample point closely located to the UAV position can be considered as visited and hence, the sample point will lose its weight (priority) as the UAVs explore the given domain, which is reflected by the time-varying weight . This weight change for the sample points depends on the weight update law, which will be explained later in detail.
Consider that there are
numbers of agents deployed for the wildlife monitoring. In the beginning (when
), all the UAV points
are accumulated at the current positions
. The UAVs move to new locations
in the next discrete-time step
based on the proposed exploration strategy (which will be explain later in this section) and then, each of them leaves one particle at their previous locations
while taking all the remaining UAV points
with them to the new location
. In this case, each of the previous UAV positions
has the weight of
and the weight for each new UAV position is
. The schematic for this concept is illustrated in
Figure 3.
The following assumption is provided to generalize this policy on the UAV point update.
3.2.1. A Three-Stage Approach
During the monitoring mission, each agent follows the three-stage approach: the next goal point determination, the weight update, and the weight information exchange and update stage. Each stage is explained in detail as follows.
Given that the agents are located at
at any discrete-time step
, the agents determine the next goal position for the next time step
as following. Each agent creates a circle with the center at the current agent location
and the initial radius
. The radius of the circle increases incrementally by a radius increment
until the agent finds
h number of sample points within the circle. Then, the agent generates all possible trajectories connecting the sample points found in the circle, starting from the current agent position
. To generate the possible trajectories, each agent creates its own tree structure representing all candidate trajectories formed by connecting the sample points in the circle starting from the current agent position
. For
h numbers of sample points within the search circle, a total of
trajectories can be generated by each agent in the tree structure. A schematic for the process to determine one possible trajectory is illustrated in
Figure 4a and the complete tree structure is presented in
Figure 4b (in this case,
).
The sequence of sample points in
lth candidate trajectory can be denoted by
, where
j indicates the sample point index and
is an index that represents a specific candidate trajectory in the tree structure. In the illustrative example provided in
Figure 4b, the sequence of sample points in the third trajectory (when
) is given by
.
Once completed, the cost corresponding to each trajectory is calculated, where the cost function is defined to determine the local-optimal trajectory for
kth agent as follows.
where
,
, denote the sample points found within the circle such that
and
is the weight information of the sample points located within the circle known to agent
k.
The cost function
in (
5) is defined in this way to ensure that each agent follows a trajectory with a shorter travel length in terms of the total Euclidean distance as well as that connects the sample points
with the high weights
in the circle first in order to drive the agent towards high priority sample points.
Given the definition of the
h-step trajectory from time
to
for agent
k,
, the candidate trajectory for the agent
,
, can be obtained from the tree structure. From the candidate trajectories, the
h-step local-optimal trajectory
is determined by
Each agent considers the first point of the h-step local-optimal trajectory as the next goal point in the next time step and then, heads toward that location with the given UAV dynamics.
After arriving at a new location
, which may differ from the next goal point location
, the agents update their own weight information
of the sample points
from the weight update law given by
where
denotes the optimal transport plan for agent
k at time
depicting the weight distribution plan from the current agent position
to the sample points
. The optimal transport plan can be obtained from the solution of the following LP problem.
The optimal solution
for the LP problem (
8) provides the information about how much weight should be distributed from
for the new agent
k position
to the sample point weight
for each sample point
. Although all the new and future UAV points
are concentrated at the new agent position
, agent
k is allowed to distribute only the assigned weight
to the sample points
. This is mainly because the future UAV points
are still undetermined and therefore, agent
k can only distribute the weight for the future UAV points in the future time steps.
The first constraint in (
8) ensures that the transport plan
from
to
has a non-negative value. The second constraint is included to guarantee that the law of mass conversation is satisfied, meaning that the total weight distributed from the new agent position
for agent
k and the weight received by the sample points
, both must be the same. The last constraint guarantees that the transport plan
should not exceed the maximum weight capacities of the sample points and the UAV point. After calculating the optimal solution
of (
8), the weight of the sample points is updated by agent
k using (
7).
Since the new UAV location
for agent
k is a single point, the analytical solution for (
8) can be obtained by the following proposition.
Proposition 1. The optimal solution for the LP problem (8) is obtained by repeatinguntil becomes zero. Proof. Given the new position of agent k at time , , the optimal transport plan for agent k is to deliver the maximum permissible weight to the closest points with positive weights in order until the weight remains positive. □
Once the weight update of the sample points is completed by all agents, this information is shared with the central agent that receives all information
from agents and transmits the common value to them in every time step. The weight update process for the common weight
is provided as follows:
This common weight information is transmitted to all agents at each time step. By sharing the common weight information, each UAV can know which areas are already covered by other UAVs. Thus, the team of UAVs can explore the given spacious domain effectively.
3.2.2. Algorithm
The formal algorithm of the OT-based multi-UAV exploration strategy is presented in Algorithm 1.
Algorithm 1 Multi-Agent Exploration Algorithm |
- 1:
initialize , , , N, , , h, , - 2:
while
do - 3:
each agent implements the following - 4:
for to do - 5:
initialize circle’s radius by - 6:
while and do - 7:
- 8:
end while - 9:
calculate the cost function associated with all possible candidate trajectories - 10:
obtain from ( 6) - 11:
update the UAV position with the given UAV dynamics with the calculated next goal position - 12:
update the individual weight by ( 7) - 13:
end for - 14:
the central agent - 15:
receives information about from all agents - 16:
updates the common weight from ( 10) - 17:
transmits to all corresponding agents - 18:
each agent receives from the central agent and - 19:
- 20:
end while
|
At the beginning of the exploration, all parameters are initialized as in the first line of Algorithm 1. At any time
, each agent creates a circle centered at the current UAV position
and increases the circle radius
r by
until there are
h number of sample points with positive weight in
, which denotes the set of sample points located within the search circle centered at
and radius
r. Next, a tree structure is generated by each agent for all possible trajectories connecting the sample points with positive weight located in the search circle, starting from the current UAV position
. Then, the cost for each trajectory is calculated from (
5) and the next goal position
is determined using (
6). Once the next goal point is determined, the agent heads towards its corresponding goal point using its motion controller and moves to a new location
. After reaching a new location, each agent distributes
amount to weight to the sample points
and updates the weight information
using (
7). Then, the central agent receives the updated individual weight information
from other agents, updates the common weight information
from (
10), and transmits the common weight information to all agents. These procedures are performed in every time step
T until the current time step
T becomes
.