Cooperative Detection of Multiple Targets by the Group of Mobile Agents

Matzliach, Barouch; Ben-Gal, Irad; Kagan, Evgeny

doi:10.3390/e22050512

Open AccessArticle

Cooperative Detection of Multiple Targets by the Group of Mobile Agents

by

Barouch Matzliach

^1,2,*,

Irad Ben-Gal

^1,2 and

Evgeny Kagan

^2,3

¹

Department Industrial Engineering, Tel-Aviv University, Tel-Aviv 6997801, Israel

²

LAMBDA Laboratory, Tel-Aviv University, Ramat Aviv 6997801, Israel

³

Department Industrial Engineering, Ariel University, Ariel 40700, Israel

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(5), 512; https://doi.org/10.3390/e22050512

Submission received: 26 February 2020 / Revised: 13 April 2020 / Accepted: 28 April 2020 / Published: 30 April 2020

(This article belongs to the Special Issue Applications of Information Theory to Industrial and Service Systems)

Download

Browse Figures

Versions Notes

Abstract

The paper considers the detection of multiple targets by a group of mobile robots that perform under uncertainty. The agents are equipped with sensors with positive and non-negligible probabilities of detecting the targets at different distances. The goal is to define the trajectories of the agents that can lead to the detection of the targets in minimal time. The suggested solution follows the classical Koopman’s approach applied to an occupancy grid, while the decision-making and control schemes are conducted based on information-theoretic criteria. Sensor fusion in each agent and over the agents is implemented using a general Bayesian scheme. The presented procedures follow the expected information gain approach utilizing the “center of view” and the “center of gravity” algorithms. These methods are compared with a simulated learning method. The activity of the procedures is analyzed using numerical simulations.

Keywords:

search and detection; multi-agent systems; probabilistic decision-making; information gain; stochastic learning; probabilistic search

1. Introduction

Methods of search and detection address various problems of finding hidden objects and chasing after targets [1]. Studies in this field were initiated in 1942 as a part of the mission to detect submarines in the Atlantic [2] and were later distributed among various applications and scenarios.

In particular, the search problem addresses the activity of the searcher up to hunting the target in its location, and often results in an optimal search policy or in an effective movement trajectory of the searcher. The detection problem, in contrast, focuses on the recognition of the target’s location without necessarily reaching its physical location and results, usually, in a cost-effective distribution of the search efforts [2]. For an overview of the field and related problems, see, e.g., [3,4,5]. In the last decades, with the development of mobile robots and multi-robot systems, methods of search and detection were extended to apply to groups of autonomous agents, so the current studies also include considerations of communication and collective decision-making under uncertainties [6,7].

In the paper, we consider the problem of detection of multiple targets by a group of mobile agents. This problem is a direct extension of the classical Koopman setting that aims at the detection of the hidden objects [2,5,8]. However, in contrast to Koopman’s formulation, we assume that the detection process is conducted by a finite small number of indivisible agents that start in certain locations, move over the domain, and explore it up to the detection of all the targets. Also, we assume that the agents are equipped with sensors that can detect the existence of the target in certain locations, yet with both false positive and false negative errors. The simple version of this problem was considered in 2012 by Israel et al. [9] in the framework of search in shadowed space. In the same year, Chernikhovsky et al. [10] considered a similar search problem with erroneous sensors and showed that the solution to the detection algorithm terminates in a finite number of steps.

The search problem after static or moving targets by finite and usually small number of agents appears in various applications, both military, and civil ones. The standard taxonomies of this problem are often based on the targets’ and the search agents’ abilities, such as their mobility level, their knowledge about the activities of the other party and on their cooperation level [4,7,11]. Many search algorithms are classified with respect to the optimization principles that govern the motion of the agents in the group. In particular, since a global optimization of the agents’ motions requires unreasonable time and computation power, search algorithms are often implemented by using different heuristics, mostly informational one or by mimicking animal foraging [6,7,12].

In the paper, we consider a detection process with several assumptions, which are usually considered separately. Following the basic Koopman formulation, we consider a probabilistic search scheme, in which the search agents have knowledge only about the targets’ location probabilities, while assuming that Koopman’s exponential random search formula which defines those detection probabilities is applicable. At the same time, we assume that the search-agents’ group includes a finite small number of members and that the agents are indivisible; such an assumption implies that the problem cannot be considered solely as a search-efforts distribution problem, but also as a problem that requires methods of swarm navigation and control.

In practice, communication among the agents as well as information processing can be organized at different levels: from peer-to-peer networks to a scheme of a central station that receives information from all the agents in the group [6,7]. In the first case, each agent obtains information from its neighbors and makes decisions based on such local information, while in the second case, a central station defines the agents’ motion based on global information. In practice, the algorithms of swarm control use both approaches, while in practice, applications based on a central station are usually restricted by computation power and communication constraints. Also, in most of the military applications, the use of a central station is a challenging one due to security reasons. In the suggested techniques, we assume an existence of a central station that holds a global probability map, which, on the one hand, can be used for theoretical consideration of the effectiveness of the suggested methods, while, on the other hand, is required for generating strict criteria for the termination of the detection process. Nonetheless, as we demonstrate in the paper, the use of a global map by a central station for the navigation of the group of search agents is less effective than the use of local maps accompanied by peer-to-peer communication.

Finally, we continue a line of practical research works [10,13], yet in contrast to some known methods of group search and detection, we consider a more realistic situation, in which both false positive and false negative detection errors exist. Another practical assumption is the one that considers a variety of sensors that can be used by each agent.

In this study, we consider the detection of a number of static targets; however, the developed algorithms allow further modification for the detection of mobile targets that is out of the scope of this paper. The objective of the presented research is to introduce methods of control of the mobile agents acting within a group such that detection of the targets is conducted in a minimal time period. Notice that the agents are not required to catch the targets, which is to reach physically their locations, but rather to detect the locations of the targets using their on-board sensors.

The suggested solution follows the occupancy grid approach, where the map of the targets’ candidate points is created simultaneously with the detection process and the agents’ motion [14,15]. The implemented sensor-fusion scheme follows a general Bayesian scheme [16] with varying sensitivity of the sensors.

The algorithm implements three different levels of the agent’s knowledge about the targets’ location:

A global map that represents the information that is available to the group of agents and is obtained by fusion of information which is available to each agent.
A local map that represents the information available to every single agent and is obtained by fusion of information obtained by the agent’s sensors.
A sensor map which is obtained by a single sensor.

The above maps are also called probability maps since they provide the information on the target’s location by a probability distribution, often using a colored heat map to indicate the probability of the target being located in each grid of the map.

The algorithm was trained with different decision-making objectives based on:

The expected information gain by the agent’s next step.
The location of the center of view, which indicates a future agent’s location that, given the sensors’ capabilities, is expected to yield a maximal modification of the probability map. Formally, this approach relies on the expected information gain procedure which is applied to the global map instead of focusing on the close neighborhood of the agent.
The location of the center of gravity of the map, which is the first moment of the targets’ location probabilities.

In the detection by a single agent, it was found that all three procedures provide similar results. However, since the center of view approach implements additional information about sensors’ capabilities, in certain cases, it demonstrates better performance than two other algorithms. Also notice, that if the sensors are errorless and equal, then the center of view and the center of gravity approaches result in the same detection times.

In a collective detection by the agents’ group, it was found that in all three algorithms, the use of an individual local map by each agent results in shorter detection times than the times when using the global map. Further studies of these scenarios demonstrated that, due to the similar decisions which are governed by the use of a global map, the agents move towards the same areas instead of dividing the efforts over the space to simultaneously investigate different areas. These results meet recent theoretical considerations of the altruistic and egoistic behavior of search agents in the groups [17] and form a basis for further considerations of the problem of “division of labor” in the groups of autonomous agents.

The algorithm was implemented by Python programming language, and the code can be directly used for solving the real-world tasks of detection of targets by groups of mobile agents.

2. Scenarios of Cooperative Detection

The considered detection problem follows general Koopman’s scenario [2] (see also [5,8]) with additional consideration of the agents’ motion toward the destinated locations. Formally, using the occupancy grid approach [14,15], the problem is defined as follows.

Let

C = {c_{1}, c_{2}, \dots, c_{n}}

be a finite set of cells such that

C

represents a grid over a closed two-dimensional domain, and consider a set of

1 \leq m ≪ n

mobile agents

A_{j}

,

j = 1, 2, \dots, m

, searching for hidden targets in the domain. For simplicity, we assume that each agent, as well as each target, can occupy only a single cell of the grid.

The state of a cell

c_{i}

,

i = 1, 2, \dots, n

is defined as a discrete random variable taking values

s_{i} = s (c_{i}) \in {0, 1}

, such that

s_{i} = 0

implies that the cell

c_{i}

does not contain any target, while

s_{i} = 1

implies that cell

c_{i}

contains a target. In case we need to stress the time t of the sensing, we will use the notation

s_{i}^{t} = s (c_{i}, t)

, otherwise, we omit it. Note that, at any time t and for each cell

c_{i}

, the probabilities of these events are mutually exclusive, i.e.,

P r {s_{i} = 0} + P r {s_{i} = 1} = 1,

(1)

Each agent

A_{j}

is equipped with a variety of sensors

𝕤_{j k}

,

k = 1, 2, \dots l

, that provide, not necessarily accurate, information about the states of the cells

s_{i}

,

i = 1, 2, \dots, n

, relative to the agent’s distance with respect to the Koopman’s exponential random search Formula [2],

P r {t a r g e t d e t e c t e d i n c_{i} | t a r g e t l o c a t e d i n c_{i}} = \exp [- θ (d (c_{i}, c_{j}), τ)],

(2)

where

θ (d (c_{i}, c_{j}), τ)

represents the search effort applied to the cell

c_{i}

with respect to the distance

d (c_{i}, c_{j})

between this cell

c_{i}

and the agent’s location

c_{j}

for observation period

τ

. It is assumed that as the distance

d (c_{i}, c_{j})

gets shorter and as the period

τ

gets longer, the higher the detection probability will be.

In order to formalize the possibility of both false positive and false negative detection errors, let us assume that the domain includes both true and dummy targets that broadcast signals indicating their presence in the domain cells. The signals sent by the true targets are considered as true alarms, and the signals that are sent by the dummy targets are considered as false alarms that represent the false-positive errors.

Then, with respect to Koopman’s search Formula (2), the probability of a perceived alarm is defined as follows:

P r {a l a r m p e r c i e v e d | a l a r m s e n t} = \exp [- d (c_{i}, c_{j}) / λ_{j k}],

(3)

where

λ_{j k} = λ (𝕤_{j k})

is the sensitivity of the sensor

𝕤_{j k}

installed on agent

A_{j}

. The dependence of the detection probability at observation period

τ

is considered in the updates of the probability map as defined below.

The probabilities

p_{i} = p (c_{i}) = P r {a l a r m s e n t f r o m c_{i}}

of sending alarms from cells

c_{i} \in C

,

i = 1, 2, \dots, n

, are defined by the probability map that represents the information about the targets’ locations in the domain. Moreover, we assume that the agents can share information about the targets’ locations as they have been perceived by the sensors.

The activity of the agents is outlined as follows. The agents start with some initial probability map

P (t) = {p_{1} (t), p_{2} (t), \dots, p_{n} (t)},

that defines initial probabilities

p_{i} (t)

of detecting the targets in cells

c_{i} \in C

,

i = 1, 2, \dots, n

, at time

t

.

At time

t

, the agents

A_{j}

,

j = 1, 2, \dots, m

, are located in the cells

c_{j} (t)

and obtain the sent signals (that are either true or false alarms) from the cells in which the targets can be located. The probabilities of receiving the signal that was sent from the cell

c_{i}

with probability

p_{i}

is defined by the Koopman Formula (3).

After receiving the signals, the

j

th agent

A_{j}

updates the sensor probability maps,

k = 1, 2, \dots, l

,

P^{s e n s o r} (j, k, t) = {p_{s_{1} = 1}^{s e n s o r} (j, k, t), p_{s_{2} = 1}^{s e n s o r} (j, k, t), \dots, p_{s_{n} = 1}^{s e n s o r} (j, k, t)},

where

p_{s_{i} = 1}^{s e n s o r} (j, k, t) = P r {s e n s o r 𝕤_{j k} i d e n t i f i e s a t r u e t a r g e t i n c_{i} a t t i m e t} .

The resulting sensor probability maps

P^{s e n s o r} (j, k, t)

,

k = 1, 2, \dots, l

, are combined into the probability map

P^{a g e n t} (j, t) = {p_{s_{1} = 1}^{a g e n t} (j, t), p_{s_{2} = 1}^{a g e n t} (j, t), \dots, p_{s_{n} = 1}^{a g e n t} (j, t)},

where probabilities

p_{s_{i} = 1}^{a g e n t} (j, t)

of the target’s locations in the cells

c_{i}

,

i = 1, 2, \dots, n

, from the agent’s point of view are specified by fusion of the sensors’ probability maps

P^{s e n s o r} (j, k, t)

,

k = 1, 2, \dots, l

.

Finally, a global probability map

P^{g l o b a l} (t) = {p_{s_{1} = 1}^{g l o b a l} (t), p_{s_{2} = 1}^{g l o b a l} (t), \dots, p_{s_{n} = 1}^{g l o b a l} (t)},

that defines the probabilities

p_{s_{i} = 1}^{g l o b a l} (t)

of the target’s locations in the cells

c_{i}

,

i = 1, 2, \dots, n

, as they are known by the group of agents is obtained by the fusion of the agents’ probability maps

P^{a g e n t} (j, t)

over all the agents

A_{j}

,

j = 1, 2, \dots, m

.

In the presented algorithms, the probability maps both at the sensors’ level and at the agents’ level are fused using a simple Bayesian scheme. Exact equations for calculating each probability map are presented in the next sections.

The general scenario of the targets’ detection by a group of mobile agents is outlined as follows. At each step, each agent makes a decision regarding its own next movement. The agent’s decision is based on either the local or the global probability maps (or both of them) as obtained at this step.

After taking its decision, the agent makes a movement step towards the chosen direction. At the completion of the step and arrival at the required cell, the agent observes the cells of the domain by utilizing its sensors, obtains true or false information about the target’s location, and updates the probability maps, respectively.

Then, the detection process continues following the updated probability maps and the agent’s current locations, in a step forward manner.

Our goal is to define the trajectories of the agents over the domain, such that all the targets will be detected in minimal time. Notice again that we do not require the agents to arrive physically to the exact targets’ locations, but rather to detect the locations of the targets at some level of certainty.

It is clear that the formulated problem follows the general Koopman’s scenario [2]. As defined in the framework of probabilistic search [5,16]. Since the general case, the computational complexity of finding the optimal solution is

O (n^{m})

, we are interested in a practically computable near-optimal solution. In the next section, we consider several heuristic approaches and reasonable assumptions that lead to such a solution.

3. Sensor Fusion and Updating Schemes over the Probability Maps

As indicated above, we assume that each agent

A_{j}

is equipped with several sensors

𝕤_{j k}

,

k = 1, 2, \dots l

, that independently provide, not necessarily accurate, information about the cells states

s_{i}

,

i = 1, 2, \dots, n

. In the used framework of the occupancy grid, sensor fusion is conducted as follows.

Let for example

𝕤_{j 1}

and

𝕤_{j 2}

be two independent sensors installed on agent

A_{j}

and let

{\tilde{s}}_{j 1} (c_{i}, t)

and

{\tilde{s}}_{j 2} (c_{i}, t)

be the signals obtained by these sensors at time

t

. Then, the probability that the target is located in the cell

c_{i}

, that is the state

s_{i} (t) = 1

, is defined by Bayes rule as follows (see also [15]):

P r {s_{i} (t) = 1 | {\tilde{s}}_{j 1} (c_{i}, t) = 1, {\tilde{s}}_{j 2} (c_{i}, t) = 1} = \frac{P r {{\tilde{s}}_{j 2} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times P r {s_{i} (t) = 1 | {\tilde{s}}_{j 1} (c_{i}, t) = 1}}{\sum_{s_{i} (t)} P r {{\tilde{s}}_{j 2} (c_{i}, t) = 1 │ s_{i} (t)} \times P r {s_{i} (t) | {\tilde{s}}_{j 1} (c_{i}, t) = 1}},

(4)

where the sum is taken over all possible values of

s_{i} (t)

. In the considered case, these values are

s_{i} (t) \in {0, 1}

.

An extension of this equation to

l

onboard sensors of the agent

A_{j}

results in the probabilities

p_{s_{i} = 1}^{a g e n t} (j, t) = \frac{\prod_{k = 1}^{l} p_{s_{i} = 1}^{s e n s o r} (j, k, t)}{\prod_{k = 1}^{l} p_{s_{i} = 1}^{s e n s o r} (j, k, t) + \prod_{k = 1}^{l} (1 - p_{s_{i} = 1}^{s e n s o r} (j, k, t))},

(5)

of the targets’ locations in the cells

c_{i}

,

i = 1, 2, \dots, n

, as they determined by the agent

A_{j}

using its sensors. This equation is based on the approach known as “independent opinion pool” [15] under the assumption that the sensors are conditionally independent and that their reliabilities and accuracies are equivalent.

Similarly, the location probabilities of different agents can be fused to global probabilities,

p_{s_{i} = 1}^{g l o b a l} (t) = \frac{\prod_{j = 1}^{m} p_{s_{i} = 1}^{a g e n t} (j, t)}{\prod_{j = 1}^{m} p_{s_{i} = 1}^{a g e n t} (j, t) + \prod_{j = 1}^{m} (1 - p_{s_{i} = 1}^{a g e n t} (j, t))},

(6)

of the targets’ locations in cells

c_{i}

,

i = 1, 2, \dots, n

, as determined by the group of the agents. Notice that such a definition requires a central unit that receives data from each agent and computes a global probability map using the obtained probabilities from all the agents.

The presented equations use the probabilities

p_{s_{i} = 1}^{s e n s o r} (j, k, t)

of the targets’ locations in the cells

c_{i}

,

i = 1, 2, \dots, n

, as determined by sensors

𝕤_{j k}

,

k = 1, 2, \dots l

, installed on agents

A_{j}

,

j = 1, 2, \dots, m

. These probabilities form a sensor probability maps

P^{s e n s o r} (j, k, t)

that are updated as follows.

At the initial time

t = 0

the probabilities

p_{s_{i} = 1}^{s e n s o r} (j, k, t)

are specified with respect to some initial distribution; if no information is available, these probabilities can be drawn by a uniform distribution. Then, these probabilities are updated by using the Bayesian approach as follows.

As indicated above, let

{\tilde{s}}_{j k} (c_{i}, t)

be the signal obtained about cell

c_{i}

relying on sensor

𝕤_{j k}

of agent

A_{j}

at time

t

. Recall that in the considered scenario,

{\tilde{s}}_{j k} (c_{i}, t) = 1

implies that cell

c_{i}

is occupied by a target while

{\tilde{s}}_{j k} (c_{i}, t) = 0

implies that cell

c_{i}

is empty, both based on sensor

𝕤_{j k}

.

Then, the state probabilities of cell

c_{i}

that are updated by the sensor outputs are:

if a signal is perceived that is ${\tilde{s}}_{j k} (c_{i}, t) = 1$ , considering that the static target was located at the cell at time $t - 1$ , then the true positive probability is

$p_{s_{i} = 1}^{s e n s o r} (j, k, t) = P r {s_{i} (t) = 1 | {\tilde{s}}_{j k} (c_{i}, t) = 1} = \frac{P r {s_{i} (t - 1) = 1} \times P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t) = 1}}{\sum_{s_{i} (t)} P r {s_{i} (t - 1)} \times P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t)}},$

(7)
Otherwise, while ${\tilde{s}}_{j k} (c_{i}, t) = 0$ , the false positive probability is

$p_{s_{i} = 1}^{s e n s o r} (j, k, t) = P r {s_{i} (t) = 1 | {\tilde{s}}_{j k} (c_{i}, t) = 0} = \frac{P r {s_{i} (t - 1) = 1} \times P r {{\tilde{s}}_{j k} (c_{i}, t) = 0 │ s_{i} (t) = 1}}{\sum_{s_{i} (t)} P r {s_{i} (t - 1)} \times P r {{\tilde{s}}_{j k} (c_{i}, t) = 0 │ s_{i} (t)}} .$

(8)

These equations define an updating scheme of the probabilities map for the sensor given the new observations. They include the probabilities

P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t) = 1}

that the sensor perceives the signal given that the target is in the cell

c_{i}

and the probability

P r {{\tilde{s}}_{j k} (c_{i}, t) = 0 │ s_{i} (t) = 1}

that the sensor does not perceive a signal from cell

c_{i}

given that the target is in that cell.

In order to define these probabilities, denote by

\tilde{a} (c_{i}, t)

an alarm signal that is sent about cell

c_{i}

at time

t

. The value

\tilde{a} (c_{i}, t) = 1

implies, truly or not, that the cell is occupied and the value

\tilde{a} (c_{i}, t) = 0

implies, truly or not, that the cell is empty. Then, implementing Koopman’s Formula (3), one obtains the following

P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t) = 1} = P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c_{j}) / λ_{j k}],

(9)

P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t) = 0} = 1 - P r {{\tilde{s}}_{j k} (c_{i}, t) = 1 │ s_{i} (t) = 1},

(10)

where, as above,

d (c_{i}, c_{j})

is the distance between the cell

c_{i}

and the agent’s location

c_{j}

and

λ_{j k} = λ (𝕤_{j k})

is the sensitivity of the sensor

𝕤_{j k}

installed on the agent

A_{j}

.

These equations enable to calculate the occupation probabilities at each time

t,

given the probabilities at the previous time

t - 1

and the information obtained by the sensors at time

t

. As indicated above, at the initial time

t = 0

, the probabilities are defined based on topographic data and prior information or, in the worst case, can be specified by a uniform distribution of the occupancy grid.

The above defined process of sensors’ fusion is illustrated in Figure 1.

The sensors receive signals

{\tilde{s}}_{j k} (c_{i}, t)

from the environment. Part of these signals are positive signals from the targets, indicating the real locations of the targets, while others are false alarms (i.e., false positive errors) that corresponds to false locations of the targets. Based on the received signals for each sensor, a local sensor map is created (see Equations (7) and (8)). Then, each agent integrates its sensor maps to a local agent map (see Equation (5)). Finally, a global map is created by integrating the agent maps (see Equation (6)). Such a hierarchical structure allows us to consider the maps of each level separately and, consequently, to define a more effective calculation process that uses only the maps required for current computations.

4. Agents’ Policies and Decision Making

In this section, we define the behavior of the group of agents taking actions in a gridded domain aiming to detect hidden targets. The agents detect the targets by their on-board sensors such that the sensors can identify the targets from certain nonzero distances. The goal is to define the trajectories of the agents such that they detect the targets in minimal time.

Formally, this problem is defined as follows. Denote by

τ_{j} (T) = (c_{j} (0), c_{j} (1), \dots, c_{j} (T))

the trajectory of the agent

A_{j}

starting from its initial cell

c_{j} (0)

and up to the cell

c_{j} (t)

occupied at time

t

. Located at cell

c_{j} (t)

the agent makes a decision regarding its next location, following a certain policy

π_{j} (P)

that prescribes how to choose the next cell

c_{j} (t + 1)

given a probability map

P

. For simplicity and tractability, we assume that policy

π_{j} (P)

for each agent

A_{j}

does not depend on the time and for any

t

is specified by the applied probability map

P

that holds the aggregated information on the location of the targets as a function of past movements of the agents. The result of the application of the policy

π_{j} (P)

is an action

𝕒_{j} (t)

that controls the agent movement from the current cell

c_{j} (t)

to the next cell

c_{j} (t + 1)

. More precisely, the policy is a function

π_{j} : P \to 𝕒 (A_{j})

, where

𝕒 (A_{j})

is a set of possible actions of the agent

A_{j}

, and an action is defined by a function

𝕒_{j} (t) : C \to C

that specifies the choice of the agent’s positions. Assuming that the actions provide an unambiguous choice of the agent’s cells, the required solution is to define the function

π_{j}

.

Assume that there are

ξ

targets,

ξ < n

, distributed somewhat over the domain, and recall that the probability

p_{s_{i} = 1}^{g l o b a l} (t)

defined by Equation (6) is the probability of detecting the targets in cells

c_{i}

,

i = 1, 2, \dots, n

, by the group of the agents. Both the number of the targets

ξ

and the global probability

p_{s_{i} = 1}^{g l o b a l} (t)

are unknown to the agents; in real situations, the former value either cannot be obtained, or its knowledge requires additional efforts, while the specification of the latter value requires a central unit that obtains data from all the agents, a requirement which can be practically challenging in many applications. However, we use these values as parameters for simulations and sensitivity analysis to demonstrate that better results can be provided by a separate usage of the agents’ probability maps, such that the use of a central unit is often unnecessary.

Denote by

T_{θ} (p | π_{1} (P), π_{2} (P), \dots, π_{m} (P))

the time required to detect the target

θ

,

θ = 1, 2, \dots, ξ

, with probability

p

given the agents’ policies

π_{1} (P), π_{2} (P), \dots, π_{m} (P)

. Then, the goal is to define such policies that result in minimal time for detecting the last target, that is

(π_{1}^{*} (P), π_{2}^{*} (P), \dots, π_{m}^{*} (P)) = \underset{(π_{1} (P), π_{2} (P), \dots, π_{m} (P))}{argmin} \max_{θ = 1, 2, \dots, ξ} T_{θ} (p) .

(11)

Note that this is an NP-hard problem that can’t be solved directly by conventional linear or integer mathematical programming [4,6]. In order to approximate the policies

π_{1}^{*} (P), π_{2}^{*} (P), \dots, π_{m}^{*} (P)

in a tractable manner, we evaluate three different approaches:

Maximizing of the expected information gain ( $E I G$ ) locally - over the cells that are reachable to each of the agents in a single move;
Heading the agents toward the center of view ( $C O V$ ), that is the point that provides maximum expected information-gain over all the cells in the domain;
Heading the agents toward the center of distribution, also known as the center of gravity $(C O G$ ), that is defined by the first moment of the probability map.

Notice that the last approach is a greedy heuristic that requires minimal computation efforts, while the first two approaches are more complicated heuristics that require the computation of the possibilities in the local or the global neighborhoods of each agent.

For the aims of comparisons, we also consider a case where static agents remain in their initial places and an agent that accumulates signals received from the targets while being governed by the brute force learning rule. We apply it for a single agent only, since this case is extremely demanding computationally.

The expected information gain

E I G (j, k, t)

for each sensor

𝕤_{j k}

of the agent

A_{j}

at time

t

is defined by the sum of the Kullback–Leibler (KL)divergence measures between the sensor probability map

P^{s e n s o r} (j, k, t | 𝕒_{j})

as obtained after executing a chosen action

𝕒_{j}

by the agent

A_{j}

and the sensor probability map

P^{s e n s o r} (j, k, t | O)

obtained without execution of any action; where such a null action is denoted by

O

. Then, the expected information gain,

E I G (j, k, t),

is defined as

E I G (j, k, t) = D_{K L} (P^{s e n s o r} (j, k, t | 𝕒_{j}) | | P^{s e n s o r} (j, k, t | O)),

(12)

where

D_{K L} (p | | q) = \sum_{x} p (x) \log (p (x) / q (x))

and logarithm is taken to the base of

2

; thus, the distance

D_{K L}

is represented by the average number of bits. Certainly, instead of the KL distance that is a pseudo-metric in the probability distributions space, other information-theoretic metrics can be used. In particular,

E I G

can be defined by the Jensen–Shannon divergence as the average of the KL distances

\frac{1}{2} D_{K L} (p | | M) + \frac{1}{2} D_{K L} (q | | M), M = \frac{1}{2} (p + q)

, or by the use of Ornstein or Rokhlin metrics (for application of such metrics to search problems see [4]). Nevertheless, here we use the conventional definition of the

E I G,

and since the heuristics do not need the metric properties of the probability distributions space, we do not require the distance function to be a formal metric.

At the agent’s level, the expected information gain

E I G (j, t)

is defined by the sum of the expected information gains for each sensor

𝕤_{j k}

, that is

E I G (j, t) = \sum_{k = 1}^{l} E I G (j, k, t) .

(13)

Similarly, for the group of

m

agents, the expected information gain

E I G (t)

at time

t

is

E I G (t) = \sum_{j = 1}^{m} E I G (j, t) .

(14)

Notice that instead of calculating the KL distances for each agent based on his own map as well as calculating the global probability maps over all agents (Equations (5) and (6), respectively), the expected information gains of higher levels are calculated by the sums of expected information gains of the lower levels. Thus, the

E I G

of the group is calculated as a sum of the

E I G

s of the agents, and the

E I G

of the agent is calculated as a sum of the

E I G

s of its sensors. Such a definition follows a line of additive property of information and leads to essentially simpler computations.

Using the

E I G

s, the agent’s decision-making follows the maximization of the

E I G

measure, that is

𝕒^{*} (t) = \underset{𝕒_{j}}{argmax} E I G (t) .

(15)

For example, while located in any cell

c_{i},

an agent can choose one of nine movement possibilities: make a step forward, backward, left, right, left-forward, left-backward, right-forward, right-backward, or stay in its current cell. Then, by Equation (15) the agent chooses such a movement that results in obtaining the maximum expected information gain about the targets’ locations.

The sensor probabilities

p_{s_{i} = 1}^{s e n s o r} (j, k, t | 𝕒_{j})

given the agent action

𝕒_{j}

can be defined by using either the global probability map,

P^{g l o b a l} (t),

or by using the agent probability map,

P^{a g e n t} (t)

. In the former case, the sensor probabilities are:

p_{s_{i} = 1}^{s e n s o r} (j, k, t | 𝕒_{j}) = p_{s_{i} = 1}^{g l o b a l} (t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c (𝕒_{j})) / λ_{j k}],

(16)

p_{s_{i} = 1}^{s e n s o r} (j, k, t | O) = p_{s_{i} = 1}^{g l o b a l} (t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c (O)) / λ_{j k}],

(17)

while in the latter case these probabilities are defined as follows

p_{s_{i} = 1}^{s e n s o r} (j, k, t | 𝕒_{j}) = p_{s_{i} = 1}^{a g e n t} (j, t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c (𝕒_{j})) / λ_{j k}],

(18)

p_{s_{i} = 1}^{s e n s o r} (j, k, t | O) = p_{s_{i} = 1}^{a g e n t} (j, t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c (O)) / λ_{j k}],

(19)

where

c (𝕒_{j})

is the agent’s location after conducting the action

𝕒_{j}

and

c (O)

is the agent’s location if it decides to avoid conducting any action, i.e., staying in its current location.

A second approach to govern the agent’s action implements the center of view (

C O V

) measure, aiming at the grid point that provides maximum expected information gain over all the cells in the domain. In other words, the difference between information, which the agent obtains in its current position, versus the information, which it expects to obtain while being located at the

C O V

point, reaches its maximum. Formally, it means that in contrast to

E I G

that is calculated over neighboring locations (thus, eight points around the current agent’s location and its current point), the

C O V

is based on a calculated

E I G

over all the points in the domain. Thus, instead of calculating the sensor probabilities by Equations (16)–(19) using distances

d (c_{i}, c (𝕒_{j}))

and, for the

C O V

calculation, the sensor probabilities are defined by using the distances

d (c_{i}, c_{η})

between the cell

c_{i}

and other points in the domain,

c_{η}

,

η = 1, 2, \dots, n

, that can be considered as candidate locations of the

C O V

. In parallel to Equations (16) and (18) in this case, we have

p_{s_{i} = 1, G}^{s e n s o r} (j, k, t | c_{η}) = p_{s_{i} = 1}^{g l o b a l} (t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c_{η}) / λ_{j k}],

(20)

p_{s_{i} = 1, A}^{s e n s o r} (j, k, t | c_{η}) = p_{s_{i} = 1}^{a g e n t} (j, t) \times P r {\tilde{a} (c_{i}, t) = 1 │ s_{i} (t) = 1} \times \exp [- d (c_{i}, c_{η}) / λ_{j k}] .

(21)

If the agent chooses to stay in its current location, then the distance is defined as above by

d (c_{i}, c (O))

and the sensor probabilities are calculated by Equations (17) and (19).

By the use of these sensor probabilities, the

E I G_{η}

is defined in parallel to the

E I G

:

E I G_{η} (j, k, t) = D_{K L} (P^{s e n s o r} (j, k, t | c_{η}) | | P^{s e n s o r} (j, k, t | O)),

(22)

E I G_{η} (j, t) = \sum_{k = 1}^{l} E I G_{η} (j, k, t),

(23)

E I G_{η} (t) = \sum_{j = 1}^{m} E I G_{η} (j, t),

(24)

and the

C O V

is the point, in which

E I G_{η}

reaches its maximum, that is

C O V (t) = \underset{c_{η}}{argmax} E I G_{η} (t) .

(25)

Finally, the center of gravity (

C O G

), which is the first moment of the probability map, the following calculations are used. In a two-dimensional domain, the location of each cell

c_{i}

,

i = 1, 2, \dots, n

, is defined by two coordinates,

c_{i} = (x_{i}, y_{i})

. In addition, recall that

s_{i} = s (c_{i}) \in {0, 1}

stands for the state of the cell

c_{i}

. Then, the coordinates of the

C O G

for the axes are

C O G_{x} (t) = \sum_{i = 1}^{n} x_{i} \times p_{s_{i} = 1}^{g l o b a l} (t) / \sum_{i = 1}^{n} p_{s_{i} = 1}^{g l o b a l} (t), .

(26)

C O G_{y} (t) = \sum_{i = 1}^{n} y_{i} \times p_{s_{i} = 1}^{g l o b a l} (t) / \sum_{i = 1}^{n} p_{s_{i} = 1}^{g l o b a l} (t),

(27)

and the final location of the

C O G

is obtained by rounding the values

C O G_{x} (t)

and

C O G_{y} (t)

to the closest integers that is

C O G (t) = ([C O G_{x} (t)], [C O G_{y} (t)]) .

(28)

Notice that since we consider only the states

s_{i} = 1

, the sum of the probabilities in the denominator differs from the unit and varies with time and with the number of targets.

Since both here and in the previous case the desired points

C O G (t)

and

C O V (t)

can be located far from the current agent’s location, the agent follows toward these points by steps and changes its direction with respect to the changes of the coordinates of both

C O G (t)

and

C O V (t)

.

5. Policies Control and Brute Force Learning

In order to control the obtained policies, we apply the simple look-backward method. Based on this method, the control of the agents’ policies is conducted as follows.

Let the global probability map at time

t - 1

be

P^{g l o b a l} (t - 1)

and assume that, following the chosen policies

π_{j} (P^{g l o b a l} (t - 1))

,

j = 1, 2, \dots, m

, the agents made a decision and conducted corresponding actions. Then, at time

t

, each of the agents is located in a new cell and following the observations from these cells the global probability map

P^{g l o b a l} (t) is constructed

. The value

V_{π} (t) = D_{K L} (P^{g l o b a l} (t) | | P^{g l o b a l} (t - 1)),

(29)

is the actual information gain that was obtained by the actions defined by the policy

π = (π_{1}, π_{2}, \dots, π_{m})

.

The defined decision-making process and control method are illustrated in Figure 2.

Based on the policy

π

, at time

t

each agent makes a decision regarding its action. After the action and the corresponding movement, the agent observes the environment, and following the obtained sensor maps, the agent map and the global map are refined. In parallel, in order to control the agent’s policy, a value of information gain

V_{π} (t)

is calculated.

The calculated information gain

V_{π} (t)

indicates the efficiency of the applied policy, and it is used as a comparative measure of the agents’ decisions: its accumulated value up to some time

T

gets larger as the agents’ policies are more efficient in the sense of the quantity of the obtained information about the targets’ locations. Accordingly, the best policy can be defined as follows:

π^{*} (T) = \underset{π}{argmax} \sum_{t = 1}^{T} V_{π} (t),

(30)

where

π^{*} (T)

denotes the best policy among all the combinations of the agents’ policies,

π_{j}

,

j = 1, 2, \dots, m

, that can be applied to the available global maps till time

T

.

The learning measure is also based on the actual information gain, but in this case, it is defined over the actions

𝕒 = (𝕒_{1}, 𝕒_{2}, \dots, 𝕒_{m})

that were chosen by the agents, namely:

V_{𝕒} (t) = D_{K L} (P^{g l o b a l} (t | 𝕒) | | P^{g l o b a l} (t | O)),

(31)

where

P^{g l o b a l} (t | 𝕒)

stands for the global probability map obtained after performing the actions of all the agents, and

P^{g l o b a l} (t | O)

is the global probability map, if all the agents stay at their current locations.

The selection of actions is conducted as follows. Assume that at time

t

the agents are at their locations and observe a certain global probability map

P^{g l o b a l} (t)

. Then, for each combination

𝕒 = (𝕒_{1}, 𝕒_{2}, \dots, 𝕒_{m})

of their actions and for the null action

O = (O_{1}, O_{2}, \dots, O_{m})

, global maps

P^{g l o b a l} (t | 𝕒)

and

P^{g l o b a l} (t | O)

are obtained and the average information gain

V_{𝕒} (t)

for the combination

𝕒

is specified. The best combination of actions is defined by the maximal value

V_{𝕒} (t)

, that is

𝕒^{*} (t) = \underset{𝕒}{argmax} V_{𝕒} (t) .

(32)

It is clear that this is the brute force learning that requires a consideration of all possible combinations of actions for all the agents with a large number of iterations. Thus, in practice, such a policy cannot be performed for a large number of agents and actions. However, for a single agent, this learning step can be performed in a relatively short time by available computation resources.

In the considered work, the brute force learning is used as a reference for the evaluation of the suggested methods.

6. Numerical Simulations and Analysis

Numerical simulations were implemented using the Python programming language, executed by a regular PC Intel I5 8265U processor. In all the cases, unless defined specifically, the run times of the algorithms are indicated by the numbers of iterations. Also, in all the tables, the policies are measured with respect to the defined approaches: expected information gain (

E I G

), center of view (

C O V

), and center of gravity (

C O G

), as presented in Section 4.

In the simulations, the search is conducted over a gridded square domain of size

n = n_{x} \times n_{y}

cells, and it is assumed that each agent and each target can occupy only one cell in the domain. In the simulations, different setups included different numbers

m

of the agents and different numbers

l

of the targets. Also, we assume that there are two types of sensors while each agent

A_{j}

is equipped with two sensors

𝕤_{j 1}

and

𝕤_{j 2}

of different types with corresponding sensitivities

λ_{j 1}

and

λ_{j 2}

. Both true alarms and false alarms are sent with respect to the sensors’ types and are perceived separately by each of the two sensor types.

Following the goal of finding such policies that result in a minimal time of detecting the last target (see Equation (11)), we determined the maximal simulation time by

T_{\max} (p) = \max_{θ = 1, 2, \dots, ξ} T_{θ} (p),

(33)

where, as indicated above,

T_{θ} (p)

is the time required to detect the target

θ

,

θ = 1, 2, \dots, ξ

, with probability

p

. Then, the policies that result in a minimum time

T_{m a x} (p)

given probability

p

are the best policies. In the simulations, we used the probability

p = 0.95

.

To reduce systematic errors, the results presented below were obtained by averaging the outcomes of five repeated trials, each of which contained thirty sessions, executed with the same parameters, with the same initial locations of both the searchers and the targets. The true and false alarms in the sessions were generated by the same uniform distribution with a random seed for each trial.

6.1. Detection by a Single Agent

Let us start with a small illustrative example of detection by a single agent

A_{1}

. In order to simulate the brute force learning defined by the Equations (31) and (32), we considered a small domain of the size

n = 20 \times 20 = 400

cells. A broadcast of the false alarms was distributed uniformly over the domain, and the frequency of sending false alarms from all

400

cells was

100

false alarms per second for each type of sensor, that is, on average

1 / 4

alarms per second from each cell to each type of sensor. The sensitivities of the sensors are

λ_{11} = λ_{12} = 10

.

In the first setting, the single agent

A_{1}

was detecting

l = 3

targets located in the cells with coordinates

c_{1} = (11, 16)

,

c_{2} = (0, 14)

and

c_{3} = (7, 1)

; the starting position of the agent was

c (0) = (20, 8)

. The results of the simulation trials are summarized in Table 1.

As expected, the best result, which leads to the minimum of

T_{m a x} (0.95)

is obtained by the brute force learning, while the times obtained by the policies based on the expected information gain (

E I G

), the center of view (

C O V

), and the center of gravity (

C O G

) are close to this best result. Notice that since the detection is conducted by a single agent, the results obtained by the

C O V

and

C O G

policies are equal.

Figure 3 illustrates the activity of a single agent detecting three targets using the center of view (

C O V

) policy.

The values of the accumulated information gain (see Equation (30)

V_{π} (T) = \sum_{t = 1}^{T} V_{π} (t),

(34)

that characterizes the effectiveness of the policy for the simulated policies are presented in Table 2. The time

T = 17

is the minimal time of detecting the last target based on the brute force policy.

With respect to the detection time, the maximal accumulated information gain

V_{π} (T)

is obtained by the brute force learning policy, while the

E I G

,

C O V

, and

C O G

policies result in the accumulated information gains that are close to the maximum.

In the next simulation, a single agent

A_{1}

is detecting

l = 5

targets located in the cells of the following coordinates:

c_{1} = (11, 16)

,

c_{2} = (0, 14)

,

c_{3} = (7, 1)

,

c_{4} = (5, 3)

and

c_{5} = (11, 15)

; the starting position of the agent is

c (0) = (19, 8)

. The results of the simulation trials are summarized in Table 3.

Similar to the previous case, the best result is obtained by the brute force learning, while the times obtained by the policies based on the center of view (

C O V

) and the center of gravity (

C O G

) are close to this best result. However, for this case with larger number of targets, one can already notice that the policy based on the expected information gain (

E I G

) is worse than the first three policies.

The relations between the values of the accumulated information gain, in this case, are again similar to the case of the detection of three targets. The best result is provided by the brute force learning policy. Then, the results of the

C O V

and the

C O G

policies obtain close to the brute force, while the

E I G

policy is less effective. The worst results are obtained by the static agent.

6.2. Detection by Multiple Agents

Now let us consider detection by a group of agents that can share information and consequently use probability maps of each other as well as a global probability map, which represents the knowledge of the group.

Since in this case, we have not considered the brute force learning, in the simulations, we used a greater domain of size

n = 40 \times 40 = 1600

cells. As mentioned above, a broadcast of the false alarms was distributed uniformly over the domain. However, the frequency of sending false alarms from all

1600

cells was

400

false alarms per second for each type of sensor that is

1 / 4

alarms per second on average from each cell to each type of sensor. It is assumed that each agent

A_{j}

is equipped by the sensors of two types. The sensitivities of the sensors of each type are denoted by

λ_{j 1}

and

λ_{j 2}

.

Recall that located in a cell, each agent can choose one of nine alternatives, while if the agent is located at the border of the domain, then the number of alternatives is smaller due to boundary conditions of the map.

In addition to the considered policies that are based on (i) expected information gain (

E I G

), (ii) center of view (

C O V

), and (iii) center of gravity (

C O G

), in the following simulations, we also distinguish decision-making under several scenarios:(i) when relying on the agent probability map versus (ii) when relying on the global probability map, as well as (i) selection of actions by each agent separately or (ii) selection of actions mutually by all the agents in the group. The use of the maps and the selection of the actions for the different policies are summarized in Table 4.

The table presents the fact that when applying the

E I G

policy, from its current cell the agent can either move to one of eight neighboring cells or stay in its current location. When applying the

C O V

policy the agent can choose one of

n

cells as a desired center of view, and when applying the

C O G

policy the agent can consider only one cell as the

C O G

cell.

For consistency, let us start with the same setting as in the previous simulations, namely by detection of

l = 5

targets located at the cells with coordinates

c_{1} = (4, 34)

,

c_{2} = (6, 23)

,

c_{3} = (37, 3)

,

c_{4} = (32, 13)

and

c_{5} = (2, 5)

. These simulations are conducted for

m = 2

agents, each of which is equipped by two sensors with respective sensitivities

λ_{j 1} = λ_{j 2} = 10

,

j = 1, 2

. Initial positions of the agents are

c_{1} (0) = (25, 3)

and

c_{2} (0) = (20, 9)

. The results of the simulations when using different policies are presented in Table 5 (cf. Table 3).

Figure 4 illustrates the activity of two agents detecting five targets when using the expected information gain (

E I G

) with

agent map / agent action

policy.

Results of simulations using different policies are presented in Table 5 (cf. Table 3).

As expected, the worst results are obtained by the static agents. The best results are provided by the policies based on the center of view (

C O V

) and center of gravity (

C O G

). As indicated above, these policies result in the same detection times. Finally, the

E I G

policy, as seen above, results in a longer search than the

C O V

and the

C O G

policies, yet, better than the static agent policy.

In addition, notice that the decision-making policies that are based on the agent map provide significantly better results than the policies based on the global map. In other words, in the detection tasks, more information is not always better, unless actions between the agents can be synchronized. The reason for this result is the following. Relying on a single global map, all agents aim at the same preferable regions with the higher probabilities of detecting the targets while ignoring the regions with the lower probabilities. However, because of the existence of both false positive and false negative errors, the targets can appear in those ignored regions, to which the agents return only after the unsuccessful detection in the preferable regions. All these movements waste a lot of time. In contrast, while using the agent maps, each agent considers its local region and continues to the other regions only after unsuccessful detection in its close neighborhood. In such a manner, the agents divide the task and conduct the detection process in parallel in different regions. At the same time, the global map is used for terminating the detection for all the agents.

Finally, notice that a better choice of actions is provided by applying the group action. However, since it requires strong computation power without a significant improvement of the detection time, this approach is less attractive for practical tasks.

The accumulated information gain for this simulated detection of

l = 5

targets by

m = 2

agents is presented in Table 6; the CPU times required for such detection are presented in Table 7.

The obtained results support the previous observations. The higher accumulated information gain is obtained by the

C O V

and the

C O G

policies, the

E I G

policy obtains the worse results, and the lowest gain is obtained by the group of static agents. Also, better results are achieved by the decision-making policies based on the agent maps, and the better choice of actions is provided by the use of group action.

In order to represent the relation between the detection efficiency and the sensors’ sensitivity, a similar detection scenario was simulated for the agents that are equipped by sensors with different sensitivities

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

. Table 8 presents the detection times

T_{m a x} (0.95)

and that accumulated information gains

V_{π} (100)

for this scenario.

A comparison of the obtained times and information gains with the results presented in Table 6 and Table 8, respectively, show that in general, the change of the sensors’ sensitivities preserves the already considered trends in the efficiencies of the policies. At the same time, it stresses an advantage of the group action relative to the actions by each agent separately.

Finally, let us consider the dependence of the accumulated information gain on the detection time. An example of such functions is shown in Figure 5, where we used the results of the previous simulation of detecting

l = 5

targets by

m = 2

agents with different sensors sensitivity

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

, following the

C O V

policy with agent action choice.

It is seen that in the beginning of the search process, the policy based on the agent map accumulates information faster than the policy based on the global map. However, as the search process continues, the accumulated information gain obtained by the global map policy converges to a value which is significantly greater than the one to which the agent map policy converges.

For validation of the presented results, further simulations were conducted for different settings. The obtained detection times and information gains demonstrate the same trends of the policies’ efficiency. In addition, it was found that as the number of the agents gets larger, the difference between the best

C O V

policy and the nearly best

C O G

and

E I G

policies is increasing.

7. Discussion

The paper presents three heuristic techniques for navigation of autonomous agents searching for hidden targets in the presence of false positive and false negative errors. Two of these heuristics are based on the expected information gain calculated over the local neighborhood of each agent (

E I G

policy) or relative to the center of view (

C O V

policy), and the third heuristic uses the center of gravity (

C O G

policy) of the targets’ location probabilities. In order to make decisions regarding the next movements, the agents use either their own probability maps or a global probability map.

The simulations show that in short-term detection processes, the policies based on the agent map outperform the policies based on the global map, both while the agents’ movements are not centrally synchronized (individual decision-making) and while the agents’ actions are definitely synchronized (collective decision-making in the group). However, in the long-term detections, the accumulated information gain during the search process when using the global map policy was significantly larger than the accumulated information gain during the search process when using the agents’ maps policy. Probably, the reason for such a result is the following: while using the global map, the agents calculate the information gain also taking into account the not relevant information based on the false alarms, and while using the agents’ maps the influence of such alarms is lower and, consequently, the accumulated information gain is lower as well.

Detection using the

E I G

policy demonstrated lower efficiency (in terms of detection time) than the

C O V

and

C O G

policies while using both the agents’ and the global probability maps. The main reason for such a result is the following. On the one hand, the

E I G

policy does not always recognize what should be the next step since the differences in the expected information gain obtained by staying in the current cell and by moving to a neighboring cell are extremely small and cannot be used for a reasonable selection of the actions. On the other hand, in order to reveal the center of view, the

C O V

policy requires the agent to check all the cells in the domain and, consequently, succeeds to find a significant change in the expected information gain.

While using the sensors with equal sensitivities, the

C O V

and

C O G

policies result in a close or even equal detection times. Therefore, since the

C O G

policy has an extremely lower computational complexity than the

C O V

policy, it should be preferable when the agents are equipped with similar sensors. However, if the sensitivities of the sensors are different, then the

C O V

policy is significantly better.

As expected, a decision-making heuristic which relies on group actions leads to better performance than the one which relies on single agents’ actions, yet, the first heuristic requires much greater computation efforts. In order to shorten the running time, the number of calculations can be decreased by using some probability threshold. Thus, at each step of the computation, the cells with probabilities lower than the threshold are ignored, thus the number of calculations is reduced without significantly influencing the quality of the search results.

Finally, notice that the results obtained when using the presented techniques are close to the results obtained by the optimal brute-force learning method. Such a comparison both validates the suggested methods and demonstrates that, for the detection over large domains, where due to intractable computation complexity the brute force learning cannot be used, these methods might provide sub optimal results by reasonable computation efforts in reasonable running time.

8. Conclusions

In the paper, we considered the problem of detection of multiple targets by the group of mobile agents that directly extends the classical Koopman search problem [2]. In contrast to many known algorithms, we addressed detection with both false positive and false negative detection errors.

The suggested solution implements three different levels of the agent’s knowledge about the targets’ locations: information that is available to the group of agents, information available to a single agent, and information obtained by a single on-board sensor of an agent.

For these settings, we considered three decision-making policies based on different considerations of the expected information gain, which can be obtained by the agent in its next step. Namely, the policies considered a local neighborhood of the agent, a location of the “center of view” from which the agents can obtain maximum information using their sensors, and a location of the “center of gravity” of the targets’ probability map.

The results obtained using the suggested policies were compared against the results provided by the worst policy, in which the agents are static, as well as the best policy of the brute force learning when tractable.

Simulations of the suggested solutions demonstrate that the best results among the constructed policies are obtained by a policy which is based on the center of view. Close results are provided by the policy based on the use of the center of gravity, and the worse results, yet sometimes satisfactory, were achieved by the policy based on the expected information gain over a local neighborhood of the agent.

In addition, it was found that in the considered problem including both false positive and false negative detection errors, decision-making policies based on the agent maps provide significantly better results than the policies based on the global map.

The best search policy under the considered settings was obtained by relying on group actions, at the expense of having a strong computation complexity, without significantly improving the detection time relative to the suggested heuristics. This observation makes this approach less attractive to implement.

Finally, it was demonstrated that the policies based on the agent map are more effective for the detection in a given short period of time, while in a long-term detection, the policies based on the use of a global map result in better outcomes in terms of accumulated information gain.

The constructed algorithms and software can form a basis for the further development of the proposed methods as well as other methods related to probabilistic search and detection. These methods can be used directly for practical applications in various fields, such as smart cities, military applications, and autonomous vehicles.

Author Contributions

Conceptualization, I.B.-G. and E.K.; methodology, B.M.; software, B.M.; formal analysis, B.M. and E.K.; investigation, B.M.; writing—original draft preparation, B.M. and E.K.; writing—review and editing, I.B.-G.; supervision, I.B.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nahin, P.J. Chases and Escapes: The Mathematics of Pursuit and Evasion; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
Koopman, B.O. Search, and Screening. In Operation Evaluation Research Group Report, 56; Center for Naval Analysis: Rosslyn, Virginia, 1946. [Google Scholar]
Frost, J.R.; Stone, L.D. Review of Search Theory: Advances and Applications to Search and Rescue Decision Support; US Coast Guard Research and Development Center: Groton, MA, USA, 2001.
Kagan, E.; Ben-Gal, I. Probabilistic Search for Tracking Targets; Wiley & Sons: Chichester, UK, 2013. [Google Scholar]
Stone, L.D. Theory of Optimal Search; Academic Press: New York, NY, USA, 1975. [Google Scholar]
Kagan, E.; Ben-Gal, I. Search, and Foraging. Individual Motion and Swarm Dynamics; Taylor & Francis: Boca Raton, FL, USA, 2015. [Google Scholar]
Kagan, E.; Shvalb, N.; Ben-Gal, I. (Eds.) Autonomous Mobile Robots and Multi-Robot Systems: Motion-Planning, Communication and Swarming; Wiley & Sons: Chichester, UK, 2019. [Google Scholar]
Washburn, A.R. Search and Detection; ORSA Books: Arlington, VA, USA, 1989. [Google Scholar]
Israel, M.; Khmelnitsky, E.; Kagan, E. Search for a mobile target by ground vehicle on a topographic terrain. In Proceedings of the 27th IEEE Convention of Electrical and Electronic Engineers in Israel, Eilat, Israel, 14–17 November 2012. [Google Scholar]
Chernikhovsky, G.; Kagan, E.; Goren, G.; Ben-Gal, I. Path planning for sea vessel search using wideband sonar. In Proceedings of the 27th IEEE Convention of Electrical and Electronic Engineers in Israel, Eilat, Israel, 14–17 November 2012. [Google Scholar]
Robin, C.; Lacroix, S. Multi-robot target detection and tracking: Taxonomy and survey. Auton. Robot. 2015, 40, 729–760. [Google Scholar] [CrossRef]
Senanayake, M.; Senthooran, I.; Barca, J.C.; Chung, H.; Kamruzzaman, J.; Murshed, M. Search and tracking algorithms from swarm of robots: A survey. Robot. Auton. Ststems 2016, 75, 422–434. [Google Scholar] [CrossRef]
Kagan, E.; Goren, G.; Ben-Gal, I. Algorithm of search for static or moving target by autonomous mobile agent with erroneous sensor. In Proceedings of the 27th IEEE Convention of Electrical and Electronic Engineers in Israel, Eilat, Israel, 14–17 November 2012. [Google Scholar]
Elfes, A. Sonar-based real-world mapping, and navigation. IEEE J. Robot. Autom. 1987, 3, 249–265. [Google Scholar] [CrossRef]
Elfes, A. Occupancy grids: A stochastic spatial representation for active robot perception. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence, New York, NY, USA, 27–29 July 1990; pp. 136–146. [Google Scholar]
Stone, L.D.; Barlow, C.A.; Corwin, T.L. Bayesian Multiple Target Tracking; Artech House Inc.: Boston, MA, USA, 1999. [Google Scholar]
Hassoun, M.; Kagan, E. On the right combination of altruism and randomness for swarms of homogeneous distributed autonomous agents. Swarm Intell. 2020. to appear. [Google Scholar]

Figure 1. The process of sensors’ fusion.

Figure 2. Schema of the agent’s decision-making and control.

Figure 3. Activity of a single agent detecting three targets setting. (a)—The targets’ positions (red squares) and the agent’s initial location (green squire). (b)—

100

false alarms per second for each type of sensor (white color indicates false alarm). (c)—The agent’s trajectory and final position.

Figure 3. Activity of a single agent detecting three targets setting. (a)—The targets’ positions (red squares) and the agent’s initial location (green squire). (b)—

100

false alarms per second for each type of sensor (white color indicates false alarm). (c)—The agent’s trajectory and final position.

Figure 4. Activity of two agents detecting five targets setting. (a)—The targets’ positions (red squares) and the agents’ initial locations (green squares). (b)—Map of

400

false alarm signals per second for each type of sensor (white color indicates false alarm). (c)—The agent trajectories and final positions.

Figure 4. Activity of two agents detecting five targets setting. (a)—The targets’ positions (red squares) and the agents’ initial locations (green squares). (b)—Map of

400

false alarm signals per second for each type of sensor (white color indicates false alarm). (c)—The agent trajectories and final positions.

Figure 5. Dependence of the accumulated information gain

V_{π} (T)

on the detection time

T

for the

C O V

policy with agent action choice;

l = 5

targets,

m = 2

agents, and sensors’ sensitivities

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

.

Figure 5. Dependence of the accumulated information gain

V_{π} (T)

on the detection time

T

for the

C O V

policy with agent action choice;

l = 5

targets,

m = 2

agents, and sensors’ sensitivities

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

.

Table 1. Times required for the detection of the last among

l = 3

targets with probability

p = 0.95

by a single agent implementing different policies.

Table 1. Times required for the detection of the last among

l = 3

targets with probability

p = 0.95

by a single agent implementing different policies.

Detection Policy	Detection Times
Detection Policy	First Target	Second Target	Third Target	$T_{m a x} (0.95)$
Static agen	$15$	$72$	$15$	$72$
$E I G$	$15$	$19$	$11$	$19$
$C O V$	$8$	$18$	$15$	$18$
$C O G$	$8$	$18$	$15$	$18$
Brute force learning	$10$	$17$	$14$	$17$

Table 2. Accumulated information gain of the detection of

l = 3

targets by a single agent for the times

T = 10

and

T = 17

.

Table 2. Accumulated information gain of the detection of

l = 3

targets by a single agent for the times

T = 10

and

T = 17

.

Detection Policy	$Accumulated Information Gain V_{π} (T)$
Detection Policy	$T = 10$	$T = 17$
Static agent	$3.6$	$4.1$
$E I G$	$4.2$	$6.1$
$C O V$	$4.4$	$6.4$
$C O G$	$4.4$	$6.4$
Brute force learning	$4.6$	$6.8$

Table 3. The times required for detection of the last among

l = 5

targets with the probability

p = 0.95

by a single agent implementing different policies.

Table 3. The times required for detection of the last among

l = 5

targets with the probability

p = 0.95

by a single agent implementing different policies.

Detection Policy	Detection Times
Detection Policy	First Target	Second Target	Third Target	Fourth Target	Fifths Target	$T_{m a x} (0.95)$
Static agent	$14$	$94$	$29$	$12$	$9$	$94$
$E I G$	$9$	$32$	$20$	$13$	$7$	$32$
$C O V$	$14$	$28$	$22$	$10$	$8$	$28$
$C O G$	$14$	$28$	$21$	$10$	$8$	$28$
Brute force learning	$14$	$25$	$17$	$18$	$11$	$25$

Table 4. Decision-making and action choice in the group of

m

agents act in a domain with

n

cells.

Table 4. Decision-making and action choice in the group of

m

agents act in a domain with

n

cells.

Decision Making/Action Choice	Number of Alternatives
Decision Making/Action Choice	$E I G$	$C O V$	$C O G$
Agent map/agent action	$9 m$	$m n$	$m$
Global map/agent action	$9 m$	$m n$	$1$
Global map/group action	$9^{m}$	$n^{m}$	$1$

Table 5. The times required for the detection of the last among

l = 5

targets with the probability

p = 0.95

by

m = 2

agents implementing different policies.

Table 5. The times required for the detection of the last among

l = 5

targets with the probability

p = 0.95

by

m = 2

agents implementing different policies.

Detection Policy		Detection Times
Detection Policy		First Target	Second Target	Third Target	Fourth Target	Fifths Target	$T_{m a x} (0.95)$
Static agents		$400$	$90$	$47$	$27$	$70$	$400$
$E I G$	Agent map/agent action	$95$	$37$	$51$	$18$	$60$	$95$
	Global map/agent action	$143$	$123$	$53$	$28$	$109$	$143$
	Global map/group action	$138$	$116$	$35$	$28$	$103$	$138$
$C O V$	Agent map/agent action	$88$	$33$	$43$	$18$	$59$	$88$
	Global map/agent action	$126$	$85$	$35$	$28$	$81$	$126$
	Global map/group action	$108$	$85$	$37$	$32$	$61$	$108$
$C O G$	Agent map	$87$	$33$	$43$	$18$	$59$	$87$
$C O G$	Global map	$126$	$89$	$35$	$28$	$81$	$126$

Table 6. Accumulated information gain in the detection of

l = 5

targets by

m = 2

agents for the times

T = 75

and

T = 100

.

Table 6. Accumulated information gain in the detection of

l = 5

targets by

m = 2

agents for the times

T = 75

and

T = 100

.

Detection Policy		$Accumulated Information Gain V_{π} (T)$
Detection Policy		$T = 75$	$T = 100$
Static agents		$4.4$	$6.5$
$E I G$	Agent map/agent action	$9.9$	$13.6$
	Global map/agent action	$5.7$	$7.9$
	Global map/group action	$5.8$	$8.7$
$C O V$	Agent map/agent action	$10.8$	$14.2$
	Global map/agent action	$8.4$	$11.7$
	Global map/group action	$9.2$	$12.6$
$C O G$	Agent map	$10.9$	$14.2$
$C O G$	Global map	$8.4$	$11.7$

Table 7. CPU time (sec.) for target detection with probability

p = 0.95

of

l = 5

targets by

m = 2

agents.

Table 7. CPU time (sec.) for target detection with probability

p = 0.95

of

l = 5

targets by

m = 2

agents.

Detection Policy		CPU Time (sec)
Static agents		240
$E I G$	Agent map/agent action	75
	Global map/agent action	180
	Global map/group action	850
$C O V$	Agent map/agent action	220
	Global map/agent action	350
	Global map/group action	3700
$C O G$	Agent map	25
$C O G$	Global map	32

Table 8. Detection times and accumulated information gain in the detection of

l = 5

targets by

m = 2

agents with different sensor sensitivities

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

.

Table 8. Detection times and accumulated information gain in the detection of

l = 5

targets by

m = 2

agents with different sensor sensitivities

λ_{1 k} = 12

and

λ_{2 k} = 8

,

k = 1, 2

.

Detection Policy		$T_{m a x} (p = 0.95)$	$V_{π} (T = 100)$
Static agents		$300$	$5.1$
$E I G$	Agent map/agent action	$81$	$13.6$
	Global map/agent action	$144$	$6.2$
	Global map/group action	$129$	$9.0$
$C O V$	Agent map/agent action	$63$	$12.9$
	Global map/agent action	$109$	$11.7$
	Global map/group action	$96$	$12.4$
$C O G$	Agent map	$67$	$12.9$
$C O G$	Global map	$109$	$11.9$

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matzliach, B.; Ben-Gal, I.; Kagan, E. Cooperative Detection of Multiple Targets by the Group of Mobile Agents. Entropy 2020, 22, 512. https://doi.org/10.3390/e22050512

AMA Style

Matzliach B, Ben-Gal I, Kagan E. Cooperative Detection of Multiple Targets by the Group of Mobile Agents. Entropy. 2020; 22(5):512. https://doi.org/10.3390/e22050512

Chicago/Turabian Style

Matzliach, Barouch, Irad Ben-Gal, and Evgeny Kagan. 2020. "Cooperative Detection of Multiple Targets by the Group of Mobile Agents" Entropy 22, no. 5: 512. https://doi.org/10.3390/e22050512

APA Style

Matzliach, B., Ben-Gal, I., & Kagan, E. (2020). Cooperative Detection of Multiple Targets by the Group of Mobile Agents. Entropy, 22(5), 512. https://doi.org/10.3390/e22050512

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cooperative Detection of Multiple Targets by the Group of Mobile Agents

Abstract

1. Introduction

2. Scenarios of Cooperative Detection

3. Sensor Fusion and Updating Schemes over the Probability Maps

4. Agents’ Policies and Decision Making

5. Policies Control and Brute Force Learning

6. Numerical Simulations and Analysis

6.1. Detection by a Single Agent

6.2. Detection by Multiple Agents

7. Discussion

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI