# An Information-Motivated Exploration Agent to Locate Stationary Persons with Wireless Transmitters in Unknown Environments

^{*}

## Abstract

**:**

## 1. Introduction

- We present a discretized system model and formulation for the problem of searching an unknown number of wireless transmitters in a possibly large and unknown area with obstacles, including a radio model that includes limited-scale propagation characteristics like path loss. To the best of our knowledge, the search problem described in this paper was not widely considered in the literature.
- We describe the design of a novel search algorithm, which is loosely based on the information-theoretic concept of empowerment, and which incorporates limited assumptions for properties of the wireless channel. Our algorithm makes explicit the uncertainty present in the problem setting by modelling important properties of transceivers (e.g., their transmit power and frequency of signal/pulse emissions) and of wireless propagation as finite-range probability distributions, which may also encode any prior knowledge.
- We conduct a performance analysis of our algorithm and compare it against two baseline algorithms, one of which conducts a search along a “dense” path (note that we have not found any algorithm in the literature which addresses the same problem). In this analysis we assess the impact of important system parameters.

## 2. Related Work

## 3. System Model

#### 3.1. Search Area

**obstacles**present in a patch and we assume that an obstacle occupies a patch completely or not at all. The agent cannot enter a place occupied by an obstacle, and a wireless transmitter also cannot be placed there. In this model, we assume that we do not have to deal with the case of a transmitter obstructed by something such as rubble. This is to reduce model complexity.

#### 3.2. Transmitters and Wireless Propagation

- 1.
- There can be zero, one, or more persons which may need to be rescued, each having a transmitter.
- 2.
- There is at most one transmitter and person pair in each patch.
- 3.
- Each of these persons is equipped with a wireless transmitter, which frequently emits wireless signals in one of a well-known set of radio frequencies. Each transmitter can be using one of a pre-defined set of wireless technologies (e.g., WiFi or Bluetooth or a cellular technology).
- 4.
- A particular wireless transmitter transmits its beacons with a transmit power p (in dBm) that is being taken from a finite set of allowable transmit powers $\mathcal{P}\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}\left\{{p}_{1},{p}_{2},\dots ,{p}_{{N}_{P}}\right\}$, with all the values given in dBm. The set $\mathcal{P}$ is known a priori to the agent.
- 5.
- A particular wireless transmitter transmits its beacons frequently, with an average beacon transmission rate of $\tau $ beacons per second, where $\tau $ is being taken from a finite set of allowable beacon transmission rates $\mathcal{T}=\left\{{\tau}_{1},\dots ,{\tau}_{{N}_{R}}\right\}$, with all values given in Hz. The set $\mathcal{T}$ is known a priori to the agent.
- 6.
- It is not known a priori to the agent how many persons or transmitters there are and what their chosen transmit powers and beacon generation rates are.

- 1.
- Suppose a transmitter is located in patch $(x,y)$—for the sake of definiteness let us say at the centre of the patch—and an agent is located in patch $({x}^{\prime},{y}^{\prime})$, again in the centre.
- 2.
- When there is an obstacle in the direct line-of-sight path between transmitter and agent, then the signal is completely blocked and the agent does not hear anything. This is a worst-case assumption.
- 3.
- Otherwise, the channel model between a transmitter in patch $(x,y)$ and an agent in patch $({x}^{\prime},{y}^{\prime})$ follows a modification of the standard log-distance model [32], which accounts for the reference distance (which we assume here to be one meter). In this model, the total path loss at a distance $d\ge 1\phantom{\rule{0.166667em}{0ex}}m$ and for given path loss exponent $\gamma $ and given path-loss L at the reference distance is given by (in dB):$$h\left(d\right|\gamma ,L)=\{\begin{array}{cc}\hfill L+10\xb7\gamma \xb7{log}_{10}\left(d\right)& ,ifd\ge 1\\ L\hfill & ,\mathrm{otherwise}\end{array}$$We assume that neither the path loss exponent $\gamma $ nor the initial path loss at the reference distance, L, are known a priori to the agent. However, as an approximation we assume that they both are taken from a finite set of eligible values. In particular, the path loss exponent $\gamma $ is taken from the set $\mathcal{G}=\left\{{\gamma}_{1},\dots ,{\gamma}_{{N}_{E}}\right\}$, where ${N}_{E}$ is the number of allowed path loss exponents, and the initial path loss value L is taken from the set $\mathcal{L}=\left\{{L}_{1},\dots ,{L}_{{N}_{L}}\right\}$, where ${N}_{L}$ is the number of allowed path loss values. The sets $\mathcal{G}$ and $\mathcal{L}$ are known a priori to the agent. In this model, we did not include a shadowing term (often modelled as lognormal fading).
- 4.
- As a result, when the transmitter uses transmit power p (in dBm), and the distance between transmitter and agent is d, then the received signal power at the agent (in dBm) is given by$${P}_{r}\left(d\right|p,\gamma ,L)=p-h\left(d\right|\gamma ,L)$$$$S\left(d\right|p,\gamma ,L)={P}_{r}\left(d\right|p,\gamma ,L)-{N}_{0}$$
- 5.
- Finally, we assume that while the details of wireless transmission and propagation (transmit power p, beacon generation rate $\tau $, path loss exponent $\gamma $, reference path loss L) are not known to the agent, there exists some threshold distance ${R}_{max}>0$ between transmitter and agent, beyond which the agent is guaranteed to not detect any transmission of a wireless signal. This value ${R}_{max}$ is known to the agent a-priori.
- 6.
- Notation: given a patch $({x}^{\prime},{y}^{\prime})$, denote by $\mathcal{N}({x}^{\prime},{y}^{\prime})$ the set of all patches $(x,y)\ne ({x}^{\prime},{y}^{\prime})$ that have a Euclidean distance smaller than or equal to ${R}_{max}$ from $({x}^{\prime},{y}^{\prime})$, where the distance of two patches is meant to refer to the distance between their centre points.

#### 3.3. Search Agent

- A
**downward sensor ($\mathit{D}$)**, like for example a camera, is mounted at the bottom of the agent and can inspect the current patch the agent is on. In particular, the downward sensor can determine with certainty whether there is a person/transmitter in the current patch or not. - A
**vicinity sensor ($\mathit{B}$)**, like for example a LIDAR, which allows the agent to determine its Moore neighbourhood, i.e., to determine reliably for all eight neighbouring patches (suitably modified for boundary patches) whether or not an obstacle is present in those. The vicinity sensor, however, does not give any information about the presence or absence of transmitters in the neighboured patches. The purpose of this sensor is to detect trees, buildings, power lines and large geographic features. - A
**radio sensor ($\mathit{R}$)**or**radio receiver**which includes an omnidirectional antenna. We assume that we did not receive demodulation circuitry for specific technologies (as these may add further weight to the agent, shortening its flight time), but rather that we can only detect the presence or absence of energy in certain predefined frequency bands. In other words, we can perform signal detection, but we assume that we do not attempt actual demodulation and extraction of data.

- Uses its downward sensor to check whether a transmitter/person is present or not in patch $({x}^{\prime},{y}^{\prime})$.
- Uses the vicinity sensor to determine its Moore neighbourhood [35].
- Uses its omnidirectional antenna to detect wireless signals.
- Updates its internal model according to the update function, makes a decision about the next patch to go next according to the action function and then moves there.

- $D\in \left\{0,1\right\}$ is the output of the downward sensor, where $D=0$ indicates that there is no transmitter in patch $(x,y)$ and $D=1$ indicates that there is.
- B is the output of the vicinity sensor, it is an eight-tuple of Boolean flags indicating for each of the neighbouring eight fields whether or not an obstacle (or block) is present in that field.
- $R\in {\mathbb{N}}_{0}$ is the output of the radio sensor, it gives the number of radio beacons that were detected during the time ${T}_{s}$ the agent listened for radio signals while being in patch $({x}^{\prime},{y}^{\prime})$.

#### 3.4. Performance Measure

## 4. Algorithm

#### 4.1. Overall Structure

- 1.
- Move to the centre of a patch $({x}^{\prime},{y}^{\prime})$ and collect the sensor readings ${S}_{t}$ as described in Section 3.3.
- 2.
- Update an internal model representing the current belief about the presence or absence of transmitters and obstacles in all the patches the agent is able to observe at the position $({x}^{\prime},{y}^{\prime})$. More precisely, the model state from the previous round, ${M}_{t-1}$, is combined with the sensor readings of round t, ${S}_{t}$, to give an updated model ${M}_{t}=g({M}_{t-1},{S}_{t})$, where the function $g(\xb7)$ is the
**model update function**. - 3.
- After updating the internal model, an action ${A}_{t}$ is chosen out of the currently available actions in patch $({x}^{\prime},{y}^{\prime})$. The available actions are moving the agent into one of the neighboured, nonobstacled patches. More precisely, to calculate an action ${A}_{t}$ we apply an
**action function**$f(\xb7)$ to the current model state ${M}_{t}$ and the current position: ${A}_{t}=f({M}_{t},({x}^{\prime},{y}^{\prime}))$.

#### 4.1.1. Search Phase

#### 4.1.2. Discovery Phase

- 1.
- Clone our current model M into ${M}^{\prime}$—ignoring all patches that can be considered outside ${R}_{max}$ (these can be zeroed).
- 2.
- Set all values in ${M}^{\prime}$ that are not 0 or 1 (maximum certainty) to $0.5$ (maximum uncertainty), indicating we unsure about the location of a transmitter in this location.
- 3.
- Use a minimal implementation of the IEB algorithm where only the predicted information within the area enclosed by ${R}_{max}$ is used to plan actions. During this search, the agent does not act on detected signals for model ${M}^{\prime}$. If signals are heard whilst performing the search, they are processed by the global model M and can be acted upon after the localized search, but are not acted on until the local discovery search is completed.
- 4.
- Search until all values in model ${M}^{\prime}$ are either 0 or 1, indicating each position was searched.
- 5.
- Update model M with the resulting internal belief values, found during the limited area search performed with ${M}^{\prime}$, such that the values within the area contained by ${R}_{max}$ are updated.

#### 4.2. Model Structure

- A real number ${M}_{x,y}$ representing the current belief that a transmitter is in patch $(x,y)$, hence ${M}_{x,y}\in [0,1]$. The value 0 represents the absolute certainty of there being no transmitter on this patch, and 1 represents absolute certainty of there being a transmitter on this patch (because it has been discovered through the downward sensor). For the case of zero probability, there can actually be two different reasons: the first reason is that we have detected an obstacle on that patch (recall that one of our assumptions is that transmitters and obstacles never share the same patch—see Section 3.1). The second reason is that there is no obstacle, but we have visited this patch in the past and have used our downward sensor to confirm that there is no transmitter. In doing this, we assume that our downward sensor is absolutely reliable, i.e., it makes no error in confirming the absence or presence of a transmitter.
- A Boolean flag ${O}_{x,y}$ which is only meaningful if ${M}_{x,y}=0$ and which indicates which of the previous two cases applies. Specifically, we set ${O}_{x,y}=1$ if there is an obstacle in patch $(x,y)$, and ${O}_{x,y}=0$ if there is no obstacle. This information about the presence or absence of obstacles is mainly used in the generation of possible paths that the UAV can take.

- For large values of $\left|\mathcal{M}\right|$ a requirement for normalization would make most of the numbers ${M}_{x,y}$ very small, possibly leading to numerical difficulties.
- We do not have to carry out computations for the purpose of normalization after each update of the ${M}_{x,y}$ values.
- It is not obvious how normalization can be given a suitable interpretation if several transmitters are allowed.

#### 4.3. Model Update Function

- Set ${M}_{{x}^{\prime},{y}^{\prime}}=D$, to record the absence or presence of a transmitter in the current patch. Furthermore, if $D=0$ (i.e., no transmitter) then also set ${O}_{{x}^{\prime},{y}^{\prime}}=1$.
- If one or more of the entries in the B-component of the sensor readings indicates an obstacle in the respective neighbour patch $(x,y)$, set ${M}_{x,y}=0$, since there is an obstacle in that patch and no transmitter. Also, set ${O}_{x,y}=1$.
- If $D=0$, i.e., if no transmitter was detected in the current patch, we will have to update all patches $(x,y)\in \mathcal{N}({x}^{\prime},{y}^{\prime})$ for which ${M}_{x,y}\notin \left\{0,1\right\}$ in a Bayesian way (see below, this part will have to account for the R component in the sensor readings) to change their current belief value ${M}_{x,y}$.
- If $D=1$, i.e., if we indeed have found a transmitter on the current patch, then all values ${M}_{x,y}$ for $(x,y)\in \mathcal{N}({x}^{\prime},{y}^{\prime})$ are updated to be $0.5$. This represents maximum uncertainty and incentivises the agent to search in this local area, allowing for the discovery phase described in Section 4.1.2.

**configuration**and summarily write it as a tuple $\mathbf{a}=(p,\tau ,\gamma ,L)$. We refer to $\mathcal{C}=\mathcal{P}\times \mathcal{T}\times \mathcal{G}\times \mathcal{L}$ as the

**configuration space**. Furthermore, let ${T}_{x,y}\left(\mathbf{a}\right)$ denote the event that a transmitter of configuration $\mathbf{a}$ is located in patch $(x,y)$. Assuming the particular configuration $\mathbf{a}=(p,\tau ,\gamma ,L)\in \mathcal{C}$, the probability that the agent in patch $({x}^{\prime},{y}^{\prime})$ detects an individual beacon sent by a transmitter in patch $(x,y)$ is given by

**extended configuration space**${\mathcal{C}}^{\prime}=\mathcal{C}\cup \left\{\partial \right\}$ where ∂ denotes the event that there is actually no transmitter in patch $(x,y)$. We clearly have

- We assume that the current value of ${M}_{x,y}$ represents our starting belief about there being a transmitter in patch $(x,y)$. We furthermore will assume that each of the configurations $\mathbf{a}\in \mathcal{C}$ (which notably does not include the ∂ configuration of there being no transmitter in this patch) is equally likely. With this in mind, we initialise our belief vector over the extended configuration space as follows:$$Pr\left[{T}_{x,y}\left(\mathbf{a}\right)\right]=\{\begin{array}{cc}\hfill 1-{M}_{x,y}& ,\mathrm{if}\mathbf{a}=\partial \\ \frac{{M}_{x,y}}{\left|\mathcal{C}\right|}\hfill & ,\mathrm{otherwise}\end{array}$$
- With this choice of the prior probabilities over ${\mathcal{C}}^{\prime}$, we evaluate the Bayesian update Equation (1) for all $\mathbf{a}\in {\mathcal{C}}^{\prime}$. Denote the result for the specific configuration $\mathbf{a}\in {\mathcal{C}}^{\prime}$ by ${U}_{\mathbf{a}}$.
- Update the overall probability of finding a transmitter in patch $(x,y)$ to become$${M}_{x,y}^{\prime}=\{\begin{array}{cc}1-{U}_{\partial}\hfill & ,\mathrm{if}b=0\\ \hfill min\left\{0.5,1-{U}_{\partial}\right\}& ,\mathrm{if}b0\end{array}$$When we actually do hear a beacon (i.e., $b>0$), then the case of there being no transmitter (configuration ∂) is ruled out and $1-{U}_{\partial}$ becomes one, which is not meaningful in our setup, since the transmitter can also be in some other patch. Hence, in this case we limit the updated belief value to $0.5$.

#### 4.4. Action Function

#### 4.4.1. Generation of Candidate Paths

- 1.
- Initialise ${A}_{best}$ to be some random action sequence that starts with an action ${a}_{t}$ with a visit-able patch. ${A}_{count}$ should be zero as paths are yet to be generated.
- 2.
- For each available action ${a}_{t}$:
- –
- Generate a random candidate path A.
- –
- We ensure the path is valid (inside environment bounds and does not go through the known location of an obstacle) by stepping through each action starting at ${a}_{t}$ and ensuring it is a possible legal state. Actions which cause an invalid state are randomized until no path conflicts are detected.
- –
- Evaluate the effectiveness of the path (see Section 4.4.2). If the expected information gain is better than that of the current best path, set ${A}_{best}=A$.

- 3.
- If ${A}_{count}={A}_{max}$, then end the search and select ${A}_{best}$, with the agents next action ${a}_{0}$. If ${A}_{count}\le {A}_{max}$, repeat from the previous step.

#### 4.4.2. Calculating IEB of a Single Candidate Path

- 1.
- We use a quantity related to the total uncertainty in ${M}_{t}$ to provide a comparison. The quantity introduced below is a variant of the well-known information-theoretic notion of entropy [40], modified to account for working with a belief vector instead of a probability distribution.
- 2.
- Apply the action sequence A on the cloned model ${M}_{t}^{\prime}$, producing ${M}_{t+n}^{\prime}$, where each location the agent visits it is assumed that maximum information gain is achieved (each visited location in ${M}_{t+n}^{\prime}$ set to zero). This model represents the expected result of having performed the actions.
- 3.
- Evaluate the entropy of ${M}_{t+n}^{\prime}$.
- 4.
- Calculate the expected information gain I based on the action sequence A.

- 1.
- Calculate the entropy of our current internal model (The entropy calculation $H\left(\right)$ is not strictly entropy as we operate on a belief vector and not probabilities. We adopt the same notation as the calculation and purpose is otherwise same. In the remainder of the paper, we will simply refer to entropy and model entropy):$$H\left(M\right)=-\sum _{(x,y)\in \mathcal{M}}{M}_{x,y}log{M}_{x,y}$$
- 2.
- As the agent visits a given location, it can be assumed that all information was observed in that patch. Therefore, we zero the positions the agent visits using the action sequence A (by setting their ${M}_{x,y}$ values to zero—this amounts to assuming that we do not find a transmitter in these patches) and store the result in ${M}^{\prime}$.
- 3.
- Calculate the entropy of the predicted internal model:$$H\left({M}^{\prime}\right)=H\left(M\right|A)=-\sum _{(x,y)\in \mathcal{M}}{M}_{x,y}^{\prime}log{M}_{x,y}^{\prime}$$
- 4.
- The expected information gain is the difference in entropy between the calculated current information and expected information:$$I(M;{M}^{\prime})=H\left(M\right)-H\left(M\right|A)$$

#### 4.4.3. Evaluating IEB for Action Sequences

#### 4.4.4. Horizon Problem

- Our model M stores belief values. For each such belief value ${M}_{x,y}$ we determine its entropy value $H\left({M}_{x,y}\right)$ and use these entropy values as input for calculating a standard data structure in computer vision, the so-called summed area table [42]. We do not discuss the details of this data structure here due to lack of space, but its main use for our paper is to calculate for an arbitrary given rectangular area the sum of the entropies of all patches contained within that area.
- We use the ability to query the summed entropy of rectangular areas in the following iterative algorithm (quad-tree search): We start by subdividing the overall rectangular environment into four (first-order) quadrants. For each first-order quadrant we query the summed-area table to identify the first-order quadrant with the largest summed entropy. The resulting first-order quadrant is then again sub-divide into four (second-order) quadrants. We again identify the second-order quadrant with the largest summed entropy, subdivided this further, and so on. We stop our algorithm when the quadrant size is reduced to a single patch and return the quadrant coordinates with the largest entropy.

#### 4.5. Computational Complexity

- IEB normal operation—in this case, we are applying our standard algorithm (in both the search and discovery phases). The process is as follows:
- -
- Model update—we update the agent’s internal belief values. Whether or not a signal is heard, we must update all internal values within ${R}_{max}$ to reflect this, so the update cost is $O\left({R}_{max}^{2}\right)$.
- -
- Calculate next action—calculate for a fixed number of n-step paths the potential entropy gain. This computation is made more complicated when a candidate path turns out not to be feasible (note that each path will be feasibility-checked, which is $O\left(n\right)$). For k paths, we compute $O(k\ast n)$ actions. In this case, we use a randomised backtracking method to find a feasible path, and as a result of backtracking we may have to inspect approximately $4n$ patches to form a path in the worst case.

- Deprived entropy case—in this case, our n-step lookahead algorithm is unable to determine near-future information gains, and no immediate action is favourable. This can occur for example if all local patches are explored and the agent can say with certainty that there is no local information to be gathered. In this case, we perform our horizon search algorithm. The process is as follows:
- -
- Quad-tree calculation—we have to calculate this summed-area table and to perform the “quadtree search”. This is $O\left({n}^{2}\right)$, as the entire agent’s belief vector needs to be considered. Once this is calculated, the result can be cached and reused.
- -
- Find peak entropy area—here, we calculate the location of a single patch, which is located within the largest area of entropy. Because of the previous calculation, this search is $O\left(\sqrt{n}\right)$.

## 5. Baseline Algorithms

**Random walk**—at each time step the agent is given a set of possible actions A, where it randomly selects an action with a uniform distribution to be taken from a set of possible actions. This agent also works in obstructed environments. We consider an upper bound for the time taken for the random walk to cover all patches is equivalent to the so-called covering time, which for a two-dimensional $n\times n$ torus is asymptotically $\sim O\left({n}^{2}{(logn)}^{2}\right)$ [43] (The covering time is a more appropriate measure for searching the entirety of a torus-shaped environment. The complexity described should be treated with care and used only to give some approximate order of magnitude). This is of course a simplification of our problem space, as the action space A can be reduced by invalid actions generated by the sides of the designated environment or obstacles.**Lawn mower**—the agent performs a “lawn mower”-like action, where it plans a path that goes through each patch in the open field, without considering the radio information. This agent is unable to navigate around obstacles and is therefore restricted to the open field scenario. The upper bound for an open field is therefore $O\left({n}^{2}\right)$.

## 6. Simulation Setup

## 7. Results

#### 7.1. Baseline

#### 7.2. Varying Transmitters

- Average, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = $\overline{\left\{2,4\right\}}$, $\mathcal{P}$ = $\overline{\left\{10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW},100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}\right\}}$, $\mathcal{T}$ = $\overline{\left\{1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz},0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}\right\}}$.
- $\gamma =2$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = 2, $\mathcal{P}$ = $\overline{\left\{10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW},100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}\right\}}$, $\mathcal{T}$ = $\overline{\left\{1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz},0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}\right\}}$.
- $\gamma =4$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = 4, $\mathcal{P}$ = $\overline{\left\{10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW},100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}\right\}}$, $\mathcal{T}$ = $\overline{\left\{1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz},0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}\right\}}$.
- $p=0.1$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = $\overline{\left\{2,4\right\}}$, $\mathcal{P}$ = $100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}$, $\mathcal{T}$ = $\overline{\left\{1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz},0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}\right\}}$.
- $p=0.01$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = $\overline{\left\{2,4\right\}}$, $\mathcal{P}$ = $10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}$, $\mathcal{T}$ = $\overline{\left\{1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz},0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}\right\}}$.
- $\tau =1.0$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = $\overline{\left\{2,4\right\}}$, $\mathcal{P}$ = $\overline{\left\{10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW},100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}\right\}}$, $\mathcal{T}$ = $1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}$.
- $\tau =0.1$, where $\mathcal{L}$ = $50\phantom{\rule{3.33333pt}{0ex}}\mathrm{dB}$, $\mathcal{G}$ = $\overline{\left\{2,4\right\}}$, $\mathcal{P}$ = $\overline{\left\{10\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW},100\phantom{\rule{3.33333pt}{0ex}}\mathrm{mW}\right\}}$, $\mathcal{T}$ = $0.1\phantom{\rule{3.33333pt}{0ex}}\mathrm{Hz}$.

#### 7.3. Varying Obstacles

## 8. Discussion

## 9. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Eun, J.; Song, B.D.; Lee, S.; Lim, D.E. Mathematical investigation on the sustainability of UAV logistics. Sustainability
**2019**, 11, 5932. [Google Scholar] [CrossRef] [Green Version] - Mayer, S.; Lischke, L.; Woźniak, P.W. Drones for Search and Rescue. In Proceedings of the 1st International Workshop on Human-Drone Interaction, Glasgow, UK, 4–9 May 2019; Ecole Nationale de l’Aviation Civile [ENAC]: Toulouse, France, 2019. [Google Scholar]
- Doherty, P.; Rudol, P. A UAV search and rescue scenario with human body detection and geolocalization. In Australasian Joint Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2007; pp. 1–13. [Google Scholar]
- Silvagni, M.; Tonoli, A.; Zenerino, E.; Chiaberge, M. Multipurpose UAV for search and rescue operations in mountain avalanche events. Geomat. Nat. Hazards Risk
**2017**, 8, 18–33. [Google Scholar] [CrossRef] [Green Version] - Erdos, D.; Erdos, A.; Watkins, S.E. An Experimental UAV System for Search and Rescue Challenge. IEEE Aerosp. Electron. Syst. Mag.
**2013**, 28, 32–37. [Google Scholar] [CrossRef] - Merwaday, A.; Guvenc, I. UAV assisted heterogeneous networks for public safety communications. In Proceedings of the Wireless Communications and Networking Conference Workshops (WCNCW), New Orleans, LA, USA, 9–12 March 2015. [Google Scholar]
- Sa, I.; Hrabar, S.; Corke, P. Inspection of Pole-Like Structures Using a Vision-Controlled VTOL UAV and Shared Autonomy. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA, 14–18 September 2014. [Google Scholar]
- Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on UAV-based applications for precision agriculture. Information
**2019**, 10, 349. [Google Scholar] [CrossRef] [Green Version] - Gevaert, C.M.; Suomalainen, J.; Tang, J.; Kooistra, L. Generation of Spectral-Temporal Response Surfaces by Combining Multispectral Satellite and Hyperspectral UAV Imagery for Precision Agriculture Applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2015**, 8, 3140–3146. [Google Scholar] [CrossRef] - Ghamry, K.A.; Kamel, M.A.; Zhang, Y. Cooperative Forest Monitoring and Fire Detection Using a Team of UAVs–UGVs. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016. [Google Scholar]
- Cooper, J.; Goodrich, M.A. Towards combining UAV and sensor operator roles in UAV-enabled visual search. In Proceedings of the 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), Amsterdam, The Netherlands, 12–15 March 2008; pp. 351–358. [Google Scholar] [CrossRef]
- Li, B.; Jiang, Y.; Sun, J.; Cai, L.; Wen, C.Y. Development and Testing of a Two-UAV Communication Relay System. Sensors
**2016**, 16, 1696. [Google Scholar] [CrossRef] [PubMed] - der Bergh, B.V.; Chiumento, A.; Pollin, S. LTE in the Sky: Trading Off Propagation Benefits with Interference Costs for Aerial Nodes. IEEE Commun. Mag.
**2016**, 54, 44–50. [Google Scholar] [CrossRef] - Baker, C.A.B.; Ramchurn, S.; Teacy, W.L.; Jennings, N.R. Planning Search and Rescue Missions for UAV Teams. In Proceedings of the Twenty-Second European Conference on Artificial Intelligence (ECAI’16), The Hague, The Netherlands, 29 August–2 September 2016; IOS Press: The Hague, The Netherlands, 2016; pp. 1777–1778. [Google Scholar] [CrossRef]
- Viseras, A.; Wiedemann, T.; Manss, C.; Magel, L.; Mueller, J.; Shutin, D.; Merino, L. Decentralized multi-agent exploration with online-learning of gaussian processes. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 4222–4229. [Google Scholar]
- AbdulCareem, M.A.; Gomez, J.; Saha, D.; Dutta, A. RFEye in the Sky. IEEE Trans. Mob. Comput.
**2020**. [Google Scholar] [CrossRef] - Shahidian, S.A.A.; Soltanizadeh, H. Optimal trajectories for two UAVs in localization of multiple RF sources. Trans. Inst. Meas. Control
**2016**, 38, 908–916. [Google Scholar] [CrossRef] - Ramirez-Paredes, J.P.; Doucette, E.A.; Curtis, J.W.; Gans, N.R. Urban target search and tracking using a UAV and unattended ground sensors. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 2401–2407. [Google Scholar]
- Newaz, A.A.R.; Jeong, S.; Lee, H.; Ryu, H.; Chong, N.Y.; Mason, M.T. Fast radiation mapping and multiple source localization using topographic contour map and incremental density estimation. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1515–1521. [Google Scholar]
- Ho, Y.H.; Chen, Y.R.; Chen, L.J. Krypto: Assisting Search and Rescue Operations Using Wi-Fi Signal with UAV. In Proceedings of the DroNet ’15, Florence, Italy, 18 May 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 3–8. [Google Scholar] [CrossRef]
- Artemenko, O.; Dominic, O.J.; Andryeyev, O.; Mitschele-Thiel, A. Energy-aware trajectory planning for the localization of mobile devices using an unmanned aerial vehicle. In Proceedings of the 2016 25th International Conference on Computer Communication and Networks (ICCCN), Waikoloa, HI, USA, 1–4 August 2016; pp. 1–9. [Google Scholar]
- Alotaibi, E.T.; Alqefari, S.S.; Koubaa, A. LSAR: Multi-UAV Collaboration for Search and Rescue Missions. IEEE Access
**2019**, 7, 55817–55832. [Google Scholar] [CrossRef] - Perez-Carabaza, S.; Bermudez-Ortega, J.; Besada-Portas, E.; Lopez-Orozco, J.A.; de la Cruz, J.M. A multi-uav minimum time search planner based on aco r. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017; pp. 35–42. [Google Scholar]
- Perez-Carabaza, S.; Besada-Portas, E.; Lopez-Orozco, J.A.; de la Cruz, J.M. A real world multi-UAV evolutionary planner for minimum time target detection. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, Denver, CO, USA, 20–24 July 2016; pp. 981–988. [Google Scholar]
- San Juan, V.; Santos, M.; Andújar, J.M. Intelligent UAV map generation and discrete path planning for search and rescue operations. Complexity
**2018**, 2018, 6879419. [Google Scholar] [CrossRef] [Green Version] - Capitan, J.; Merino, L.; Ollero, A. Decentralized cooperation of multiple uas for multi-target surveillance under uncertainties. In Proceedings of the 2014 International Conference on Unmanned Aircraft Systems (ICUAS), Orlando, FL, USA, 27–30 May 2014; pp. 1196–1202. [Google Scholar]
- Waharte, S.; Trigoni, N. Supporting Search and Rescue Operations with UAVs. In Proceedings of the 2010 International Conference on Emerging Security Technologies, Canterbury, UK, 6–7 September 2010; pp. 142–147. [Google Scholar] [CrossRef]
- Klyubin, A.; Polani, D.; Nehaniv, C. Empowerment: A Universal Agent-Centric Measure of Control. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2–5 September 2005; Volume 1, pp. 128–135. [Google Scholar]
- Salge, C.; Glackin, C.; Polani, D. Changing the Environment Based on Empowerment as Intrinsic Motivation. Entropy
**2014**, 16, 2789–2819. [Google Scholar] [CrossRef] [Green Version] - Lanillos, P.; Besada-Portas, E.; Pajares, G.; Ruz, J.J. Minimum time search for lost targets using cross entropy optimization. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 602–609. [Google Scholar]
- Barry, D.; Willig, A.; Woodward, G. Empowerment-Driven Single Agent Exploration for Locating Multiple Wireless Transmitters. In Proceedings of the AI 2018: Advances in Artificial Intelligence, Wellington, New Zealand, 11–14 December 2018; Springer: Wellington, New Zealand, 2018; Volume 11320, pp. 29–37. [Google Scholar]
- Rappaport, T.S. Wireless Communications—Principles and Practice; Prentice Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
- Hou, Z.; Xiong, S. On Model-Free Adaptive Control and Its Stability Analysis. IEEE Trans. Autom. Control
**2019**, 64, 4555–4569. [Google Scholar] [CrossRef] - Roman, R.C.; Precup, R.E.; Petriu, E.M. Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur. J. Control
**2021**, 58, 373–387. [Google Scholar] [CrossRef] - Ménard, A.; Marceau, D.J. Exploration of Spatial Scale Sensitivity in Geographic Cellular Automata. Environ. Plan. Plan. Des.
**2005**, 32, 693–714. [Google Scholar] [CrossRef] - Ay, N.; Bertschinger, N.; Der, R.; Güttler, F.; Olbrich, E. Predictive information and explorative behavior of autonomous robots. Eur. Phys. J. B
**2008**, 63, 329–339. [Google Scholar] [CrossRef] [Green Version] - Scheunemann, M.M. Autonomous and Intrinsically Motivated Robots for Sustained Human-Robot Interaction. Ph.D. Thesis, Computer Science, University of Hertfordshire, Hertfordshire, UK, 2021. [Google Scholar]
- Karlin, S.; Taylor, H.M. A First Course in Stochastic Processes, 2nd ed.; Academic Press: San Diego, CA, USA, 1975. [Google Scholar]
- Salge, C.; Guckelsberger, C.; Canaan, R.; Mahlmann, T. Accelerating Empowerment Computation with UCT Tree Search. In Proceedings of the Conference on Computational Intelligence and Games, Maastricht, The Netherlands, 14–17 August 2018; pp. 165–172. [Google Scholar]
- Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] [Green Version] - Choset, H.; Lynch, K.; Hutchinson, S.; Kantor, G.; Burgard, W.; Kavraki, L.; Thrun, S. Principles of Robot Motion—Theory, Algorithms and Implementation; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Crow, F. Summed-area tables for texture mapping. In Proceedings of the ACM SIGGRAPH 84, Minneapolis, MN, USA, 23–27 July 1984. [Google Scholar]
- Levin, D.A.; Peres, Y.; Wilmer, E.L. Markov Chains and Mixing Times; Americal Mathematical Society: Providence, RI, USA, 2009. [Google Scholar]
- Kozlova, A.; Brown, J.A.; Reading, E. Examination of representational expression in maze generation algorithms. In Proceedings of the 2015 IEEE Conference on Computational Intelligence and Games (CIG), Tainan, Taiwan, 31 August–2 September 2015; pp. 532–533. [Google Scholar] [CrossRef]
- Qi, J.; Song, D.; Shang, H.; Wang, N.; Hua, C.; Wu, C.; Qi, X.; Han, J. Search and Rescue Rotary-Wing UAV and Its Application to the Lushan Ms 7.0 Earthquake. J. Field Robot.
**2016**, 33, 290–321. [Google Scholar] [CrossRef] - Stone, L.; Keller, C.; Kratzke, T.; Strumpfer, J. Search for the Wreckage of Air France Flight AF 447. Stat. Sci.
**2014**, 29, 69–80. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Time taken to find all transmitters within environment. Error bar indicates ${\sigma}^{2}$ standard deviation.

**Figure 2.**Time taken to find all transmitters within environment with varied transmitter properties. Error bar indicates ${\sigma}^{2}$ standard deviation.

**Figure 3.**Time taken to find a single transmitter within an environment with an increasing number of randomly placed obstacles, where average time is shown. Error bar indicates ${\sigma}^{2}$ standard deviation.

Symbol | Value | Description |
---|---|---|

v | 0.5 m/s | The agent velocity |

${T}_{s}$ | 4 s | The time taken to move from one patch to another |

l | 2 m | The length of a given patch |

n | 10 | The lookahead value of the IEB algorithm |

paths | 5000 | The number of sampled paths for each iteration |

max ticks | 500,000 | Maximum number of ticks for the IEB simulation (derived experimentally) |

${L}_{w}$ | 500 | Environment width |

${L}_{h}$ | 500 | Environment height |

total patches | 250,000 | Total number of patches in the environment |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Barry, D.; Willig, A.; Woodward, G.
An Information-Motivated Exploration Agent to Locate Stationary Persons with Wireless Transmitters in Unknown Environments. *Sensors* **2021**, *21*, 7695.
https://doi.org/10.3390/s21227695

**AMA Style**

Barry D, Willig A, Woodward G.
An Information-Motivated Exploration Agent to Locate Stationary Persons with Wireless Transmitters in Unknown Environments. *Sensors*. 2021; 21(22):7695.
https://doi.org/10.3390/s21227695

**Chicago/Turabian Style**

Barry, Daniel, Andreas Willig, and Graeme Woodward.
2021. "An Information-Motivated Exploration Agent to Locate Stationary Persons with Wireless Transmitters in Unknown Environments" *Sensors* 21, no. 22: 7695.
https://doi.org/10.3390/s21227695