# Autonomous Search for a Diffusive Source in an Unknown Structured Environment

^{*}

Next Article in Journal

Next Article in Special Issue

Next Article in Special Issue

Previous Article in Journal

Defence Science and Technology Organisation, 506 Lorimer Street, Melbourne, VIC 3207, Australia

Author to whom correspondence should be addressed.

Received: 16 December 2013 / Revised: 28 January 2014 / Accepted: 29 January 2014 / Published: 10 February 2014

(This article belongs to the Special Issue Entropy Methods in Guided Self-Organization)

The paper presents a framework for autonomous search for a diffusive emitting source of a tracer (e.g., aerosol, gas) in an environment with an unknown map of randomly placed and shaped obstacles. The measurements of the tracer concentration are sporadic, noisy and without directional information. The search domain is discretised and modelled by a finite two-dimensional lattice. The links in the lattice represent the traversable paths for emitted particles and for the searcher. A missing link in the lattice indicates a blocked path due to an obstacle. The searcher must simultaneously estimate the source parameters, the map of the search domain and its own location within the map. The solution is formulated in the sequential Bayesian framework and implemented as a Rao-Blackwellised particle filter with entropy-reduction motion control. The numerical results demonstrate the concept and its performance.

The search for an emitting source of particles, chemicals, odour or radiation, based on sporadic clues or intermittent measurements, has attracted a great deal of interest lately. In the biological context, the search is studied to understand animal behaviour in the search for food or mates [1–3], to model the biochemical reactions in cells (the search for specific DNA sequences by transcription factors [2,4], genetic-based therapy [5]) and to simulate intracellular transport (viral trafficking in a microtubule network [6,7]).

Industrial applications mainly focus on rescue operations with the goal of localising hazardous pollutants, such as chemical leaks [8–10] or radioactive sources [11–13]. The search strategies can be broadly divided into two categories. The first includes conventional methods, which are guided by the positive concentration gradient (“chemotaxis”) [14] and its variants [15]. Vergassola et al. [16] formulated another search strategy (referred to as “infotaxis”), which is driven by the information gain or entropy-reduction (for a comprehensive review, see [17]).Information-gain guided searchhas been successfully applied in the context of finding a weak source in a turbulent flow (e.g., drug or leak emitting chemicals [16,17])and localising radioactive point sources [12,13].The crucial advantage of “infotaxis” versus “chemotaxis” is that the former can be used even when the estimation of concentration gradient is infeasible, which is always the case in the presence of sporadic or intermittent measurements.

While all works referenced above deal with search in an open domain (without obstacles) or assuming that a precise map of the search domain (with obstacles) is a priori available, in this paper, we focus on autonomous search for a diffusive emitting source in a domain with randomly placed and shaped obstacles (forbidden areas), whose structure (the map) is unknown. The problem is of importance, for example, in the localisation of dangerous leaks in collapsed buildings, inside tunnels or mines. The searcher senses in a probabilistic manner both the structure of the search domain (e.g., the presence or absence of obstacles, walls, blocked passages) and the level of concentration of tracer particles. The objective of the search is to navigate through the unknown environment for the purpose of source localisation in the shortest possible time. Once the source is localised, the coordinates of the source relative to its starting position (or the path to the source) need to be reported.

This is not a trivial task for several reasons. First, the measurements of the tracer particle concentration are sporadic, noisy and without directional information. Furthermore, the emission rate of the source is typically unknown (hence, the concentration measurement cannot easily be related to the distance between the source and the searcher). Finally, the searcher needs to explore the domain, create its partial map (which must include the starting point and the source) and localise itself relative to this map. This partial map is important, for example, in order to guide the rescue team to the source or to help the searcher retreat to its starting position (simple obstacle avoidance methods clearly would be insufficient for this purpose). Among the search schemes that are intended to deal with the sporadic measurements and that directly address the balance between exploitation of the information accumulated during the search and exploration of the environment, “infotaxis” is the most efficient, exhibiting the lowest average search time and the highest reliability in source reaching [18]. For this reason, we adopt an information-driven search strategy in the paper.

The searcher operates in a fully autonomous manner: it senses the environment (the concentration of a tracer; the position of obstacles) and after processing the sensor data (which is inherently uncertain, due to the noise in perception and actuation), it subsequently makes a decision on where to move next in order to collect new measurements. Its motion control is not noise-free, as it may occasionally fail to execute correctly. Hence, the searcher unknowingly may move to a position different from intended. The probabilistic models of searcher motion and sensor measurements is assumed to be known.

In the paper, we consider a two-dimensional search domain. The coordinates of the initial position of the searcher, as well as the border of the search area (relative to the initial position) are given as input parameters. In order to fulfil its mission (i.e., find the source and report its coordinates relative to its starting position), the searcher must carry out simultaneous estimation at three levels: (1) estimation of source parameters (its location in 2D and its release rate); (2) estimation of the map of the search area; and (3) estimation of the searcher position within the estimated map. Estimation at levels (2) and (3) has been studied extensively in robotics under the term grid-based simultaneous localisation and mapping (SLAM) [19]. The primary mission in all SLAM publications is an accurate mapping of the area. The primary mission of our searcher, however, is to localise the source, while SLAM is only a necessary component of the solution.

The search domain is discretised, as, for example, in [8], and modelled by a finite two-dimensional lattice. With a sufficiently fine resolution of the lattice, the emitting source can be considered to be in one of the nodes of the lattice. The links (edges) of the lattice represent the traversable paths for emitted particles (tracer) and for the searcher. Missing links in the lattice indicate blocked paths due to walls or obstacles. This is a very general model applicable to searches at various scales, from inside buildings and tunnels, to within cells of living organisms [2]. The percentage of missing links in the lattice is assumed to be above the percolation threshold p_{c} (for the adopted lattice structure p_{c} = 1/2 [20,21]), so that long-range connectivity is satisfied [20]. Using the absorbing Markov chains technique [22], we can compute exactly the mean concentration level in any node of the lattice, that is, at any point of the search domain with obstacles.

Since the structure (map) of the search domain is unknown, the searcher must rely on a theoretical model of concentration measurement, which is independent of the map. An approximation of such a model is derived in an analytic form using conformal mapping [23].

The only related work that deals with autonomous search in an unknown structured environment is [24]. While [24] presents a plethora of interesting experimental results, the algorithms are based on heuristics. The contribution of our paper is a theoretically sound framework for autonomous search for a diffusive source in an unknown environment. The mathematical models of tracer distribution (for known and unknown maps), as well as the models of measurements and motion dynamics, are derived or precisely specified. Estimation of source parameters, the map and the searcher location within the map is carried out in the optimal sequential Bayesian framework, implemented using a Rao-Blackwellised particle filter. Finally, the searcher motion is driven by the maximisation of the information gain (i.e., entropy reduction), which, on average, results in the shortest average search time.

The paper is organised as follows. Mathematical models of tracer distribution, measurements and searcher motion are described in Section 2. The autonomous search problem is formulated and its conceptual solution provided in Section 3. Full technical details of the proposed search algorithm are presented in Section 4, with numerical results given in Section 5. Finally, conclusions of this study are summarised in Section 6.

The concentration of a tracer at any point in the search domain is governed by the diffusive equation, which, in the steady state, reduces to the Laplace equation [25]:

$${D}_{0}\Delta \theta ={A}_{0}\delta (x-X,y-Y).$$

Here, D_{0} is the diffusion coefficient of tracer in the environment, Δ is the Laplace operator, θ is the mean (time-averaged) tracer concentration, δ is the Dirac delta function, A_{0} is the release-rate of the tracer source and X, Y are the coordinates of the source in a two-dimensional Cartesian coordinate system. We remark here that D_{0} is treated as an aggregated parameter, whose value approximately captures the main diffusion processes in the system. Depending on the problem context, this may be molecular diffusion, turbulent diffusion, flow-induced diffusion, confined (compartmented structure) diffusion, etc.

For convenience we adopt a circular search area of radius R_{0}, centred on the origin of the coordinate system, that is, for every point inside the search area,
$r=\sqrt{{x}^{2}+{y}^{2}}\le {R}_{0}$. Assuming that the tracer source is undetectable outside the search domain, we can impose the absorbing boundary condition θ (r = R_{0}) = 0. The traditional approach to the computation of the tracer concentration, θ, at every point of the search domain is via analytical or numerical solution of Equation (1). This, however, is a nontrivial task when the search domain is a structure of complex topology (due to obstacles, compartments walls, random openings, etc.).

We therefore adopt an alternative approach, where the continuous model of the tracer diffusion process is replaced with a random walk on a square lattice, adopted as a discrete model of the search area. Discretisation is illustrated in Figure 1 for a search area centred on the origin of the coordinate system, with the radius R_{0} = 9. The length of each link (edge, bond) in the lattice determines the resolution of discretisation and, in this example, is adopted as a unit length. The source, assumed to be located at one of the nodes of the lattice, is emitting particles that travel through the lattice according to the random walk model [26].

The obstacles in the search domain (the regions through which the tracer cannot pass) are simply modelled as missing links (or clusters of missing links) in the square lattice. Figure 2 shows an example of such a model: this incomplete lattice is obtained by removing fraction p ≈ 0.35 of the links in the complete lattice shown in Figure 1. Note that all nodes in the incomplete lattice are connected. On average, this will be the case if the fraction of missing links in the incomplete grid of Figure 2 is below the percolation threshold, p_{c}; above the percolation threshold (p > p_{c}), the lattice becomes fragmented. The framework of percolation theory enables the analytical description of statistical properties of such a lattice [20,21].

This section explains how to compute the mean concentration of tracer particles in each node of the incomplete grid (such as the one shown in Figure 2), which represents a discretised model of the search area with obstacles.

For a given incomplete grid, the mean concentration can be computed using the absorbing Markov chain technique [22]. Neglecting the spatial approximation of the search domain (due to discretisation) and under the assumption that the distribution of particles has reached the steady state, the absorbing Markov chain provides an exact solution for the quantity of source material at each location.

We can regard the random walk of tracer particles through the incomplete grid (e.g., Figure 2) as a Markov chain whose states are the nodes of the grid. The Markov chain is specified by the transition matrix, **T**; each element of this matrix is the probability of the transition from state s_{i} to state s_{j} (i.e., a particle move from node i to node j): T_{ij} = P {s_{j}|s_{i}}. How does one construct **T** given the incomplete grid? First note that we distinguish two types of states in this Markov chain: absorbing states (corresponding to the nodes on the boundary of the grid) and transient states. For an absorbing state, s_{i}, **T**_{ii} = 1 and **T**_{ij} = 0, if j ≠ i. Suppose a transient state, s_{i}, corresponds to node i in the incomplete grid, which has connections (links) with nodes j_{1},…, j_{m}, where for a square grid m ≤ 4. Then, **T**_{ij}_{1} = · · · = **T**_{ij}_{m} = 1/m and **T**_{ij} = 0 for j ∉ {j_{1},…, j_{m}}.

Suppose there are r absorbing states and t transient states. If we order the states so that the absorbing states come first (before the transient states), then the transition matrix takes the canonical form:

$$\mathbf{T}=\left[\begin{array}{ll}{\mathbf{I}}_{r}\hfill & \mathbf{0}\hfill \\ \mathbf{R}\hfill & \mathbf{Q}\hfill \end{array}\right]$$

$$\mathbf{F}={({\mathbf{I}}_{t}-\mathbf{Q})}^{-1}$$

Figure 3 shows the mean concentration of tracer particles for the search area modelled by the incomplete grid of Figure 2, with the source placed at (X, Y) = (0, 7) and with A_{0} = 12. Notice from Figure 3 how the concentration depends on the distance from the source and the structure of the grid, plotted in Figure 2.

Two types of measurements are collected by the searcher. Sensor 1 measures the concentration of tracer particles as a count of particles received during the sampling interval. Assuming the so-called “dilution” limit (limit of low concentrations), the tracer fluctuations follow the Poisson distribution [16], that is, a concentration measurement at node j of the grid is a random sample drawn from

$$n~\mathcal{P}(n;\lambda )=\frac{{\lambda}^{n}}{n!}{e}^{-\lambda}$$

The searcher sequentially estimates the source parameters without knowing the map of the search area. Hence, the measurement model based on the mean concentration λ= A_{0} · **F**_{ij} cannot be used in estimation (recall that matrix **F** is formed using knowledge of the structure of the incomplete lattice). Assuming that the fraction of missing links in the lattice is smaller than the percolation threshold, p_{c}, the expected concentration of tracer particles in any node, j, of the incomplete lattice can be computed approximately using the property of conformal invariance of the Laplace equation (see Appendix 6 for details). Suppose the source of release rate A_{0} is placed at a node of the grid, positioned at coordinates (X, Y). Then, the mean (time and ensemble averaged) concentration at node j, positioned at (x_{j}, y_{j}), can be approximated as

$${\langle \theta \rangle}_{j}\approx -\frac{A}{2}\text{log}({R}^{2})$$

$${R}^{2}={R}_{0}^{2}\frac{{({x}_{j}-X)}^{2}+{({y}_{j}-Y)}^{2}}{{({x}_{j}Y-{y}_{j}X)}^{2}+{({R}_{0}^{2}-{x}_{j}X-{y}_{j}Y)}^{2}}.$$

Note that this model is independent of the structure of the incomplete lattice. In summary, estimation will be carried out using Sensor 1 measurement model in Equation (4), where mean λ = 〈θ〉_{j} is approximated by Equations (5) and (6). The actual concentration measurements also follow the model in Equation (4), but with λ =θ_{j} = A_{0} · **F**_{ij}. This is how we simulate measurements in Section 5.

The searcher moves and explores the search domain in order to find the source. The source parameter estimation is carried out using the map-independent measurement model in Equation (5), which does not require discretisation of the search domain on a square lattice (as in Figure 1). Nevertheless, we keep discretisation for the searcher in order to model its motion paths and to facilitate its grid-based SLAM functionality. Thus, we assume that the searcher travels within the search area along the paths represented by the links of the incomplete grid as in Figure 2. As it travels, it stops at the nodes along its path to sense the environment, i.e., to collect measurements.

Sensor 2 is a simple binary detector of the presence or absence of the links (paths) visible from the node in which the searcher is currently placed. It reports on the presence/absence of the primary and secondary neighbouring links.

A link in a grid of Figure 2 is defined by a quadruple (x_{1}, y_{1}, x_{2}, y_{2}), where (x_{1}, y_{1}) and (x_{2}, y_{2}) are the coordinates of the nodes it connects. In order to explain what we mean by primary and secondary links, consider for example the node at location (−3, −4) indicated by “o” in Figure 2; the zoomed-in segment is shown in Figure 4. The primary links from this node are the connecting links towards east, west, up and down from (−3, −4), plotted in red in Figure 4; for example, ℓ_{1} = (−3, −4, −2, −4). The status of a link, ℓ, denoted m(ℓ), is a binary variable with the convention that m(ℓ) = 1 means that the link ℓ exists. According to Figures 2 and 4, we have: m(ℓ_{1}) = 1, m(ℓ_{2}) = 1, m(ℓ_{3}) = 0, m(ℓ_{4}) = 1. Existing links are shown by solid lines in Figure 4, while non-existing links are plotted as dotted lines.

The secondary links from the node at (−3, −4) in Figure 2 represent second neighbouring links in direction of east, west, up and down from (−3, −4), that is, ℓ_{5} = (−2, −4, −1, −4), and so on. According to Figures 2 and 4, the status of secondary links is: m(ℓ_{5}) = 1, m(ℓ_{6}) = 0, m(ℓ_{7}) = 0, m(ℓ_{8}) = 1; existing secondary links are indicated by solid green lines in Figure 4. A secondary link is observable if the connecting primary link to it exists in the graph. In Figures 2 and 4, for example, ℓ_{5}, ℓ_{6} and ℓ_{8} are observable, but ℓ_{7} is not, because m(ℓ_{3}) = 0.

Let an observation (supplied by sensor 2) about the presence or absence of a link, ℓ, be a binary value z(ℓ) ∈{0, 1}, where z(ℓ) = 0 means link ℓ is absent and z(ℓ) = 1 is the opposite. The performance of sensor 2 can be described by two detection matrices, one for the primary links, the other for observable secondary links. Each detection matrix, П, has a form

$$\prod =\left[\begin{array}{cc}P(z=0|m=0)& P(z=0|m=1)\\ P(z=1|m=0)& P(z=1|m=1)\end{array}\right]$$

$$\prod =\left[\begin{array}{cc}1-{p}_{fa}& 1-{p}_{d}\\ {p}_{fa}& {p}_{d}\end{array}\right]\cdot $$

Suppose the searcher is in node i at discrete time k − 1. Let the set of admissible controls vectors for the next move be defined as $\mathcal{U}$_{k} = {·,→, ←, ↑, ↓}, meaning that the searcher can stay where it is, or move one unit length to the right, to the left, up or down. After processing measurements from its sensors, the searcher decides to choose control
${\mathbf{u}}_{k}^{*}\in {\mathcal{U}}_{k}$ and, hence, to arrive at time k at node j. However, due to control noise or unmodelled exogenous effects [19], control
${\mathbf{u}}_{k}^{*}\in {\mathcal{U}}_{k}$ is executed correctly only with probability 1 − p_{e}; with probability p_{e}, the searcher will actually execute control
${{u}^{\prime}}_{k}\in {\mathcal{U}}_{k}\backslash \left\{{\mathbf{u}}_{k}^{*}\right\}$.

The searcher has at its disposal the probabilistic models of sensor measurements and dynamic models. Prior knowledge also includes: (1) the coordinates of its initial position; (2) the length of each link in the square lattice; and (3) the boundary of the circular search area (defined by its centre and radius). The described prior translates into knowledge of the full grid, such as the one shown in Figure 1. Searcher motion is restricted to this full grid.

The objective of the searcher is to estimate in the shortest possible time the coordinates of the emitting source, as well as the partial map describing the path from its starting (entry) point to the estimated location of the source.

The described problem can be cast in the sequential Bayesian estimation framework as a nonlinear filtering problem. Let us first define the state vector, which consists of three parts:

- (1)
The coordinates of the searcher position at discrete time k = 1, 2, . . . are denoted by

**p**_{k}= [x_{k}y_{k}]^{т}.- (2)
The status (presence/absence) of each link in the complete grid (such as the one shown in Figure 1). The status of link ℓ

_{j}, where j = 1, . . . , L and L is the total number of links in the complete grid, is m(ℓ_{j}) = m_{j}∈ {0, 1}. The notation P (m_{j}= 1) refers to the probability that link ℓ_{j}is present. The map at time k is fully specified by vector**m**_{k}= [m_{1,k}, . . . , m_{L,k}]^{т}. The time index is introduced, because we allow the map of the search area to occasionally change, e.g., an open door can close. The assumption is that the statuses of links are mutually independent, i.e., m_{j,k}is independent from m_{i,k}for i ≠ j.- (3)
The parameter vector of the source is denoted by

**s**= [X Y A]^{т}.

The complete state vector is then defined as
${\mathbf{y}}_{k}={\left[{\mathbf{p}}_{k}^{\u22ba}\hspace{0.17em}\hspace{0.17em}{\mathbf{m}}_{k}^{\u22ba}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{\mathbf{s}}^{\u22ba}\right]}^{\u22ba}$, where **p**_{k} and **m**_{k} are discrete state variables, while **s** is a continuous state vector. Dynamics of the state, **y**_{k}, are described by two transitional densities: p(**m**_{k}|**m**_{k}_{−1}) specifies the evolution of the map over time, while p(**p**_{k}|**p**_{k}_{−1}, **u**_{k}) characterises the searcher motion model. The observation models of the searcher are specified by two likelihood functions: g_{1}(n_{k}|**p**_{k}, **m**_{k}, **s**) characterises sensor 1, which provides the count of particles n_{k} at position **p**_{k} coming from the source in state **s** through the map **m**_{k}; g_{2}(**z**_{k}|**p**_{k}, **m**_{k}) refers to sensor 2 and describes the observation **z**_{k} of the status of the links in **m**_{k} visible from the searcher in location **p**_{k}. Let us denote observations at time k by a vector
${\zeta}_{k}={\left[{n}_{k}{z}_{k}^{\u22ba}\right]}^{\u22ba}$. Finally, the prior probability density function (pdf) of the state is denoted by p(**y**_{0}).

The goal in the sequential Bayesian framework is to estimate any subset or property of the sequence of states **y**_{0:}_{k} := (**y**_{0}, . . . , **y**_{k}) given observation sequence **ζ**_{1:}_{k} := (**ζ**_{1}, . . . , ζ_{k}) and the control sequence **u**_{1:}_{k} := (**u**_{1}, . . . , **u**_{k}), which is completely specified by the joint posterior distribution p(**y**_{0:}_{k}|**ζ**_{1:}_{k}, **u**_{1:}_{k}). This posterior satisfies the following recursion:

$$p({\mathbf{y}}_{0:k}|{\zeta}_{1:k},{\mathbf{u}}_{1:k})=\frac{g({\zeta}_{k}|{\mathbf{y}}_{k})p({\mathbf{y}}_{k}|{\mathbf{y}}_{k-1},{\mathbf{u}}_{k})}{p({\zeta}_{k}|{\zeta}_{1:k-1})}p({\mathbf{y}}_{0:k-1}|{\zeta}_{1:k-1},{\mathbf{u}}_{1:k-1})$$

$$p({\mathbf{y}}_{k}|{\mathbf{y}}_{k-1},{\mathbf{u}}_{k})=p({\mathbf{m}}_{k}|{\mathbf{m}}_{k-1})p({\mathbf{p}}_{k}|{\mathbf{p}}_{k-1},{\mathbf{u}}_{k})$$

$$g({\zeta}_{k}|{\mathbf{y}}_{k})={g}_{1}({n}_{k}|{\mathbf{p}}_{k},{\mathbf{m}}_{k},\mathbf{s})\hspace{0.17em}\u200a\u200a{g}_{2}({\mathbf{z}}_{k}|{\mathbf{p}}_{k},{\mathbf{m}}_{k})$$

Recursion in Equation (9) involves intractable integrals in the denominator. In order to solve it, we adopt a numerical approximation based on the sequential Monte Carlo method [27]. Before going into details, notice that factorization expressed by Equations (10) and (11) imposes a structure, which can be conveniently represented by a dynamic Bayesian network (DBN) [28] shown in Figure 5. The circles in Figure 5 represent random variables: white circles are hidden variables; gray circles are observed variables. Arrows indicate dependencies. Arrows that are plotted by dashed lines are explained next.

The particle count measurement, n_{k}, depends on the map, **m**_{k}; hence, its likelihood is formulated as g_{1}(n_{k}|**p**_{k}, **m**_{k}, **s**). The searcher, however, does not know the map (it estimates it only partially as it travels through the search area), and hence, we have introduced the approximate measurement model expressed by Equations (4)–(6). The searcher will therefore process count observations, n_{k}, using the likelihood function, which is independent of **m**_{k} and denoted byg̃_{1}(n_{k}|**p**_{k}, **s**), rather than g_{1}(n_{k}|**p**_{k}, **m**_{k}, **s**). We indicate this fact by drawing the arrow from **m**_{k} to n_{k} in Figure 5 by a dashed line.

The computation of the posterior pdf for a structured complex system, such as the one shown in Figure 5, can be factorised and consequently made computationally and statistically more efficient. Technical details will be given in Section 4.

After processing the measurements received at time k − 1, the searcher needs to select the next control vector, **u**_{k}, which will change its position to **p**_{k} ∼ p(**p**_{k}|**p**_{k}_{−1}, **u**_{k}). The problem of selecting **u**_{k} can be formulated as a partially-observed Markov decision process [29], whose elements are: (1) the set of admissible control vectors $\mathcal{U}$_{k}; (2) the current information state, expressed by the predicted pdf p(**y**_{k}|**ζ**_{1:}_{k}_{−1}, **u**_{1:}_{k}_{−1}, **u**_{k}), where **u**_{k} ∈ $\mathcal{U}$_{k}; and (3) the reward function associated with each control **u**_{k} ∈ $\mathcal{U}$_{k}. In the paper, we adopt motion control based on a single step ahead strategy; this myopic approach is suboptimal in the presence of randomly missing links, but is computationally easier to implement and faster to run. The control vector is then selected as

$${\mathbf{u}}_{k}=\text{arg}\hspace{0.17em}\underset{\mathbf{v}\in {\mathcal{U}}_{k}}{\text{max}}\mathbb{E}\{\mathcal{D}(\mathbf{v},p({\mathbf{y}}_{k}|{\varsigma}_{1:k-1},{\mathbf{u}}_{1:k-1},\mathbf{v}),{\varsigma}_{k}(\mathbf{v}))\}$$

Considering that the primary mission of the search is source localisation (map estimation is of secondary importance), the reward function at time k is adopted as the information gain between: (1) the predicted pdf over the state subspace (**s**, **p**_{k}) and (2) the updated pdf over (**s**, **p**_{k}), using the count measurement n_{k}. The two distributions are denoted π_{0}(**s**, **p**_{k}|**u**_{k}) = p(**s**|n_{1:}_{k}_{−1}, **u**_{1:}_{k})p(**p**_{k}|**p**_{k}_{−1}, **u**_{k}) and π_{1}(**s**, **p**_{k}|n_{k}, **u**_{k}) = ξg̃_{1}(n_{k}|**p**_{k}, **s**) π_{0}(**s**, **p**_{k}|**u**_{k}), respectively, where ξ is a normalisation constant. The information gain between the two distributions is measured using a special case of Rényi divergence, known as the Bhattacharyya distance [30]:

$$\mathcal{D}({\mathbf{u}}_{k})=-2\hspace{0.17em}\text{log}\int \sqrt{{\pi}_{1}(\mathbf{s},{\mathbf{p}}_{k}|{n}_{k},{\mathbf{u}}_{k})\cdot {\pi}_{0}(\mathbf{s},{\mathbf{p}}_{k}|{\mathbf{u}}_{k})}d\text{s}\hspace{0.17em}d{\mathbf{p}}_{k}$$

The proposed search algorithm, formulated as a DBN with observer control, can be implemented efficiently as a Rao-Blackwellised particle filter (RBPF) [31] with sensor control. Rao-Blackwellisation is a technique for analytical marginalisation of a part of the state vector. Its purpose is to reduce the dimension of the state space in which a Monte Carlo estimation needs to be carried out, in order to improve the computational and statistical efficiency of the particle filter [31,34].

The idea of the RBPF is as follows. Suppose it is possible to divide the components of the hidden state vector, **y**_{k}, into two groups, α_{k} and β_{k}, such that the following two conditions are satisfied:

**C-1:**p(**y**_{k}|**y**_{k}_{−1},**u**_{k}) = p(**α**_{k}|**β**_{k}_{−1:}_{k},**α**_{k}_{−1}) · p(**β**_{k}|**β**_{k}_{−1},**u**_{k})**C-2:**the conditional posterior distribution p(**α**_{k}|**β**_{0:}_{k},**ζ**_{1:}_{k},**u**_{1:}_{k}) is analytically tractable.

Then, we need only to estimate the posterior p(**β**_{0:}_{k}|**ζ**_{1:}_{k}, **u**_{1:}_{k}), meaning that we reduced the dimension of the space for Monte Carlo estimation from dim(**y**_{k}) to dim(**β**_{k}). In the described DBN, shown in Figure 5, in order to satisfy conditions C-1 and C-2, the state vector, **y**_{k}, can be partitioned as follows:

$${\alpha}_{k}={\left[{\mathbf{m}}_{k}^{\text{T}}\hspace{0.17em}\hspace{0.17em}A\right]}^{\text{T}}$$

$${\beta}_{k}={\left[{\mathbf{p}}_{k}^{\u22ba}\hspace{0.17em}\u200a\u200aX\hspace{0.17em}\u200a\u200aY\right]}^{\u22ba}$$

We are interested only in the filtering posterior density, which can now be factorised as follows:

$$p({\alpha}_{k},{\beta}_{0:k}|{\varsigma}_{1:k},{\mathbf{u}}_{1:k})=p({\alpha}_{k},|{\beta}_{0:k},{\varsigma}_{1:k},{\mathbf{u}}_{1:k})\cdot p({\beta}_{0:k}|{\varsigma}_{1:k},{\mathbf{u}}_{1:k})$$

The pdf p(**β**_{0:}_{k}|**ζ**_{1:}_{k}, **u**_{1:}_{k}) is approximated by a random sample
${\left\{{\beta}_{0:k}^{(i)}\right\}}_{i=1}^{N}$. Subsequently, one can compute analytically (for each sample
${\beta}_{0:k}^{(i)}$):

$$p({\alpha}_{k}|{\beta}_{0:k}^{(i)},{n}_{1:k},{\mathbf{z}}_{1:k},{\mathbf{u}}_{1:k})=p({\mathbf{m}}_{k},|{\mathbf{z}}_{1:k},{\beta}_{0:k}^{(i)})\cdot p(A|{n}_{1:k},{\beta}_{0:k}^{(i)})$$

$p({m}_{k}|{\mathbf{z}}_{1:k},{\beta}_{0:k}^{(i)})={\mathbf{q}}_{k}$ is a vector of probabilities of existence for each link in the random grid and

$p(A|{n}_{1:k},{\beta}_{0:k}^{(i)})$ is approximated by a Gamma distribution with shape parameter η

_{k}and scale parameter θ_{k}, i.e., $\mathcal{G}$ (A; η_{k}, θ_{k}).

Hence, each particle corresponds to a set:

$$\left({\beta}_{0:k}^{(i)},{\mathbf{q}}_{k},{\eta}_{k},{\theta}_{k}\right)$$

Let us first discuss the analytic recursive formula for the computation of **q**_{k}, following the ideas of the grid-based SLAM [19]. Note that

$${\mathbf{q}}_{k}=p({\mathbf{m}}_{k}{|{\mathbf{z}}_{1:k},\beta}_{0:k}^{(i)})=\frac{{g}_{2}({\mathbf{z}}_{k}{|{\mathbf{m}}_{k},\beta}_{k}^{(i)})p({\mathbf{m}}_{k}{|{\mathbf{z}}_{k-1},\beta}_{0:k-1}^{(i)})}{{\sum}_{{\mathbf{m}}_{k}}{g}_{2}({\mathbf{z}}_{k}{|{\mathbf{m}}_{k},\beta}_{k}^{(i)})p({\mathbf{m}}_{k}{|{\mathbf{z}}_{k-1},\beta}_{0:k-1}^{(i)})}$$

$$p({\mathbf{m}}_{k}|{\mathbf{z}}_{1:k-1},{\beta}_{0:k-1}^{(i)})=\sum _{{\mathbf{m}}_{k-1}}p({\mathbf{m}}_{k}|{\mathbf{m}}_{k-1})p({\mathbf{m}}_{k-1}|{\mathbf{z}}_{1:k-1},{\beta}_{0:k-1}^{(i)})$$

The update of probability vector, **q**_{k}, is then carried out as follows. Recall from Equation (15) that particle
${\beta}_{k}^{(i)}$ specifies the location of the searcher at time k,
${\mathbf{p}}_{k}^{(i)}={\left[{x}_{k}^{(i)}\hspace{0.17em}{y}_{k}^{(i)}\right]}^{\u22ba}$. Each component of vector **z**_{k} is then an observation of existence of a primary or a secondary link from location
${\mathbf{p}}_{k}^{(i)}$. Let q_{j}_{,}_{k}_{−1} be a component of vector **q**_{k}_{−1}, denoting the posterior probability that link ℓ_{j} exists at time k − 1, i.e.,
${q}_{j,k-1}=p({m}_{j,k-1}{|\mathbf{z}}_{1:k-1,}{\beta}_{0:k-1}^{(i)})$. Recall also that since the presence or absence of links are assumed independent, then
${\mathbf{q}}_{k-1}={\prod}_{j=1}^{L}{q}_{j,k-1}$. According to Equation (20), link j existence probability is predicted as

$${q}_{j,k|k-1}=p({m}_{j,k}=1|{m}_{j,k-1}=0)\cdot (1-{q}_{j,k-1})+p({m}_{j,k}=1|{m}_{j,k-1}=1)\cdot {q}_{j,k-1}.$$

Let z be a component of vector **z**_{k}, which refers to link ℓ_{j}, according to the current position of the searcher,
${\mathbf{p}}_{k}^{(i)}$. Then, based on Equation (19), we update the link, j, existence probability as

$${q}_{j,k}=\{\begin{array}{cc}\frac{{p}_{d}{q}_{j,k|k-1}}{{p}_{d}{q}_{j,k|k-1}+{p}_{fa}(1-{q}_{j,k|k-1})}& \text{if}\hspace{0.17em}\u200a\u200az=1\hspace{0.17em}\\ \frac{(1-{p}_{d}){q}_{j,k|k-1}}{(1-{p}_{d}){q}_{j,k|k-1}+(1-{p}_{fa})(1-{q}_{j,k|k-1})}& \text{if}\hspace{0.17em}\u200a\u200az=0\end{array}$$

$${\mathbf{q}}_{k}=\psi ({\mathbf{q}}_{k-1},{\beta}_{k}^{(i)},{\mathbf{z}}_{k})$$

Let us describe next the analytic recursion for the update of the parameters, η_{k} and θ_{k}, of Equation (18). At time k − 1, the posterior of emission rate A is modeled by a gamma distribution:

$$A|{n}_{1:k-1},{\beta}_{0:k-1}^{(i)}~\mathcal{G}(A;{\eta}_{k-1}{\theta}_{k-1}).$$

Sensor 1 provides at time k a count measurement, n_{k}, which plays the key role in the update of parameters η_{k}_{−1} and θ_{k}_{−1}. Recall that the likelihood function of this measurement,
${\tilde{g}}_{1}({n}_{k}|{\beta}_{k}^{(i)},A)$, is a Poisson distribution with parameter (mean)
${\lambda}_{k-1}^{(i)}$, rather than A. Fortunately,
${\lambda}_{k-1}^{(i)}$ is linearly related to emission rate A, that is

$${\lambda}_{k}^{(i)}=A\cdot c({\beta}_{k}^{(i)})$$

$${c}_{k}^{(i)}=-\frac{1}{2}\left(2\hspace{0.17em}\text{log}\hspace{0.17em}{R}_{0}+\text{log}\frac{{({x}_{k}^{(i)}-{X}^{(i)})}^{2}+{({y}_{k}^{(i)}-{Y}^{(i)})}^{2}}{{({x}_{k}^{(i)}{Y}^{(i)}-{y}_{k}^{(i)}{X}^{(i)})}^{2}+{({R}_{0}^{2}-{x}_{k}^{(i)}{X}^{(i)}-{y}_{k}^{(i)}{Y}^{(i)})}^{2}}\right)$$

In the proposed algorithm for the update of parameters η_{k}_{−1} and θ_{k}_{−1}, we use the following two properties of Gamma distribution.

- (1)
Scaling property [32]: if X ∼ $\mathcal{G}$ (η, θ), then for any c > 0, cX ∼ $\mathcal{G}$ (η, cθ).

- (2)
Gamma distribution is the conjugate prior of Poisson distributions [33]: if λ ∼ $\mathcal{G}$ (η, θ) is a prior distribution and n is a sample from the Poisson distribution with parameter λ, then the posterior is,

$$\lambda ~\mathcal{G}(\eta +n,\theta /(1+\theta )).$$$${\lambda}_{k-1}^{(i)}|{n}_{1:k-1},{\beta}_{0:k}^{(i)}~\mathcal{G}\left(\lambda ;{\eta}_{k-1},{c}_{k}^{(i)}\cdot {\theta}_{k-1}\right)$$

Using measurement n_{k} and the conjugate prior property, the posterior distribution is

$${\lambda}_{k-1}^{(i)}|{n}_{1:k-1},{\beta}_{0:k}^{(i)}~\mathcal{G}\hspace{0.17em}\left(\lambda ;{\eta}_{k-1}+{n}_{k,}\frac{{c}_{k}^{(i)}\cdot {\theta}_{k-1}}{1+{c}_{k}^{(i)}{\theta}_{k-1}}\right)$$

Since we are after the updated parameters of Gamma distribution of A (rather than ${\lambda}_{k}^{(i)}$), again, using the scaling property, we have

$$A|{n}_{1:k},{\beta}_{0:k}^{(i)}~\mathcal{G}\hspace{0.17em}\left(A;{\eta}_{k-1}+{n}_{k,}\frac{{\theta}_{k-1}}{1+{c}_{k}^{(i)}{\theta}_{k-1}}\right)$$

From Equations (24) and (28), we can summarise the analytic expressions for the update of η_{k} and θ_{k} as follows

$${\eta}_{k}={\eta}_{k-1}+{\eta}_{k}$$

$${\theta}_{k}=\frac{{\theta}_{k-1}}{1+{c}_{k}^{(i)}{\theta}_{k-1}}.$$

Recursive estimation of p (β_{0:}_{k}|**ζ**_{1:}_{k}, **u**_{1:}_{k}) is implemented using a particle filter. If we use the transitional prior as the proposal distribution, i.e.,

$$q({\beta}_{0:k}|{\varsigma}_{1:k},{\mathbf{u}}_{1:k-1})=p({\beta}_{k}|{\beta}_{k-1},{\mathbf{u}}_{k})p({\beta}_{0:k-1}|{\varsigma}_{1:k-1},{\mathbf{u}}_{1:k-1})$$

$${w}_{k}\propto p({\varsigma}_{k}|{\varsigma}_{1:k-1},{\beta}_{0:k})$$

For our problem, Expression (32) can be evaluated as

$$\begin{array}{l}{w}_{k}\propto \int p({\varsigma}_{k},{\alpha}_{k}|{\varsigma}_{1:k-1},{\beta}_{0:k})d{\alpha}_{k}\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}=\int {\tilde{g}}_{1}({n}_{k}|A,{\beta}_{k})p(A|{n}_{1:k-1},{\beta}_{0:k-1})dA\times \end{array}$$

$$\sum _{{\mathbf{m}}_{k}}{g}_{2}({\mathbf{z}}_{k}|{\mathbf{m}}_{k},{\mathbf{p}}_{k})p({\mathbf{m}}_{k}|{\mathbf{z}}_{1:k-1},{\beta}_{0:k-1})$$

$${\mathbf{q}}_{k|k-1}=\sum _{{\mathbf{m}}_{k-1}}p({\mathbf{m}}_{k}|{\mathbf{m}}_{k-1}){\mathbf{q}}_{k-1}.$$

The components of vector **q**_{k|k}_{−1}, i.e., q_{j}_{,}_{k|k}_{−1}, were specified by Equation (21). The integral that features in Equation (34) can also be computed analytically. This integral equals

$$\mathcal{I}=\int {\tilde{g}}_{1}({n}_{k}|A,{\beta}_{k})p(A|{n}_{1:k-1},{\beta}_{0:k-1})dA$$

$$=\int \mathcal{P}({n}_{k};{\lambda}_{k}=c({\beta}_{k})\cdot A)\mathcal{G}(A;{\eta}_{k-1},{\theta}_{k-1})dA$$

$$\mathcal{G}\left(A;{\eta}_{k-1}+{n}_{k},\frac{{\theta}_{k-1}}{1+{c}_{k}^{(i)}{\theta}_{k-1}}\right)=\frac{\mathcal{P}({n}_{k};{\lambda}_{k}=c({\beta}_{k})\cdot A)\mathcal{G}(A;{\eta}_{k-1},{\theta}_{k-1})}{\int \mathcal{P}({n}_{k};{\lambda}_{k}=c({\beta}_{k})\cdot A)\mathcal{G}(A;{\eta}_{k-1},{\theta}_{k-1})dA}$$

$$\mathcal{I}=\frac{\mathcal{P}({n}_{k}|{\lambda}_{k}=c({\beta}_{k})\cdot A)\mathcal{G}(A;{\eta}_{k-1},{\theta}_{k-1})}{\mathcal{G}(A;{\eta}_{k-1}+{n}_{k},{\theta}_{k-1}/(1+c({\beta}_{k}){\theta}_{k-1}))}$$

$${\tilde{w}}_{k}=\phi ({\beta}_{k},{\mathbf{q}}_{k-1},{\eta}_{k-1},{\theta}_{k-1},{n}_{k},{\mathbf{z}}_{k})$$

Importance weights determine in a probabilistic manner which particles will survive (and possibly multiply) during the resampling step of the RBPF.

Suppose the posterior distribution at time k − 1, p(**y**_{k}_{−1}|**ζ**_{1:}_{k}_{−1}, **u**_{1:}_{k}_{−1}), is approximated by a set of particles

$${\mathcal{Y}}_{k-1}={\left\{\left({\beta}_{k-1}^{(i)},{\mathbf{q}}_{k-1}^{(i)},{\eta}_{k-1}^{(i)},{\theta}_{k-1}^{(i)}\right)\right\}}_{i=1}^{N}$$

The question is how to compute the information gain Equation (13) for each **u** ∈ $\mathcal{U}$_{k}, based on particles $\mathcal{Y}$_{k}_{−1}. We adopt the ideal measurement approximation for this, that is, in hypothesizing the future count measurement (resulting from action **u**), we assume: (1) action **u** will be carried out correctly, that is, the transitional density p(**p**_{k}|**p**_{k}_{−1}, **u**_{k}) will be replaced by deterministic mapping; **p**_{k} = **p**_{k}_{−1} + **u**_{k}; and (2) the measurement count will be equal to the mean of g̃_{1}(n_{k}|A, **β**_{k}), that is, λ_{k} (rounded off to the nearest integer).

Since we are after the expected value of the gain, that is, 𝔼{$\mathcal{D}$(**u**)}, we will create an ensemble of “future ideal measurements”
${\left\{{n}_{k}^{(j)}\right\}}_{j=1}^{M}$. Expectation is then approximated by a sample mean, i.e.,

$$\mathbb{E}\left\{\mathcal{D}(\mathbf{u})\right\}\approx \frac{1}{M}\sum _{j=1}^{M}{\mathcal{D}}^{(j)}(\mathbf{u})$$

The ensemble of “future ideal measurements”
${\left\{{n}_{k}^{(j)}\right\}}_{j=1}^{M}$ is created as follows. For each action **u**, choose, at random, a set of M particle indices i_{j} ∈ {1, . . . , N}, j = 1, . . . , M. Action **u** is then expected to move the searcher to location
${\mathbf{p}}_{k}^{({i}_{j})}={\mathbf{p}}_{k-1}^{({i}_{j})}+\mathbf{u}$. Then a “future ideal measurement” is
${n}_{k}^{(j)}=\lfloor {A}^{({i}_{j})}\cdot {c}^{({i}_{j})}\rfloor $, where c(^{ij}) as a function of
${\mathbf{p}}_{k}^{({i}_{j})}$,
${\mathbf{p}}_{x}^{({i}_{j})}$ was defined by Equation (25)A(^{i}^{j}) ∼$\mathcal{G}$(A; η_{k}_{−1}, θ_{k}_{−1}) and ⌊·⌉ denotes the nearest integer function.

It remains to explain how to compute the gain D^{(}^{j}^{)}(**u**) based on
${n}_{k}^{(i)}$. Distribution π_{0}(**s**, **p**_{k}|**u**_{k}), which features in Equation (13), can be approximated using the particle set $\mathcal{Y}$_{k}_{−1} as follows:

$${\pi}_{o}(\mathbf{s},{\mathbf{p}}_{k}\mathbf{u})\approx \mathcal{G}(A;{\eta}_{k-1}{\theta}_{k-1})\sum _{i=1}^{N}{w}_{k-1}^{(i)}\delta ({\mathbf{p}}_{s}-{\mathbf{p}}_{s}^{(i)},{\mathbf{p}}_{k}-{\mathbf{p}}_{k}^{(i)})$$

$${\pi}_{1}(\mathbf{s},{\mathbf{p}}_{k}|\mathbf{u},{n}_{k}^{(i)})=\frac{{\tilde{g}}_{1}({n}_{k}^{(i)}|{\mathbf{p}}_{k},{\mathbf{p}}_{s},A){\pi}_{o}(\mathbf{s},{\mathbf{p}}_{k}|\mathbf{u})}{\int {\tilde{g}}_{1}({n}_{k}^{(i)}|{\mathbf{p}}_{k},{\mathbf{p}}_{s},A){\pi}_{o}(\mathbf{s},{\mathbf{p}}_{k}|\mathbf{u})d\mathbf{s}d{\mathbf{p}}_{k}}.$$

Substitution of Equations (41) and (42) into Equation (13) leads to

$${\mathcal{D}}^{(i)}(\mathbf{u})\approx -2\hspace{0.17em}\text{log}\frac{{{\sum}_{i=1}^{N}{w}_{k-1}^{(i)}\mathcal{J}}^{(i)}({n}_{k}^{(i)})}{{\left[{\sum}_{i=1}^{N}{w}_{k-1}^{(i)}{\mathcal{I}}^{(i)}({n}_{k}^{(i)})\right]}^{1/\mathrm{2\hspace{0.17em}}}}$$

$${\mathcal{J}}^{(i)}({n}_{k})=\int {\left[\mathcal{P}({n}_{k};{\lambda}_{k}^{(i)}=c({\beta}_{k}^{(i)})A)\right]}^{1/2}\times \mathcal{G}(A;{\eta}_{k-1}^{(i)},{\theta}_{k-1}^{(i)})dA.$$

The integral in Equation (44) can be evaluated numerically.

The pseudo-code of one cycle of the search algorithm is presented in Algorithm 1. The input consists of the particle set, $\mathcal{Y}$_{k}_{−1}, defined by Equation (40). Selection of the control vector, **u**_{k} (line 2 of Algorithm 1), is described in Algorithm 2.

An explanation of the steps in Algorithm 1 is described first. Estimation of the state vector via the RBPF is carried out in lines 4–18. According to Equation (15), random vector
${\beta}_{k-1}^{(i)}$ consists of
${\mathbf{p}}_{k-1}^{(i)}$, X^{(}^{i}^{)} and Y^{(}^{i}^{)}. Since the source location, (X^{(}^{i}^{)}, Y^{(}^{i}^{)}), is static, only the component,
${\mathbf{p}}_{k-1}^{(i)}$, is propagated to future time k in line 6. In line 7, Equation (39) is applied to compute the unnormalised weights of each particle. The map, represented by the probability of existence of each link, is updated in line 8, based on the Expression (23). The parameters of Gamma distribution are update in lines 9–11. The weights assigned to each quadruple (
${\beta}_{k}^{(i)}$,
${\mathbf{q}}_{k}^{(i)}$,
${\eta}_{k}^{(i)}$,
${\theta}_{k}^{(i)}$) are normalised in line 14. Resampling of particles is carried out in lines 15p–18. The particles for source position
${\mathbf{p}}_{s}^{(i)}$ are not restricted to the grid nodes and after the resampling step, their diversity is improved by regularisation [34]. Finally, the output is the particle set, $\mathcal{Y}$_{k}.

1: | Input: $\mathcal{Y}$_{k−1} |

2: | Run Algorithm 2 to select the control vector u_{k} |

3: | Apply control u_{k} and collect measurements z_{k}, n_{k} |

4: | Ȳ;_{k} = ∅; $\mathcal{Y}$_{k} = ∅ |

5: | for i =1, . . . ,N do |

6: | Draw ${\mathbf{p}}_{k}^{(i)}~p({\mathbf{p}}_{k}|{\mathbf{p}}_{k-1}^{(i)},{\mathbf{u}}_{k})$ |

7: | ${\tilde{w}}_{k}^{(i)}=\phi ({\beta}_{k}^{(i)},{\mathbf{q}}_{k-1}^{(i)},{\eta}_{k-1}^{(i)},{\theta}_{k-1}^{(i)},{n}_{k},{z}_{k})$ |

8: | ${\mathbf{q}}_{k}^{(i)}=\psi ({\mathbf{q}}_{k-1}^{(i)},{\beta}_{k}^{(i)},{\mathbf{z}}_{k})$ |

9: | ${\eta}_{k}^{(i)}={\eta}_{k-1}^{(i)}+{n}_{k}$ |

10: | Compute constant ${C}_{k}^{(i)}$ as a function of ${\beta}_{k}^{(i)}$ using Equation (25) |

11: | ${\theta}_{k}^{(i)}={\theta}_{k-1}^{(i)}/(1+{c}_{k}^{(i)}{\theta}_{k-1}^{(i)})$ |

12: | ${\overline{\mathcal{Y}}}_{k}={\overline{\mathcal{Y}}}_{k}\cup \left\{({\beta}_{k}^{(i)},{\mathbf{q}}_{k}^{(i)},{\eta}_{k}^{(i)},{\theta}_{k}^{(i)})\right\}$ |

13: | end for |

14: | ${w}_{k}^{(i)}={\tilde{w}}_{k}^{(i)}/{\sum}_{j=1}^{N}{\tilde{w}}_{k}^{(i)},\text{for}\hspace{0.17em}i=1,\dots ,N$ |

15: | for i = 1, . . . ,N do |

16: | Select index j^{i} ∈ {1, . . . ,N} with probability
${w}_{k}^{(i)}$ |

17: | ${\overline{\mathcal{Y}}}_{k}={\overline{\mathcal{Y}}}_{k}\cup \left\{({\beta}_{k}^{(i)},{\mathbf{q}}_{k}^{(i)},{\eta}_{k}^{({j}_{i})},{\theta}_{k}^{({j}_{i})})\right\}$ |

18: | end for |

19: | Output: $\mathcal{Y}$k |

The selection of a control vector, described by Algorithm 2, starts with postulating the set, $\mathcal{U}$_{k}, in line 2. For every **u** ∈ $\mathcal{U}$_{k}, the algorithm anticipates j = 1, . . . , M future measurements
${n}_{k}^{(j)}$ (line 9) and accordingly computes a sample of the reward $\mathcal{D}$^{(}^{j}^{)}(**u**) (line 14). The expected reward is then a sample mean (line 16). Finally, the optimal one-step ahead control is selected in line 18.

It has been observed in simulations that one step ahead control can sometimes lead to situations where the observer position switches eternally between two or three nodes of the lattice. In order to overcome this problem, we adopt a heuristic as follows: if a node has been visited in the last 10 search steps more than three times, the next motion control vector is selected at random. While a multi-step ahead searcher control would be preferable than the adopted heuristic, it would also be computationally more demanding. Multi-step ahead searcher control remains to be explored in future work.

1: | Input: $\mathcal{Y}$_{k−1} |

2: | Create the set of admissible controls $\mathcal{U}$_{k} = {·,→,←,↑,↓} |

3: | for every u ∈ $\mathcal{U}$_{k} do |

4: | for j = 1, . . . ,M do |

5: | Choose at random particle index i_{j} ∈ {1, . . . ,N} |

6: | ${\mathbf{p}}_{k}^{({i}_{j})}={\mathbf{p}}_{k-1}^{({i}_{j})}+\mathbf{u};$ |

7: | Compute ${c}_{k}^{({i}_{j})}$ using ${\mathbf{p}}_{k}^{({i}_{j})}$ and ${\mathbf{p}}_{s}^{({i}_{j})}$ via Equation (25) |

8: | Adopt ${A}^{({i}_{j})}={\eta}_{k-1}^{({i}_{j})}\cdot {\theta}_{k-1}^{({i}_{j})}$ |

9: | ${\eta}_{k}^{(j)}=\lfloor {A}_{k-1}^{({i}_{j})}\cdot {\theta}_{k-1}^{({i}_{j})}\rfloor $ |

10: | for i = 1, . . . ,N do |

11: | Compute ${\mathcal{I}}^{(i)}({n}_{k}^{(j)})$ via Equation (38) |

12: | Compute ${\mathcal{J}}^{(i)}({n}_{k}^{(j)})$ via Equation (44) |

13: | end for |

14: | Compute $\mathcal{D}$^{(j)}(u) using Equation (43) |

15: | end for |

16: | Estimate 𝔼{$\mathcal{D}$(u)} as a sample mean of
${\left\{{\mathcal{D}}^{(j)}(\mathbf{u})\right\}}_{j=1}^{M}$ |

17: | end for |

18: | Select control vector uk ∈ $\mathcal{U}$k using Equation (12) |

We applied the described search algorithm to the search area modelled by the random grid shown in Figure 2. Prior knowledge available to the searcher is illustrated by Figure 1: the radius of the search area is R_{0} = 9; the centre is **c** = (0, 0), and the total number of potential links in the complete grid modelling the search area is L = 572. The parameters of the emitting source to be estimated are: X = 0, Y = 7 and A_{0} = 12. The searcher initial position is **p**_{0} = (9, −4).

Dynamic model p(**m**_{k}|**m**_{k}_{−1}) is a 2 × 2 transitional probability matrix with diagonal and off-diagonal elements 0.999 and 0.001, respectively, meaning the changes in the status of the links are very rare. This ensures a stable structure of the search domain, because the count measurement model is valid in a steady state. Dynamic model p(**p**_{k}|**p**_{k}_{−1}, **u**_{k}) can be expressed as

$$p({\mathbf{p}}_{k}|{\mathbf{p}}_{k-1},{\mathbf{u}}_{k})=(1-{p}_{e})\delta ({\mathbf{p}}_{k}-{\mathbf{p}}_{k-1}+{\mathbf{u}}_{k})+\sum _{\text{v}\in {\mathcal{U}}_{k}\backslash {\mathbf{u}}_{k}}\frac{{p}_{e}}{|{\mathcal{U}}_{k}|-1}\delta ({\mathbf{p}}_{k}-{\mathbf{p}}_{k-1}+\mathbf{v})$$

The parameters of detection matrices, which define the likelihood function g_{2}(**z**_{k}|**p**_{k}, **m**_{k}), are as follows: for primary observable links, p_{d} = 1 and p_{fa} = 0; for secondary observable links, p_{d} = 0.8 and p_{fa} = 0.1.

The RBPF used N = 4, 000 particles with M = 400 samples used in the averaging of information gain. The particle set, $\mathcal{Y}$_{0}, at the initial time is created as follows:
${\mathbf{p}}_{0}^{(i)}={\mathbf{p}}_{0}$, for all i = 1, . . . , N particles; the source location vector is drawn from a uniform distribution over a circle with centre **c** and radius R_{0}, i.e.,
${\mathbf{p}}_{s}^{(i)}~{\mathcal{U}}_{\text{Circle}(\text{c},{R}_{0})}({\mathbf{p}}_{s})$; link existence probabilities are set to q_{j}_{,0} = 0.5, for all j = 1, . . . , L links; finally, the parameters of initial Gamma distribution $\mathcal{G}$(A; η_{0}, θ_{0}) were selected as η_{0} = 15 and θ_{0} = 1.

We terminate the search algorithm when the searcher steps on the source. At this point, we compare the true source location with the current estimate of the posterior distribution of the searcher position, approximated by particles ${\left\{{\mathbf{p}}_{k}^{(i)}\right\}}_{i=1}^{N}$. If the true source position is contained in the support defined by ${\left\{{\mathbf{p}}_{k}^{(i)}\right\}}_{i=1}^{N}$, the search is considered successful.

Figure 6 illustrates a typical run of the search algorithm. The true path of the searcher on this run is shown in Figure 6a. It took the searcher 53 time steps to reach the source. During the search, the motion control vector failed to execute correctly on two occasions. The final estimate of the map (i.e., of existing links of the square lattice) is shown in Figure 6b. This figure shows only the links whose probability of existence is higher than 0.6. The blue circles in Figure 6b indicate the posterior distribution of the searcher final position. Its true position, which is the same as the source position, is included in the support of this posterior, meaning that the search was successful. Moreover, on this occasion, the maximum a posteriori (MAP) estimate of the searcher final position coincides with the truth. Figure 6c shows the measured values of the count number, n_{k}, along the path. As we discussed in the Introduction, the measurements are sporadic, especially in the beginning, when the distance between the searcher and the source is large: among the first ten count measurements, only three indicated a non-zero tracer concentration. An avi video file, illustrating a single run of the algorithm, is supplied with this paper.

Figure 7 illustrates two search paths on a much bigger lattice, with the fraction of missing links p = 0.20. The source parameters were X = 0, Y = 7 and A_{0} = 16; all other parameters were the same as above. The duration of two searches was 84 and 103 time steps, respectively.

The average performance of the search algorithm has been assessed via Monte Carlo runs using the smaller scale model of the search area (and its corresponding parameters), shown in Figure 6. If the search on a particular run was successful, its corresponding search time is used in averaging. A run is declared unsuccessful if the source has not been found after k = 100 discrete-time steps. We also keep the statistics on the success rate of the search. The results obtained via averaging over 100 Monte Carlo runs are presented in Table 1 for three different locations of the source, i.e., (X, Y) = (0, 7), (0, 1), (2, −5). The three locations correspond to the shortest path distances (from the searcher initial position **p**_{0} = (9, −4) to the source) of 20, 14 and eight unit lengths, respectively. All other parameters were the same as described above for the illustrative run. As expected, the results in Table 1 indicate that the search is quicker and more reliable (i.e., with a higher success rate) for a source which is closer to the searcher initial position.

Table 2 presents the results for a source at location (0, 7), but with three different values of the source release-rate, i.e., A_{0} = 8, 12, 16. The results indicate that the search is quicker for a source characterised by a higher release rate. The explanation of this trend is as follows. Initially, when the searcher is far from the source, its measurements of tracer concentration are very small, typically zero, hence uninformative. During this phase of the search, the searcher effectively moves according to a “diffusive” (or random walk) model, which is slower than the so-called “ballistic” movement associated with an information-driven search [2]. The random walk phase is longer for a weaker source, which contributes to the overall longer search time in this case. As a specific numerical example, we have also validated that a purely random search never manages to find the source at (0, 7) in the given time frame of 100 discrete time steps.

The paper considers a very difficult problem of autonomous search for a diffusive point source of tracer in an environment whose structure is unknown. Sequential estimation and motion control are carried out in highly uncertain circumstances with the state space, including, in addition to the source parameters, the map of the search area and the searcher position within the map. The paper develops mathematical models of measurements, formulates the sequential Bayesian solution (in the form of a Rao-Blackwellised particle filter) and proposes an information-driven motion control of the searcher. The numerical results demonstrate the concept, indicating high success rates in comparison with a random walk. A gradient-based search (“chemotaxis”) would be inappropriate in this application, because the computation of a gradient is infeasible in the presence of intermittent measurements and geometric constraints.

There are many areas for further research and improvements of the concepts introduced in this paper. One direction is to explore the potential benefits of the analytical results available from percolation theory in the theoretical analysis of searcher performance. Another is to investigate more efficient particle filters for source parameter estimation (being a deterministic part of the state space) and search strategies that “look” multiple steps ahead (rather than a one step myopic search). It is also desirable to further explore the coordination of multiple networked searchers with decentralised estimation and motion control. Finally, it would be important to practically validate the proposed autonomous searcher in experimental trials.

The authors declare no conflicts of interest.

An approximate model of mean concentration, independent of the grid structure, was introduced in Section 2.3. This model is a solution of Laplace Equation (1) for a circular search area in the absence of obstacles, with a boundary condition θ (r = R_{0}) = 0, but using different values of parameters. More specifically, the obstacles in the search area are incorporated in this model via homogenization (volume/ensemble averaging) of the diffusion equation (similar to the effective media approach [35,36]), so that Equation (1) is replaced with

$$D\Delta \langle \theta \rangle ={A}_{0}\delta (x-X,y-Y)$$

In line with the above comments, we will use Equation (46) as a foundation for the measurement model that is independent of the structure of the search domain. The solution of Equation (46) for a tracer source located at the center of circle (X = Y = 0) is given by [23]:

$$\langle \theta \rangle =\frac{A}{2}\hspace{0.17em}\text{log}[(z{z}^{*})/{R}_{0}^{2}]$$

$$\langle \theta \rangle =(k/2)\hspace{0.17em}\text{log}[(w{w}^{*})/{R}_{0}^{2}]$$

The required conformal transformation is the well-known Möbius map (see [23]):

$$w(z)=\frac{{R}_{0}(z-Z)}{Z{Z}^{*}-{R}_{0}^{2}}$$

We point out that the model is not restricted to a circular search area. According to the theory of analytical functions, a conformal mapping to the circle always exists for an arbitrary simply connected domain, and therefore, it can be computed analytically or numerically [23].

- Viswanathan, G.M.; Afanasyev, V.; Buldyrev, S.V.; Murphy, E.J.; Prince, P.A.; Satnley, H.E. Levy flight search patterns of wandering albatrosses. Nature
**1996**, 381, 413–415. [Google Scholar] - Bénichou, O.; Loverdo, C.; Moreau, M.; Voituriez, R. Intermittent search strategies. Rev. Mod. Phys
**2011**, 83, 81–129. [Google Scholar] - Hein, A.M.; McKinley, S.A. Sensing and decision-making in random search. Proc. Natl. Acad. Sci. USA
**2012**, 109. [Google Scholar] [CrossRef] - Coppey, M.; Benichou, O.; Voituriez, R.; Moreau, M. Kinetics of target site localization of a protein on DNA: A stochastic approach. J. Biophys
**2004**, 87, 1640–1649. [Google Scholar] - Bressloff, P.C.; Newby, J. Filling of a Poisson trap by a population of random intermittent searchers. Phys. Rev. E
**2012**, 85, 031909. [Google Scholar] - Holcman, D. Modeling DNA and Virus Trafficking in the Cell Cytoplasm. J. Stat. Phys
**2007**, 127, 471–494. [Google Scholar] - Bressloff, P.C.; Newby, J. Stochastic models of intracellular transport. J. Chem. Phys
**2013**, 85, 135–196. [Google Scholar] - Farrell, J.A.; Pang, S.; Li, W. Plume mapping via Hidden Markov methods. IEEE Trans. Syst. Man Cybern
**2003**, 33, 850–863. [Google Scholar] - Li, W.; Farrell, J.A.; Pang, S.; Arrieta, R.M. Moth-inspired chemical plume tracking on an autonomous underwater vehicle. IEEE Trans. Robot
**2006**, 22, 292–307. [Google Scholar] - Oyekan, J.; Hu, H. A novel bio-controller for localizing pollution sources in a medium peclet environment. J. Bionic Eng
**2010**, 7, 345–353. [Google Scholar] - Klimenko, A.V.; Priedhorsky, W.C.; Hengartner, N.W.; Borozin, K.N. Efficient strategies for low-level nuclear searches. IEEE Trans. Nucl. Sci
**2006**, 53, 1435–1442. [Google Scholar] - Ristic, B.; Gunatilaka, A. Information driven localisation of a radiological point source. Inf. Fusion
**2008**, 9, 317–326. [Google Scholar] - Ristic, B.; Morelande, M.; Gunatilaka, A. Information driven search for point sources of gamma radiation. Signal Process
**2010**, 90, 1225–1239. [Google Scholar] - Ishida, H.; Nakayama, G.; Nakamoto, T.; Morizumi, T. Controlling a Gas/Odor plume tracking robot based on transient responses of gas sensors. IEEE Sens. J
**2005**, 5, 537–545. [Google Scholar] - Dhariwal, A.; Sukhatme, G.S.; Requicha, A.A.G. Bacterium-Inspired Robots for Environmental Monitoring. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA’04), New Orleans, LA, USA, 26 April–1 May 2004; pp. 1436–1443.
- Vergassola, M.; Villermaux, E.; Shraiman, B.I. ‘Infotaxis’ as a strategy for searching without gradients. Nature
**2007**, 445, 406–409. [Google Scholar] - Iacono, G.L. A comparison of different searching strategies to locate sources of odor in turbulent flows. Adapt. Behav
**2010**, 18, 155–170. [Google Scholar] - Masson, J.B. Olfactory searches with limited space perception. Proc. Natl. Acad. Sci. USA
**2013**, 110. [Google Scholar] [CrossRef] - Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Ben-Avraham, D.; Havlin, S. Diffusion and Reaction in Fractals and Disordered Systems; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Torquato, S. Random Heterogeneous Materials: Microstructure and Macroscopic Properties; Springer: New York, NY, USA, 2002. [Google Scholar]
- Kemeny, J.G.; Snell, J.L. Finite Markov Chains; Van Nostrand Reinhold Company: New York, NY, USA, 1960. [Google Scholar]
- Prosperetti, A. Advanced Mathematics for Applications; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Marjovi, A.; Marques, L. Multi-robot olfactory search in structured environments. Robot. Auton. Syst
**2011**, 59, 867–881. [Google Scholar] - Selvadurai, A.P.S. Partial Differential Equations in Mechanics 1; Springer: Berlin, Germany, 2000. [Google Scholar]
- Burioni, R.; Cassi, D. Random walks on graphs: Ideas, techniques and results. J. Phys. A Math. Gen
**2005**, 38. [Google Scholar] [CrossRef] - Doucet, A., de Freitas, J.F.G., Gordon, N.J., Eds.; Sequential Monte Carlo Methods in Practice; Springer: New York, NY, USA, 2001.
- Dean, T.; Kanazawa, K. A model for reasoning about persistence and causation. Comput. Intell
**1989**, 5, 142–150. [Google Scholar] - Chong, E.; Kreucher, C.; Hero, A. POMDP Approximation Using Simulation and Heuristics. In Foundations and Applications of Sensor Management; Hero, A., Castanon, D., Cochran, D., Kastella, K., Eds.; Springer: New York, NY, USA, 2008; Chapter 8. [Google Scholar]
- Kailath, T. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Tech
**1967**, 15, 52–60. [Google Scholar] - Doucet, A.; de Freitas, N.; Murphy, K.P.; Russell, S.J. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, CA, USA, 30 June–3 July 2000; pp. 176–183.
- Hazewinkel, M. Gamma-Distribution. Encyclopedia of Mathematics, Available online: http://www.encyclopediaofmath.org/index.php?title=Gamma-distribution&oldid=24074 (accessed on 10 February 2014).
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3nd ed; CRC Press: Boca Raton, FL, USA, 2003. [Google Scholar]
- Ristic, B.; Arulampalam, S.; Gordon, N. Beyond the Kalman Filter: Particle Filters for Tracking Applications; Artech House: Boston, MA, USA, 2004. [Google Scholar]
- Nicholson, C. Diffusion and related transport mechanisms in brain tissue. Rep. Prog. Phys
**2001**, 64, 815–884. [Google Scholar] - Novak, I.L.; Kraikivski, P.; Slepchenko, B.M. Diffusion in cytoplasm: Effects of excluded volume due to internal membranes and cytoskeletal structures. Biophys. J
**2009**, 97, 758–767. [Google Scholar] - Pisani, L. Simple expression for the tortuosity of porous media. Transp. Porous Media
**2011**, 88, 193–203. [Google Scholar]

Source location | Shortest path length | Number of search steps | Success rate |
---|---|---|---|

(0, 7) | 20 | 42.1 | 94 |

(0, 1) | 14 | 34.0 | 95 |

(2, −5) | 8 | 28.8 | 99 |

Release rate A_{0} | Number of search steps | Success rate [%] |
---|---|---|

8 | 49.5 | 78 |

12 | 42.1 | 94 |

16 | 38.2 | 93 |

© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).