Autonomous search for a diffusive source in an unknown environment

The paper presents an approach to olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas) in an environment with unknown map of randomly placed and shaped obstacles. The measurements of tracer concentration are sporadic, noisy and without directional information. The search domain is discretised and modelled by a finite two-dimensional lattice. The links is the lattice represent the traversable paths for emitted particles and for the searcher. A missing link in the lattice indicates a blocked paths, due to the walls or obstacles. The searcher must simultaneously estimate the source parameters, the map of the search domain and its own location within the map. The solution is formulated in the sequential Bayesian framework and implemented as a Rao-Blackwellised particle filter with information-driven motion control. The numerical results demonstrate the concept and its performance.


I. INTRODUCTION
The search for an emitting source of particles, chemicals, odour, or radiation, based on sporadic clues or intermittent measurements, has attracted a great deal of interest lately. The topic is important for search and rescue operations with the goal to localise dangerous pollutants, such as chemical leaks and radioactive sources. In biology, the search is studied to model animal bahaviour in search for food or mates [1]- [3]. Bio-inspired search for underwater sources of pollution have been studied in [4]- [6]. A robot for gas/odour plume tracking guided by the increase in the concentration gradient has been proposed in [7]. "Infotaxis" [8] is a search strategy based on entropy-reduction maximisation which has been developed in the context of finding a weak source in a turbulent flow (e.g. drug or leak emitting chemicals, for a comprehensive review see [9]). Information-gain driven search for radioactive point sources has been studied in [10]. In all these applications the search domain is either open (without obstacles) or a precise map of the search domain (with obstacles) is available.
In this paper we focus on autonomous olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas, heat, moisture) in a domain with randomly placed and shaped obstacles (forbidden areas), whose structure (the map) is unknown. The problem is of importance for example in localisation of dangerous leaks in collapsed buildings, inside tunnels or mines. The searcher senses in a probabilistic manner both the structure of the search domain (e.g. the presence or absence of obstacles, walls, blocked passages) and the level of concentration of tracer particles. The objective of the search is to localise and report the coordinates of the source in a shortest possible time. This is not a trivial task for several reasons. First, the emission rate of the source is typically unknown. Furthermore, the measurements of tracer particle concentration are sporadic, noisy and without directional information. Since the structure (map) of the search domain is unknown, the searcher needs to explore the domain and create its map. The searcher motion is fully autonomous: it senses the environment and after processing this uncertain information sequentially makes decisions on where to move next in order to collect new measurements. Its motion control, however, is not fully reliable as it may occasionally fail to execute correctly. The probabilistic model of searcher motion is assumed known.
In the paper we restrict to the search in a two-dimensional domain. The coordinates of the searcher initial position, as well as the border of the search area (relative to the initial position) are given as input parameters. In order to fulfil its mission, the searcher has to find the source and report its coordinates relative to its initial position. This in turn requires simultaneous estimation at three levels: (1) estimation of source parameters (its location in 2D and its release rate); (2) estimation of the map of the search area and (3) estimation of the searcher position within the estimated map. Estimation at levels (2) and (3) has been studied extensively in robotics under the term grid-based simultaneous localisation and mapping (SLAM) [11]. The primary mission in all SLAM publications is an accurate mapping of the area. The primary mission of our searcher, however, is to localise the source, while SLAM is only a necessary component of the solution.
The only related work which deals with olfactory search in an unknown structured environment is [12]. While this paper presents a plethora of experimental results, the algorithms are based on heuristics. Our approach, however, is theoretically sound in the sense that its mathematical models are precisely defined, estimation is carried out in the sequential Bayesian framework and the searcher motion control is driven by information gain.
The search domain is discretised, as for example in [4], and modelled by a finite two-dimensional lattice. With sufficiently fine resolution of the lattice, the emitting source can be considered to be in one of the nodes of the lattice. The links (bonds, edges) of the lattice represent the traversable paths for emitted particles (tracer) and for the searcher. Missing links in the lattice indicate blocked paths due to the walls or obstacles. This is a very general model applicable to searches at various scales, from inside buildings and tunnels, to within cells of living organisms [2]. The percentage of missing links in the lattice is assumed to be above the percolation threshold p c (for the adopted lattice structure p c = 1/2 [13], [14]) so that long-range connectivity is satisfied [13]. Using the absorbing Markov chains technique [15], we can compute exactly the mean concentration level in any node of the lattice, that is at any point of the search domain with obstacles.
Since the structure (map) of the search domain is unknown, the searcher must rely on a theoretical model of concentration measurement which is independent of the this map. Such a model is derived in the paper in the analytic form and used in the search.
The search itself consist of algorithms for sequential estimation and motion control. We adopt the framework of optimal sequential Bayesian estimation with information-driven motion control. Implementation is carried out using a Rao-Blackwellised particle filter.
The paper is organised as follows. Mathematical models of measurements and searcher motion are described in Sec.II. The olfactory search problem is formulated and its conceptual solution provided in Sec.III. Full technical details of the proposed search algorithm are presented in Sec.IV, with numerical results given in Sec.V. Finally, conclusions of this study are summarised in Sec. VI.

A. Model of environment
The concentration of a tracer at any point of the search domain is governed by the diffusive equation, which in the steady state reduces to the Laplace equation [16]: Here D 0 is the diffusion coefficient of tracer in the environment, ∆ is the Laplace operator, θ is the mean (timeaveraged) tracer concentration, δ is the Dirac delta function, A 0 is the release-rate of the tracer source, and X, Y are the coordinates of the source in a two-dimensional Cartesian coordinate system. For convenience we adopt a circular search domain of radius R 0 , centred at the origin of the coordinate system, that is for every point inside the search area, r = x 2 + y 2 ≤ R 0 . Assuming that the tracer source is undetectable outside the search domain, we can impose the absorbing boundary condition θ(r = R 0 ) = 0. The traditional approach to the computation of the tracer concentration θ at every point of the search domain, is via analytical or numerical solution of (1). This, however, is a non-trivial task when the search domain is a structure of complex topology (due to obstacles, compartments walls, random openings, etc).
We therefore adopt an alternative approach, where the continuous model of the tracer diffusion process is replaced with a random walk on a square lattice, adopted as a discrete model of the search area. Discretisation is illustrated in Fig.1 for a search area centred at the origin of the coordinate system, with the radius R 0 = 9. The length of each link (edge, bond) in the lattice determines the resolution of discretisation, and in this example is adopted as a unit length. The source, assumed to be located at one of the nodes of the lattice, is emitting particles which travel through the lattice according to the random walk model [17]. The obstacles in the search domain (the regions through which the tracer cannot pass) are simply modelled as missing links (or clusters of missing links) in the square lattice. Fig.2 shows an example of such a model: this incomplete lattice is obtained by removing fraction p = 0.35 of the links in the complete lattice shown in Fig.1. Note that all nodes in the incomplete lattice (grid) are connected. On average this will be the case if the fraction of missing links in the incomplete grid of Fig.2 is below the percolation threshold p c ; above the percolation threshold (p > p c ) the lattice becomes fragmented. The framework of percolation theory enables analytical description of statistical properties of such a lattice [13], [14].

B. Model of tracer distribution
This section explains how to compute the mean concentration of tracer particles in each node of the incomplete grid (such as the one shown in Fig.2) which represents a discretised model of the search area with obstacles.
For a given incomplete grid, the mean concentration can be computed using the absorbing Markov chain technique [15]. Neglecting the spatial approximation of the search domain (due to discretisation) and under the assumption that the distribution of particles has reached the steady state, the absorbing Markov chain technique provides an exact solution for the quantity of source material at each location.
We can regard the random walk of tracer particles through the incomplete grid (e.g. Fig.2) as a Markov chain whose states are the nodes of the grid. The Markov chain is specified by the transition matrix T; each element of this matrix is the probability of transition from state s i to state s j (i.e. a particle move from node i to node j): T ij = P {s j |s i }. How to construct T given the incomplete grid? First note that we distinguish two types of states in this Markov chain: absorbing states (corresponding to the nodes on the boundary of the grid) and transient states. For an absorbing state s i , T ii = 1 and T ij = 0, if j = i. Suppose a transient state s i corresponds to node i in the incomplete grid, which has connections (links) with nodes j 1 , . . . , j m , where for a square grid m ≤ 4. Then T ij1 = · · · = T ijm = 1/m and T ij = 0 for j / ∈ {j 1 , . . . , j m }. Suppose there are r absorbing states and t transient states. If we order the states so that the absorbing states come first (before the transient states), then the transition matrix takes the canonical form: where I is r × r identity matrix, Q is the t × t matrix that describes transitions between transient states, R is a t × r matrix that describes the transitions from transient to absorbing states and 0 is an r×t matrix of zeros. The fundamental matrix of the absorbing Markov chain [15], represents the expected number of visits to a transient state s j starting from a transient state s i (before being absorbed). This matrix will be used in simulations to compute the mean particle concentration in any node of the incomplete grid. Suppose an emitting source is placed at node i, which is not on the boundary. The source is releasing tracer particles at a constant rate A 0 . Then the expected concentration of tracer particles in any other node j of the incomplete grid (which is not on the boundary) is given by The concentration scales linearly with the release rate A 0 as a direct consequence of the linearity of Laplace equation (1). Fig.3 shows the mean concentration of tracer particles for the search area modelled by incomplete grid of Fig.2, with the source placed at (X, Y ) = (0, 7) and with A 0 = 12. Notice how the concentration depends on the distance from the source and the structure of the grid, plotted in Fig.2.

C. Sensor models and motion model
Two types of measurements are collected by the searcher. Sensor 1 measures the concentration of tracer particles as a count of particles received during the sampling time. Assuming the so-called 'dilution' limit (limit of low concentrations) the tracer fluctuations follow the Poisson distribution [8], that is a concentration measurement at node j of the grid is a random sample drawn from n ∼ P(n; λ) = λ n n! e −λ (4) where λ = θ j = A 0 · F ij . The Poisson distribution mimics the intermittency of concentration measurements [8].
The searcher sequentially estimates the source parameters without knowing the map of the search area. Hence the measurement model based on the mean concentration λ = A 0 ·F ij cannot be used in estimation (recall that matrix F is formed based on the structure of the incomplete grid). Assuming that the fraction of missing links in the incomplete grid is smaller than the percolation threshold p c , the expected concentration of tracer particles in any node j of the incomplete grid can be computed approximately using the property of conformal invariance of the Laplace equation (see Appendix for details). Suppose the source of release rate A 0 is placed at a node of the grid, positioned at coordinates (X, Y ). Then the mean (time and ensemble averaged) concentration at node j, positioned at (x j , y j ) can be approximated as: which depends on the fraction of missing links in the incomplete grid, see Appendix), and Note that this model is independent of the structure of the incomplete grid. In summary, estimation will be carried out using Sensor 1 measurement model based on (4), where mean λ = θ j is approximated by (5), (6). The actual concentration measurements will be simulated according to (4), but with The searcher moves and explores the search domain in order to find the source. The source parameter estimation is carried out using the map-independent measurement model (5), which does not require discretisation of the search domain on a square lattice (as in Fig.1). Nevertheless, we keep discretisation for the searcher in order to model its motion paths and to facilitate its grid-based SLAM functionality. Thus we assume that the searcher travels within the search area along the paths represented by the links of the incomplete grid as in Fig.2. As it travels, it stops at the nodes along its path to sense the environment, i.e. to collect measurements. Sensor 2 is a simple binary detector of the presence or absence of the links (paths) visible from the node in which the searcher is currently placed. It reports on the presence/absence of the primary and secondary neighbouring links.
A link in a grid of Fig.2, is defined by a quadruple (x 1 , y 1 , x 2 , y 2 ), where (x 1 , y 2 ) and (x 2 , y 2 ) are the coordinates of the nodes it connects. In order to explain what we mean by primary and secondary links, consider for example the node at location (−3, Let an observation (supplied by sensor 2) about the presence or absence of a link ℓ, be a binary value z(ℓ) ∈ {0, 1}, where z(ℓ) = 0 means link ℓ is absent and z(ℓ) = 1 is the opposite. The performance od sensor 2 can be described by two detection matrices, one for the primary links, the other for secondary links. Each detection matrix Π has a form where P (z = 1|m = 1) = p d and P (z = 1|m = 0) = p f a are the probability of correct detection and the probability of false detection p f a , respectively. The columns of matrix Π add up to 1, and hence (7) can be written as: Suppose the searcher is in node i at discrete-time k − 1. Let the set of admissible controls vectors for the next move be defined as U k = {·, ↑, →, ↓, ←}, meaning that the searcher can stay where it is, or move one unit length up, right, down or left. After processing measurements from its sensors, the searcher decides to choose control u * k ∈ U k and hence to be at time k at node j. This control, however, is executed correctly only with probability 1−p e . Due to control noise or unmodeled exogenous effects [11], with probability p e the searcher will actually execute control u ′ k ∈ U k \ {u * k }.

III. THE PROBLEM AND ITS CONCEPTUAL SOLUTION
The searcher has at its disposal the probabilistic models of sensor measurements and dynamic models. Prior knowledge also includes: (1) the coordinates of its initial position; (2) the length of each link in the square lattice; (3) the boundary of the circular search area (defined by its centre and radius). The described prior translates into knowledge of the complete grid such as the one shown in Fig.1. The searcher cannot move outside the complete grid.
The objective of the searcher is to estimate in the shortest possible time the coordinates of the emitting source, as well as the partial map describing the path from its starting (entry) point to the estimated location of the source. This partial map is important, for example, in order to guide the rescue team to the source or if the searcher has to retreat to its starting position.

A. Sequential Bayesian estimation
The described problem can be cast in the sequential Bayesian estimation framework as a nonlinear filtering problem. Let us first define the state vector, which consists of three parts: 1) The coordinates of the searcher position at discrete-time k = 1, 2, . . . are denoted by p k = [x k y k ] ⊺ . 2) The status (presence/absence) of each link in the complete grid (such as the one shown in Fig.1). The status of link ℓ j , where j = 1, . . . , L, and L is the total number of links in the complete grid, is m(ℓ j ) = m j ∈ {0, 1}.
The notation P (m j = 1) refers to the probability that the link is present. The map at time k is fully specified by vector m k = [m 1,k , . . . , m L,k ] ⊺ . The time index is introduced because we allow the map of the search area to occasionally change, e.g. an open door can close. The assumption is that the statuses of links are mutually independent, i.e. m j,k is independent from m i,k for i = j.

3) The parameter vector of the source is denoted by
The complete state vector is then defined as Dynamics of the state y k is described by two transitional densities: p(m k |m k−1 ) specifies the evolution of the map over time, while p(p k |p k−1 , u k ) characterises the searcher motion model. The observation models of the searcher are specified by two likelihood functions: g 1 (n k |p k , m k , s) characterises sensor 1, which provides the count of particles n k at p k from the source in state s through the map m k ; g 2 (z k |p k , m k ) refers to sensor 2 and describes the observation z k of the status of the links in m k visible from the searcher in location p k . Let us denote observations and controls at time k by a vector Finally, the prior probability density function (PDF) of the state is denoted by p(y 0 ).
The goal in the sequential Bayesian framework is to estimate any subset or property of the sequence of states y 0:k := (y 0 , . . . , y k ) given observation sequence ζ 1:k := (ζ 1 , . . . , ζ k ) and the control sequence u 1:k := (u 1 , . . . , u k ), which is completely specified by the joint posterior distribution p(y 0:k |ζ 1:k , u 1:k ). This posterior satisfies the following recursion: where is the transitional density, and is the measurement likelihood function.
In general it is impossible to solve analytically the recursive equation (9). Instead we will formulate a numerical approximation using the sequential Monte Carlo method [18]. Before going into details, notice that factorization expressed by (10) and (11) imposes a structure which can be conveniently represented by a dynamic Bayesian network (DBN) [19] shown in Fig.4. The circles in Fig.4   The dynamic Bayesian network representing the dependency between the random variables which feature in the described inference problem Count measurement n k depends on the map m k , hence the likelihood of count measurement is formulated as g 1 (n k |p k , m k , s). The searcher, however, does not know the map (it estimates it only partially as it travels through the search area) and hence we have introduced the approximate measurement model expressed by (4)- (6). The searcher will therefore process count observations n k using the likelihood function which is independent of m k , and denoted bỹ g 1 (n k |p k , s), rather than g 1 (n k |p k , m k , s). We indicate this fact by drawing the arrow from m k to n k in Fig.4 by a dashed line.
The computation of the posterior PDF for a structured complex system such as the one shown in Fig.4 can be factorised and consequently made computationally and statistically more efficient. Technical details will be given in Sec.IV.

B. Information driven motion control
After processing the measurements received at time k − 1, the searcher needs to select the next control vector u k which will change its position to p k ∼ p(p k |p k−1 , u k ). The problem of selecting u k can be formulated as a partiallyobserved Markov decision process [20], whose elements are: (1) the set of admissible control vectors U k ; (2) the current information state, expressed by the predicted PDF p(y k |ζ 1:k−1 , u 1:k−1 , u k ), where u k ∈ U k ; (3) the reward function associated with each control u k ∈ U k . In the paper we adopt motion control based on a single step ahead strategy; this myopic approach is suboptimal in the presence of randomly missing links, but is computationally easier to implement and faster to run. The control vector is then selected as: where D(u, p, ζ) is the reward function. Note that the reward depends on future measurement ζ k = [n k z ⊺ k ] ⊺ , which would be acquired after control u ∈ U k had been applied. Since the decision has to be made before the actual control is applied, the expectation E is taken in (12) with respect to the prior measurements PDF.
Considering that the primary mission of the search is source localisation (map estimation is of secondary importance), the reward function at time k is adopted as the information gain between: (1) the predicted PDF over the state subspace (s, p k ) and (2) the updated PDF over (s, p k ), using the count measurement n k . The two distributions are denoted π 0 (s, p k |u k ) = p(s|n 1:k−1 , u 1:k )p(p k |p k−1 , u k ) and π 1 (s, p k |n k , u k ) = ξg 1 (n k |p k , s) π 0 (s, p k |u k ), respectively, where ξ is a normalisation constant. The information gain between the two distributions is measured using the Bhattacharya distance [21]. The reward function is thus adopted as: π 1 (s, p k |n k , u k ) · π 0 (s, p k |u k ) ds dp k (13) where we dropped unnecessary arguments in notation for D.

IV. THE SEARCH ALGORITHM
The proposed search algorithm, formulated as a DBN with observer control, can be implemented efficiently as a Rao-Blackwellised particle filter (RBPF) [22] with sensor control. The idea of the RBPF is as follows. Suppose it is possible to divide the components of the hidden state vector y k into two groups, α k and β k , such that the following two conditions are satisfied: C-1: p(y k |y k−1 , u k ) = p(α k |β k−1:k , α k−1 )·p(β k |β k−1 , u k )
Then we need only to estimate the posterior p(β 0:k |ζ 1:k , u 1:k ), meaning that we reduced the dimension of the space in which Monte Carlo estimation needs to be carried out, from dim(y k ) to dim(β k ). As argued in [22], this potentially improves computational and statistical efficiency of the particle filter.
In the described DBN, shown in Fig,4, in order to satisfy conditions C-1 and C-2, the state vector y k can be partitioned as follows: We are interested only in the filtering posterior density, which can now be factorised as follows: p(α k , β 0:k |ζ 1:k , u 1:k ) = p(α k |β 0:k , ζ 1:k , u 1:k ) · p(β 0:k |ζ 1:k , u 1:k ) The PDF p(β 0:k |ζ 1:k , u 1:k ) is approximated by a random sample {β Subsequently one can compute analytically (for each sample β where • p(m k |z 1:k , β (i) 0:k ) = q k is a vector of probabilities of existence for each link in the random grid and • p(A|n 1:k , β (i) 0:k ) is approximated by a Gamma distribution with shape parameter η k and scale parameter θ k , i.e. G (A; η k , θ k ).
Hence each particle corresponds to a set: where q k , η k , θ k are the sufficient statistics of p(α k |β (i) 0:k , n 1:k , z 1:k , u 1:k ). Keep in mind that q k , η k , θ k depend on a particular sequence (particle) β

A. Recursive formulae for sufficient statistics
Let us first discuss the analytic recursive formula for the computation of q k , following the ideas of the grid-based SLAM [11]. Note that The update of probability vector q k is then carried out as follows. Recall from (15) that particle β (i) k specifies the location of the searcher at time k, p Each component of vector z k is then an observation of existence of a primary or a secondary link from location p (i) k . Let q j,k−1 be a component of vector q k−1 , denoting the posterior probability that link ℓ j exists at time k − 1, i.e. q j,k−1 = p(m j,k−1 |z 1:k−1 , β (i) 0:k−1 ). Recall also that since the link statuses are assumed independent, then q k−1 = L j=1 q j,k−1 . According to (20), link j existence probability is predicted as: Let z be a component of vector z k which refers to link ℓ j , according to the current position of the searcher, p (i) k . Then based on (19) we update the link j existence probability as: where p d and p f a , introduced in (8), are the elements of the appropriate detection Π matrix (primary or secondary) of (8). Equations (19)- (22) can be summarised as: Let us describe next the analytic recursion for the update of the parameters η k and θ k of (18). At time k − 1, the posterior of emission rate A is modeled by a gamma distribution: Sensor 1 provides at time k a count measurement n k , which plays the key role in the update of parameters η k−1 and θ k−1 . Recall that the likelihood function of this measurement, g 1 (n k |β where the constant c(β with X (i) and Y (i) , according to (15), being the components of particle β (i) k . In the proposed algorithm for the update of parameters η k−1 and θ k−1 we use the following two properties of Gamma distribution: 1) Scaling property [23]: if X ∼ G(η, θ) then for any c > 0, cX ∼ G(η, cθ). 2) Gamma distribution is the conjugate prior of Poisson distributions [24]: if λ ∼ G(η, θ) is a prior distribution and n is a sample from the Poisson distribution with parameter λ, then the posterior is λ ∼ G(η + n, θ/(1 + θ)).
Given β k of (25) and express the prior distribution of λ (i) k−1 as: Using measurement n k and the conjugate prior property, the posterior distribution is: Since we are after the updated parameters of Gamma distribution of A (rather than λ (i) k ), again using the scaling property we have: We summarise from (24) and (28) the analytic expressions for the update of η k and θ k :
The components of vector q k|k−1 , i.e. q j,k|k−1 , were specified by (21). The integral which features in (34) can also be computed analytically. This integral equals: where P(n; λ) is the Poisson distribution introduced in (4).
Recall the explanation presented in Sec.IV-A about the update of the parameters of the Gamma distribution, summarised by (26)- (28). Effectively we have shown there that: where the integral in the denominator is I, see (36). Hence, the integral can be expressed as: and is computed for an arbitrary chosen value of A > 0. Based on (34), let us summarise the expression for an unnormalised importance weight as: Importance weights determine in a probabilistic manner which particles will survive (and possibly multiply) during the resampling step of the RBPF.

C. Information gain
Suppose the posterior distribution at time k − 1, p(y k−1 |ζ 1:k−1 , u 1:k−1 ), is approximated by a set of particles: where random sample β  (15). The weights of the particles in Y k−1 are equal, because sensor control is carried out after resampling, i.e. w (i) The question is how to compute the information gain (13) for each u ∈ U k , based on particles Y k−1 . We adopt the ideal measurement approximation for this, that is, in hypothesizing the future count measurement (resulting from action u), we assume: (1) action u will be carried out correctly, that is the transitional density p(p k |p k−1 , u k ) will be replaced by deterministic mapping: p k = p k−1 + u k , and (2) the measurement count will be equal to the mean ofg 1 (n k |A, β k ), that is λ k (rounded off to the nearest integer).
Since we are after the expected value of the gain, that is E{D(u)}, we will create an ensemble of "future ideal measurements" {n . Expectation is then approximated by a sample mean, i.e.
where D (j) (u) was computed using n (j) k . The ensemble of "future ideal measurements" {n (j) k } M j=1 is created as follows. For each action u, choose at random a set of M particle indices i j ∈ {1, . . . , N }, j = 1, . . . , M . Action u is then expected to move the searcher to location p k−1 +u. Then a "future ideal measurement" is n was defined by (25), A (ij ) ∼ G(A; η k−1 , θ k−1 ), and ⌊·⌉ denotes the nearest integer function.
It remains to explain how to compute the gain D (j) (u) based on n (j) k . Distribution π 0 (s, p k |u k ), which features in (13), can be approximated using the particle set Y k−1 as follows: Substitution of (41) and (42) into (13) leads to: where I (i) (n k ) is computed via (38) and Integral (44) can be evaluated numerically.

D. Implementation
The pseudo-code of one cycle of the search algorithm is presented in Algorithm 1. The input consists of the particle set Y k−1 , defined by (40). Selection of the control vector u k (line 2 of Algorithm 1) is described in Algorithm 2.
Explanation of the steps in Algorithm 1 are described first. Estimation of the state vector via the RBPF is carried out in lines 4-18. According to (15), random vector β k−1 , X (i) and Y (i) . Since the source location, (X (i) , Y (i) ), is static, only the component p s are not restricted to the grid nodes and after the resampling step, their diversity is improved by regularisation [25]. Finally, the output is the particle set Y k .
The selection of a control vector, described by Algorithm 2, starts with postulating the set U k in line 2. For every u ∈ U k , Algorithm 1 The searcher algorithm 1: Input: Y k−1 2: Run Algorithm 2 to select the control vector u k 3: Apply control u k and collect measurements z k , n k 4: Y k = ∅; Y k = ∅ 5: for i = 1, . . . , N do 6: Draw p k as a function of β (i) k using (25) 11:  It has been observed in simulations that one step ahead control can sometimes lead to situations where the observer position switches eternally between two or three nodes of the lattice. In order to overcome this problem, we adopt a heuristic as follows: if a node has been visited in the last 10 search steps more than 3 times, the next motion control vector is selected at random. While a multi-step ahead searcher control would be preferable than adopted heuristic, it would also be computationally more demanding. Multi-step ahead searcher control remains to be explored in future work.

A. An illustrative run
We applied the described search algorithm to the search area modelled by the random grid shown in Fig.2. Prior knowledge available to the searcher is illustrated by Fig.1: the radius of the search area is R 0 = 9; the centre is c = (0, 0) and the total number of potential links in the complete grid modelling the search area is L = 572. The parameters of the emitting source to be estimated are: X = 0, Y = 7 and A 0 = 12. The searcher initial position is p 0 = (9, −4).
Dynamic model p(m k |m k−1 ) is is a 2×2 transitional probability matrix with diagonal and off-diagonal elements 0.999 and 0.001, respectively 1 . Dynamic model p(p k |p k−1 , u k ) can be expressed as: where in simulations we used the value p e = 0.04. The parameters of detection matrices, which define the likelihood function g 2 (z k |p k , m k ), are as follows: for primary observable links, p d = 1 and p f a = 0; for secondary observable links p d = 0.8 and p f a = 0.1.
The RBPF used N = 4000 particles with M = 400 samples used in the averaging of information gain. The particle set Y 0 at initial time is created as follows: p (i) 0 = p 0 , for all i = 1, . . . , N particles; the source location vector is drawn from a uniform distribution over a circle with centre c and radius R 0 , i.e. p (i) s ∼ U Circle(c,R0) (p s ); link existence probabilities are set to q j,0 = 0.5, for all j = 1, . . . , L links; finally, the parameters of initial Gamma distribution G(A; η 0 , θ 0 ) were selected as η 0 = 15 and θ 0 = 1.
We terminate the search algorithm when the searcher steps on the source. At this point we compare the true source location with the current estimate of the posterior distribution of the searcher position, approximated by particles {p . If the true source position is contained in the support defined by {p , the search is considered successful. Fig. 5 illustrates a typical run of the search algorithm. The true path of the searcher on this run is shown in Fig.5.(a). It took the searcher 53 time steps to reach the source. During the search, the motion control vector failed to execute correctly on two occasions. The final estimate of the map (i.e of existing links of the square lattice) is shown in Fig.5.(b). This figure shows only the links whose probability of existence is higher than 0.6. The blue circles in Fig.5.(b) indicate the posterior distribution of the searcher final position. Its true position, which is the same as the source position, is included in the support of this posterior, meaning that the search was successful. Moreover, on this occasion, the MAP estimate of the searcher final position coincides with the truth. Fig.5.(c) shows the measured values of the count number n k along the path. As we discussed in introduction, the measurements 1 The structure of the search domain must be stable (recall that the count measurement model is valid for a steady-state), hence the changes in the statuses of links are very rare. are sporadic, especially in the beginning, when the distance between the searcher and the source is large: among the first ten count measurements, only three indicated a non-zero tracer concentration.
An avi video file, illustrating a single run of the algorithm, is supplied with the submission of this paper.

B. Monte Carlo runs
The average performance of the search algorithm in the described scenario has been assessed via Monte Carlo runs. If the search on a particular run was successful, its corresponding search time is used in averaging. A run is declared unsuccessful if the source has not been found after k = 100 discretetime steps. We also keep the statistics on the success rate of the search. The results obtained via averaging over 100 Monte Carlo runs are presented in Table I for three different locations of the source, i.e. (X, Y ) = (0, 7), (0, 1), (2, −5). The three locations correspond to the shortest path distances (from the searcher initial position p 0 = (9, −4) to the source) of 20, 14 and 8 unit lengths, respectively. All other parameters were the same as described above for the illustrative run. As expected, the results in Table I indicate that the search is quicker and more reliable (i.e. with a higher success rate) for a source which is closer to the searcher initial position. Table II presents the results for a source at location (0, 7) but with three different values of the source release-rate, i.e. A 0 = 8, 12, 16. The results indicate that the search is quicker for a source characterised by a higher release rate. The explanation of this trend is as follows. Initially, when the searcher is far from the source, its measurements of tracer concentration are very small, typically zero, hence uninformative. During this phase of the search, the searcher effectively moves according to a 'diffusive' (or random walk) model, which is slower that the so-called 'ballistic' movement associated with information driven search [2]. The random walk phase is longer for a weaker source, which contributes to the overall longer search time in this case. As a specific numerical example, we have also validated that a purely random search never manages to find the source at (0, 7) in the given time-frame of 100 discrete-time steps.

VI. SUMMARY
The paper considers a very difficult problem of autonomous search for a diffusive point source of tracer in an environment whose structure is unknown. Sequential estimation and motion control are carried out in highly uncertain circumstances with the state space including, in addition to the source parameters,  the map of the search area and the searcher position within the map. The paper develops mathematical models of measurements, it formulates the sequential Bayesian solution (in the form of a Rao-Blackwellised particle filter) and proposes an information driven motion control of the searcher. The numerical results demonstrate the concept, indicating high success-rates in comparison with random walk. There are many areas for further research and improvements of the concepts introduces in this paper. One direction is to explore the potential benefits of analytical results available from the percolation theory in carrying out olfactory search. Another is to investigate more efficient particle filters for source parameter estimation (being a deterministic part of the state space). Finally, the search strategies based on multiple steps ahead (rather than myopic search) are expected to improve the performance in a domain with obstacles. APPENDIX An approximate model of mean concentration, independent of the grid structure, was introduced in Sec.II-C. This model is a solution of the Laplace equation (1) for a circular search area in the absence of obstacles, with a boundary condition θ(r = R 0 ) = 0, but using different values of parameters. More specifically, the obstacles in the search area are incorporated in this model via homogenization (volume/ensemble averaging) of the diffusion equation (similar to the effective media approach [26], [27]), so that (1) is replaced with where D is the re-scaled diffusivity that accounts for such a homogenization, and θ is the time/ensemble averaged tracer concentration. The new (often called effective) diffusivity D is related to 'unobstructed' diffusivity D 0 of (1) via the formula D = f c D 0 . The scaling parameter 0 ≤ f c ≤ 1 (known as tortuosity [14], [28]) describes the effect of obstacles (their shape and packing density [14], [27]). According to (46), the decrease of the effective diffusivity of the tracer due to the presence of obstacles has the same effect as an appropriate increase of source release-rate (i.e. A = A 0 /f c ), with unchanged diffusivity in (46) (i.e. D = D 0 ), where parameters D 0 , A 0 correspond to their values in an unobstructed space, see (1). We arrive at a conclusion that the effect of obstacles can be approximately incorporated with imprecision (overestimation) of the source release-rate, without any effect on the source position. Since the main goal of the search algorithm is to find the source, then such inaccuracy in releaserate estimation becomes irrelevant for the performance of the algorithm. This means that as the first approximation for adopted measurements model we can still use (46) with known diffusivity D = D 0 and some unknown A. Estimation of A 0 (if required) can be implemented retrospectively based on a theoretical model for f c [28]). For the lattice models an expression for f c can be derived analytically by employing the framework of percolation theory, resulting in the expression f c = (1 − p/p c ) α , where p c = 1/2 (percolation threshold on square lattice), p is the fraction of missing links in the incomplete square lattice and α = 1.30 [13], [14]. If the number of missing links is small, we can adopt approximation f c ≈ 1 and A ≈ A 0 . In line with the above comments we will use (46) as a foundation for the measurements model that is independent of the structure of the search domain. The solution of (46) for a tracer source located at the center of circle (X = Y = 0), is given by [29]: where z = x + iy is the complex coordinate and z * is its complex-conjugate. To find the solution for configurations other than the circular domain with the source in the centre, we employ the property of conformal invariance of the Laplace equation [29]. We illustrate this method with a source placed inside the circular domain, but away from its centre (that is at coordinates (X, Y ) s.t. √ X 2 + Y 2 < R 0 ). If we can find a conformal transformation ω(z) that maps an arbitrary position of the source (X, Y ) back to the center of the circular domain, then we can still use the solution (47), but with the substitution z → ω(z). Therefore, for an arbitrary position of the source inside the search area √ X 2 + Y 2 < R 0 we can write θ = (κ/2) log[(ww * )/R 2 0 ].
The required conformal transformation is the well-known Möbius map (see [29]) where Z = X + iY . After straightforward calculations we arrive at the solution given by (5) and (6). We point out that the model is not restricted to a circular search area. According to the theory of analytical functions, a conformal mapping to the circle always exists for arbitrary simply connected domain, and therefore can be computed analytically or numerically [29].