Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps

Zhu, Pingping; Ferrari, Silvia; Morelli, Julian; Linares, Richard; Doerr, Bryce

doi:10.3390/s19071524

Open AccessArticle

Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps

by

Pingping Zhu

^1,*

,

Silvia Ferrari

¹

,

Julian Morelli

¹

,

Richard Linares

² and

Bryce Doerr

³

¹

Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853, USA

²

Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

³

Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, MN 55455, USA

^*

Author to whom correspondence should be addressed.

Sensors 2019, 19(7), 1524; https://doi.org/10.3390/s19071524

Submission received: 18 December 2018 / Revised: 15 March 2019 / Accepted: 24 March 2019 / Published: 28 March 2019

(This article belongs to the Special Issue Mobile Robot Olfaction for Real-World Applications–From Disaster Response to Environmental Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper develops a decentralized approach to gas distribution mapping (GDM) and information-driven path planning for large-scale distributed sensing systems. Gas mapping is performed using a probabilistic representation known as a Hilbert map, which formulates the mapping problem as a multi-class classification task and uses kernel logistic regression to train a discriminative classifier online. A novel Hilbert map information fusion method is presented for rapidly merging the information from individual robot maps using limited data communication. A communication strategy that implements data fusion among many robots is also presented for the decentralized computation of GDMs. New entropy-based information-driven path-planning methods are developed and compared to existing approaches, such as particle swarm optimization (PSO) and random walks (RW). Numerical experiments conducted in simulated indoor and outdoor environments show that the information-driven approaches proposed in this paper far outperform other approaches, and avoid mutual collisions in real time.

Keywords:

gas sensing; multi-agent systems; very large-scale robotic systems; information theory

1. Introduction

Fugitive emissions and the dispersion of pollutants in the atmosphere are significant concerns affecting public health as well as climate change. The accidental release of hazardous gases from both urban and industrial sources is responsible for a variety of respiratory illnesses and environmental concerns [1,2]. Many sensors are currently fabricated and deployed both indoors and outdoors for air quality data collection and communication. However, obtaining a spatial representation of a gas distribution is a challenging problem because existing mapping and fusion algorithms do not scale to networks of potentially hundreds of mobile and stationary sensors. The decentralized mapping and path-planning methods presented in this paper are applicable to very large-scale sensor systems, and as such can potentially be used to assess air quality, classify safe and hazardous areas based on the concentration of the harmful gases, and even localize fugitive emissions [3].

Due to the fundamental mechanisms of atmospheric gas dispersion, auxiliary tools are required to ensure early detection and to respond with appropriate counter measures by planning future measurements efficiently in both space and time. Methods for obtaining gas distribution maps (GDM) have recently been developed along two lines of research [4]. In the first line of research, a stationary sensor network is used to collect and fuse measurements to estimate the position of a source, typically requiring expensive and time-consuming calibration and data-recovery operations [5]. The use of stationary sensors, however, is not typically effective or sufficiently expedient for gas sensing in response to high source rates of critical emissions. The second line of research, pursued in this paper, involves the use of mobile sensors, such as terrestrial robots equipped with gas sensors that can be controlled to rapidly and efficiently collect and fuse measurements over time. The latter approach provides for flexible sensor configurations that can respond to measurements online and thus can be applied to mapping gas distributions in unknown and complex environments.

Mobile gas sensing can be performed by a single robot [2,3,6,7,8,9] or by networks of robots also referred to as multi-agent systems (MAS) [10,11,12,13,14]. Compared to single robots, MAS present several advantages, including increased probability of success and improved overall operational efficiency [14], but also present additional technical challenges. In addition to requiring solving the GDM problem as a dynamical optimization problem, multi-agent path planning, coordination, communication, and fusion can become intractable as the number of robots increases [4,15,16,17,18,19,20].

MAS path planning and coordination can be achieved via centralized or decentralized methods, depending on the underlying communication infrastructure [21]. Centralized methods require persistent communication between a central station and every agent in the network, such that the central station can process and fuse all sensor measurements and use them to plan, or re-plan, the robot paths in a coordinated and collaborative fashion. Decentralized methods allow each robot to process its own measurements individually and then to communicate and fuse it with the measurements of a subset of collaborative robots, such as its nearest neighbors, with established connectivity. As a result, the performance of centralized methods depends entirely on the reliability of communication protocols under the operating conditions. Because fragile communication links are common in many hazardous scenarios, reliance on persistent communications with the central station and the associated power consumption can hinder or even prevent the applicability of MAS over large regions of interest that require repeated long-distance data transmission. Nevertheless, most of the existing GDM methods to date rely on centralized methods and algorithms [10,12,13].

One of the main challenges in developing decentralized GDM methods lies in solving the data fusion problem for neighboring robots such that each robot can build its own representation of the GDM based on local measurements, but also update it incrementally as new information is obtained from neighboring robots. Considering the limited communication bandwidth and computing resources of most autonomous robots, it is also impractical to expect each sensor to share all the raw measurement data with its neighbors. Therefore, the decentralized approach to gas source localization presented in [11] shares only the largest gas concentration and corresponding source position with its neighbors. However, this decentralized approach cannot be extended to high-performance GDM representations, such as Gaussian process (GP) mixture models [8] or kernel-based models [3,22], and fusing GDMs obtained by different robots would result in redundant and expensive computations that potentially operate on the same raw data repeatedly.

The Hilbert map is an alternate GDM that represents a probability map learned from local concentration measurements [7,23]. The novel GDM representation developed in this paper uses kernel logistic regression (KLR) to express the probability that the gas concentration at a certain position belongs to a predefined range as a Hilbert map function. By this approach, new decentralized fusion algorithms can be developed that present several advantages, including decentralized fusion, agent-level complete GDM representation, and update for decentralized decision making. Additionally, information fusion operations are implemented via simple summations and only the local Hilbert map needs to be shared among neighboring robots at every measurement update. As a result, at the end time of the task, information about the gas concentration distribution over the entire region of interest can be delivered to the client or operator even if only a few robots complete the mission. Two GDM-based path-planning methods, the entropy-based artificial potential field (EAPF) and the entropy-based particle swarm optimization (EPSO) algorithms, are presented in this paper, and the simulation results show that they significantly outperform existing algorithms.

2. Problem Formulation

Consider the problem of optimally planning the trajectory of a distributed sensing system comprised of N cooperative robots engaged in GDM through a large region of interest (ROI), denoted by

W \subset R^{2}

. In general, the gas concentration at a position

x \in W

, at time t, is modeled as a random variable

C (x, t)

, thus the whole gas concentration is a random field. The gas distribution is considered constant over a time interval for simplicity and thus is modeled as a spatial function

c (x)

defined over

W

. The approach can be extended to time-varying concentrations by augmenting the dimensionality of the GDM.

Let

U \subset R^{m}

denote the space of admissible actions or controls. The dynamics of each robot are governed by stochastic differential equations (SDEs),

{\dot{x}}_{n} (t) = f [x_{n} (t), u_{n} (t), t] + Gw (t), x_{n} (T_{0}) = x_{n_{0}} and n = 1, \dots, N

(1)

where

x_{n} (t) \in W

denotes the nth robot’s position and the velocity at time t,

u_{n} (t) \in U

denotes the nth robot action or control, and

x_{n_{0}}

denotes the robot initial conditions at initial time

T_{0}

. The robot dynamics in (1) are characterized by additive Gaussian white noise, denoted by

w (t) \in R^{2}

, and

G \in R^{2 \times 2}

is a known constant matrix. In this paper, we assume that the position of the nth robot is obtained by an on-board GPS and is denoted by

{\hat{x}}_{n} (t)

, where

(\hat{\cdot})

denotes the estimated variable’s value.

All N robots are equipped with identical metal oxide sensors in order to cooperatively map a time-constant gas distribution, where the quantity and position of gas plumes are unknown a priori. Because the reaction surface of a single gas sensor is very small (≈1 cm

^{2}

), a single measurement from a gas sensor can only provide information about a very small area. Therefore, to increase the resolution of the GDM, a small metal oxide sensor array is used instead of a single oxide sensor for each robot. The area that is covered by this small metal oxide sensor array is called the field of view (FOV) of the robot, denoted by

S (x_{n}) \in W

. Because the structure of the metal oxide sensor array is fixed, the positions of each metal oxide sensor can be approximated by the robot position measurements, denoted by

{\hat{x}}_{n, m}

,

m = 1 : M

, where M is the number of oxide sensors on-board the nth robot. For example, for a

5 \times 5

metal oxide sensor array, the number of sensors is

M = 25

.

The gas concentration measurement obtained by the mth sensor can be modeled as the sum of the actual concentration and measurement noise,

{\hat{c}}_{n, m} (x_{n}) = c (x_{n, m}) + \hat{v} (x_{n, m})

(2)

where

\hat{v} (x_{n, m})

is a realization of the random measurement noise,

V (x_{n, m})

. Furthermore, the concentration measurement,

{\hat{c}}_{n, m} (x_{n})

, can be treated as a sample of the random variable

C (x_{n, m})

, defined as

C (x_{n, m}) ≜ c (x_{n, m}) + V (x_{n, m})

(3)

Then, the concentration measurement

{\hat{c}}_{n, m} (x_{n})

can be considered as an observation of the random field

C (\cdot)

at position

x_{n, m}

.

Because the actual gas concentration,

c (x)

, and the distribution of the measurement noise,

V (x)

, are unknown, our objective is to approximate the random field

C (\cdot)

, and then use this approximated random field to estimate the gas concentration map. The cost function of the nth robot over a fixed time interval

[t_{0}, t_{f}]

can then be expressed as an integral of the future measurement values, conditioned on past measurements, and the robot control usage, i.e.,

J_{n} = \int_{W} H [C_{n} (x | M_{n} (T_{f}))] d x + \int_{t_{0}}^{t_{f}} u_{n} {(t)}^{T} R u_{n} (t) d t

(4)

where

H [C_{n} (x | M_{n} (t))]

is the entropy of the random variable

C_{n} (x | M_{n} (t))

approximated by the nth robot given all past measurements,

M_{n}

.

R

is a positive definite matrix that weighs the importance of the elements of the control input

u_{n} (t)

, and the superscript “

T^{}

” denotes the matrix transpose. The optimal planning problem is to find the optimal time history of the robot state

x_{n} (t)

and control

u_{n} (t)

, for all

n = 1, \dots, N

, so that the cost function

J_{n}

in (4) is minimized over

[t_{0}, t_{f}]

, and subject to (1).

Let the time interval

[t_{0}, t_{f}]

be discretized into

T_{f}

equal time steps

Δ t = (t_{f} - t_{0}) / T

and let

t_{k} = t_{0} + k Δ t

denote the discrete-time index with

k = 1, \dots, T_{f}

. Then the objective function,

J_{n}

, can be rewritten as,

J_{n} = \int_{W} H [C_{n} (x | M_{n}^{(T_{f})})] d x + \sum_{k = 1}^{T_{f}} u_{n, k}^{T} R u_{n, k}

(5)

where the subscript k indicates the kth time step and the superscript “

{(T_{f})}^{}

” indicates the time until the

T_{f}

th step. The term,

M_{n}^{(T_{f})} = \overset{T_{f}}{\underset{k = 1}{\cup}} M_{n, k}

, indicates the measurement history until the

T_{f}

th step and

M_{n, k}

denotes all the measurements obtained by the nth robot up to the kth time step.

3. Representation of GDM

To approximate the probability distribution of the continuous random variable

C_{n} (x | M_{n}^{(T)})

, there are several methods, including kernel density estimation (KDE) [24,25] and Gaussian process regression (GPR) [26]. In the KDE method, the learned probability density function (PDF) is assumed to be a weighted summation of many parameterized kernel functions, where the weight coefficients and the kernel parameters are critical and learned from the raw measurement data set. In the GPR method, the approximated PDF is assumed to be a Gaussian distribution, where a large matrix, the Gram matrix, is very critical and learned from the raw measurement data. For both of methods, the data fusion of two different measurement data sets and update of the approximated PDF are computationally demanding, because all computations must be implemented from a massive amount of raw data. It is difficult to obtain the updated parameters and coefficients from the previous parameters and coefficients directly.

The method in this paper overcomes this hurdle through an efficient fusion approach developed by approximating the continuous probability distribution by a discrete probability distribution. Denote the range of the concentration in the whole ROI by

R = [L_{c}, H_{c}]

, and then divide the range into L concentration intervals denoted by

R_{1} = [L_{1}, H_{1}), \dots, R_{l} = [L_{l}, H_{l}), \dots, R_{L} = [L_{L}, H_{L}]

, where

H_{l} = L_{l + 1}

for

l = 1, \dots, L - 1

, and

L_{1} = L_{c}

and

H_{L} = H_{c}

. The cutoff coefficients,

{(L_{l}, H_{l})}_{l = 1}^{L}

, specify the concentration intervals of interest. Instead of approximating the PDF, this paper models the probabilities that the gas concentration at every position

x \in W

belongs to the concentration interval

{p_{l} (x)}_{l = 1}^{L}

, or,

p_{l} (x) ≜ P (C (x) \in R_{l}), l = 1, \dots, L

(6)

where

P (\cdot)

denotes the probability operator. Let

f_{c} (x)

denote the PDF of the concentration

C (x)

and

Δ L_{l} = H_{l} - L_{l}

denote the length of the lth concentration interval. The relationship between

p_{l} (x)

and

f_{c} (x)

can be expressed as,

p_{l} (x) ≜ P (C (x) \in R_{l}) = \int_{x \in R_{l}} f_{c} (x) d x

(7)

where

f_{c} (L_{l}) = lim_{Δ L \to 0} \frac{p_{l} (x)}{Δ L}

(8)

Therefore, the discrete distribution

{p_{l} (x)}_{l = 1}^{L}

can be used to describe the probability distribution of the continuous random variable

C (x)

as

L \to \infty

. More importantly, it can be used for decision making, where the concentration intervals correspond to different levels of hazardous gas concentration. The Hilbert map method, developed by Fabio Ramos and Lionel Ott in [27] to model obstacle occupancy, is modified here to approximate the discrete distribution

{p_{l} (x)}_{l = 1}^{L}

over the entire ROI.

3.1. Mapping with KLR

A Hilbert map is a continuous probability map developed by formulating the mapping problem as a binary classification task. Let

x \in W

be any point in

W

and

Y \in {0, 1}

be defined as a categorical random variable such that

Y = \{\begin{matrix} 1, & if C (x) \in R \\ 0, & if C (x) \notin R \end{matrix}

(9)

where the realization is denoted by y. The Hilbert map describes the probabilities

P (Y = 1 | x) = p (x)

and

P (Y = 0 | x) = 1 - p (x)

at the position

x

.

Consider the training measurement data set,

M = {(x_{m}, y_{m})}_{m = 1}^{M}

, where

x_{m} \in W

indicates the measurement position, the Boolean variable

y_{m} \in {1, 0}

indicates whether the concentration measurement,

\hat{c} (x_{m})

, belongs to the range,

R

(where

1 = Y e s

and

0 = N o

), and M is the number of measurements. The probability

P (Y | x)

is defined as,

p (x) = P (Y = 1 | x) = 1 - \frac{1}{1 + exp [f (x)]} = \frac{e^{f (x)}}{1 + e^{f (x)}}

(10)

where,

1 - p (x) = P (Y = 0 | x) = \frac{1}{1 + e^{f (x)}}

(11)

and

f \in H

is a Hilbert function defined on

W

.

H

is a reproducing kernel Hilbert space (RKHS) associated with a kernel

k (\cdot, \cdot)

. The kernel mapping is denoted by

k (x, \cdot) = φ (x), φ (x) \in H

, and is an injective map from

W

to

H

[28,29,30,31,32]. According to the kernel trick [28,29,30,31,32], the evaluation of the Hilbert function can be expressed in the form of inner product,

f (x) = {〈f, φ (x)〉}_{H}

(12)

where

{〈\cdot, \cdot〉}_{H}

indicates the inner product in the RKHS. To learn the Hilbert map, the loss function

J_{H}

is defined as,

\begin{matrix} J_{H} & = & \sum_{m = 1}^{M} ℓ_{m} + \frac{λ}{2} {∥ f ∥}_{H}^{2} \\ = & \sum_{m = 1}^{M} \{ln [1 + e^{f (x_{m})}] - y_{m} f (x_{m})\} + \frac{λ}{2} {∥f∥}_{H}^{2} \end{matrix}

(13)

where

ℓ_{m} = - ln [P (Y = y_{m} | x_{m})]

is the negative log-likelihood (NLL) of the data

(x_{m}, y_{m})

. Here, the term

λ

is the regularization term, which is a small user-defined positive scalar,

λ ≪ 1

. Then the gradient of the loss function with respective to f is expressed as,

\begin{matrix} g & = & \frac{\partial J_{H}}{\partial f} = \sum_{m = 1}^{M} [\frac{e^{f (x_{m})}}{1 + e^{f (x_{m})}} φ (x_{m}) - y_{m} φ (x_{m})] + λ f \\ = & \sum_{m = 1}^{M} [P (Y = 1 | x_{m}) - y_{m}] φ (x_{m}) + λ f \\ = & Φ (p - y) + λ f \end{matrix}

(14)

where

Φ = [φ (x_{1}), \dots, φ (x_{M})]

,

p = {[p (x_{1}), \dots, p (x_{M})]}^{T} = {[\frac{e^{f (x_{1})}}{1 + e^{f (x_{1})}}, \dots, \frac{e^{f (x_{M})}}{1 + e^{f (x_{M})}}]}^{T}

, and

y = {[y_{1}, \dots, y_{M}]}^{T}

. In addition, the Hessian operator is expressed as,

\begin{matrix} H & = \frac{\partial g}{\partial f} = \frac{\partial}{\partial f} [\frac{e^{f (x_{1})}}{1 + e^{f (x_{1})}}, \dots, \frac{e^{f (x_{M})}}{1 + e^{f (x_{M})}}] Φ^{T} + λ I \\ = [\frac{\partial}{\partial f} \frac{e^{f (x_{1})}}{1 + e^{f (x_{1})}}, \dots, \frac{\partial}{\partial f} \frac{e^{f (x_{M})}}{1 + e^{f (x_{M})}}] Φ^{T} + λ I \\ = [\frac{e^{f (x_{1})}}{1 + e^{f (x_{1})}} \frac{1}{1 + e^{f (x_{1})}} φ (x_{1}), \dots, \frac{e^{f (x_{M})}}{1 + e^{f (x_{M})}} \frac{1}{1 + e^{f (x_{M})}} φ (x_{M})] Φ^{T} + λ I \\ = Φ Λ Φ^{T} + λ I \end{matrix}

(15)

where

I

is an identity operator (or matrix) defined in the domain of

H \times H

, so that for any function

h \in H

, and

I h = h

. Here,

Λ

is an

M \times M

diagonal matrix defined as,

Λ = p {(1_{M} - p)}^{T}

(16)

and

1_{M} = {[1, \dots 1]}^{T}

is an

M \times 1

vector with all elements equal to one.

Using the Newton-Raphson method, the Hilbert function, is updated iteratively,

\begin{matrix} f_{i + 1} & = f_{i} - H_{i}^{- 1} g_{i} = f_{i} - {[Φ Λ_{i} Φ^{T} + λ I]}^{- 1} g_{i} \\ = f_{i} - {[Φ Λ_{i} Φ^{T} + λ I]}^{- 1} [Φ (p_{i} - y) + λ f_{i}] \\ = f_{i} - {[Φ Λ_{i} Φ^{T} + λ I]}^{- 1} Φ (p_{i} - y) - λ {[Φ Λ_{i} Φ^{T} + λ I]}^{- 1} f_{i} \\ \approx f_{i} - {[Φ Λ_{i} Φ^{T} + λ I]}^{- 1} Φ (p_{i} - y) \\ = f_{i} - Φ R_{i} (p_{i} - y) \end{matrix}

(17)

where

R_{i} = {[Λ_{i} Φ^{T} Φ + λ I_{M}]}^{- 1}

,

I_{M}

is an

M \times M

identity matrix, and the subscript “i” indicates the ith iteration for learning function f. The Hilbert function,

f (x)

, is evaluated iteratively as follows,

\begin{matrix} f_{i + 1} (x) & = {〈f_{i + 1}, φ (x)〉}_{H} \\ = f_{i} (x) - φ^{T} (x) Φ R_{i} (p_{i} - y) \\ = f_{i} (x) - k^{T} R_{i} (p_{i} - y) \end{matrix}

(18)

where

k = Φ^{T} φ (x) = {[k (x_{1}, x), \dots, k (x_{M}, x)]}^{T}

.

It can be seen from (18) that the evaluation of

f_{i + 1} (x)

only depends on the evaluations of

f_{i} (x)

and

k (x_{m}, x)

,

m = 1, \dots, M

. Therefore, the evaluation

f_{i} (x)

for each iteration is not needed if

x \notin {x_{1}, \dots, x_{M}}

. Instead,

f_{i} (x)

can be calculated at the last iteration directly from the evaluations of

f_{i} (x_{m})

, such that

f_{i} (x) = f_{0} (x) - k^{T} \sum_{i^{'} = 1}^{i} R_{i^{'}} (p_{i^{'}} - y) = f_{0} (x) - k^{T} S_{i}

(19)

where

f_{0} (x)

is the initial function evaluation at

x

, prior to learning the function from

M

. The following matrix

S_{i} ≜ \sum_{i^{'} = 1}^{i} R_{i^{'}} (p_{i^{'}} - y)

(20)

only depends on the measurement data,

M

, and can be updated iteratively.

Furthermore, consider Q collocation points,

X^{c} = {x_{q}^{c} \in W}_{q = 1}^{Q}

, in the ROI, characterized by the same spatial interval, labeled by the superscript “c”. Let

Φ^{c} = [φ (x_{1}^{c}), \dots, φ (x_{Q}^{c})]

denote the feature matrix of the collocation points. The evaluations of function

f (x_{q}^{c})

at all collocation points can be updated by,

f_{i + 1} = f_{i} - {Φ^{c}}^{T} Φ R_{i} (p_{i} - y)

(21)

where

f_{i} = {Φ^{c}}^{T} f_{i} = {[f_{i} (x_{1}^{c}), \dots, f_{i} (x_{Q}^{c})]}^{T}

. According to (19), the evaluations,

f_{i} = {Φ^{c}}^{T} f_{i}

, can be updated directly by,

f_{i} = f_{0} - {Φ^{c}}^{T} Φ S_{i}

(22)

where

f_{0}

comprises the initial function evaluations at the collocation points prior to learning the function from the measurement data

M

.

In summary, instead of learning the function f or its coefficients as in traditional KLR or GPR [26], we update the evaluations of

f (x)

at the collocation points,

X^{c}

. Given the Hilbert function f, the evaluations at the collocation points provide the Hilbert map,

f

, defined as

f ≜ {Φ^{c}}^{T} f = {[f (x_{1}^{c}), \dots, f (x_{Q}^{c})]}^{T}

(23)

3.2. Temporal Update of Hilbert Map

The previous subsection develops a method for learning the Hilbert map from the measurement data set

M

, obtained by the sensors during one time interval. This subsection considers a Hilbert map updated according to the next data set,

M_{k} = {(x_{k, m}, y_{k, m})}_{m = 1}^{M_{k}}

, obtained during the kth time interval. Here,

M_{k}

is the number of measurements in the kth time interval, and

X_{k} = {x_{k, m}}_{m = 1}^{M_{k}}

and

Y_{k} = {y_{k, m}}_{m = 1}^{M_{k}}

are the measurement positions and the corresponding classification estimates, respectively. To learn the next Hilbert map, the loss function over a period T can be expressed as,

J_{T} = \sum_{k = 1}^{T} [γ (T - k) \sum_{m = 1}^{M_{k}} ℓ_{k, m}] + \frac{λ}{2} {∥f∥}_{H}^{2}

(24)

where

λ

is the regularization term as in (13),

γ (T - k)

is the “forgetting factor”, and

ℓ_{k, m} = - ln [P (Y = y_{k, m} | x_{k, m})]

is the NLL of the data

(x_{k, m}, y_{k, m})

. Similarly to (14), the gradient is expressed as,

\begin{matrix} g_{T} & = \frac{\partial J_{T}}{\partial f} = \sum_{k = 1}^{T} [γ (T - k) \sum_{m = 1}^{M_{k}} \frac{\partial}{\partial f} ℓ_{k, m}] + λ f \\ = [Φ_{1}, \dots, Φ_{T}] Γ_{T} {[{(p_{1} - y_{1})}^{T}, \dots, {(p_{T} - y_{T})}^{T}]}^{T} \\ = {\tilde{Φ}}_{T} Γ_{T} {[{(p_{1} - y_{1})}^{T}, \dots, {(p_{T} - y_{T})}^{T}]}^{T} \end{matrix}

(25)

where

Φ_{k} = [φ (x_{k, 1}), \dots, φ (x_{k, M_{k}})]

,

p_{k} = {[p (x_{k, 1}), \dots, p (x_{k, M_{k}})]}^{T} = {[\frac{e^{f (x_{k, 1})}}{1 + e^{f (x_{k, 1})}}, \dots, \frac{e^{f (x_{k, M_{k}})}}{1 + e^{f (x_{k, M_{k}})}}]}^{T}

,

y_{k} = {[y_{k, 1}, \dots, y_{k, M_{k}}]}^{T}

for

k = 1, \dots, T

, and

{\tilde{Φ}}_{T} = [Φ_{1}, \dots, Φ_{T}]

. Furthermore,

Γ_{T}

is a diagonal matrix obtained by placing the vector,

{[\begin{matrix} γ (T - 1) 1_{M_{1}}^{T} & \dots & γ (T - k) 1_{M_{k}}^{T} & \dots & γ (0) 1_{M_{T}}^{T} \end{matrix}]}^{T}

on the diagonal of a zero matrix. In addition, as with (15), the Hessian operator can be expressed as,

\begin{matrix} H_{T} & = \frac{\partial g_{T}}{\partial f} = [Φ_{1}, \dots, Φ_{T}] Γ_{T} {\tilde{Λ}}_{T} {[Φ_{1}^{T}, \dots, Φ_{T}^{T}]}^{T} + λ I \\ = {\tilde{Φ}}_{T} Γ_{T} {\tilde{Λ}}_{T} {\tilde{Φ}}_{T}^{T} + λ I \end{matrix}

(26)

where,

{\tilde{Λ}}_{T} = [\begin{matrix} Λ_{1} & \dots & 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & Λ_{k} & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & 0 & \dots & Λ_{T} \end{matrix}]

(27)

and

Λ_{k} = p_{k} {(1_{M_{k}} - p_{k})}^{T}

.

Then, the update rule for learning the Hilbert function, f, at the ith iteration is given by,

\begin{matrix} f_{T, i + 1} & = f_{T, i} - H_{T, i}^{- 1} g_{T, i} \\ \approx f_{T, i} - {[{\tilde{Φ}}_{T} Γ_{T} {\tilde{Λ}}_{T, i} {\tilde{Φ}}_{T}^{T} + λ I]}^{- 1} {\tilde{Φ}}_{T} Γ_{T} {[{(p_{1, i} - y_{1})}^{T}, \dots, {(p_{T, i} - y_{T})}^{T}]}^{T} \\ = f_{T, i} - {\tilde{Φ}}_{T} {\tilde{R}}_{i} Γ_{T} {[{(p_{1, i} - y_{1})}^{T}, \dots, {(p_{T, i} - y_{T})}^{T}]}^{T} \end{matrix}

(28)

where

{\tilde{R}}_{i} = {[Γ_{T} {\tilde{Λ}}_{T, i} {\tilde{Φ}}_{T} \tilde{Φ} + λ I_{M_{t o l}}]}^{- 1}

and

M_{t o l} = \sum_{k = 1}^{T} M_{k}

. According to the above equation, the function, f, is expressed by using the set of all the measurement positions,

{⋃_{k = 1}^{T} X_{k}}

, and the corresponding coefficients.

To reduce the computational load, the forgetting factor,

γ (T - k)

, is modeled by,

γ (T - k) = \{\begin{matrix} 1 & if T - k < τ \\ 0 & otherwise \end{matrix}

(29)

such that each robot stores only the measurements obtained during the past

τ

time steps. By setting

τ = 1

, the update rule (28) for learning the function f at the ith iteration is expressed as,

\begin{matrix} f_{T, i + 1} & = f_{T, i} - {[Φ_{T} Λ_{T} Φ_{T}^{T} + λ I]}^{- 1} Φ_{T} (p_{T} - y_{T}) \\ = f_{T, i} - Φ_{T} R_{T, i} (p_{T, i} - y_{T}) \end{matrix}

(30)

where

R_{T, i} = {[Λ_{i} Φ_{T}^{T} Φ_{T} + λ I_{M_{T}}]}^{- 1}

.

3.3. Approximation of GDM

The previous subsections present a method for learning the Hilbert function, f, from the available measurement data, comprising the measurement position data set

X_{k} = {x_{k, m}}_{m = 1}^{M_{k}}

and corresponding classification estimates

Y_{k} = {y_{k, m}}_{m = 1}^{M_{k}}

for

k = 1, \dots, T

that indicate whether the concentration measurement at the measurement position belongs to the range,

R

, defined in (9). If L classification estimates are obtained from the same concentration measurements, indicating that the concentration measurement belongs to the L concentration ranges, such as

R_{1}, \dots, R_{L}

, respectively, using the learning method described in Section 3.2, one can approximate L Hilbert functions,

f_{1}, \dots, f_{L}

. Furthermore, the probabilities in (6) can be evaluated at all the collocation points,

X^{c} = {x_{q}}_{q = 1}^{Q}

, such that

p_{l} (x_{q}^{c}) = 1 - \frac{1}{e^{f_{l} (x_{q}^{c})}}, q = 1, \dots, Q and l = 1, \dots, L

(31)

Now, consider Hilbert functions,

{f_{l}}_{l = 1}^{L}

, approximated locally by each robot. Although all the concentration intervals are mutually exclusive and complete, i.e.,

\begin{matrix} R_{i} \cap R_{j} & = \emptyset \end{matrix}

(32)

\begin{matrix} \overset{L}{\underset{i = 1}{\cup}} R_{i} & = [L_{c}, H_{c}], i, j = 1, \dots, L and i \neq j \end{matrix}

(33)

it is not guaranteed that the sum of all the learned probabilities,

{p_{l} (x_{q}^{c})}_{l = 1}^{L}

, is equal to one, or

\sum_{l = 1}^{L} p_{l} (x_{q}^{c}) = 1

. Therefore, the learned probabilities must be normalized to obtain the discrete distribution,

π (x_{q}^{c}) = [π_{1} (x_{q}^{c}), \dots, π_{L} (x_{q}^{c})]

. Each component of the discrete distribution is calculated by,

π_{l} (x_{q}^{c}) = \frac{p_{l} (x_{q}^{c}) \prod_{1 \leq l^{'} \neq l \leq L} [1 - p_{l^{'}} (x_{q}^{c}]}{p_{0} (x_{q}^{c})} \propto \frac{p_{l} (x_{q}^{c})}{1 - p_{l} (x_{q}^{c})}

(34)

where

p_{0} (x_{q}^{c})

is a normalization term. Let

\bar{c} = [{\bar{c}}_{1}, \dots, {\bar{c}}_{L}]

denote the medians of these intervals, where

{\bar{c}}_{l} = (H_{l} - L_{l}) / 2

. Then, the expectation of the random variable

C (x_{q}^{c})

can be expressed as,

E [C (x_{q}^{c})] = π {(x_{q}^{c})}^{T} \bar{c}

(35)

where

E (\cdot)

is the expectation operator.

4. Information Fusion

In this section, the problem of information fusion between neighboring robots is considered, where each robot builds its own Hilbert function locally using a decentralized approach. Assume that all the robots use the same collocation points,

X^{c}

. To limit the amount of communicated data, the evaluations of the Hilbert functions at the collocation points are shared among the robots, instead of the coefficients and parameters of the Hilbert functions.

4.1. Hilbert Map Fusion

Consider two robots that have each learned their respective Hilbert functions,

f_{1}

and

f_{2}

, from two different sets of measurement data,

M_{1}

and

M_{2}

, respectively. Then, the fused Hilbert function,

f_{F}

, can be obtained based on the following theorem.

Theorem 1.

Let

f_{1} (x)

and

f_{2} (x)

be two Hilbert functions defined on a workspace

W

, and approximated by two robots based on their own measurement data sets,

M_{1}

and

M_{2}

, respectively. These two Hilbert functions can be applied to calculate the conditional probability that the concentration is in the range

R

given the corresponding measurement data sets, as follows:

\begin{matrix} p_{1} (x | M_{1}) & = 1 - \frac{1}{1 + e^{f_{1} (x)}} \end{matrix}

(36)

\begin{matrix} p_{2} (x | M_{2}) & = 1 - \frac{1}{1 + e^{f_{2} (x)}} \end{matrix}

(37)

Then, the fused conditional probability

p_{F} (x | M_{1}, M_{2})

can be expressed as

p_{F} (x | M_{1}, M_{2}) = 1 - \frac{1}{1 + e^{f_{F} (x)}}

(38)

where

f_{F} (x)

is the fused Hilbert function. In addition, the fused Hilbert function,

f_{F} (x)

, can be calculated from the Hilbert functions,

f_{1} (x)

and

f_{2} (x)

, as follows,

f_{F} (x) = f_{1} (x) + f_{2} (x) - ln ε

(39)

where

ε = \frac{p (x)}{1 - p (x)}

is the ratio between the prior probabilities,

P (C (x) \in R) = p (x)

and

P (C (x) \notin R) = 1 - p (x)

, at

x \in W

.

The proof of Theorem 1 is provided in the Appendix A.1. According to Theorem 1, the following corollary can be obtained.

Corollary 1.

Assume that the prior probability is even, such as

p (x) = 1 / 2

. Then, the ratio between the prior probabilities can be calculated by,

ε = \frac{p (x)}{1 - p (x)} = 1

(40)

and the fusion function can be rewritten as

f_{F} (x) = f_{1} (x) + f_{2} (x)

(41)

Because the prior concentration distribution is unknown, the Hilbert functions are fused according to (41) in Corollary 1, unless otherwise stated. Furthermore, assume that all the robots have the same collocation points,

X^{c}

. The information fusion can be implemented by fusing the Hilbert maps among neighboring robots,

f_{H, F} = f_{H, 1} + f_{H, 2}

(42)

where

f_{H, 1}

and

f_{H, 2}

, and

f_{H, F}

are the Hilbert maps associated with the Hilbert functions

f_{1} (x)

,

f_{2} (x)

, and

f_{F} (x)

, respectively.

4.2. Communication Strategy

Any pair of robots in the network can share and update their Hilbert maps efficiently according to (42). To implement data fusion for a very large number of robots, however, a communication strategy is also required to determine how to share data between robots characterized by active communication links. In this paper, gas measurement data are communicated at every time step

T_{c}

. The communication protocol requires four steps at every time k, as follows. In the first, the N robots are partitioned into smaller communication networks according to their positions and communication range

r_{c}

, and

G_{ı}

denotes the index set of the robot in the ıth communication network. Then, for any robot,

a_{n}

, with

n \in G_{ı}

, there exists another robot,

a_{n^{'}}

, such that

d_{n, n^{'}} = ∥ x_{n} - x_{n^{'}} ∥ \leq r_{c}, n, n^{'} \in G_{ı}

(43)

where

d_{n, n^{'}}

is the distance between robots

a_{n}

and

a_{n^{'}}

, and

∥ \cdot ∥

indicates the Euclidean norm.

As a second step, one robot in every communication network is selected as a temporary data center (TDC) denoted by

a_{n_{ı}^{*}}

,

n_{ı}^{*} \in G_{ı}

. The other robots in

G_{ı}

send the Hilbert map change

Δ f_{n, k}

to robot

a_{n_{ı}^{*}}

. The Hilbert map change,

Δ f_{n, k}

, is defined as the change between the nth robot’s Hilbert map at the current time step k and its Hilbert map at the previous communication time step,

k - T_{c}

, such that

Δ f_{n, k} = f_{n, k} - f_{n, k - T_{c}}, n \in G_{ı}

(44)

where

f_{n, k}

and

f_{n, k - T_{c}}

denote the Hilbert maps of the nth robot at times kth and

(k - T_{c})

th, respectively.

In the third step, the sum of the Hilbert map changes obtained from all robots in

G_{ı}

, defined as,

Δ f_{G_{ı}, k} ≜ \sum_{n \in G_{ı}} Δ f_{n, k}

(45)

is communicated back to the other robots, except the TDC robot,

a_{n_{ı}^{*}}

. Finally, in the fourth step, all the robots update their own Hilbert maps by adding the received total Hilbert map changes,

Δ f_{G_{ı}, k}

, to the current Hilbert maps, such that

f_{n, k^{+}} = f_{n, k} + Δ f_{G_{ı}, k}, n \in G_{ı}

(46)

where

f_{n, k^{+}}

represents the Hilbert map after the data fusion process.

5. Path-Planning Algorithms

In the previous sections, the Hilbert maps

f_{l, n, k}

,

l = 1, . ., L

,

n = 1, \dots, N

and

k = 1, \dots, T_{f}

, are approximated and updated by the robots. The corresponding binary probabilities

p_{l, n, k}

can be calculated efficiently as follows

\begin{matrix} p_{l, n, k} & = {[p_{l, n, k} (x_{1}^{c}), \dots, p_{l, n, k} (x_{Q}^{c})]}^{T} \\ = {[\frac{e^{f_{l, n, k} (x_{1}^{c})}}{1 + e^{f_{l, n, k} (x_{1}^{c})}}, \dots, \frac{e^{f_{l, n, k} (x_{1}^{c})}}{1 + e^{f_{l, n, k} (x_{1}^{c})}}]}^{T} \end{matrix}

(47)

and the entropy at each collocation point

x_{q}^{c} \in X^{c}

,

q = 1, \dots, Q

, is obtained by

h_{l, n, k} (x_{q}^{c}) = - p_{l, n, k} (x_{q}^{c}) log [p_{l, n, k} (x_{q}^{c})] - [1 - p_{l, n, k} (x_{q}^{c})] log [1 - p_{l, n, k} (x_{q}^{c})]

(48)

Then, an entropy map,

h_{n, k}

is obtained from the vector,

h_{n, k} = {[h_{n, k} (x_{1}^{c}), \dots, h_{n, k} (x_{Q}^{c})]}^{T}

(49)

where

h_{n, k} (x_{q}^{c}) = \sum_{l = 1}^{L} h_{l, n, k} (x_{q}^{c})

denotes the sum of the entropy at the collocation point

x_{q}^{c}

.

According to the cost function in (5), the objective of the nth robot is to minimize the uncertainty of the concentration distribution, which can be implemented by minimizing the entropy at all the collocation points, such that

{\tilde{J}}_{n} (k) = \sum_{q = 1}^{Q} h_{n, k} (x_{q}^{c})

(50)

Therefore,

{\tilde{J}}_{n} (T_{f})

can be treated as an approximation of the term

\int_{W} H [C_{n} (x | M_{n}^{(T_{f})})] d x

in (5). In the rest of paper,

{\tilde{J}}_{n} (T_{f})

is applied as the approximation of the final cost function in (5) unless otherwise stated. In other words, the nth robot should visit the area around the collocation point

x_{q}^{c}

, which has the higher value of

h_{n, k} (x_{q}^{c})

at the kth time step. Based on this idea, two entropy-based path-planning algorithms are proposed in the following section to control the robots such that the value of the concentration measurements obtained by the robots in the ROI can be optimized over time.

5.1. Entropy-Based Artificial Potential Field

An information-driven approach is developed by planning the path of the robots such that they move towards collocation points with higher entropy. The collocation points are treated as goal positions characterized by attractive artificial potential fields defined as,

U^{a t t} (x_{n}, x_{q}^{c}) = - \frac{h_{n, k} (x_{q}^{c})}{∥ x_{n} - x_{q}^{c} ∥^{2}}, q = 1, \dots, Q and n = 1, \dots, N

(51)

where the superscript “att” indicates the attractive field. The corresponding attractive gradient is expressed as

g^{a t t} (x_{n}, x_{q}^{c}) = \frac{\partial U_{q}^{a t t} (x_{n})}{\partial x_{n}} = \frac{U_{q}^{a t t} (x_{n})}{∥ x_{n} - x_{q}^{c} ∥^{4}} (x_{n} - x_{q}^{c}), q = 1, \dots, Q

(52)

Similarly to classic artificial potential field methods, the repulsive potential functions

U^{r e p}

generated from the other robots are also considered, such that

U^{r e p} (x_{n}, x_{n^{'}}) = \{\begin{matrix} \frac{1}{2} {(\frac{1}{∥ x_{n} - x_{n^{'}} ∥} - \frac{1}{r_{r e p}})}^{2}, & ∥ x_{n} - x_{n^{'}} ∥ \leq r_{r e p} \\ 0, & ∥ x_{n} - x_{n^{'}} ∥ > r_{r e p} \end{matrix}, 1 \leq n \neq n^{'} \leq N

(53)

where

r_{r e p}

is the distance threshold to create a repulsion effect on the robot. The repulsive gradient is expressed as

g^{r e p} (x_{n}, x_{n^{'}}) = \{\begin{matrix} - (\frac{1}{∥ x_{n} - x_{n^{'}} ∥} - \frac{1}{r_{r e p}}) \frac{x_{n} - x_{n^{'}}}{∥ x_{n} - x_{n^{'}} ∥^{3}}, & ∥ x_{n} - x_{n^{'}} ∥ \leq r_{r e p} \\ 0, & ∥ x_{n} - x_{n^{'}} ∥ > r_{r e p} \end{matrix}, 1 \leq n \neq n^{'} \leq N

(54)

Using (52) and (54), the total potential gradient for the nth robot is expressed as,

g^{t o l} (x_{n}) = ξ \sum_{q = 1}^{Q} g^{a t t} (x_{n}, x_{q}^{c}) + η \sum_{1 \leq n^{'} \neq n \leq N} g^{r e p} (x_{n}, x_{n^{'}})

(55)

where

ξ

and

η

are user-defined coefficients.

Based on the total gradient,

g^{t o l} (x_{n})

, the nth robot can be controlled to visit the collocation points with higher uncertainty and avoid collisions with other robots. The algorithm is developed based on the entropy attraction, which is named as EAPF algorithm.

5.2. Entropy-Based Particle Swarm Optimization

The particle swarm optimization (PSO) algorithm proposed by Clerc and Kennedy in [33] and its variants use a “constriction coefficient” to prevent the “explosion behavior” of the particles, and have been successfully applied to GDM and gas source localization problems [12,13]. In the original PSO algorithm and its variants, the concentration measurements are used to update the robot controls. Considering the different objective function in (5), an entropy-based PSO (EPSO) is proposed.

At the kth time step, the update of the nth robot position,

x_{n, k}

, can be described as

\begin{matrix} ν_{n, k} & = χ [ν_{n, k - 1} + ψ_{1} g^{a t t} (x_{n, k}, b_{n, k}^{n e i g}) + ψ_{2} g^{a t t} (x_{n, k}, b_{n, k}^{g l o b})] + η \sum_{1 \leq n^{'} \neq n \leq N} g^{r e p} (x_{n, k}, x_{n^{'}, k}) \\ x_{n, k + 1} & = x_{n, k} + ν_{n, k} \end{matrix}

(56)

where

ν_{n, k}

represents the velocity of the nth robot at the kth time step,

b_{n, k}^{n e i g}

and

b_{n, k}^{g l o b}

are the best neighboring and global collocation points, respectively. The best neighboring collocation point,

b_{n, k}^{n e i g}

, is determined by,

b_{n, k}^{n e i g} = \underset{x \in X^{c}, ∥ x_{n, k} - x ∥ \leq r_{n e i g}}{arg max} h_{n, k} (x)

(57)

where

r_{n e i g}

is a coefficient which specifies the neighbor area from

x_{n, k}

. The best global collocation point,

b_{n, k}^{g l o b}

, is determined by

b_{n, k}^{g l o b} = \underset{x \in X^{c}}{arg max} h_{n, k} (x)

(58)

The learning coefficients,

ψ_{1} \in [0, {\bar{ψ}}_{1}]

and

ψ_{2} \in [0, {\bar{ψ}}_{2}]

, are two uniform random variables. The constant parameter

χ > 0

prevents the explosion behavior. For efficient performance and prevention of the explosion behavior in (56), the parameter settings of the learning coefficients proposed in [12,13,33] is applied. The constriction parameter

χ > 0

is calculated by (refer to [33]),

χ = \{\begin{matrix} \frac{2 κ}{ψ - 2 + \sqrt{ψ^{2} - 4 ψ}}, & if ψ > 4 \\ κ, & o t h e r w i s e \end{matrix}

(59)

where

ψ = {\bar{ψ}}_{1} + {\bar{ψ}}_{2}

and

κ \in [0, 1]

.

6. Simulations and Results

The performance of the decentralized GDM methods presented in this paper is demonstrated on a gas sensing application, where information about gas concentration obtained by a large network of robots is fused and used for information-driven path planning in a decentralized approach. Two indoor and outdoor environments are simulated and used to test the proposed methods. The new entropy-based EAPF and EPSO path-planning algorithms are compared to the existing algorithms known as random walk (RW) and classical particle swarm optimization (CPSO) [12,13,33].

6.1. Indoor GDM Sensing

The decentralized sensing system consists of a network of

N = 100

robots characterized by single integrator dynamics,

{\dot{x}}_{i} = u_{i}, i = 1, \dots, N

(60)

where

x_{i} = {[x, y]}^{T}

is the robot state vector,

x, y

are the inertial coordinates, and

u_{i} = {[u_{1}, u_{2}]}^{T}

is the robot control vector comprised of the x- and y-velocity components. The robot state/position is assumed to be observable by a built-in GPS with zero-mean measure noise w, where w is Gaussian white noise

N (0, Σ)

with

Σ = 0.05 \times I_{2}

. The above distributed sensing system is tasked with mapping a gas distribution in an ROI

W = [0, L_{x}] \times [0, L_{y}]

where

L_{x} = 200

m and

L_{y} = 160

m, with an unknown gas distribution shown in Figure 1, and over a fixed time interval

[0, T_{f}]

, where

T_{f} = 500

min. The normalized gas concentration range

R = [0, 100]

(Figure 1) is divided into

L = 3

intervals:

R_{1} = [0, 30)

,

R_{2} = [30, 70)

, and

R_{3} = [70, 100]

, representing low-hazard, medium-hazard, and high-hazard concentrations, respectively.

The robots are initially deployed at four corners in the ROI by sampling from a given Gaussian Mixed Model with 4 components where

μ_{1} = {[20, 20]}^{T}

,

μ_{2} = {[20, 140]}^{T}

,

μ_{3} = {[180, 20]}^{T}

and

μ_{4} = {[180, 140]}^{T}

, and identical covariance matrices,

Σ = [\begin{matrix} 10 & 0 \\ 0 & 10 \end{matrix}]

. The initial robot positions are shown as red points in Figure 1. Each robot is equipped with a small metal oxide sensor array comprised by

M = 5 \times 5 = 25

gas sensors, where the spatial intervals between two sensors are all 10 cm. The FOV of each sensor array covers an area of

40 \times 40

cm

^{2}

in the ROI, which is very small relative to the whole ROI. Assume that the measurement noise of the gas concentration

V (x)

is also characterized by Gaussian white noise,

N (0, Σ_{c} (x))

with the covariance matrix

Σ_{c} (x) = 0.5 \times I_{2}

everywhere in

W

. All of robots communicate with each other at the same time with a communication period

T_{c} = 10

min. The communication range is

r_{c} = 30

m. In addition,

Q = 100 \times 100

virtual collocation points are evenly deployed in the ROI to generate the Hilbert maps, where the Gaussian kernel is used for Hilbert function learning with a kernel size of

σ = 2

m, and the parameter

τ

is set to 3.

The EPSO and the CPSO neighbor range coefficient,

r_{n e i g} = 5

m, is applied to determine the best neighboring collocation points. The maximum velocity of each robot is 2 m/min for all simulations. The four path-planning algorithms, referred to as EAPF, EPSO, CPSO, and RW, are tested for mapping the gas distribution. The approximated cost function

{\tilde{J}}_{n} (k)

defined in (50) is calculated by each robot at every time step as shown in Figure 2, where the solid line represents the mean of the approximated cost values over all of the robots at the kth time step, and the dashed line indicates one standard deviation above and below the mean. These cost histories reflect how effective the robots are at mapping the gas distribution. A lower value of

{\tilde{J}}_{n} (k)

indicates that more information about the gas distribution is obtained by the robot at the kth step. It can be observed that the EAPF and EPSO algorithms achieve significantly better performance than the others. They can map the gas distribution more rapidly and completely, while the CPSO and RW algorithms perform poorly because they do not used prior measurement data for planning.

The means and standard deviations of the approximated cost function values of the all robots at the final time

k = T_{f} = 500

for all the algorithms are tabulated in Table 1. It can be seen that the approximated cost function obtained by the EPSO algorithm outperforms the other algorithms in the indoor environment. Let

n^{*}

denote the index of the robot who gets the lowest value of

{\tilde{J}}_{n} (T_{f})

at the final step. Then, the value of

{\tilde{J}}_{n^{*}} (k)

represents the best performance among all the robots. The estimates of the gas concentration in the ROI are calculated based on these learned Hilbert maps according to (35), where

\bar{c} = [15, 50, 85]

is calculated from the cutoff coefficients. For each path-planning algorithm, three snapshots of the estimated GDMs generated by the

n^{*}

th robot in each simulation are presented in Figure 3, where the snapshots are taken at the

k = 20

th,

k = 100

th,

k = 500

th time steps, respectively. It can be seen that the robots controlled by the EAPF and EPSO algorithms obtain clean GDMs at the final time step, while the robots controlled by the other two existing algorithms cannot complete the mapping task in the given time period.

The normalized mean square errors (NMSE) between the estimated gas distribution and the actual gas distribution are calculated for each robot. The means and the corresponding standard deviations of the NMSE over the all robots for the different planning algorithms are reported in Table 2, which obviously shows that the EPSO algorithm outperforms the other algorithms to estimate the GDMs.

6.2. Outdoor GDM

To verify the robustness and versatility of the proposed approaches, a GDM shown in Figure 4, originally presented in [34], is used to represent the gas distribution in an outdoor environment. The gas concentrations are normalized to the range

R = [0, 100]

for comparison like the indoor simulations. In this case, the intervals are chosen as

R_{1} = [0, 60)

,

R_{2} = [60, 80)

, and

R_{3} = [80, 100]

, to represent the plume shapes. All other parameters, including the initial robot positions, are the same as those used in the previous subsection.

The approximated cost function

{\tilde{J}}_{n} (k)

obtained by all four algorithms is plotted in Figure 5. The approximated cost function at the final step are also tabulated in Table 3. Similarly to the indoor simulations, the gas concentration is estimated according to (35), where

\bar{c} = [30, 70, 90]

is calculated based on the cutoff coefficients. The evolution of the estimated GDM obtained by different algorithms is presented in Figure 6. Furthermore, the statistical results of the NMSE between the estimated GDM and the actual GDM are reported in Table 4.

As expected, the results presented in Figure 5 and Figure 6 and Table 3 and Table 4 all show that the proposed EAPF and EPSO algorithms work well in the outdoor GDM problem, while the CPSO and RW algorithms cannot map the gas concentration in the entire workspace in the given time period. In addition, the EPSO algorithm significantly outperforms the other algorithms in all simulations.

7. Conclusions

This paper presents a decentralized framework for GDM and information-driven path planning in distributed sensing systems. GDM is performed using a probabilistic representation known as a Hilbert map and a novel Hilbert map fusion method is presented that quickly and efficiently combines information from many neighboring robots. In addition, two entropy-based path-planning algorithms, namely the EAPF and EPSO algorithms, are proposed to efficiently control all the robots to obtain the gas concentration measurements in the ROI. The proposed approaches are demonstrated on a system with hundreds of robots that must map a gas distribution collaboratively over a large ROI using on-board iron-oxide arrays and no prior information. The results show that through fusion and decentralized processing, the entropy of the gas map decreases over time, the robot paths remain safe (avoiding mutual collisions), and the entropy-based methods far outperform both traditional and random approaches.

Author Contributions

Conceptualization, P.Z., J.M. and S.F.; Formal analysis, P.Z. and J.M.; Funding acquisition, S.F.; Methodology, P.Z.; Software, P.Z.; Project administration, S.F.; Validation, B.D., R.L. and S.F.; Writing—original draft, J.M. and P.Z.; Writing—review and editing, all authors.

Funding

This research was partially funded by National Science Foundation grant ECCS-1556900 and the Office of Naval Research, Code 321.

Conflicts of Interest

The authors declare no conflict of interest and the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Appendix A.1. Proof of Theorem 1

Given two Hilbert functions

f_{1} (x)

and

f_{2} (x)

, the conditional probabilities that the concentration at the position

x

is belong the range

R

are expressed as

\begin{matrix} p_{1} (x | M_{1}) & = P (C (x) \in R | M_{1}) \end{matrix}

(A1)

\begin{matrix} p_{2} (x | M_{2}) & = P (C (x) \in R | M_{2}) \end{matrix}

(A2)

For convenience, the event

C (x) \in R

and its complement even

C (x) \notin R

are denoted by e and

e^{C}

, respectively. Then, the fused conditional probability,

p_{F} (x | M_{1}, M_{2})

, can be expressed as,

\begin{matrix} p_{F} (x | M_{1}, M_{2}) & = P (e | M_{1}, M_{2}) \\ = \frac{P (e, M_{1}, M_{2})}{P (M_{1}, M_{2})} \\ = \frac{P (e) P (M_{1} | e) P (M_{2} | e)}{P (e) P (M_{1} | e) P (M_{2} | e) + P (e^{C}) P (M_{1} | e^{C}) P (M_{2} | e^{C})} \\ = \frac{P (e) P (M_{1} | e) P (e) P (M_{2} | e)}{P {(e)}^{2} P (M_{1} | e) P (M_{2} | e) + ε P {(e^{C})}^{2} P (M_{1} | e^{C}) P (M_{2} | e^{C})} \\ = \frac{P (e, M_{1}) P (e, M_{2})}{P (M_{1}) P (e | M_{1}) P (M_{2}) P (e | M_{2}) + ε P (M_{1}) P (e^{C} | M_{1}) P (M_{2}) P (e^{C} | M_{2})} \\ = \frac{P (e | M_{1}) P (e | M_{2})}{P (e | M_{1}) P (e | M_{2}) + ε P (e^{C} | M_{1}) P (e^{C} | M_{2})} \\ = \frac{p_{1} (x | M_{1}) p_{2} (x | M_{2})}{p_{1} (x | M_{1}) p_{2} (x | M_{2}) + ε [1 - p_{1} (x | M_{1})] [1 - p_{2} (x | M_{2})]} \\ = \frac{\frac{e^{f_{1} (x)}}{1 + e^{f_{1} (x)}} \frac{e^{f_{2} (x)}}{1 + e^{f_{2} (x)}}}{\frac{e^{f_{1} (x)}}{1 + e^{f_{1} (x)}} \frac{e^{f_{2} (x)}}{1 + e^{f_{2} (x)}} + ε \frac{1}{1 + e^{f_{1} (x)}} \frac{1}{1 + e^{f_{2} (x)}}} \\ = \frac{e^{f_{1} (x) + f_{2} (x)}}{e^{f_{1} (x) + f_{2} (x)} + ε} \\ = \frac{e^{f_{1} (x) + f_{2} (x) - ln ε}}{e^{f_{1} (x) + f_{2} (x) - ln ε} + 1} \\ = \frac{e^{f_{F}} (x)}{e^{f_{F}} (x) + 1} \end{matrix}

(A3)

where,

f_{F} (x) = f_{1} (x) + f_{2} (x) - ln ε

(A4)

□

References

Neumann, P.P.; Asadi, S.; Lilienthal, A.J.; Bartholmai, M.; Schiller, J.H. Autonomous gas-sensitive microdrone: Wind vector estimation and gas distribution mapping. IEEE Robot. Autom. Mag. 2012, 19, 50–61. [Google Scholar] [CrossRef]
Rossi, M.; Brunelli, D. Autonomous gas detection and mapping with unmanned aerial vehicles. IEEE Trans. Instrum. Meas. 2016, 65, 765–775. [Google Scholar] [CrossRef]
Lilienthal, A.; Loutfi, A.; Blanco, J.L.; Galindo, C.; Gonzalez, J. Integrating SLAM into gas distribution mapping. In Proceedings of the ICRA Workshop on Robotic Olfaction–Towards Real Applications, Freiburg, Germany, 19–21 September 2007. [Google Scholar]
Bayat, B.; Crasta, N.; Crespi, A.; Pascoal, A.M.; Ijspeert, A. Environmental monitoring using autonomous vehicles: A survey of recent searching techniques. Curr. Opin. Biotechnol. 2017, 45, 76–84. [Google Scholar] [CrossRef]
Jelicic, V.; Magno, M.; Brunelli, D.; Paci, G.; Benini, L. Context-adaptive multimodal wireless sensor network for energy-efficient gas monitoring. IEEE Sens. J. 2013, 13, 328–338. [Google Scholar] [CrossRef]
Ishida, H.; Nakamoto, T.; Moriizumi, T. Remote sensing of gas/odor source location and concentration distribution using mobile system. Sens. Actuators B Chem. 1998, 49, 52–57. [Google Scholar] [CrossRef]
Lilienthal, A.; Duckett, T. Building gas concentration gridmaps with a mobile robot. Robot. Auton. Syst. 2004, 48, 3–16. [Google Scholar] [CrossRef]
Stachniss, C.; Plagemann, C.; Lilienthal, A.J.; Burgard, W. Gas distribution modeling using sparse Gaussian process mixture models. In Proceedings of the Robotics: Science and Systems Conference 2008, Zürich, Switzerland, 25–28 June 2008; pp. 310–317. [Google Scholar]
Albertson, J.D.; Harvey, T.; Foderaro, G.; Zhu, P.; Zhou, X.; Ferrari, S.; Amin, M.S.; Modrak, M.; Brantley, H.; Thoma, E.D. A mobile sensing approach for regional surveillance of fugitive methane emissions in oil and gas production. Environ. Sci. Technol. 2016, 50, 2487–2497. [Google Scholar] [CrossRef]
Hayes, A.T.; Martinoli, A.; Goodman, R.M. Distributed odor source localization. IEEE Sens. J. 2002, 2, 260–271. [Google Scholar] [CrossRef]
Jatmiko, W.; Ikemoto, Y.; Matsuno, T.; Fukuda, T.; Sekiyama, K. Distributed odor source localization in dynamic environment. In Proceedings of the 2005 IEEE Sensors, Irvine, CA, USA, 30 October–3 November 2005. [Google Scholar] [CrossRef]
Akat, S.B.; Gazi, V.; Marques, L. Asynchronous particle swarm optimization-based search with a multi-robot system: Simulation and implementation on a real robotic system. Turk. J. Electr. Eng. Comput. Sci. 2010, 18, 749–764. [Google Scholar]
Turduev, M.; Cabrita, G.; Kırtay, M.; Gazi, V.; Marques, L. Experimental studies on chemical concentration map building by a multi-robot system using bio-inspired algorithms. Auton. Agents Multi-Agent Syst. 2014, 28, 72–100. [Google Scholar] [CrossRef]
Sinha, A.; Kaur, R.; Kumar, R.; Bhondekar, A.P. Cooperative control of multi-agent systems to locate source of an odor. arXiv, 2017; arXiv:1711.03819. [Google Scholar]
Rudd, K.; Foderaro, G.; Ferrari, S. A generalized reduced gradient method for the optimal control of multiscale dynamical systems. In Proceedings of the 52nd IEEE Conference on Decision and Control, Florence, Italy, 10–13 December 2013; pp. 3857–3863. [Google Scholar]
Foderaro, G.; Ferrari, S.; Wettergren, T.A. Distributed optimal control for multi-agent trajectory optimization. Automatica 2014, 50, 149–154. [Google Scholar] [CrossRef]
Ferrari, S.; Foderaro, G.; Zhu, P.; Wettergren, T.A. Distributed optimal control of multiscale dynamical systems: A tutorial. IEEE Control Syst. Mag. 2016, 36, 102–116. [Google Scholar]
Rudd, K.; Foderaro, G.; Zhu, P.; Ferrari, S. A Generalized Reduced Gradient Method for the Optimal Control of Very-Large-Scale Robotic Systems. IEEE Trans. Robot. 2017, 33, 1226–1232. [Google Scholar] [CrossRef]
Foderaro, G.; Zhu, P.; Wei, H.; Wettergren, T.A.; Ferrari, S. Distributed optimal control of sensor networks for dynamic target tracking. IEEE Trans. Control Netw. Syst. 2018, 5, 142–153. [Google Scholar] [CrossRef]
Doerr, B.; Linares, R.; Zhu, P.; Ferrari, S. Random Finite Set Theory and Optimal Control for Large Spacecraft Swarms. arXiv, 2018; arXiv:1810.00696. [Google Scholar]
Jiménez, A.; García-Díaz, V.; Bolaños, S. A decentralized framework for multi-agent robotic systems. Sensors 2018, 18, 417. [Google Scholar] [CrossRef]
Lilienthal, A.; Duckett, T. Creating gas concentration gridmaps with a mobile robot. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 1, pp. 118–123. [Google Scholar]
Moravec, H.; Elfes, A. High resolution maps from wide angle sonar. In Proceedings of the 1985 IEEE International Conference on Robotics and Automation, St. Louis, MO, USA, 25–28 March 1985; Volume 2, pp. 116–121. [Google Scholar]
Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
Ramos, F.; Ott, L. Hilbert maps: Scalable continuous occupancy mapping with stochastic gradient descent. Int. J. Robot. Res. 2015, 35, 1717–1730. [Google Scholar] [CrossRef]
Zhu, P.; Chen, B.; Príncipe, J.C. Extended Kalman filter using a kernel recursive least squares observer. In Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 1402–1408. [Google Scholar]
Zhu, P.; Chen, B.; Príncipe, J.C. A novel extended kernel recursive least squares algorithm. Neural Netw. 2012, 32, 349–357. [Google Scholar] [CrossRef] [PubMed]
Zhu, P. Kalman Filtering in Reproducing Kernel Hilbert Spaces. Ph.D. Thesis, University of Florida, Gainesville, FL, USA, 2013. [Google Scholar]
Zhu, P.; Chen, B.; Príncipe, J. Learning nonlinear generative models of time series with a Kalman filter in RKHS. IEEE Trans. Signal Process. 2014, 62, 141–155. [Google Scholar] [CrossRef]
Zhu, P.; Wei, H.; Lu, W.; Ferrari, S. Multi-kernel probability distribution regressions. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–7. [Google Scholar]
Clerc, M.; Kennedy, J. The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evolut. Comput. 2002, 6, 58–73. [Google Scholar] [CrossRef]
Gemerek, J.R.; Ferrari, S.; Albertson, J.D. Fugitive gas emission rate estimation using multiple heterogeneous mobile sensors. In Proceedings of the 2017 ISOCS/IEEE International Symposium on Olfaction and Electronic Nose (ISOEN), Montreal, QC, Canada, 28–31 May 2017; pp. 1–3. [Google Scholar]

Figure 1. Gas concentration distribution in an indoor environment and initial robot deployment in ROI (plotted by red points).

Figure 2. Approximated cost functions for different path-planning algorithms in the indoor environment, where the solid lines represent the mean of the approximated cost values over all the robots at the kth time step and the dashed lines indicate one standard deviation above and below the solid lines.

Figure 3. Evolution of the estimated gas distribution map in the indoor environment generated by the

n^{*}

th robot in each simulation at three instants in time, where the red point indicates the position of the

n^{*}

th robot and white points indicate the other robots, where the sub-figures in the first row (a–c) show the evolution of the estimated gas distribution obtained by the EAPF algorithm; the sub-figures in the second row (d–f) show the evolution of the estimated gas distribution obtained by the EPSO algorithm; the sub-figures in the third row (g–i) show the evolution of the estimated gas distribution obtained by the CPSO algorithm; and the sub-figures in the fourth row (j–l) show the evolution of the estimated gas distribution obtained by the RW algorithm.

Figure 3. Evolution of the estimated gas distribution map in the indoor environment generated by the

n^{*}

th robot in each simulation at three instants in time, where the red point indicates the position of the

n^{*}

th robot and white points indicate the other robots, where the sub-figures in the first row (a–c) show the evolution of the estimated gas distribution obtained by the EAPF algorithm; the sub-figures in the second row (d–f) show the evolution of the estimated gas distribution obtained by the EPSO algorithm; the sub-figures in the third row (g–i) show the evolution of the estimated gas distribution obtained by the CPSO algorithm; and the sub-figures in the fourth row (j–l) show the evolution of the estimated gas distribution obtained by the RW algorithm.

Figure 4. ROI and Gas concentration distribution in an outdoor environment and initial robot deployment in the ROI, where the red points represent the robots.

Figure 5. Approximated cost functions for different path-planning algorithms in the outdoor environment, where the solid lines represent the mean of the approximated cost values over all the robots at the kth time step and the dashed lines indicate one standard deviation above and below the solid lines.

Figure 6. Evolution of the estimated gas distribution map in the outdoor environment generated by the

n^{*}

th robot in each simulation at three instants in time, where the red point indicates the position of the

n^{*}

th robot and white points indicate the other robots, where the sub-figures in the first row (a–c) show the evolution of the estimated gas distribution obtained by the EAPF algorithm; the sub-figures in the second row (d–f) show the evolution of the estimated gas distribution obtained by the EPSO algorithm; the sub-figures in the third row (g–i) show the evolution of the estimated gas distribution obtained by the CPSO algorithm; and the sub-figures in the fourth row (j–l) show the evolution of the estimated gas distribution obtained by the RW algorithm.

Figure 6. Evolution of the estimated gas distribution map in the outdoor environment generated by the

n^{*}

th robot in each simulation at three instants in time, where the red point indicates the position of the

n^{*}

th robot and white points indicate the other robots, where the sub-figures in the first row (a–c) show the evolution of the estimated gas distribution obtained by the EAPF algorithm; the sub-figures in the second row (d–f) show the evolution of the estimated gas distribution obtained by the EPSO algorithm; the sub-figures in the third row (g–i) show the evolution of the estimated gas distribution obtained by the CPSO algorithm; and the sub-figures in the fourth row (j–l) show the evolution of the estimated gas distribution obtained by the RW algorithm.

Table 1. Statistical results of the approximated cost function at the final time step in the indoor environment.

Algorithm	Mean of ${\tilde{J}}_{n} (T_{f})$	Std. of ${\tilde{J}}_{n} (T_{f})$
EAPF	72.01	66.64
EPSO	28.003	36.56
CPSO	19650.11	459.56
RW	15442.17	555.88

Table 2. Statistical results of the NMSE between the estimated GDMs and the actual GDMs at the final time step in the indoor environment.

Algorithm	Mean of NMSE	Std. of NMSE
EAPF	0.17521	0.00934
EPSO	0.17022	0.00432
CPSO	1.72230	0.03225
RW	1.20700	0.12516

Table 3. Statistic results of the approximated cost function at the final time step in the outdoor environment.

Algorithm	Mean of ${\tilde{J}}_{n} (T_{f})$	Std. of ${\tilde{J}}_{n} (T_{f})$
EAPF	243.9842	263.3285
EPSO	19.7781	55.7657
CPSO	20232.4784	147.2573
RW	16510.9489	380.798

Table 4. Statistic results of the estimated gas distribution maps at final time step in the outdoor environment.

Algorithm	Mean of NMSE	Std. of NMSE
EAPF	0.05368	0.00506
EPSO	0.04272	0.00123
CPSO	0.51741	0.00402
RW	0.41734	0.01094

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, P.; Ferrari, S.; Morelli, J.; Linares, R.; Doerr, B. Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps. Sensors 2019, 19, 1524. https://doi.org/10.3390/s19071524

AMA Style

Zhu P, Ferrari S, Morelli J, Linares R, Doerr B. Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps. Sensors. 2019; 19(7):1524. https://doi.org/10.3390/s19071524

Chicago/Turabian Style

Zhu, Pingping, Silvia Ferrari, Julian Morelli, Richard Linares, and Bryce Doerr. 2019. "Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps" Sensors 19, no. 7: 1524. https://doi.org/10.3390/s19071524

APA Style

Zhu, P., Ferrari, S., Morelli, J., Linares, R., & Doerr, B. (2019). Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps. Sensors, 19(7), 1524. https://doi.org/10.3390/s19071524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scalable Gas Sensing, Mapping, and Path Planning via Decentralized Hilbert Maps

Abstract

1. Introduction

2. Problem Formulation

3. Representation of GDM

3.1. Mapping with KLR

3.2. Temporal Update of Hilbert Map

3.3. Approximation of GDM

4. Information Fusion

4.1. Hilbert Map Fusion

4.2. Communication Strategy

5. Path-Planning Algorithms

5.1. Entropy-Based Artificial Potential Field

5.2. Entropy-Based Particle Swarm Optimization

6. Simulations and Results

6.1. Indoor GDM Sensing

6.2. Outdoor GDM

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI