Next Article in Journal
Removal of Gross Artifacts of Transcranial Alternating Current Stimulation in Simultaneous EEG Monitoring
Next Article in Special Issue
Towards Establishing Cross-Platform Interoperability for Sensors in Smart Cities
Previous Article in Journal
Highly Sensitive Diode-Based Micro-Pirani Vacuum Sensor with Low Power Consumption
Article Menu
Issue 1 (January-1) cover image

Export Article

Sensors 2019, 19(1), 189; https://doi.org/10.3390/s19010189

Article
Multi-Type Sensor Placements in Gaussian Spatial Fields for Environmental Monitoring
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
*
Correspondence: [email protected]; Tel.: +852-2857-8425
This paper is an extended version of Sun, C.; Yu, Y.; Li, V.O.; Lam, J.C. Optimal Multi-type Sensor Placements in Gaussian Spatial Fields for Environmental Monitoring. In Proceedings of the 4th International Conference on Smart Cities, Kansas City, MO, USA, 16–19 September 2018; pp. 420–429.
Received: 30 November 2018 / Accepted: 2 January 2019 / Published: 7 January 2019

Abstract

:
As citizens are increasingly concerned about the surrounding environment, it is important for modern cities to provide sufficient and accurate environmental information to the public for decision making in the era of smart cities. Due to the limited budget, we often need to optimize the sensor placement in order to maximize the overall information gain according to certain criteria. Existing work is primarily concerned with single-type sensor placement; however, the environment usually requires accurate measurements of multiple types of environmental characteristics. In this paper, we focus on the optimal multi-type sensor placement in Gaussian spatial field for environmental monitoring. We study two representative cases: the one-with-all case when each station is equipped with all types of sensors and the general case when each station is equipped with at least one type of sensor. We propose two greedy algorithms accordingly, each with a provable approximation guarantee. We evaluated the proposed approach via an application in air quality monitoring scenario in Hong Kong and experimental results demonstrate the effectiveness of the proposed approach.
Keywords:
multi-type sensor placement; submodular optimization; gaussian process

1. Introduction

Environmental monitoring plays an essential role in the era of smart cities [1,2], providing not only sufficient information for citizens’ decision making, such as whether the air quality is suitable for exercise outdoors, but also as the primary data source for many longitudinal environment and health studies in order to better understand and assess the environment dynamics and their impact on public health over time [3,4,5]. However, deploying fixed-location sensors or monitoring stations that could provide accurate and calibrated measurements are costly, including not only the sensor cost which can be up to € 30,000 per device [6], but also the operation cost and site construction cost [7]. Recent studies have shown that low-cost sensors are not yet ready for providing accurate measurements [6] despite their recent popularity. Therefore, there is usually a fixed budget for deploying the sensors and locations should be chosen carefully according to certain objectives, leading to the sensor placement problem.
The general sensor placement problem has been studied in many environmental monitoring applications, for example, temperature monitoring [8], water contamination [9], wind monitoring [10], soil moisture [11], etc. In these previous work, the underlying spatial field, sometimes after preprocessing step, such as time series segmentation [10] or log-transformation [8], are modeled by the Gaussian Process (GP), which is a powerful probabilistic framework for modeling spatial phenomena and allows information theoretic criteria such as minimum conditional entropy and maximum mutual information to be applied for finding the most informative locations.
Unfortunately, most of the existing work focus on single-type sensor placement problem under cardinality/budget constraints while some complex environment phenomena require measurements of multiple types of spatial fields simultaneously. A motivating scenario is to monitor the air quality of a region which requires measurements of multiple types of atmospheric pollutants. Take China as an example, six types of pollutants are measured [7], namely nitrogen dioxide ( NO 2 ), carbon monoxide ( CO ), sulfur dioxide ( SO 2 ), finite suspended matter ( PM 2.5 ), respirable suspended particulates ( PM 10 ) and ground level ozone ( O 3 ). The air quality metric varies with different countries. In China, these six pollutants are measured to calculate the Air quality index (AQI) level. In the United States, PM 2.5 and PM 10 are considered together as particulate matter. Since each field is likely to exhibit different spatial patterns, it is not cost effective to apply the single-type sensor placement strategies for each field which may lead to a waste of site construction cost (around 200k USD) and maintenance cost (around 30k USD per year) [12] for deploying only one type of sensor. Figure 1 further illustrates the challenge of multi-type sensor placement problem. Suppose that there are T types of fields of interest in total. For each field, the five colored grids denote the optimal single-type sensor placement result (the five most informative locations). In particular, Location B is selected for all types for fields, while Location A is selected only for the second field. Therefore, when considering all the fields at the same time, selecting Location A for deploying a station may not be a good choice when the budget is limited and a careful design of placement scheme is required to balance the trade-off between information gain and cost.
There are some studies that investigated the multi-type sensor placement problem. Singh et al. [13] studied optimal sensor placement of two types of sensors that differ in cost and coverage with a total budget constraint. Ohsaka and Yoshida [14] gave a formulation for the joint placement of k types of sensors under the individual set size constraints and the total size constraints. However, both studies assume that each location can only install one type of sensor, yet in reality different types of sensors can be integrated together as a modular sensor box [15] and monitoring stations are often equipped with more than one type of sensors [16]. Yuen and Kuok [17] proposed a Bayesian sequential multi-type sensor placement algorithm for structural health monitoring and Lin et al. [18] proposed a non-dominated sorting genetic algorithm for solving the optimal multi-type sensor placement for structural damage detection. However, both approaches are heuristic optimization methods with no approximation guarantee. Furthermore, they both require explicit bounds on the number of sensors for each type.
In terms of sensor placement problem for air quality monitoring, Hsieh et al. [12] proposed an entropy-minimization model to recommend the best locations for establishing new monitoring stations. However, the placement result depends on the accuracy of their proposed inference model and the model is only able to predict the AQI value of one type of pollutant (either PM 2.5 or PM 10 ) rather than the general AQI, which better reflects the air quality.
In this paper, we study the optimal budgeted multi-type sensor placement problem without the disjoint assumption. Our major contributions are as follows. Firstly, we formulate the optimal multi-type sensor placement problem for environmental monitoring under a general budget constraint. Next, we investigate two representative scenarios: the one-with-all case that installs all types of sensors for each placement and the general case that only requires at least one type of sensor to be installed at each location. By exploiting the nice monotonicity and submodularity, we propose two greedy algorithms for solving the corresponding scenarios with provable approximation guarantee. We adapt the lazy approach for the two proposed greedy algorithms, which can further speed up the performance without hurting the approximation guarantee. Finally, we perform a case study using the air quality measurements in year 2017 from the official government stations of Hong Kong to demonstrate the proposed approach. This formulation provides guidance for city planners to design the multi-type sensor network for environmental monitoring.
The rest of the paper is organized as follows. In Section 2, we review the Gaussian Process (GP) model and optimal design criteria for single spatial field, and introduce the optimal multi-type sensor placement problem. In Section 3, we study the problem in the one-with-all case and the general case, and provide greedy algorithms with approximation guarantees. In Section 4, we provide the simulation results on the Hong Kong air quality monitoring data. We conclude in Section 5.

2. Problem Formulation

In this section, we review the relevant background including Gaussian Process and informative location for single spatial field, and formally formulate the optimal multi-type sensor placement problem. The major notations are summarized in Table 1.

2.1. Gaussian Process

To quantify the information gain by the placement, we adopt the Gaussian Process (GP) representation for the spatial fields. Gaussian Process is a powerful framework for making probabilistic predictions of spatial phenomena [19]. Intuitively, it generalizes multivariate Gaussian to an infinite number of random variables such that the joint distribution over every finite subset of random variables follows a Gaussian distribution.
Each GP GP ( m ( · ) , k ( · , · ) ) is fully specified by a mean function m ( · ) and a symmetric positive-definite covariance function (also known as kernel) k ( · , · ) . An important property of GP is that, for every finite subset A of the index set V, the joint distribution over these random variables X A is Gaussian. Then, for the random variable with index u, its mean μ u is given by m ( u ) . For each pair of random variables with indexes u , v , their covariance σ u , v is given by k ( u , v ) . We denote the mean vector of the set of random variables A by μ A and their covariance matrix by Σ A A .
Let V : = { 1 , 2 , , n } denote a finite set of indexes, each corresponding to a location (square grid) of the city region. Let GP ( m i ( · ) , k i ( · , · ) ) denote the Gaussian Process for the ith spatial field and T denote the types of spatial fields of interest. Let [ T ] : = { 1 , 2 , , T } . Suppose for the ith spatial field, e.g. NO 2 , a set of observations X A = x A corresponding to the finite subset A V can be obtained either through pre-deployments or mathematical simulations, we can then predict the value of any point y V . By definition, the distribution of X y given these observations is a Gaussian whose conditional mean μ y | A and variance σ y | A 2 are given by:
μ y | A = m i ( y ) + Σ y A Σ A A 1 ( x A μ A ) ,
σ y | A 2 = k i ( y , y ) Σ y A Σ A A 1 Σ A y ,
where Σ y A is a covariance vector with one entry for each u A with value k i ( y , u ) and Σ A y = Σ y A T .
To compute the predictive distribution above, we need to know the mean and covariance functions. The mean function can be estimated by regression. The covariance function, depending on the specific scenarios, can be obtained by either learning the hyperparameters of some existing family of kernel functions [19] such as Gaussian kernel k θ ( d ) = exp ( d 2 θ 2 ) with hyperparameter θ , or learning complex nonstationary kernels from sensory data collected by pre-deployment [20].

2.2. Informative Locations for Single Spatial Field

There are two common criteria for deciding what a good design is for placing single-type sensors in Gaussian Process: entropy [21] and mutual information [8]. Entropy criterion seeks to place sensors at the most uncertain places so as to minimize the conditional entropy of the unobserved locations V \ A after placing sensors at locations A. Specifically, if the budget allows for k sensors in total, then we aim to find
A = arg min A V : | A | = k H ( X V \ A | X A )
H ( X V \ A | X A ) is the conditional differential entropy given by:
H ( X V \ A | X A ) = p ( x V \ A , x A ) log p ( x V \ A | x A ) d x V \ A d x A
where p ( x V \ A , x A ) is the joint probability density function. Since H ( X V \ A | X A ) = H ( X V ) H ( X A ) where H ( X V ) is invariant with respect to the choice of A, the optimization in Equation (3) is equivalent to
A = arg max A V : | A | = k H ( X A )
The mutual information criterion on the other hand seeks to place sensors at locations A that most significantly reduce the uncertainty about the estimates in the rest of the space V \ A . Specifically, if the budget allows for k sensors in total, then we aim to find
A = arg max A V : | A | = k I ( X V \ A ; X A )
where I ( X V \ A ; X A ) is the mutual information between the unknown locations X V \ A and the known locations X A which is given by
I ( X V \ A ; X A ) = H ( X V \ A ) H ( X V \ A | X A )
Finding the optimal solution of both optimizations in Equations (5) and (6) has been shown to be NP-hard [8,22]. Fortunately, both objective functions have the nice monotone and submodular properties (the mutual information is usually monotone and submodular under the assumption that k n ).
Definition 1
(Non-decreasing). A set function f : 2 V R is called non-decreasing if for all A B V , we have
f ( B ) f ( A )
Definition 2
(Submodularity). A set function f : 2 V R is called submodular if for all A B V and s V \ B , we have
f ( A { s } ) f ( A ) f ( B { s } ) f ( B )
Submodularity is also known as the diminishing returns property. Intuitively, the more sensors placed, the less information gain we can have by deploying a new sensor. An equivalent definition is as follows. A set function f : 2 V R is called submodular if for all A , B V , we have
f ( A ) + f ( B ) f ( A B ) + f ( A B )
The choice of the informative criterion depends on the actual scenario. For now, we denote the general information gain of choosing the set of indexes A V for deploying a certain type of sensors (using either criterion) as f ( A ) . Then, both optimizations in Equations (5) and (6) can be written as the following submodular function maximization:
A = arg max A V : | A | = K f ( A )
where K is subset size constraint (also known as the cardinality constraint), i.e., the total number of sensors we can place due to the total budget constraints.
Despite the hardness of the optimization above, the nice monotone and submodular properties allow us to solve the problem via the simple greedy algorithm with provable approximation guarantee. The algorithm starts with the empty set and then at each iteration adds to the current set A the index s that maximizes the incremental information gain f ( A { s } ) f ( A ) and continues until the subset size constraint | A | K is no longer satisfied.
Theorem 1
([23]). If the submodular set function f is monotone and f ( ) = 0 , then the greedy algorithm finds a solution A such that f ( A ) ( 1 1 / e ) f ( A ) with at most O ( K | V | ) evaluations of f.
It is not hard to know that f ( ) = 0 . Hence, by this theorem, we know that the greedy algorithm gives an approximation ratio of 1 1 / e 0.632 for the single-type sensor placement problem.

2.3. Optimal Multi-Type Sensor Placement

In many cases, we would like to place multiple types of sensors simultaneously for monitoring a complex spatial phenomena given a fixed budget. A motivating scenario is to monitor the air quality of a region which requires measurements of six types of atmospheric pollutants [7], namely nitrogen dioxide ( NO 2 ), carbon monoxide ( CO ), sulfur dioxide ( SO 2 ), finite suspended matter ( PM 2.5 ), respirable suspended particulates ( PM 10 ) and ground level ozone ( O 3 ).
Figure 2 shows an example of the multi-type sensor placement scheme. The red rectangles denote the selected locations for deploying stations. For each station, one or multiple types of sensors are installed to monitor the spatial fields. Specifically, the optimal multi-type sensor placement problem aims to figure out where are the best locations for deploying the stations and what type of sensors should be installed at each location in order to achieve certain objectives.
Let f i ( A i ) denote the information gain of choosing the set of indexes (locations) A i V for deploying the ith type of sensors. For simplicity of notation, let A : = { A 1 , A 2 , , A T } denote the multi-type sensor placement scheme.
Now, we proceed to discuss the cost function for the multi-type sensor placement case. An important observation is that, when multiple types of sensors are placed together, the total cost is smaller than the sum of individual cost due to the existence of site construction cost and site operation cost (http://aqicn.org/products/monitoring-stations/).
Let c i denote the equipment cost for the ith type which includes both the initial cost and the sensor-specific operation cost (e.g., calibration cost, sampling cost, etc.), then the total equipment cost c e ( A ) can be expressed as
c e ( A ) = c 1 · | A 1 | + c 2 · | A 2 | + + c T · | A T |
Let c s ( A ) denote the site cost which includes the construction cost and site operation cost. If we assume that the site cost is invariant with the location, i.e. c s i t e for each station, c s ( A ) can be expressed as
c s ( A ) = c s i t e · | A 1 A 2 A T |
Then, the cost function for A can be expressed as
c ( A ) = c s ( A ) + c e ( A )
and we aim to find
A = arg max c ( A ) B ( f 1 ( A 1 ) , f 2 ( A 2 ) , , f T ( A T ) )
where B is the total budget constraint.

3. Solution Approach

In this section, we investigate two reasonable scenarios of the optimal multi-type sensor placement problem and propose two greedy approaches with provable approximation guarantees.

3.1. One-with-All Case

We start with the simplest condition where each station is equipped with all types of sensors. Let c a l l : = c s i t e + i = 1 T c T denote the cost of a station (with all type of sensors). In this case, we have A 1 = A 2 = = A T and the general cost constraint reduces to the cardinality constraint | A | B c a l l where x denotes the floor function mapping x to the greatest integer less than or equal to x. Let A : = A 1 = A 2 = = A T denote the placement scheme and K denote the total number of stations we can deploy. Then, the problem becomes:
A = arg max | A | K ( f 1 ( A ) , f 2 ( A ) , , f T ( A ) )
However, the optimal solution of the above multi-objective optimization does not exist. The reason is that different pollutant fields are likely to exhibit different spatial patterns due to different generation process.
Figure 3 visualizes the different spatial characteristic of NO 2 and PM 2.5 in Hong Kong. The variance of the random variable at each monitored location is estimated with the hourly measurement data during Year 2017. As shown in Figure 3, the high variance locations with respect to PM 2.5 are not always the locations where the variances are high with respect to NO 2 .
Instead, we can always find the Pareto-optimal solutions [24] of the above multi-objective problem. We say a placement scheme A is Pareto optimal if there is no other scheme A such that f i ( A ) f i ( A ) for all i and f j ( A ) > f j ( A ) for some j. In other words, A is Pareto-optimal if there is no other placement scheme that is no worse than A in all objectives and is strictly better than A in at least one objective f j .
One standard approach to find such solutions is the weighted sum transformation/scalarization [24]:
A = arg max | A | K i = 1 T w i f i ( A )
where w i > 0 denotes the weight parameter of the ith objective function and we have i = 1 T w i = 1 . By default, we can choose w 1 = w 2 = = w T = 1 T . Since submodularity is closed under linear combinations, the new objective function is also submodular. Hence, we can use the greedy approach to solve the problem for this case. The detail is summarized in Algorithm 1.
Algorithm 1 Multi-type sensor deployment algorithm for one-with-all case
  • Input: Station number constraint K, a set of grids V, objective function f 1 , f 2 , , f T , weights w 1 , w 2 , , w T
  • Output: A subset of locations A V
  • A =
  • while | A | K do
  •   select location s that has the highest incremental gain i = 1 T w i ( f i ( A { s } ) f i ( A ) )
  •   add s to location set A
  • end while
  • return A
Proposition 1.
Algorithm 1 finds a solution A such that i = 1 T w i f i ( A ) ( 1 1 / e ) i = 1 T w i f i ( A ) where A is the Pareto-optimal solution with weight parameters w 1 , w 2 , , w T .
Using the fact that the objective function is monotone and submodular, it directly follows from Theorem 3.

3.2. General Case

A more general scenario is when each station is only required to install at least one sensor. Take the weather monitoring stations in Hong Kong for example, many stations only contain some of the sensors, such as temperature, pressure, rainfall, etc. In this case, increasing the information gain of one type of sensor will decrease the information gain in another due to the total budget constraint. Hence, we adopt a similar weighted sum transformation approach and aim to solve the follow optimization:
A = arg max c ( A ) B i = 1 T w i f i ( A i )
where w i > 0 denotes the weight parameter of the ith objective function and we have i = 1 c w i = 1 .
To understand the difficulty of the optimization, we first investigate the structure of the cost constraint. Let k i N denote the number of sensors for the ith type and K N + denote the total number of stations of the placement scheme A . Then, the cost constraint can be rewritten as:
| A 1 A 2 A T | = K N +
| A i | = k i N , i [ T ]
c s i t e · K + i = 1 T c i · k i B
Proposition 2.
Let A denote the optimal placement scheme. If K T and k i 1 i [ T ] , when B i = 1 T ( c i min i c i ) c s i t e + min i c i = B c a l l , the cost constraints in Equations (16)–(18) can be reduced to the cardinality constraint k 1 = k 2 = = k T = K = B c a l l for the optimal multi-type sensor placement.
Proof. 
Since the objective function f i is non-decreasing for each i, we aim to find the optimal integer solutions ( k 1 , k 2 , , k T ) subject to the budget constraint in order the maximize the overall information gain.
Since the total sensor costs i = 1 T c i · k i when there is K stations is at most i = 1 T c i · K , we know that the achievable number of stations K is at least k min : = B c a l l and equality is achieved when each station is equipped with all types of sensors (one-with-all case).
In the meantime, since i = 1 T c i · k i is at least min i c i · ( K T ) + i = 1 T c i if we assume that there is at least one sensor for each type, we know that the achievable number of stations K is at most k max : = B i = 1 T ( c i min i c i ) c s i t e + min i c i and equality is achieved when each location is equipped with one sensor for each type, except for the cheapest type with ( K T + 1 ) sensors.
Therefore, when k max = k min , K is unique and k 1 = k 2 = = k T = K can be achieved. Then, the cost constraints can be reduced to the cardinality constraint k 1 = k 2 = = k T = K = B c a l l . ☐
Remark 1.
The assumptions that K > T and k i 1 i [ T ] is usually naturally satisfied with a reasonable budget that allows to place at least one sensor for each type of pollutant, as otherwise there be entirely no measurement for some types of field, making the total uncertainty still quite high.
Proposition 2 gives the condition when the general case reduces to the one-with-all case. This usually happens when c s i t e c i for all i [ T ] . In other words, when the sensor costs are negligible compared with the site construction costs, each station should be equipped with all types of sensors. The budget constraint limits the number of stations we can deploy.
In the following, we focus on the case when the cost constraint is not reducible to the one-with-all case. One might consider to greedily place sensors until the cost constraint can no longer be satisfied, i.e., at each step, we consider the location s to place type i sensor such that
s , i = arg max i , s w i ( f i ( A i { s } ) f i ( A i ) )
and confirm the selection if its cost is acceptable. A i here refers to the current selected location set for the ith field. However, the solution can be arbitrarily bad as a sensor providing information gain g will always be preferred over a sensor providing information gain g ϵ despite a much higher cost. Alternatively, we can consider to greedily assign sensors based on information gain per cost, i.e., at each step, we consider the location s to place type i sensor such that
s , i = arg max i , s w i ( f i ( A i { s } ) f i ( A i ) ) i , s c ( A )
and confirm the selection if its cost is acceptable. i , s c ( A ) denotes its current cost. However, the solution can still be arbitrarily bad, as a cheap sensor ϵ with a higher information gain per cost ( 2 ϵ / ϵ = 2 ) will always be preferred over an expensive sensor B providing higher information gain B despite the remaining budget B only allows one to be selected and the better solution is to choose the expensive sensor.
Fortunately, the following theorem shows that the two solutions cannot be bad at the same time.
Theorem 2.
Let A G denote the solution by greedy selection with the criteria in Equation (19) and A C G denote the solution by cost effective greedy selection with the criteria in Equation (20). If the submodular function f 1 , f 2 , , f T are monotone and f ( ) = 0 , then we have
max { A G , A C G } 1 2 ( 1 1 / e ) max A : c ( A ) B i = 1 T w i f i ( A i )
Proof. 
Let V ^ : = { 1 , 2 , , n T } denote a new set with | V ^ | = T · | V | . Then, each placement scheme A : = { A 1 , A 2 , , A T } corresponds to exactly one subset A ^ V ^ such that A ^ = { s + ( i 1 ) × T : s A i i = [ T ] } . Let f : 2 V ^ R denote a set function such that f ( A ^ ) = i = 1 T w i f i ( A i ) . It is obvious that f ( ) = 0 .
We first show that f is non-decreasing. Let B ^ V ^ denote the corresponding set for placement scheme B . Since for all A ^ B ^ V ^ , we know that A i B i for all i [ T ] , then f ( B ^ ) = i = 1 T w i f i ( B i ) i = 1 T w i f i ( A i ) = f ( A ^ ) and hence f is non-decreasing.
We then show that f is submodular. For s ^ V ^ \ B ^ , let j [ T ] denote its corresponding type and s V denote its corresponding location. Then, f ( B ^ { s ^ } ) f ( B ^ ) = w j ( f j ( B j { s } ) f j ( B j ) ) w j ( f j ( A j { s } ) f j ( A j ) ) = f ( A ^ { s ^ } ) f ( A ^ ) and hence f is submodular.
Therefore, f is non-decreasing and submodular with f ( ) = 0 and by Theorem 3 in [9] which is a generalization of the Theorem in [25] for the special case of the budgeted max-cover problem, we know that max { A G , A C G } 1 2 ( 1 1 / e ) max A : c ( A ) B i = 1 T w i f i ( A i ) . The proof is now complete. ☐
Let V : = { V , V , , V } and i , s c ( A ) denotes the incremental cost of adding type i sensor at location s when the existing placement scheme is A . Then, we know that
i , s c ( A ) = c c c s i t e + c i j [ T ] s A j c i otherwise
With the help of above additional notations, we now summarize the proposed hybrid greedy selection approach for the general multi-type sensor placement in Algorithm 2.
Algorithm 2 Multi-type sensor deployment algorithm for the general cost case
  • Input: Budget B, a set of grid index V, objective function f 1 , f 2 , , f T , weights w 1 , w 2 , , w T , site construction cost c s i t e , sensor cost c 1 , c 2 , , c T
  • Output: A placement scheme A V
  • A = A = , V = V = V , B = B = B
  • while the search space V is not empty do
  •   Select location s for placing the ith type sensor that has the highest incremental gain w i ( f i ( A i { s } ) f i ( A i ) )
  •   if the incremental cost i , s c ( A ) is less than the remaining budget B then
  •    Update the remaining budget B = B i , s c ( A ) and add location s to the location set A i
  •   end if
  •   Remove location s from search space V i
  • end while
  • while the search space V is not empty do
  •   Select location s for placing the ith type of sensor that has the highest cost-effective gain w i ( f i ( A i { s } ) f i ( A i ) ) i , s c ( A )
  •   if the cost i , s c ( A ) is less than the remaining budget B then
  •    Update the remaining budget B = B i , s c ( A ) and add location s to the location set A i
  •   end if
  •   Remove location s from search space V i
  • end while
  •  Select the better scheme A = arg max A { A , A } i = 1 T w i f i ( A i )
  • return A
Proposition 3.
The time complexity of Algorithm 2 is O ( B min i c i T | V | ) .
Sviridenko [26] showed that it is even possible to achieve the approximation ratio of 1 1 / e for the general cost case; however, the algorithm requires an enumeration over all feasible sets of cardinality three and hence its complexity for our problem is O ( B min i c i T 4 | V | 4 ) , which is impractical.
In many real cases, the bound is not tight. Hence, we also provide a tighter online bound for arbitrary placement scheme derived with the submodularity property.
Theorem 3
(Online bound). For a given placement scheme A ˜ = { A 1 ˜ , A 2 ˜ , , A c ˜ } and each s V \ A i ˜ , let δ i , s = w i ( f i ( A i ˜ { s } ) f i ( A i ˜ ) ) . Let r i , s = δ i , s / c i , s where c i , s denotes the incremental cost of adding the ith type of sensor to location s. Let s 1 , s 2 , , s m be the sequence of locations with r i , s in descending order and p 1 , p 2 , , p m be the sequence of selected types. Let k be such that C = i = 1 k 1 c p i , s i B and i = 1 k c p i , s i > B . Let λ = ( B C ) / c p k , s k . Then
max A , c ( A ) B i = 1 T w i f i ( A i ) i = 1 T w i f i ( A i ^ ) + i = 1 k 1 δ s i + λ δ s k

3.3. Assessing the Trade Off

After we obtain the placement scheme A for the general case, we know the station number K and the number of sensors k i for the ith spatial field. If we run the simple greedy algorithm for the ith type of spatial field with subset size constraint k i , we can find the uncoupled placement result A i with | A i | = k i and obtain the individual sacrifice f i ( A i ) f i ( A i ) due to the budget constraint. The total information loss is i = 1 T ( f i ( A i ) f i ( A i ) ) with a total saving of c s i t e · ( i = 1 c | A i | i = 1 c | A i | ). This information is useful to assess whether additional budgets should be allocated for a certain field to provide further information.

3.4. Speeding up the Algorithms

Krause et al. [8] developed a lazy evaluation technique to speed up the greedy selection algorithm for single-type sensor placement problem. In this section, we adapt this approach for the multi-type sensor placement problem to speed up the two algorithms proposed above.
We start with the one-with-all case. The key idea is that, at each iteration, some calculations of the information gain can be saved by utilizing the submodular property, i.e., the information gain for adding sensors can never increase. Therefore, we can maintain an ordered list of the information gain and only update the value when necessary. The lazy greedy for the this case is presented with Algorithm 3.
Algorithm 3 Lazy greedy algorithm for one-with-all case
  • Input: Station number constraint K, a set of grids V, objective function f 1 , f 2 , , f T , weights w 1 , w 2 , , w T
  • Output: A subset of locations A V
  • A = .
  •  for location s, calculate the initial incremental gain δ s = i = 1 T w i f i ( { s } )
  • s = arg max δ s , add s to location set A.
  •  remove δ s from δ s and sort the gain δ s in descending order [ δ s 1 , δ s 2 , δ s n 1 ] .
  • for j = 2 to K do
  •   while i = 1 T w i ( f i ( A { s 1 } ) f i ( A ) ) < δ s 2 do
  •    update δ s 1 = i = 1 T w i ( f i ( A { s 1 } ) f i ( A ) ) .
  •    sort δ s in descending order [ δ s 1 , δ s 2 , δ s n 1 ] .
  •   end while
  •   add s 1 to location set A and remove δ s 1 from δ s .
  • end for
  • return A
The idea is similar for the general case. Specifically, for the cost-effective greedy selection, we maintain an ordered list of information gain per cost for adding a certain type of sensor to a specific location. Notice that the cost is also dependent on the existing selection and hence the cost update after each iteration (if any) will cause the reordering of the list. The lazy greedy for the general case is presented with Algorithm 4.
Algorithm 4 Lazy greedy algorithm for the general cost case
  • Input: Budget B, a set of grid index V, objective function f 1 , f 2 , , f T , weights w 1 , w 2 , , w T , site construction cost c s i t e , sensor cost c 1 , c 2 , , c T
  • Output: A placement scheme A V
  • A = A = , V = V = V , B = B = B
  •  for location s, calculate the initial incremental gain of type i sensor δ i , s = w i f i ( s )
  • i , s = arg max δ i , s , add s to location set A i .
  •  remove δ i , s from the gain list δ i , s and sort it in descending order [ δ i 1 , s 1 , δ i 2 , s 2 , δ i n T 1 , s n T 1 ] .
  • while the search space V is not empty do
  •   while w i 1 ( f i ( s 1 A i 1 ) f i ( A i 1 ) ) < δ i 2 , s 2 do
  •    update δ i 1 , s 1 = w i 1 ( f i ( s 1 A i 1 ) f i ( A i 1 ) ) and sort δ i , s in descending order.
  •   end while
  •   if the incremental cost i 1 , s 1 c ( A ) is less than the remaining budget B then
  •    Update the remaining budget B = B i 1 , s 1 c ( A )
  •    Add location s 1 to the location set A i 1
  •   end if
  •   Remove location s 1 from search space V i 1 and δ i 1 , s 1 from the list
  • end while
  •  for location s, calculate the initial incremental gain per cost of type i sensor δ i , s = w i f i ( s ) i , s c ( A )
  • i , s = arg max δ i , s , add s to location set A i .
  •  remove δ i , s from the list of δ i , s and sort it in descending order [ δ i 1 , s 1 , δ i 2 , s 2 , δ i n T 1 , s n T 1 ] .
  • while the search space V is not empty do
  •   while w i 1 ( f i ( s 1 A i 1 ) f i ( A i 1 ) ) i , s c ( A ) < δ i 2 , s 2 do
  •    update δ i 1 , s 1 = w i 1 ( f i ( s 1 A i 1 ) f i ( A i 1 ) ) i , s c ( A ) and sort δ i , s in descending order.
  •   end while
  •   if the cost i 1 , s 1 c ( A ) is less than the remaining budget B then
  •    Update the remaining budget B = B i , s c ( A )
  •    Add location s 1 to the location set A i
  •    Update δ i , s of the other type of sensors at location s 1 and sort the list
  •   end if
  •   Remove location s 1 from search space V i and δ i 1 , s 1 from the list
  • end while
  •  Select the better scheme A in terms of the objective value
  • A = arg max A { A , A } i = 1 T w i f i ( A i )
  • return A

4. Simulations

We evaluated the proposed multi-type placement scheme on the hourly air quality monitoring data for 2017 provided by the Hong Kong Environment Protection Department (EPD) website [16]. There are 16 monitoring stations in total, three of which are roadside stations. Here, we only considered PM 2.5 , PM 10 , NO 2 , O 3 and SO 2 as they are measured by all of the stations. As shown in Figure 4, the distribution of normalized one-hour difference of the pollutants at Tung Chung monitoring station are approximately normal distributed (due to space limits, we only show one station as an example; in fact, the statement holds for arbitrary station (location)), and hence satisfy the GP assumption. We estimated the empirical covariance matrix from the data, which represents the spatial process accurately. Here, we chose the entropy criterion as it directly satisfies the monotone and submodular property without the further requirement of that the number of sensors available is much smaller than the total number of possible locations. Let X s ( i ) denote the distribution of the ith field at location s. Then, the incremental information gain of adding a sensor of type i to location s is
δ i , s = H ( X s ( i ) | X A ( i ) ) = 1 2 log ( 2 π e σ X s ( i ) | X A ( i ) 2 )
Figure 5 compares the information gain of different spatial fields using simple greedy selection. The blue line represents the objective function and the orange line is a straight line with a slope of maximum individual entropy. As can be seen, all objective functions exhibit the diminishing returns property and it is natural to expect a more prominent effect when the number of sensors is much larger. The name of the first location selected is displayed in each of the objective. Specifically, we know that Causeway Bay is the most uncertain location in terms of NO 2 , which is likely due to the car emissions, and Sha Tin is the most uncertain location in terms of PM 10 and O 3 . We also chose Tung Chung as a representative and display when it is selected in each field, which clearly indicates that each spatial field has different properties.
Figure 6 shows the placement results for the one-with-all case when k = 10 . The weights w i , i = 1 , 2 , , 5 were set to be identical for each field. Different from the single best location for each field, Central was selected first here as it is the most uncertain location with respect to the sum of individual objectives.
Figure 7 shows the placement results for the general case when the budget is just enough to deploy five stations, each with all types of sensors. The weights w i , i = 1 , 2 , , 5 were set to be identical for each field. As shown in the figure, the cost-effective greedy approach is able to make full use of the budget and selects five locations for deploying stations with all type of sensors. The greedy approach, however, picks six locations for deploying stations yet none of them has all type of sensors due to the lack of consideration of cost during the selection. Since much more sensors are deployed with the cost-effective greedy approach as compared to the greedy approach (25 vs. 9), the total information gain of the cost-effective greedy approach is larger. Therefore, the the final placement result of the hybrid greedy approach is proposed by the cost effective greedy selection in this case.
Figure 8 shows the performance of the proposed hybrid greedy approach for the general case. We used the random selection as the baseline for comparison. The performance of simple-greedy selection and cost-effective greedy selection are also shown. Here, we set c 1 = 2 , c 2 = 2 , c 3 = 1 , c 4 = 1 , c 5 = 1 , c s i t e = 15 as the cost of PM sensors are generally higher and vary B from 15 to 200 at a step size of 5. The weights w i , i = 1 , 2 , , 5 are set to be identical for each field.
As can be seen from the plot, as the budget goes up, the general trend of information gain with the hybrid greedy approach follows the submodular property. The flat regions in the curve correspond to the scenarios when the general cost constraint can be reduced to the simple cardinality constraint. For example, when the budget is 25 to 30, k max = k min = 1 and the strategy is simply deploying one station with all types of sensors.
For the cost-effective greedy selection, the increase is steady as the placement strategy in this scenario will first select a location for one type and then deploy other types at that location until the budget allows for a new station. The reason is that the site cost is quite high compared to the sensor cost and it is not cost effective to deploy a new station when sensors can still be added to the existing stations. This also explains the flat region even when k min < k max .
The simple greedy selection, however, has a sudden drop when the budget first allows for a new station. The reason is that this approach will prefer exploring a new location with larger information gain rather than using the budget for adding other sensors to existing stations with smaller information gain and hence less sensors can be added.
Figure 9 shows the performance of the proposed hybrid greedy approach for the general case with a special emphasis on PM 2.5 as it is the most health-harmful air pollutant [27]. Here, we set the weight for PM 2.5 to 0.6 and other weights to 0.1 . The other settings remain the same. As can be seen from the plot, the performance of simple-greedy and cost-effective greedy are comparable due to the high weight of the objective function for PM 2.5 . This is because adding PM 2.5 sensors greedily without considering the cost will still increase the total information gain. Furthermore, the flat regions for cost-effective greedy disappear, indicating it will be preferable to deploy new stations for PM 2.5 than to add sensors to existing stations.
Both plots show that the proposed hybrid greedy approach always performs much better than random selection in terms of total information gain.
Figure 10 and Figure 11 show the speed performance comparison of greedy approach with lazy greedy approach. It can be easily seen that lazy greedy is faster than greedy while achieving the same approximation guarantee. For the general case, there is a more significant improvement. The reason is that, at each iteration, the candidate pool of possible selections is larger and hence more function evaluations can be saved with a lazy approach.

5. Conclusions

In this paper, we formulate the multi-type sensor placement problem in Gaussian spatial field for environmental monitoring. We analyze two cases with different assumptions on the station requirement and propose two greedy algorithms with approximation guarantees. We then introduce a lazy approach for speeding up the greedy algorithms while achieving the same performance guarantee. We evaluated the proposed approach via an application in air quality monitoring scenario in Hong Kong and experimental results demonstrate the effectiveness of the proposed approach. This formulate can provide guidance for designing a citywide multi-type sensor network for environmental monitoring cost-effectively.

Author Contributions

C.S. developed the methodology and drafted the manuscript; Y.Y. provided important comments in the methodology and experiment design; V.O.K.L. and J.C.K.L. guided the research direction and revised the manuscript.

Funding

This research was supported in part by the Theme-based Research Scheme of the Research Grants Council of Hong Kong, under Grant No. T41-709/17-N.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sun, C.; Yu, Y.; Li, V.O.; Lam, J.C. Optimal Multi-type Sensor Placements in Gaussian Spatial Fields for Environmental Monitoring. In Proceedings of the 4th international conference on smart cities, Kansas City, MO, USA, 16–19 September 2018; pp. 420–429. [Google Scholar]
  2. Bacco, M.; Delmastro, F.; Ferro, E.; Gotta, A. Environmental Monitoring for Smart Cities. IEEE Sens. J. 2017, 17, 7767–7774. [Google Scholar] [CrossRef]
  3. Fischer, P.H.; Marra, M.; Ameling, C.B.; Hoek, G.; Beelen, R.; de Hoogh, K.; Breugelmans, O.; Kruize, H.; Janssen, N.A.; Houthuijs, D. Air pollution and mortality in seven million adults: The Dutch Environmental Longitudinal Study (DUELS). Environ. Health Perspect. 2015, 123, 697–704. [Google Scholar] [CrossRef] [PubMed]
  4. Kioumourtzoglou, M.A.; Schwartz, J.D.; Weisskopf, M.G.; Melly, S.J.; Wang, Y.; Dominici, F.; Zanobetti, A. Long-term PM2.5 exposure and neurological hospital admissions in the northeastern United States. Environ. Health Perspect. 2016, 124, 23–29. [Google Scholar] [CrossRef] [PubMed]
  5. Watts, N.; Adger, W.N.; Agnolucci, P.; Blackstock, J.; Byass, P.; Cai, W.; Chaytor, S.; Colbourn, T.; Collins, M.; Cooper, A.; et al. Health and climate change: Policy responses to protect public health. Lancet 2015, 386, 1861–1914. [Google Scholar] [CrossRef]
  6. Castell, N.; Dauge, F.R.; Schneider, P.; Vogt, M.; Lerner, U.; Fishbain, B.; Broday, D.; Bartonova, A. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ. Int. 2017, 99, 293–302. [Google Scholar] [CrossRef] [PubMed]
  7. Beijing Air Pollution: Real-Time Air Quality Index (AQI). Available online: https://aqicn.org (accessed on 5 May 2018).
  8. Krause, A.; Singh, A.; Guestrin, C. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res. 2008, 9, 235–284. [Google Scholar]
  9. Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar]
  10. Du, W.; Xing, Z.; Li, M.; He, B.; Chua, L.H.C.; Miao, H. Optimal sensor placement and measurement of wind for water quality studies in urban reservoirs. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks, Berlin, Germany, 15–17 April 2014; pp. 167–178. [Google Scholar]
  11. Wu, X.; Liu, M.; Wu, Y. In-situ soil moisture sensing: Optimal sensor placement and field estimation. ACM Trans. Sens. Netw. 2012, 8, 33. [Google Scholar] [CrossRef]
  12. Hsieh, H.P.; Lin, S.D.; Zheng, Y. Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 437–446. [Google Scholar]
  13. Singh, A.; Guillory, A.; Bilmes, J. On bisubmodular maximization. In Proceedings of the Artificial Intelligence and Statistics, Canary Islands, Spain, 21–23 April 2012; pp. 1055–1063. [Google Scholar]
  14. Ohsaka, N.; Yoshida, Y. Monotone k-submodular function maximization with size constraints. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 694–702. [Google Scholar]
  15. Array of Things: A Networked Urban Sensor Project in Chicago. Available online: https://arrayofthings.github.io/ (accessed on 5 May 2018).
  16. Hong Kong Air Quality Monitoring Data. Available online: http://www.aqhi.gov.hk/en.html (accessed on 5 May 2018).
  17. Yuen, K.V.; Kuok, S.C. Efficient Bayesian sensor placement algorithm for structural identification: A general approach for multi-type sensory systems. Earthq. Eng. Struct. Dyn. 2015, 44, 757–774. [Google Scholar] [CrossRef]
  18. Lin, J.F.; Xu, Y.L.; Law, S.S. Structural damage detection-oriented multi-type sensor placement with multi-objective optimization. J. Sound Vib. 2018, 422, 568–589. [Google Scholar] [CrossRef]
  19. Rasmussen, C.E. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning; Springer: Berlin, Germany, 2004; pp. 63–71. [Google Scholar]
  20. Nott, D.J.; Dunsmuir, W.T. Estimation of nonstationary spatial covariance structure. Biometrika 2002, 89, 819–829. [Google Scholar] [CrossRef]
  21. Cressie, N. Statistics for Spatial Data; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  22. Ko, C.W.; Lee, J.; Queyranne, M. An exact algorithm for maximum entropy sampling. Oper. Res. 1995, 43, 684–691. [Google Scholar] [CrossRef]
  23. Nemhauser, G.L.; Wolsey, L.A.; Fisher, M.L. An analysis of approximations for maximizing submodular set functions I. Math. Program. 1978, 14, 265–294. [Google Scholar] [CrossRef]
  24. Grant, M.; Boyd, S.; Ye, Y. CVX: Matlab Software for Disciplined Convex Programming. Available online: http://cvxr.com/cvx (accessed on 5 May 2018).
  25. Khuller, S.; Moss, A.; Naor, J.S. The budgeted maximum coverage problem. Inf. Process. Lett. 1999, 70, 39–45. [Google Scholar] [CrossRef][Green Version]
  26. Sviridenko, M. A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 2004, 32, 41–43. [Google Scholar] [CrossRef][Green Version]
  27. Air Pollution. Available online: https://www.who.int/sustainable-development/cities/health-risks/air-pollution/en/ (accessed on 5 May 2018).
Figure 1. Illustration of the challenge of multi-type sensor placement.
Figure 1. Illustration of the challenge of multi-type sensor placement.
Sensors 19 00189 g001
Figure 2. An example of the multi-type sensor placement scheme.
Figure 2. An example of the multi-type sensor placement scheme.
Sensors 19 00189 g002
Figure 3. Spatial variations of the air quality measurements at the 16 official monitoring stations in Hong Kong in 2017: the blue circles denote NO 2 and the red circles denote PM 2.5 . The size of a circle represents the magnitude of the variance of the corresponding random variable.
Figure 3. Spatial variations of the air quality measurements at the 16 official monitoring stations in Hong Kong in 2017: the blue circles denote NO 2 and the red circles denote PM 2.5 . The size of a circle represents the magnitude of the variance of the corresponding random variable.
Sensors 19 00189 g003
Figure 4. Histogram of the normalized one-hour difference of the hourly measurements at Tung Chung station over the year 2017.
Figure 4. Histogram of the normalized one-hour difference of the hourly measurements at Tung Chung station over the year 2017.
Sensors 19 00189 g004
Figure 5. Comparison of information gain of different spatial fields with simple greedy selection.
Figure 5. Comparison of information gain of different spatial fields with simple greedy selection.
Sensors 19 00189 g005
Figure 6. Placement results of 10 sensors in Hong Kong for one-with-all case.
Figure 6. Placement results of 10 sensors in Hong Kong for one-with-all case.
Sensors 19 00189 g006
Figure 7. Placement results for the general case when the budget is 100, the cost is c 1 = c 2 = c 3 = c 4 = c 5 = 1 , c s i t e = 15 . The left figure is the placement result with the cost-effective greedy selection. The right figure is the placement result with the greedy selection.
Figure 7. Placement results for the general case when the budget is 100, the cost is c 1 = c 2 = c 3 = c 4 = c 5 = 1 , c s i t e = 15 . The left figure is the placement result with the cost-effective greedy selection. The right figure is the placement result with the greedy selection.
Sensors 19 00189 g007
Figure 8. Performance of the hybrid greedy approach with equal weight w i .
Figure 8. Performance of the hybrid greedy approach with equal weight w i .
Sensors 19 00189 g008
Figure 9. Performance of the hybrid greedy approach with a higher weight for PM 2.5 .
Figure 9. Performance of the hybrid greedy approach with a higher weight for PM 2.5 .
Sensors 19 00189 g009
Figure 10. Greedy vs. lazy greedy for one-with-all case.
Figure 10. Greedy vs. lazy greedy for one-with-all case.
Sensors 19 00189 g010
Figure 11. Greedy vs. lazy greedy for general case.
Figure 11. Greedy vs. lazy greedy for general case.
Sensors 19 00189 g011
Table 1. Notations.
Table 1. Notations.
NotationDefinition
m ( · ) The mean function of the Gaussian Process
k ( · , · ) The kernel function of the Gaussian Process
X A The random variables over the location index set A
σ Try to span the whole column of the table
TThe total number of types of interest
[ T ] The abbreviation for the set { 1 , 2 , , T }
VThe set of all indexes, each corresponding to a location/grid
| V | The number of indexes in the set V
san index in the set V
A i The set of the indexes of the selected locations for the ith type
A The placement scheme { A 1 , A 2 , , A T }
f i The ithe objective function
w i The weight parameter of the ith objective function
c i The unit cost for the ith type
c s i t e The site construction cost
BThe total budget constraint
KThe subset size constraint
k i The total number of sensors for the ith type
x The floor function mapping x to the greatest integer
less than or equal to x
δ i , s The information gain of adding location index s of type i

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Sensors EISSN 1424-8220 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top