Next Article in Journal
An Improved Approach for Estimating Daily Net Radiation over the Heihe River Basin
Next Article in Special Issue
Spectrum Sharing Based on a Bertrand Game in Cognitive Radio Sensor Networks
Previous Article in Journal
COSMO-SkyMed Image Investigation of Snow Features in Alpine Environment
Previous Article in Special Issue
Node Immunization with Time-Sensitive Restrictions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhance the Quality of Crowdsensing for Fine-Grained Urban Environment Monitoring via Data Correlation

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
*
Author to whom correspondence should be addressed.
Sensors 2017, 17(1), 88; https://doi.org/10.3390/s17010088
Submission received: 1 November 2016 / Revised: 19 December 2016 / Accepted: 20 December 2016 / Published: 4 January 2017
(This article belongs to the Special Issue New Paradigms in Cyber-Physical Social Sensing)

Abstract

:
Monitoring the status of urban environments, which provides fundamental information for a city, yields crucial insights into various fields of urban research. Recently, with the popularity of smartphones and vehicles equipped with onboard sensors, a people-centric scheme, namely “crowdsensing”, for city-scale environment monitoring is emerging. This paper proposes a data correlation based crowdsensing approach for fine-grained urban environment monitoring. To demonstrate urban status, we generate sensing images via crowdsensing network, and then enhance the quality of sensing images via data correlation. Specifically, to achieve a higher quality of sensing images, we not only utilize temporal correlation of mobile sensing nodes but also fuse the sensory data with correlated environment data by introducing a collective tensor decomposition approach. Finally, we conduct a series of numerical simulations and a real dataset based case study. The results validate that our approach outperforms the traditional spatial interpolation-based method.

1. Introduction

Rapid urbanization makes the issue of urban environment more serious than ever before in major cities, especially for developing countries. Urban environment monitoring, which provides crucial information for scientific city management, is of great importance for solving urban environmental problems. Urban Remote Sensing (URS) [1] and specialized Wireless Sensor Network (WSN) [2,3] are commonly used ways for acquiring a wide range of environmental monitoring data. However, the principle of URS limits the precision for fine-grained urban sensing. While for WSN, the sensing nodes always face the limited energy [4] and instability [5] problem, which restricts the scalability and utility of WSN for urban sensing. Therefore, it is very necessary to develop innovative technologies that could precisely and ubiquitously sense urban dynamics.
Recent years, with the popularity of smartphones/vehicles equipped with onboard sensors and the development of wireless networks (such as 4G/Wi-Fi network), a people-centric sensing mode is emerging [6,7]. Researchers characterize this sensing mode as crowdsensing [8]. For urban monitoring with a crowdsensing network, mobile smartphones/vehicles equipped with sensing modules in a region send their measurements (sensory data) to a data center via wireless network, then the data center aggregates these measurements to estimate the phenomenon values of all points in the region. Similar to digital images, the distribution status of environment phenomenon (such as PM 2.5 concentration, CO 2 concentration) over a region can be expressed as a two-dimensional (2D) signal. To illustrate fine-grained urban status, the crowdsensing network utilize sensory data to generate a sensing image according to spatial interpolation methods [9]. As shown in Figure 1a,b, the crowdsensing network works like an “urban camera”, which could capture the urban status.
Generally, the geographical positions of crowdsensing participants are dynamic and uneven distributed in an urban area, which makes the quality of sensing image varies by time and locations significantly. In our previous work [10], we first defined a metric – urban resolution – for measuring the quality of sensing image. Learning from the concept of resolution in conventional image system, higher/lower urban resolution of sensing image means more/less details the sensing image holds. In fact, this metric is a measurement for evaluating the sensing ability (sensitivity) of crowdsensing network. Furthermore, we also revealed a linear relationship between urban resolution r and the number of crowdsensing participants s from statistical perspective.
Intuitively, huge number of crowdsensing participants will provide a guarantee of high urban resolution. In practice, regardless of the number of crowdsensing participants, the spatial distribution of participants is uneven, i.e., there exist some hot zones and blank zones simultaneously in the urban area. This inhomogeneity is determined by the underlying human mobility features, which are summarized from real mobility traces [11]. Based on our observation, the emergence of blank regions will obviously reduce the sensing ability of the crowdsensing network. Thus, how to increase the sensing ability for blank zones becomes the main challenge.
Recently, many researchers devote to breaking the primary bottleneck of crowdsensing, i.e., sensory data sparsity, which is mainly caused by limited number or uneven distribution of crowdsensing participants [12,13]. A powerful and generic technique, Compressive Sensing (CS), is extensively used for inferring the missing sensory data. By far, the application of CS has led to significant advances in reconstructing network traffic [14], improving urban traffic sensing [15] and refining localization [16], etc. Among these works, the CS-based methods share the same prerequisite that the sensory data should have inherent structure and can be sparsely represented. However, the CS-based method cannot achieve good performance for some common types of crowdsensing data (e.g., environment data), because the real environment data rarely satisfy the aforementioned technical conditions [17,18,19]. Fortunately, several works [17,20,21] have revealed both the temporal-spatial correlation and category correlation of environmental phenomenons through extensive real dataset analysis. These works motivate us to enhance the estimation accuracy of missing values for environment crowdsensing via a data correlation based approach. Here, the data correlation mainly refers to the following two aspects:
  • Temporal correlation of sensory data. Researchers of [17] revealed the pervasive existence of time stability feature among environmental phenomenons, such as temperature, humidity and light. This feature indicates that most environmental phenomenons will not change dramatically and maintain stable for a while. On the other hand, the frequency of participants sending their measurements is much higher than that of environment changing. Utilizing temporal correlation is to leverage measurement data in a correlated time period rather than a moment. Due to the dynamic feature of crowdsensing participants, a discrete participant at a moment is converted into a sensing trajectory in correlated time period (see Figure 1c), which could decrease the area of blank zones.
  • Category correlation of sensory data. Many existing researches [20,21,22] show the strong correlation among some categories of sensory data (see Figure 1d). Taking air quality data for example, three mainly concerned atmospheric pollutants, the concentration of PM 2.5 , PM10, and NO 2 , have clear correlation. Therefore, if there exist some correlated sensory data in blank zones, then the correlated information is able to recover the target environmental phenomenon.
In this paper, we first express a sensing image as the form of matrix (discrete signal matrix) and utilize a Delaunay triangulation based interpolation to complete the matrix as a baseline. Then, we build a three-dimension tensor (each dimension denotes region, signal category and time slot respectively) to model measurements collected by participants, and summarize a signal correlation matrix to measure the correlation between different categories of environment phenomenons. After that, we propose a collaborative tensor decomposition approach to supplement the missing entries on the basis of temporal and category correlation of sensory data. In order to verify the enhancement of crowdsensing ability, we utilize the Self-similar Least Action Walk (SLAW) model [11], a commonly used human/vehicles mobility model to generate trajectories of crowdsensing participants. Then, we reconstruct a CO 2 concentration signal as the ground truth of target environment phenomenon, and further generate correlated signals through liner/non-liner functions. By conducting a series of simulations, we verify the promotion of crowdsensing ability. In addition, we also utilize real urban air pollutants data to examine our enhanced crowdsensing approach.
The rest of the paper is organized as follows. Section 2 presents the method for generating sensing image and defines crowdsensing resolution. Section 3 proposes the enhanced crowdsensing approach. Section 4 utilizes numerical simulations to investigate the resolution promotion of enhanced crowdsensing approach. Section 5 illustrates a case study by utilizing real dataset of urban air pollutants. The paper concludes in Section 6.

2. Urban Monitoring via Crowdsensing and Sensing Restriction of Crowdsensing

In this section, we first briefly introduce the process of generating sensing image via crowdsensing networks. Then, we give the definition of crowdsensing resolution, which is utilized to quantize the sensing ability of crowdsensing networks. At last, we give the linear restriction of crowdsensing resolution, which limits the sensing ability of crowdsensing networks.

2.1. Generation of Sensing Image via Crowdsensing Network

For crowdsensing based environmental monitoring applications, the participants, i.e., mobile smartphones/vehicles equipped with sensing modules, are the sensing nodes. We define this kind of sensing nodes as crowdsensing node.
Definition 1.
Crowdsensing Node. A smartphone/vehicle with sensing and communication abilities is called a crowdsensing node, denoted by u i , if the smartphone/vehicle senses the natural phenomenon. The value of u i is represented as u i ( x i , y i , v i , t i ) where ( x i , y i ) , v i and t i denote the position, measurement and the corresponding time of u i , respectively.
The natural phenomenon in a city is dynamic, while for a small time interval denoted by T, it can be expressed as a static two-dimensional (2D) signal. Mathematically, the signal is a function of two independent variables, i.e., spatial position ( x , y ) and the phenomenon value is v = F ( x , y ) . Figure 2a shows a case of 2D continuous-space signal (distribution of CO 2 concentration over a region) in T. Assume that the urban region is a square R and s crowdsensing nodes are dynamically distributed in R. Figure 2b illustrates the distribution of 100 crowdsensing nodes at time t i T . Let U = { u i , 1 i s } be the set of crowdsensing nodes in R, and V = { v i , 1 i s } be the set of measurements. In other words, V is the set of sampling points of F ( x , y ) (see Figure 2c). Generation of sensing image, is to estimate the phenomenon values of all points in R via V (see Figure 2d).
In order to implement this estimation, we transform F ( x , y ) from 2D continuous-space signal to a 2D discrete-space signal by dividing the unite square R into m × m grids (see Figure 3a). Each grid has a uniform signal value. Then, the 2D signal could be expressed by a matrix Z m × m for time period T. Because of the uneven distribution of crowdsensing nodes, which is determined by human mobility, there exists three situations for a given grid g i , j :
  • There is only one crowdsensing node in g i , j . For this case, the corresponding entry z i , j in matrix Z is equal to the sensory data provided by the only crowdsensing node.
  • There are more than one crowdsensing nodes in g i , j . The corresponding z i , j is calculated by a weight sum of all the sensory data generated in g i , j . Considering the distribution of crowdsensing nodes in one grid, we build a voronoi diagram according to the locations of crowdsensing nodes (see Figure 3b), and then calculate the weight sum of sensory data where the area of the divided polygons are weights of sensory data.
  • There is no crowdsensing node in g i , j . The corresponding z i , j is set to null.
A commonly used way for filling the null entries in Z is spatial interpolation method. More details about spatial interpolation can be found in [9]. Based on the interpolation method, we give the definition of sensing image as follows:
Definition 2.
Sensing Image. For a given time interval T, the sensing image generated by the set of crowdsensing nodes, U, is defined as the following 2D matrix:
Z = z 1 , 1 ( T ) z 1 , 2 ( T ) z 1 , m ( T ) z 2 , 1 ( T ) z 2 , 2 ( T ) z 2 , m ( T ) z m , 1 ( T ) z m , 2 ( T ) z m , m ( T ) m × m ,
where
z i , j ( T ) = u k U i , j v k a k g , U i , j > 0 ; I n V , g i , j , U i , j = 0 .
In Equation (1), U i , j denotes the set of crowdsensing nodes in grid g i , j within T. Besides, g, a k are the area of the whole grid g i , j and the polygon centered on u k in the corresponding voronoi diagram respectively. I n · , · denotes the interpolation method and V = Δ z i , j ( T ) : g i , j , U i , j > 0 .

2.2. Resolution of Crowdsensing

As mentioned in Section 1, to measure the ability of crowdsensing is to estimate the quality of sensing image, i.e., Q o I . Firstly, we give the definition of Q o I as follows:
Definition 3.
Quality of sensing Image. The quality of a sensing image for a specific time interval T, is defined as the similarity between the sensing image Z and the static 2D discrete-space raw signal Z of the target environment phenomenon. Correlation coefficient is used for measuring the similarity, i.e.,
C ( Z , Z ) = i = 1 m j = 1 m z i , j ( T ) z ¯ i , j ( T ) z i , j ( T ) z ¯ i , j ( T ) i = 1 m j = 1 m z i , j ( T ) z ¯ i , j ( T ) 2 i = 1 m j = 1 m z i , j ( T ) z ¯ i , j ( T ) 2 .
A large correlation value means a strong similarity between Z and Z , i.e., the QoI of Z is high. However, for most cases, the raw signal Z is unknown in advance. Therefore, users cannot calculate the QoI by using this equation directly. Then, we turn to use a new metric, sensing image resolution (i.e., urban resolution proposed in [10]), to measure the quality of sensing image Z .
In conventional image systems, resolution, defined as the number of gridded pixels, is a metric to evaluate the quality of image. Similarly, resolution can also be used to measure the quality of sensing image. However, the challenge is that the resolution of sensing image is not simply the crowdsensing nodes’ count. As the crowdsensing nodes are dispersedly and dynamically distributed and not the strictly gridded distribution.
Indirectly, if we could find out how many gridded sensing nodes are needed to generate a sensing image, denoted by Z , which has the similar QoI with Z generated by s crowdsensing nodes, then the number of gridded sensing nodes, n × n , can be regarded as the resolution of sensing image Z . On the other hand, the corresponding n × n is also an estimation for the sensing ability of crowdsensing network. Therefore, we redefine this metric as crowdsensing resolution.
Definition 4.
Resolution of Crowdsensing. The resolution, denoted by r, of a crowdsensing network with s sensing nodes is defined as n l × n l where
n l = a r g m a x n C Z , Z .
In Equation (2), Z denotes the sensing image generated by the crowdsensing network, and Z denotes a sensing image generated by n × n gridded sensing nodes.
Obviously, when n l becomes large enough, Z could approximately equal to Z , i.e., C Z , Z 1 . That is to say, the crowdsensing network is with high sensing ability.

2.3. Linear Restriction

In paper [10], we utilized three 2D signals with different variation degrees to analyze the relationship between crowdsensing resolution r and sensing nodes number s via Monte Carlo simulations. An approximately linear growth relationship between r and s is revealed.
For s crowdsensing nodes in an unit area, the relationship between n l and s follows:
n l = α s ,
where the slope α is with the reference value in 0.5 , 0.6 .
Therefore, given the number of crowdsensing participants s, we can easily infer the resolution r of the crowdsensing network via the linear relationship. On the other hand, the linear relationship performs as a restriction, which limits the sensing ability that a crowdsensing network could achieve according to the scale of crowdsensing participants.

3. Enhanced Crowdsensing Approach

To enhance the crowdsensing ability and break the linear restriction, we utilize both signal and temporal correlation of sensory data to generate sensing images more precisely. Firstly, we introduce how to model the sensing data acquired by crowdsensing participants. Then, we propose a collaborative tensor decomposition approach to infer the missing items of target signal through signals’ correlation. At last, we illustrate the correlated time slots combination approach to further infer the rest missing items via temporal correlation of environmental signal.

3.1. Data Modeling

As shown in the left part of Figure 4, we model the sensory and correlated sensory data using a tensor, S R N × Q × K , with three dimensions:
  • Region dimension, r = r 1 , r 2 , , r N , denotes N regions which are transferred from m × m grids, one region per grid, g i , j = r i m + j .
  • Category dimension, c = c 1 , c 2 , , c Q , denotes Q signal categorise, where c 1 is the target signal and the others are correlated signals.
  • Time dimension, t = t 1 , t 2 , , t K , denotes K time slots. Here, we divide the monitoring time period into K time slots and the span of each time slot is decided by the sending intervals of crowdsensing nodes.
An entry S i , j , k stores the sensory data, which is acquired in region r i at time slot t k with signal category of c j . Likewise, for the process of mapping data to the tensor S , there also exists three situations as mentioned in Section 2.1 for a specific region and we adopt the same strategy to calculate the sensory data.
As tensor S is sparse, we try to “borrow” more information from correlation between different categories of signals for inferring the missing entries. Although tensor S can capture the correlation between different categories of signals to some context, a specialized matrix can further intensify the correlation. As shown in the right part of Figure 4, we formulate a matrix C R Q × Q to model the signals’ correlation, where an entry C i , j denotes the correlation between signal c i and c j . The correlation is quantized by:
C i , j = t r S r , i , t S ¯ r , i , t S r , j , t S ¯ r , j , t t r S r , i , t S ¯ r , i , t 2 t r S r , j , t S ¯ r , j , t 2 ,
where t is t = 1 K and r is r = 1 , S r , i , t n u l l , S r , j , t n u l l N .
After data modeling, we get a sparse tensor S N × Q × K and a signal correlation matrix, C Q × Q .

3.2. Collaborative Tensor Decomposition

To achieve a higher accuracy of filling the missing entries in tensor S , we exploit the collaborative tensor decomposition method. As shown in the middle part of Figure 4, tensor S is transformed into tucker decomposition model [23], i.e., the multiplication of a core tensor A R d R × d H × d T with three matrices, R R N × d R , H R Q × d H , T R K × d T . Here, d R , d H , d T are very small, denoting the number of latent factors. Moreover, the signal correlation matrix C is decomposed as the self product of matrix H Q × d H , where d H < Q , by low-rank approximation. By requiring them share the same low rank matrix H Q × d H , we could propagate the information between tensor S and matrix C . Then, we define a loss function to quantize the error of collaborative tensor decomposition as:
Γ A , R , H , T = 1 2 I S A × R R × H H × T T 2 + λ 1 2 C H H T 2 + λ 2 2 A 2 + R 2 + H 2 + T 2 ,
where · denotes the Frobenius norm, I is an indicating tensor, and the operator “∘” denotes the entry-wise product. The entry I i , j , k = 0 if S i , j , k is missing, I i , j , k = 1 otherwise. In Equation (4), I S A × R R × H H × T T 2 and C H H T 2 are to measure the errors of decomposing S and C , respectively, and the last part A 2 + R 2 + H 2 + T 2 is regularization penalty to avoid over-fitting. Parameters λ 1 and λ 2 are to control the contribution of each part for collaborative decomposition.
Minimize the loss function above, is to find an optimal result for inferring the missing entries in tensor S with data correlation. But this loss function is not jointly convex to all the variables of A , R , H and T . In general, we cannot get closed-form solutions for the minimization. Therefore, we utilize a gradient descent technique [21] to get a local optimal solutions by minimizing the loss function iteratively. Finally, we recover S by S r e c = A × R R × H H × T T . Here, the multiplication symbol with subscript denotes tensor mode multiplication, e.g., X = A × R R is X n , j , k = i = 1 d R A i , j , k × R n , i .
The mobility features of human crowd result in the uneven distribution of crowdsensing nodes, therefore it forms several hot/blank regions in urban area. For hot regions, there exist many crowdsensing nodes providing their measurements. As a result, different categories of sensory data co-exist in these regions, it is helpful for mining the signal correlation. On the other hand, there also exist some blank regions without any sensory data both for target signal and correlated signals. This means in such regions the target signal can’t be precisely recovered simply by collaborative tensor decomposition.

3.3. Correlated Time Slots Combination

As mentioned in Section 1, the frequency of crowdsensing nodes sending their measurements is much higher than that of environment changing, and the environment phenomenon will keep steady in a relative long period. To further supplement the missing items, we combine several continuous time slots together to generate a sensing image. For a specific crowdsensing node, correlated time slots combination is to make extension from time dimension. Due to the dynamic feature of crowdsensing nodes, in a period, these scattered crowdsensing nodes are more likely to form several sensing traces, therefore cover more regions.
Given a time period T t s , t e , we extract the corresponding slices in tensor S and make an entry-wise combination (see Figure 5). Here, a specific extracted slice for time slot t i is a region-signal matrix. After that we formulate a matrix M (see the right part of Figure 5). An entry m i , j in matrix M , is computed by:
m i , j = t s t t e S i , j , t N , N 0 ; n u l l , N = 0 ,
where N denotes the number of sensory data with S i , j , t n u l l in t s , t e . The first column of M , denoted by V 1 = m 1 , 1 , m 2 , 1 , , m N , 1 T , is the sensory data of target signal in N regions. By converting V 1 back into m × m matrix and using spacial interpolation method to further recover the missing entries, we get the optimized sensing image.

4. Numerical Simulation

In this section, we conduct numerical simulations to examine the promotion of our crowdsensing enhancement approach. Firstly, we illustrate the models we used to generate the mobile traces and environment signals. Then, we demonstrate an instance of the generated sensing images via enhanced crowdsensing approach. At last, we verify the promotion of our approach statistically.

4.1. Basic Models

(1) Mobility model of crowdsensing nodes: Existing works about human/vehicle mobility have summarized the following statistical features from real mobile traces: F1 heterogeneous bounded mobility areas, F2 truncated power-law flights and pause-times, F3 truncated power-law inter-contact times, and F4 fractal waypoints. In this paper, we utilize Self-similar Least Action Walk (SLAW) [11], that produces synthetic walk trajectories containing all above features, to simulate mobility traces of crowdsensing nodes. Parameter settings are summarized in Table 1. More details about SLAW model and the interpretation of these parameters can be found in [11].
After executing the SLAW model, we obtain all the positions of 6000 crowdsensing nodes for every minute over 10 h. Figure 6, illustrates the distribution of the crowdsensing nodes over the whole simulation region at a given instant. From Figure 6, we observe that some hot zones and blank zones exist simultaneously. This implies that the sensing ability of crowdsensing network for different urban regions may be very different. On the other hand, it is extremely complex to generate a sensing image of the whole city via all crowdsensing nodes in the city. Therefore, we divide the simulation region into 25 unit grids with 1 × 1 km 2 , and study the crowdsensing resolution for each grid.
(2) Target and correlated signal models: We utilize the 2D signal shown in Figure 2a as the ground truth of target signal. This is a signal of CO 2 concentration which is generated by the sensory data from 100 CO 2 sensor nodes in a square region (about 1 km 2 ) of Wuxi City, China [24]. To generate the correlated signals, we utilized linear/non-linear transform function as follows:
f 1 z i , j = a z i , j + b + e , f 2 z i , j = a z i , j 2 + b z i , j + c + e ,
where eN 0 , δ is the additive noise. By adjusting the parameters of a , b , c and δ, we generate the linear/non-linear signals with 0.9 correlation (as shown in Figure 7).

4.2. Instance Illustration

We randomly divide the 6000 crowdsensing nodes into 3 groups (2000 nodes for each group) (see Figure 8a). Group I senses the target signal, Group II and III sense the correlated signals, respectively. Assuming that, the correlated time period of environment phenomenon is one hour. To estimate how the frequency of crowdsensing nodes sending their measurements affects crowdsensing promotion, we conduct two experiments by adjusting the sending interval as 10 / 20 min. Here, we extract the positions of sensing nodes from SLAW traces every 10 / 20 min to simulate different sending intervals. Therefore, for the time duration of one hour, we have 6 / 3 correlated time slots respectively. We name these two experiments by S6 and S3 respectively.
As shown in Figure 9, for instance, we choose an unit grid (the red box in Figure 8a) to illustrate the recovered target signal after using enhanced crowdsensing approach. To make comparison, we also select a time slot in this one hour time period and generate a sensing image on the basis of interpolation method (as shown in Figure 9a).
The qualities of Figure 9a–c are 0.653 , 0.884 and 0.929 , respectively. In this unit grid, the average number of crowdsensing nodes (Group I) is 247. Then, we use n × n gridded sampling points to generate sensing images and compute C Z , Z against different values of n (see Figure 10). Let C Z , Z C Z , Z , we can easily acquire the corresponding n l as 7.72 , 12.57 and 13.48 respectively.

4.3. Statistical Results

In this section, we analyze the crowdsensing promotion from statistical perspective. We count the number of Group I nodes for each grid at every time slots and calculate the average number of nodes for every hour. For each grid, we recover the target signal for every hour based on crowdsensing promotion approach with different sending intervals ( 10 / 20 min). Then we calculate the corresponding n l for each sensing image according to Equation (2). Therefore we have 25 grids × 10 h × 1 n l values. We calculate the mean of n l which are with the same s . Figure 11 plots the n l results against different values of s . Moreover, we also plot the n l result based on interpolation method.
According to the linear regression results of simulations, we obtain the linear function n l = 0.583 s for interpolation method. The slope α value of 0.583 is consistent with the result mentioned in paper [10]. We also get the linear functions: n l = 0.774 s and n l = 0.831 s , for enhanced crowdsensing with S3 and S6 respectively. The higher value of α is, the higher resolution achieves for a given s. This statistical result validate that our enhanced crowdsensing approach outperforms the interpolation based method.

5. Case Study: Air Quality Monitoring in Beijing

The U-Air project [20] which is led by Microsoft Research aims to inference fine-grained air quality index (AQI) in Beijing with ubiquitous data. The researchers of U-Air open two data sources they used: locations of air monitoring stations in Beijing, and AQI (including the composition of air pollutants) of each station. In this section, we utilize a portion of these data to verify the effectiveness of our crowdsensing enhancement approach.

5.1. Data Correlation Analysis

In [20], researchers reveal that the concentration of air pollutants is notably influenced by meteorology (e.g., temperature, humidity) through analysing the reported information of monitoring stations. Figure 12a shows the locations of 22 monitoring stations in Beijing, and these stations report AQI every one hour. We extract the parameters of PM 2.5 , PM10 and air humidity index of monitoring station S 9 within the time period from 21 to 27 April 2013, and quantize their correlation by correlation coefficients, denoted by C ( · , · ) .
Figure 13 and Figure 14 shows the variation tendency of PM 2.5 versus PM10 and air humidity index, respectively. Both of the figures reflect clear correlation among PM 2.5 , PM10 and humidity. Specifically, the C ( P M 2.5 , P M 10 ) , C ( P M 2.5 , H u m i d i t y ) are 0.91 and 0.75 . Moreover, we also analyse the correlation among PM 2.5 , PM10 and air humidity index for the rest 21 monitoring stations. The statistical results are given in Table 2.
From Table 2, we observe that there are 20 monitoring stations with their C ( P M 2.5 , P M 10 ) above than 0.8 , and 20 monitoring stations with their C ( P M 2.5 , H u m i d i t y ) above than 0.7 , among all the 22 monitoring stations. This statistical result reveals strong correlation among PM 2.5 , PM10 and air humidity index.

5.2. Data Modelling

Due to the lack of monitoring stations (Beijing only has 22 stations covering a 50 × 50 km 2 land, i.e., 113 km 2 /per station [20]), it is not possible to acquire a fine-grained air pollutants distribution image for the whole urban area simply by spatial interpolation method. We choose the area within the second ring road of Beijing (see the red box zone in Figure 12b), which is the most concentrated area of monitoring station, as the target monitoring area. We map 3 categories of AQI (PM 2.5 , PM10 and air humidity index) onto the selected area according to the locations of their corresponding monitoring stations, and recover the signals by spatial interpolation method for time period around 12:00 at 24 April 2013 (see Figure 15).
From Figure 15, we observe similar distribution status among PM 2.5 , PM10 and air humidity. We take signal of PM 2.5 as the target signal, while the others are correlated signals.

5.3. Comparison of Generated Sensing Images

Microsoft Research shared a collection of taxicab GPS traces gathered in Beijing [25], and these traces are perceived with the statistical features of human mobility. In our crowdsensing network, we deem these ordinary taxicabs as the crowdsensing nodes and randomly select 100 different taxi traces within the target zones. Then, we extract an instant of the taxis’ GPS locations and sense the PM 2.5 signal in Figure 15a. The generated sensing image by spatial interpolation method is shown in Figure 16a. Similar as mentioned in Section 4.2, we also extract the GPS locations of the selected traces for every 20 min to simulate the sensing cycle, then we have 3 time slots of sensing data within the one hour monitoring period. Moreover, we also select 100 traces to sense PM10 signal and 100 traces to sense air humidity signal. By conducting our crowdsensing enhancement approach, we acquire the optimized sensing image (see Figure 16b).
Compared with Figure 16a, the optimized sensing image of Figure 16b holds more details and the correlation coefficient between Figure 15a and Figure 16a is 0.58 , while the correlation coefficient between Figure 15a and Figure 16b is 0.91 . This result validates the utility of our method.

6. Conclusions

This paper studied the problem of how to promote the sensing ability of crowdsensing network for fine-grained environmental monitoring. In practice, the sensing ability of crowdsensing is limited by the number and space distribution of crowdsensing participants. To enhance the ability of crowdsensing, we proposed a multi-source data driven approach and then utilized a novel metric called “crowdsensing resolution” to quantize the enhancement. The kernel of our enhanced crowdsensing approach, is to leverage temporal and sensory correlated data to help with the recovery of sensing image in blank zones. By improving the sensing ability for blank zones, the crowdsensing network could generate sensing images more precisely, i.e., achieve a higher crowdsensing resolution. More specifically, we built a tensor to model the sensory data and summarized a signal correlation matrix to quantize the correlation between different categories of sensory data. By conducting collaborative tensor decomposition and correlated time slots combination, we presented the optimized sensing images. The numerical simulations and a real dataset based case study verified the resolution promotion of our enhanced crowdsensing beyond traditional interpolation-based approach.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant 61332005, Grant 61272517 and Grant 61632008, the Funds for Creative Research Groups of China under Grant No. 61421061, the Cosponsored Project of Beijing Committee of Education and Beijing Training Project for The Leading Talents in S&T (ljrc201502).

Author Contributions

The work presented here was carried out in collaboration between all authors. Xu Kang, Liang Liu, and Huadong Ma contributed the conception and developed the idea. Xu Kang carried out the experiments and data analysis. Xu Kang and Liang Liu wrote the paper. Huadong Ma provided useful suggestions and helped revise the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, X. Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  2. Dong, M.; Ota, K.; Liu, A. RMER: Reliable and Energy Efficient Data Collection for Large-scale Wireless Sensor Networks. IEEE Internet Things J. 2016, 3, 511–519. [Google Scholar] [CrossRef]
  3. Hu, Y.; Dong, M.; Ota, K.; Liu, A. Mobile Target Detection in Wireless Sensor Networks With Adjustable Sensing Frequency. IEEE Syst. J. 2016, 10, 1160–1171. [Google Scholar] [CrossRef]
  4. Liu, X.; Wei, T.; Liu, A. Fast Program Codes Dissemination for Smart Wireless Software Defined Networks. Sci. Program. 2016, 2016, 1–21. [Google Scholar] [CrossRef]
  5. Liu, Y.; Dong, M.; Ota, K.; Liu, A. ActiveTrust: Secure and Trustable Routing in Wireless Sensor Networks. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2013–2027. [Google Scholar] [CrossRef]
  6. Campbell, A.T.; Eisenman, S.B.; Lane, N.D.; Miluzzo, E.; Peterson, R.A.; Lu, H.; Zheng, X.; Musolesi, M.; Fodor, K.; Ahn, G.S. The rise of people-centric sensing. IEEE Internet Comput. 2008, 12, 12–21. [Google Scholar] [CrossRef]
  7. Tang, Z.; Liu, A.; Li, Z.; Choi, Y.J.; Sekiya, H.; Li, J. A Trust-Based Model for Security Cooperating in Vehicular Cloud Computing. Mobile Inf. Syst. 2016, 2016, 1–22. [Google Scholar] [CrossRef]
  8. Ganti, R.K.; Ye, F.; Lei, H. Mobile crowdsensing: Current state and future challenges. IEEE Commun. Mag. 2011, 49, 32–39. [Google Scholar] [CrossRef]
  9. Mitas, L.; Mitasova, H. Spatial interpolation. Geogr. Inf. Syst. Princ. Tech. Manag. Appl. 1999, 1, 481–492. [Google Scholar]
  10. Liu, L.; Wei, W.; Zhao, D.; Ma, H. Urban resolution: New metric for measuring the quality of urban sensing. IEEE Trans. Mob. Comput. 2015, 14, 2560–2575. [Google Scholar] [CrossRef]
  11. Lee, K.; Hong, S.; Kim, S.J.; Rhee, I. SLAW: Self-Similar Least-Action Human Walk. IEEE ACM Trans. Netw. 2012, 20, 515–529. [Google Scholar] [CrossRef]
  12. Wang, L.; Zhang, D.; Wang, Y.; Chen, C. Sparse mobile crowdsensing: Challenges and opportunities. IEEE Commun. Mag. 2016, 54, 161–167. [Google Scholar] [CrossRef]
  13. Wang, L.; Zhang, D.; Pathak, A.; Chen, C.; Xiong, H.; Yang, D.; Wang, Y. CCS-TA: Quality-guaranteed online task allocation in compressive crowdsensing. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 7–11 September 2015; pp. 683–694.
  14. Zhang, Y.; Roughan, M.; Willinger, W.; Qiu, L. Spatio-temporal compressive sensing and internet traffic matrices. Acm Sigcomm Comput. Commun. Rev. 2009, 39, 267–278. [Google Scholar] [CrossRef]
  15. Li, Z.; Zhu, Y.; Zhu, H.; Li, M. Compressive Sensing Approach to Urban Traffic Sensing. In Proceedings of the International Conference on Distributed Computing Systems, Minneapolis, MN, USA, 20–24 June 2011; pp. 889–898.
  16. Rallapalli, S.; Qiu, L.; Zhang, Y.; Chen, Y.C. Exploiting temporal stability and low-rank structure for localization in mobile networks. In Proceedings of the Sixteenth Annual International Conference on Mobile Computing and Networking, Chicago, IL, USA, 20–24 September 2010; pp. 161–172.
  17. Kong, L.; Xia, M.; Liu, X.Y.; Chen, G.; Gu, Y.; Wu, M.Y.; Liu, X. Data Loss and Reconstruction in Wireless Sensor Networks. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 2818–2828. [Google Scholar] [CrossRef]
  18. Xu, L.; Hao, X.; Lane, N.D.; Liu, X.; Moscibroda, T. More with less: Lowering user burden in mobile crowdsourcing through compressive sensing. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 7–11 September 2015; pp. 659–670.
  19. Zhang, Y.; Roughan, M.; Willinger, W.; Qiu, L. Spatio-temporal Compressive Sensing and Internet Traffic Matrices. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, Barcelona, Spain, 16–21 August 2009; pp. 267–278.
  20. Zheng, Y.; Liu, F.; Hsieh, H.P. U-Air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1436–1444.
  21. Zheng, Y.; Liu, T.; Wang, Y.; Zhu, Y.; Liu, Y.; Chang, E. Diagnosing New York city’s noises with ubiquitous data. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13–17 September 2014; pp. 715–725.
  22. Vardoulakis, S.; Fisher, B.E.; Pericleous, K.; Gonzalez-Flesca, N. Modelling air quality in street canyons: A review. Atmos. Environ. 2003, 37, 155–182. [Google Scholar] [CrossRef]
  23. Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. Soc. Ind. Appl. Math. Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
  24. Mao, X.; Miao, X.; He, Y.; Li, X.Y. Citysee: Urban CO2 monitoring with sensors. In Proceedings of the IEEE INFOCOM International Conference on Computer Communications, Orlando, FL, USA, 25–30 March 2012; pp. 1611–1619.
  25. Yuan, J.; Zheng, Y.; Xie, X.; Sun, G. Driving with knowledge from the physical world. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 316–324.
Figure 1. Crowdsensing networks based urban environment monitoring. (a) The crowdsensing networks, consisting of vast smartphones/vehicles, work as an urban camera; (b) Sensing image of target phenomenon generated by the crowdsensing network via interpolation method; (c) Sensing trajectories; (d) Correlated signals; (e) Sensing image after using enhanced crowdsensing approach.
Figure 1. Crowdsensing networks based urban environment monitoring. (a) The crowdsensing networks, consisting of vast smartphones/vehicles, work as an urban camera; (b) Sensing image of target phenomenon generated by the crowdsensing network via interpolation method; (c) Sensing trajectories; (d) Correlated signals; (e) Sensing image after using enhanced crowdsensing approach.
Sensors 17 00088 g001
Figure 2. The process of generating sensing image via crowdsensing. (a) The 2D signal of CO 2 concentration; (b) The distribution of 100 nodes in R with their sending time t i T ; (c) The sampling points of 2D signal; (d) Sensing image generated by V.
Figure 2. The process of generating sensing image via crowdsensing. (a) The 2D signal of CO 2 concentration; (b) The distribution of 100 nodes in R with their sending time t i T ; (c) The sampling points of 2D signal; (d) Sensing image generated by V.
Sensors 17 00088 g002
Figure 3. (a) Division of unit square R by m × m uniform grids; (b) A voronoi diagram based on the locations of crowdsensing nodes.
Figure 3. (a) Division of unit square R by m × m uniform grids; (b) A voronoi diagram based on the locations of crowdsensing nodes.
Sensors 17 00088 g003
Figure 4. Collaborative tensor decomposition.
Figure 4. Collaborative tensor decomposition.
Sensors 17 00088 g004
Figure 5. Correlated time slots combination.
Figure 5. Correlated time slots combination.
Sensors 17 00088 g005
Figure 6. An instant of 6000 crowdsensing nodes generated by SLAW model.
Figure 6. An instant of 6000 crowdsensing nodes generated by SLAW model.
Sensors 17 00088 g006
Figure 7. Correlated signals. (a) Linear correlated signal: f 1 z i , j = 0.95 × z i , j + 0.05 + N 0 , 0.06 ; (b) Non-linear correlated signal: f 2 z i , j = 0.9 × z i , j 2 + 0.9 × z i , j + 0.1 + N 0 , 0.11 .
Figure 7. Correlated signals. (a) Linear correlated signal: f 1 z i , j = 0.95 × z i , j + 0.05 + N 0 , 0.06 ; (b) Non-linear correlated signal: f 2 z i , j = 0.9 × z i , j 2 + 0.9 × z i , j + 0.1 + N 0 , 0.11 .
Sensors 17 00088 g007
Figure 8. (a) Distribution of 6000 nodes (Groups I,II, and III ) over the whole 5 × 5 km 2 region which is divided into 25 unit squares; (b) Distribution of crowdsensing nodes over the unit area indicated by the red box in (a).
Figure 8. (a) Distribution of 6000 nodes (Groups I,II, and III ) over the whole 5 × 5 km 2 region which is divided into 25 unit squares; (b) Distribution of crowdsensing nodes over the unit area indicated by the red box in (a).
Sensors 17 00088 g008
Figure 9. Recovered signals (generated sensing images). (a) Interpolation method; (b) Enhanced crowdsensing approach with 3 time slots combination; (c) Enhanced crowdsensing approach with 6 time slots combination.
Figure 9. Recovered signals (generated sensing images). (a) Interpolation method; (b) Enhanced crowdsensing approach with 3 time slots combination; (c) Enhanced crowdsensing approach with 6 time slots combination.
Sensors 17 00088 g009
Figure 10. Correlation coefficients C Z , Z against different values of n.
Figure 10. Correlation coefficients C Z , Z against different values of n.
Sensors 17 00088 g010
Figure 11. Relationship between crowdsensing resolution r and s crowdsensing nodes in an unit area which are generated by SLAW model. The x-axis denotes s , and the y-axis denotes n l , i.e., r .
Figure 11. Relationship between crowdsensing resolution r and s crowdsensing nodes in an unit area which are generated by SLAW model. The x-axis denotes s , and the y-axis denotes n l , i.e., r .
Sensors 17 00088 g011
Figure 12. (a) Air quality monitoring stations in Beijing viewed from Google Earth; (b) The selected monitoring area within 2th ring road of Beijing.
Figure 12. (a) Air quality monitoring stations in Beijing viewed from Google Earth; (b) The selected monitoring area within 2th ring road of Beijing.
Sensors 17 00088 g012
Figure 13. Variation tendency of PM 2.5 and PM10 in monitoring station S 9 during the time period of 21–27 April 2013.
Figure 13. Variation tendency of PM 2.5 and PM10 in monitoring station S 9 during the time period of 21–27 April 2013.
Sensors 17 00088 g013
Figure 14. Variation tendency of PM 2.5 and air humidity in monitoring station S 9 during the time period of 21–27 April 2013.
Figure 14. Variation tendency of PM 2.5 and air humidity in monitoring station S 9 during the time period of 21–27 April 2013.
Sensors 17 00088 g014
Figure 15. The constructed signals within the red box zones of Figure 12b. (a) Signal of PM 2.5 ; (b) Signal of PM10; (c) Signal of air humidity.
Figure 15. The constructed signals within the red box zones of Figure 12b. (a) Signal of PM 2.5 ; (b) Signal of PM10; (c) Signal of air humidity.
Sensors 17 00088 g015
Figure 16. Recovered signals (sensing images). (a) The generated sensing image based on interpolation method; (b) The generated sensing image based on enhanced crowdsensing approach.
Figure 16. Recovered signals (sensing images). (a) The generated sensing image based on interpolation method; (b) The generated sensing image based on enhanced crowdsensing approach.
Sensors 17 00088 g016
Table 1. Parameter settings of SLAW model.
Table 1. Parameter settings of SLAW model.
ParameterValue
Distance alpha3
Number of mobile nodes6000
Simulation area 5000 × 5000 m 2
Number of waypoints6000
Hurst parameter0.75
Time duration10 h
Clustering range100 m
Levy exponent for pause time1
Minimum/maximum pause time30 s/1800 s
Table 2. Statistical result of correlation analysis.
Table 2. Statistical result of correlation analysis.
Value Ranges of C ( · , · ) Number of Monitoring Stations
C ( PM 2.5 , PM 10 ) C ( PM 2.5 , Humidity )
C ( · , · ) [ 0.9 , 1.0 ) 130
C ( · , · ) [ 0.8 , 0.9 ) 71
C ( · , · ) [ 0.7 , 0.8 ) 119
C ( · , · ) [ 0.5 , 0.7 ) 12

Share and Cite

MDPI and ACS Style

Kang, X.; Liu, L.; Ma, H. Enhance the Quality of Crowdsensing for Fine-Grained Urban Environment Monitoring via Data Correlation. Sensors 2017, 17, 88. https://doi.org/10.3390/s17010088

AMA Style

Kang X, Liu L, Ma H. Enhance the Quality of Crowdsensing for Fine-Grained Urban Environment Monitoring via Data Correlation. Sensors. 2017; 17(1):88. https://doi.org/10.3390/s17010088

Chicago/Turabian Style

Kang, Xu, Liang Liu, and Huadong Ma. 2017. "Enhance the Quality of Crowdsensing for Fine-Grained Urban Environment Monitoring via Data Correlation" Sensors 17, no. 1: 88. https://doi.org/10.3390/s17010088

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop