Integration of Remote Sensing and Social Sensing Data in a Deep Learning Framework for Hourly Urban PM2.5 Mapping

Fine spatiotemporal mapping of PM2.5 concentration in urban areas is of great significance in epidemiologic research. However, both the diversity and the complex nonlinear relationships of PM2.5 influencing factors pose challenges for accurate mapping. To address these issues, we innovatively combined social sensing data with remote sensing data and other auxiliary variables, which can bring both natural and social factors into the modeling; meanwhile, we used a deep learning method to learn the nonlinear relationships. The geospatial analysis methods were applied to realize effective feature extraction of the social sensing data and a grid matching process was carried out to integrate the spatiotemporal multi-source heterogeneous data. Based on this research strategy, we finally generated hourly PM2.5 concentration data at a spatial resolution of 0.01°. This method was successfully applied to the central urban area of Wuhan in China, which the optimal result of the 10-fold cross-validation R2 was 0.832. Our work indicated that the real-time check-in and traffic index variables can improve both quantitative and mapping results. The mapping results could be potentially applied for urban environmental monitoring, pollution exposure assessment, and health risk research.


Introduction
Fine particles with an aerodynamic diameter of less than 2.5 micrometers (PM 2.5 ), which correspond to the "high-risk respirable convention", as defined in [1], have aroused worldwide concern [2]. The troubling thing is that 92% of the world population are exposed to PM 2.5 air pollution concentration that is above the annual mean World Health Organization Air Quality Guidelines (WHO AQG) level of 10 µg/m 3 [3]. In addition to the health effects, PM 2.5 also has significant impacts on climate change, agricultural production and ecological environment [4].
To our knowledge, a number of studies have explored the effects of PM 2.5 on health and evaluated population exposure to PM 2.5 , based on a continuous distribution of PM 2.5 [5,6]. And the results showed that the accuracy of the PM 2.5 concentration estimation has a great impact on the research conclusions, and the spatiotemporal PM 2.5 distribution data are very important basic data. However, ground monitoring stations for PM 2.5 are limited up to now, because these static and expensive facilities are often sparsely and unevenly distributed in study area [7]. Hence, generating accurate fine spatiotemporal mapping of PM 2.5 concentration is important to meet the practical demand.
The study region shown in Figure 1 is the central urban area of Wuhan, which is the largest city in central China. According to the statistics [4], the PM 2.5 concentration level in Wuhan is above the average level in China. In Wuhan, approximately 60% of the permanent population lives on about 10% of the land in the central urban area [41], so that we need to pay attention to the effect of PM 2.5 pollution on human and environmental health [42]. Meanwhile, Wuhan is the core city of the Yangtze River Economic Belt, and also a comprehensive transportation hub for China. Thus, the ecological construction of Wuhan is particularly important, and the spatiotemporally continuous mapping of PM 2.5 is essential for environment monitoring. Given all of the above, we chose this area as a case study, as the existing studies have rarely considered this region.

Data Sets
In this study, we took five categories of data (from 24 January 2018 to 31 July 2018) into consideration: ground station PM2.5 data, social sensing data, remote sensing data, meteorological data and terrain data.

Data Sets
In this study, we took five categories of data (from 24 January 2018 to 31 July 2018) into consideration: ground station PM 2.5 data, social sensing data, remote sensing data, meteorological data and terrain data.
2.2.1. Ground Station PM 2.5 Hourly PM 2.5 data for the study period were obtained from the China National Environmental Monitoring Center (CNEMC) web site (http://www.cnemc.cn/) and the Hubei Environmental Monitoring Center (http://hbt.hubei.gov.cn/hjxx/). There are 11 ground monitoring stations in the central area of Wuhan. We also took the neighboring sites into consideration in the modeling process, so that 20 stations, in total, were considered in our experiment. About 90,000 PM 2.5 concentration records were collected for the study period.

Social Sensing Data Remote Sensing Data
We collected four kinds of social sensing data: real-time check-in data, traffic index data, road network data and POIs. The first two are dynamic real-time data, and the last two are relatively static data.
Firstly, real-time check-in data were obtained from the "Tencent Location Big Data" service (https://heat.qq.com/). This service updates every 5 min with a spatial resolution of approximately 0.01 • for about 6000 location points in Wuhan. The anonymized and passively collected geolocation data allow the analysis of population activity and mobile patterns.
Secondly, traffic index data were gathered from the NavInfo Traffic Index platform (http://www. nitrafficindex.com/). The traffic index is a quantitative indicator which has six levels on the basis of the actual road speed and road conditions, and also the subjective feeling degree about traffic congestion of people is added to describe the road traffic operation status. We obtained the traffic index data of 502 Roads in Wuhan in each hour which can be used as a factor to reflect the real-time traffic influence on the atmosphere.
Finally, POIs were obtained from the Amap developer platform (https://lbs.amap.com/), including company enterprises, traffic facilities, road facilities, scenic spots, and other types.

Remote Sensing Data
Two kinds of products were used in our study, one closely corresponding to the ground station PM 2.5 and the other reflecting the land-cover information. We chose the Himawari-8 Level 3 hourly AOT product based on the method developed by Yoshida, et al. [43] and subsequently improved by Kikuchi, et al. [44], with strict cloud screening using the differences in the spatiotemporal variability characteristic of aerosol and cloud. This product has a good quality but a poor spatial coverage. Hourly AOT data for the study period with a spatial resolution of 5-km were downloaded from the Japan Aerospace Exploration Agency (JAXA) P-Tree System (http://www.eorc.jaxa.jp/ptree/). In this study, only aerosol retrievals with the highest confidence level ("very good") were adopted for the estimation of PM 2.5 . The MODIS normalized difference vegetation index (NDVI) was used for the presentation of land use types. The 16-day synthetic product with a spatial resolution of 1-km (MOD13Q1) was downloaded from the Level-1 and Atmosphere Archive & Distribution System (LAADS) Web site (http://ladsweb.nascom.nasa.gov/).

Terrain Data
We used the NASA Shuttle Radar Topographic Mission (SRTM) digital elevation model (DEM) product as terrain data, which has a resolution of 90 m at the equator. The data were obtained from the Consultative Group on International Agricultural Research-Consortium for Spatial Information (CGIAR-CSI) (http://srtm.csi.cgiar.org/). Figure 2 shows the data and modeling processes. We performed data preprocessing on the original multi-data, then employed geospatial analysis methods and image processing means to construct and extract the input variables (The abbreviations for data sets and variables are summarized in Supplementary Material: Table S1). When all the variables were converted to the raster format, we matched the grids of multiple variables on a specific hour scale. Then the multivariate vector of each labeled grid (including the site-based PM 2.5 observation) could be obtained, which is also the form of the model input sample. Finally, a spatiotemporally uniform multi-source feature set could be used for modeling. We used the deep belief network (DBN) model [45] in the study, which is one of the most classic deep learning models. The quantitative evaluation and mapping feedback were both considered for obtaining more reliable and accurate results. The feature extraction and DBN model are explained in detail in the following. 1 "IDW" means inverse distance weighting method; "POI" means points of interest; "AOT" means aerosol optical thickness product; "NDVI" means normalized difference vegetation index product; "DEM" means digital elevation model data; "RBM" means restricted Boltzmann machine; "BP" means the back-propagation neural network.

Feature Extraction
While there are many ways to construct and extract the input variables, some of them fairly Figure 2. Data and modeling processes. "IDW" means inverse distance weighting method; "POI" means points of interest; "AOT" means aerosol optical thickness product; "NDVI" means normalized difference vegetation index product; "DEM" means digital elevation model data; "RBM" means restricted Boltzmann machine; "BP" means the back-propagation neural network.

Feature Extraction
While there are many ways to construct and extract the input variables, some of them fairly involved, we attempted to use geospatial analysis, geostatistics, and image processing methods to extract features from multi-source heterogeneous data effectively and easily. Then the features with different spatiotemporal scales and different data formats were unified into the raster format on the same scale, which could be easily used to generate sample data for modeling.
3.1.1. Spatiotemporal Features of PM 2.5 The spatiotemporal distribution of PM 2.5 concentration follows Tobler's First Law of Geography [46], which means that near things are more related to each other. According to the spatiotemporal autocorrelation of PM 2.5 , we calculated the characteristic variables of the initial distribution of concentration. Figure 3 shows the spatial correlation and time dependence of the unlabeled grid which can be inferred by the adjacent labeled grids that have observed PM 2.5 concentrations. The inverse distance weighting (IDW) method was used to calculate the spatial feature of PM 2.5 (PM s ) and the temporal feature of PM 2.5 (PM t ). . Figure 3. Schematic diagram of calculating the spatiotemporal features of PM2.5. The ti, ti-1, ti-2 represent one hour, two hours, and three hours before the current moment respectively. The labeled grids (s1, s2, s3, s4) represent the places that include ground monitoring stations, while the unlabeled grid (s0) represents the space that needs to be estimated.

Social Sensing Features
(1) Real-Time Check-In Compared with traditional demographic data, Tencent real-time check-in data can dynamically reflect the spatial distribution and temporal variation characteristics of population. The high correlation between the check-in density of social media data and human density distribution has been revealed by many studies [47][48][49]. The raw data are in JavaScript Object Notation (JSON) format. We transcoded and vectorized the data to get the point data with a 0.01° spatial resolution. Considering that the instability and uncertainty of the acquisition of social sensing data would cause the absence of data in some regions or at some points, we used the IDW spatial interpolation method to fill the gaps and also get the raster data. The numbers of check-ins of each grid represent the distribution of population. Finally, the hourly averaged real-time check-in (RTCI) feature was adopted in our model.
(2) Traffic Index Density Real-time traffic index data can reflect traffic flow information. Studies such as Forehead and Huynh [38] have proved that automobile exhaust emissions have a great impact on PM2.5 pollution, where PM2.5 concentration often rises in times of traffic congestion or at rush hour. The raw data were converted to the line features with traffic index attribute. Then the kernel density analysis (KDA) method [50] was used to estimate the kernel density of the traffic index in study area, so as to obtain the hourly raster data of the traffic index density feature (TID) at a 0.01° spatial resolution. The KDA could capture the spot of traffic congestion and could quickly calculate the spatial distribution of the line features. . The t i , t i-1 , t i-2 represent one hour, two hours, and three hours before the current moment respectively. The labeled grids (s 1 , s 2 , s 3 , s 4 ) represent the places that include ground monitoring stations, while the unlabeled grid (s 0 ) represents the space that needs to be estimated.

Social Sensing Features
(1) Real-Time Check-In Compared with traditional demographic data, Tencent real-time check-in data can dynamically reflect the spatial distribution and temporal variation characteristics of population. The high correlation between the check-in density of social media data and human density distribution has been revealed by many studies [47][48][49]. The raw data are in JavaScript Object Notation (JSON) format. We transcoded and vectorized the data to get the point data with a 0.01 • spatial resolution. Considering that the instability and uncertainty of the acquisition of social sensing data would cause the absence of data in some regions or at some points, we used the IDW spatial interpolation method to fill the gaps and also get the raster data. The numbers of check-ins of each grid represent the distribution of population. Finally, the hourly averaged real-time check-in (RTCI) feature was adopted in our model.
(2) Traffic Index Density Real-time traffic index data can reflect traffic flow information. Studies such as Forehead and Huynh [38] have proved that automobile exhaust emissions have a great impact on PM 2.5 pollution, where PM 2.5 concentration often rises in times of traffic congestion or at rush hour. The raw data were converted to the line features with traffic index attribute. Then the kernel density analysis (KDA) method [50] was used to estimate the kernel density of the traffic index in study area, so as to obtain the hourly raster data of the traffic index density feature (TID) at a 0.01 • spatial resolution. The KDA could capture the spot of traffic congestion and could quickly calculate the spatial distribution of the line features.

(3) Road Network Density
The road network can reflect the spatial pattern of a city. Its form and layout often divide an urban system into blocks of different sizes and different functional areas. Four levels of roads were concluded in this study, i.e., highways, main roads, secondary roads and branch roads. The ratio of the total length of the roads to a certain area was regarded as the density of the road network (ROAD). ROAD was a static variable during study period.

(4) POIs
POIs can be regarded as a mass of points of interest abstracted from various entities in a city, including infrastructures, business districts, catering and entertainment places, office buildings, industrial enterprises, scenic spots, etc. POIs are a portrait of the whole city and reflect the appearance of urban development. Allowing for the fact that not all POIs and PM 2.5 are relevant, we filtered two groups of representative characteristic variables from all the POIs: the potential sources of pollution type (PS), including chemical plants, steel mills, textile factories, printworks, and others; and the cleaner location type (Scen), including water bodies, parks, and scenic spots. The buffer analysis method was used to calculate the POI numbers within the buffer range. Finally, each grid was given the value of the number of each type of POIs.

Other Raster Features
The remote sensing data (AOT and NDVI), meteorological data, and DEM data are all organized as raster data originally. We resampled the raster data at a 0.01 • spatial resolution. For the NDVI product with a revisit period of 16 days, multi-scene data sets corresponding to each 16-day period during the research period were generated and successively arranged; and the DEM data remained unchanged during the study period. Finally, we obtained features of AOT, NDVI, DEM, RH, TEM, EWS, NWS, SP, and PBLH.

Deep Learning Model for PM 2.5 Estimation
Deep learning is able to mine complex and nonlinear relationships between many variables, so as to provide the prospect that effectively predicts the spatial and temporal distribution of PM 2.5 . We used the DBN model that outperforms the other traditional algorithms on PM 2.5 estimation [27]. It is an alternatively a class of simple, unsupervised networks such as restricted Boltzmann machines (RBMs), composed of multiple layers of latent variables, with connections between the layers but not between units within each layer [45]. As shown in Figure 4, our training model has 3 RBM layers and selects a back-propagation (BP) neural network as the prediction method. The input is labeled samples that each sample involves the ground truth monitoring values cooperating with the multi-source variables, shown that each grid X corresponds to a multivariate vector. In order to eliminate dimensionality and accelerate the convergence speed of this model, the Min-Max Normalization method was adopted before training. The output layer contains the learned weights of each neuron. Finally, the general structure used to estimate PM 2.5 is: where SSD means the social sensing variables, including RTCI, TID, POIs, and ROAD; and RSD includes AOT and NDVI; and the meteorological variable Wea contains WS, RH, PBLH, TEMP, and SP.
All the input variables were explained earlier in Section 3.1. The three main parts of the model process are as follows.
order to eliminate dimensionality and accelerate the convergence speed of this model, the Min-Max Normalization method was adopted before training. The output layer contains the learned weights of each neuron. Finally, the general structure used to estimate PM2.5 is: where SSD means the social sensing variables, including RTCI , TID , POIs , and ROAD ; and RSD includes AOT and NDVI ; and the meteorological variable Wea contains WS , RH , PBLH , TEMP , and SP . All the input variables were explained earlier in Section 3.1. The three main parts of the model process are as follows.  (1) Pre-Training The pre-training process was performed by a series of RBM layers, as shown in Figure 4, which is a method of generating model weights by unsupervised learning from layer to layer. One RBM is a two-layer network, including a visible layer (v) of m neurons and a hidden layer (h) of n neurons, both of which are connected by weights (W) [51]. A training method called contrastive divergence (CD) [52] was used to get the weight updated. The activation probability of each neuron in the hidden layer was calculated as shown in (2). Similarly, the conditional distribution probability of reconstructing the visible layer with the hidden layer was calculated as shown in (3).
where b i and c j are the bias of the ith visible neuron v i and the jth hidden neuron h j , respectively; w ij is the weight between the two neurons. The log sig indicates the activation function log sig(x) = 1 1+exp(−x) , which introduced the nonlinear characteristics into our network.
By calculating the activation probability of the hidden layer neuron inferred from the real visible layer as p(h j v idata ) and that inferred from the visible layer reconstructed from the hidden layer as p(h j v ireconstruction ) , the weights and bias parameters were updated as where λ is the learning rate. After the CD training process, the hidden layer can not only accurately express the characteristics of the input features of the visible layer, but it can also reconstruct the visible layer. We carried out many experiments to adjust the optimum network parameters, and finally chose 3 hidden layers, with the neuron number of each layer being 12, 24 and 36, so that the multi-layer RBM could realize deep feature extraction.
(2) Fine-Tuning We selected the BP neural network to achieve the fine-tuning of the entire network, which can reverse the PM 2.5 estimation error to each RBM, layer by layer. The mean-squared normalized error performance function (MSE) was used to measure the network error as where n is the total number of samples; and y(p), ∧ y(p) are the target value and the output value of the p th input sample respectively. Whether to reverse the error information or not depends on whether the condition is satisfied. When the error meets the preset accuracy or the number of iterations reaches the upper limit, the algorithm is terminated.
In the BP network, the weight of the output of the RBMs was used as the input, which overcame the shortcoming of the BP falling into local optima due to the random initialization of the weight parameters. The weight of each layer was updated by where η is the learning rate, and ∇E(w) is the partial derivative of the network error, with respect to the weight of a certain layer. We used the Levenberg-Marquardt (L-M) backpropagation method [53] as the training function for achieving the optimal solution of the minimized error. The L-M can accelerate the speed of convergence and avoid getting trapped in local optima. (

3) Prediction
This step was based on the reiterative validations of the DBN model. We selected the most appropriate network parameters in the training process with the labeled grid samples. The trained DBN network net and the setting of the normalized parameters were saved for the prediction as where X input_reg is the unlabeled grid vectors, which were normalized using the same normalized parameters as the training input data. Thus, the spatially continuous PM 2.5 concentration can be obtained by using the simulation function sim.

Validation
We carried out both the quantitative verification and the mapping test to obtain more reliable network parameters and more accurate estimation results. A 10-fold cross-validation (CV) method [54] was used to evaluate the overall estimation capability of the DBN model. Specifically, the sample set was randomly divided into 10 parts, with one part as the validation set and the others as the training set. The 10 data parts were then successively verified and, finally, the average value of the results of the 10 parts was adopted as the modeling accuracy. We adopted the statistical indicators of the coefficient of determination (R 2 ), the root-mean-square error (RMSE, µg/m 3 ), the mean prediction error (MPE, µg/m 3 ), and the relative prediction error (RPE, %) to evaluate the model performance. Meanwhile, the mapping effect of the PM 2.5 spatial distribution as a feedback mechanism was also applied, to assist with the validation. The variables that caused obvious anomalies in mapping the continuous distribution of PM 2.5 and that did not improve the estimation accuracy obviously were removed. Based on the multi-validation, the optimal combination of variables and the optimum network parameters could be selected.

Results and Discussion
Due to the cloud cover and bright surface with high reflectance [34], there are a lot of the missing data in the hourly AOT data for the central urban area of Wuhan during the research period. Therefore, we extracted two sample sets: (1) One set (Sample set A) containing approximately 80,000 sample pairs which are not matched with AOT; and (2) the other set (Sample set B) containing only about 1600 sample pairs that are matched with AOT.

Descriptive Statistics
According to the statistics on data collected during the study period, the PM 2.5 concentration, ranged from 2 µg/m 3 to 209 µg/m 3 , with an average of 53.88 µg/m 3 and a large variation on the spatiotemporal scale. Taking the Sample set B as an example shown in Figure 5. The diagonal line of the matrix visualizes the distribution of some variables. Excluding the NDVI, other variables are non-normally distributed (skewed distributed, multi-peak distributed, etc.). The bivariate scatter plots in the lower triangle show that PM 2.5 exhibits a nonlinear relationship with most other variables. In the sample sets, the values of some variables are very unevenly distributed, which means that these features are less representative when training the model (Scen, DEM, etc.). From the upper triangle, we find that: (1) PM 2.5 is negatively correlated with RH and TEM, which is exactly in line with a previous study in Wuhan [55]; (2) PBLH and NDVI also presents the negative relationships with PM 2.5 , in that a low atmospheric boundary layer height is not conducive to the diffusion and dilution of PM 2.5 , while vegetation can clean and purify the atmospheric environment; and (3) there are almost no linear correlations between social sensing variables and PM 2.5 . These results indicate that the traditional methods based on the assumption of linear relationships between variables would not be suitable to mine and explain these complex nonlinear relationships between variables.   Figure S1, Figure S2).

Model Accuracy Evaluation
Both the model validation results and the mapping continuity of spatial distribution are  Figures S1 and S2). "RTCI" means real-time check-in variable; "PBLH" means planetary boundary layer height; "TEM" means air temperature at a 2-m height; "RH" means specific relative humidity; "Scen" means the cleaner location type POI.

Model Accuracy Evaluation
Both the model validation results and the mapping continuity of spatial distribution are important in applications. Sample set A for modeling, with an R 2 of 0.832 (Table 1), obtained a higher accuracy than using Sample set B, which had an R 2 of 0.742 (Table 2). And a more continuous spatiotemporal distribution could be obtained by using Sample set A. Accordingly, Sample set A was preferred for the modeling and evaluation. More details about the selection of sample sets are provided in the discussion in Section 4.4. 1 "RMSE" means the root-mean-square error; 2 "MPE" means the mean prediction error; 3 "RPE" mans the relative prediction error; 4 "optimal variables A" means the optimal variable combination based the Sample set A (Time, NDVI, PM s , PM t , RTCI, TID, RH, TEM, EWS, NWS, SP, PBLH); 5 "RTCI" means real-time check-in variable; 6 "TID" means traffic index variable; 7 "PM s " means the spatial feature of PM 2.5 ; 8 "PM t " means the temporal feature of PM 2.5 ; 9 "Wea" means the meteorological variables. The effective and optimal variables appropriate for the DBN model were selected according to the modeling framework designed in this study and constrained during the entire study period. Finally, based on the experiments, a certain combination of variables (Time, NDVI, PM s , PM t , RTCI, TID, RH, TEM, EWS, NWS, SP, PBLH) resulted in the most optimized model accuracy, as well as the mapping results. Specially, the selection of social sensing variables is explained in more detail in the discussion in Section 4.3. Table 1 lists the evaluation results generated by 10-fold CV. The model fitting R 2 is 0.850, and the RMSE is 9.303 µg/m 3 . The CV results of R 2 , RMSE, MPE, and RPE are 0.832, 9.864 µg/m 3 , 6.961 µg/m 3 and 23.764% respectively, which indicates that the model explains 83.2% of variability of the ground measured PM 2.5 . There is no over-fitting phenomenon by comparing the results of model fitting and CV, which indicates the reliability of the trained model.

Quantitative Evaluation Results
For further exploration, we also investigated the effects of each kind of variable in the optimal variable combination A. As shown in Table 1, the accuracy of the model is reduced to varying degrees when any category of variables is removed. In particular, if the variables of RTCI and TID are removed, the R 2 lowered 0.045 and the RMSE increased 1.22 µg/m 3 . Real-time check-in variable reflects the population activity and mobile patterns. Previous study found that particulate concentration hot spots are mainly distributed in urban centers where human activities accumulate, resulting in increased PM 2.5 concentrations [39]. The automobile exhaust emission is one of the main sources of PM 2.5 pollution in urban area [4]. Thus, both the RTCI and TID have great impacts on the model accuracy. As for NDVI, likely due to the potential filtering and absorption function of the vegetation [36], NDVI affected the model accuracy to a certain extent. The results demonstrated that with the integration of the remote sensing data NDVI, the dynamic social sensing data and other auxiliary data, the modeling result can be effectively promoted.

Mapping Results of PM 2.5 Concentration
Taking one day (17 April 2018) as an example, Figure 6 shows the 24 h spatial distribution of PM 2.5 in the central urban area of Wuhan. Comparing with most studies using the AOT to predict PM 2.5 , we could also predict the PM 2.5 distribution during the nighttime and obtain continuous mapping of hourly spatial distribution of PM 2.5 , since the input variables of the model were almost completely covered over time and space. Int. J. Environ. Res. Public Health 2019, 16, x 13 of 20 residential areas is higher than others [55]. What is more, the higher concentration spots in the low concentration areas can be identified as pollution emergencies. The PM2.5 distributions of larger temporal scales (daily, monthly, seasonal, etc.) can be generated by the hourly spatial mapping results. Figure 7 displays a map of the average concentration of estimated PM2.5 overlaid with the average concentration for each monitoring station during the study period. Satisfactorily, the two sets of data show good consistency. It can be seen that the spatial distribution of averaged PM2.5 concentration differ significantly in downtown Wuhan. PM2.5 pollution is serious in the Qingshan district, where station 1329A is located, since the Wuhan Iron and Steel Group Corporation is located in this area. The industrial waste gas emission has a bad effect on the air quality of the surrounding area. On the whole, most of the mapping grids of average concentration are higher than the annual average standard (35 3 μg/m ) set by the Air Quality Guidelines of China [56], and far from the standard set by the World Health Organization (10 3 μg/m ) [3].  The mapping result reflects the spatiotemporal characteristics of PM 2.5 . Temporally, the hourly changes of PM 2.5 concentration are obvious. Nevertheless, previous research has mostly considered daily, monthly, or even coarser temporal scales [10,15,16,18,19], which may hide the details of the hourly changes. The mapping results of this model provide more details of hourly PM 2.5 concentration, which can be applied for urban real-time monitoring and air pollution prevention. With regard to the spatial scale, the precision of the 0.01 • resolution can more precisely reflect the spatial details. The PM 2.5 concentration in the Qingshan district, where heavy industries gather, shows a relatively high level, which is followed by the Hongshan district and the Wuchang district, where traffic and population is more concentrated. The literature on the spatiotemporal distribution of PM 2.5 in Wuhan has also shown that the PM 2.5 concentration in industrial areas, traffic areas and residential areas is higher than others [55]. What is more, the higher concentration spots in the low concentration areas can be identified as pollution emergencies.
The PM 2.5 distributions of larger temporal scales (daily, monthly, seasonal, etc.) can be generated by the hourly spatial mapping results. Figure 7 displays a map of the average concentration of estimated PM 2.5 overlaid with the average concentration for each monitoring station during the study period. Satisfactorily, the two sets of data show good consistency. It can be seen that the spatial distribution of averaged PM 2.5 concentration differ significantly in downtown Wuhan. PM 2.5 pollution is serious in the Qingshan district, where station 1329A is located, since the Wuhan Iron and Steel Group Corporation is located in this area. The industrial waste gas emission has a bad effect on the air quality of the surrounding area. On the whole, most of the mapping grids of average concentration are higher than the annual average standard (35 µg/m 3 ) set by the Air Quality Guidelines of China [56], and far from the standard set by the World Health Organization (10 µg/m 3 ) [3].

The Effects of the Social Sensing Variables
In order to explore the effects of all kinds of social sensing variables, we conducted experiments on each social sensing variable and then verified the results. Figure 8 and Figure 9 show the quantitative and mapping results of the four models, respectively, corresponding to (a), (b), (c), and (d). As a contrast, (a) is the result using optimal variable combination A.
Analyzing the quantitative results, as shown in Figure 8(b), without RTCI and TID in optimal variable combination A, R 2 drops from 0.832 to 0.787, and the slope of the blue fit line (0.790) in Figure 8(b) is less than the slope (0.838) in Figure 8(a). This means that RTCI and TID play positive effects in the model, in that they improve the model accuracy and reduce the extent of the underestimation. Figure 8(c) and Figure 8(d) respectively show the validation results of adding ROAD and POIs (PS, Scen) into optimal variable combination A, where it can be seen that both

The Effects of the Social Sensing Variables
In order to explore the effects of all kinds of social sensing variables, we conducted experiments on each social sensing variable and then verified the results. Figures 8 and 9 show the quantitative and mapping results of the four models, respectively, corresponding to (a), (b), (c), and (d). As a contrast, (a) is the result using optimal variable combination A. time result a better performance than the relatively static data (POIs, ROAD) when the distribution of samples is sparse and heterogeneous.   time result a better performance than the relatively static data (POIs, ROAD) when the distribution of samples is sparse and heterogeneous.   Analyzing the quantitative results, as shown in Figure 8b, without RTCI and TID in optimal variable combination A, R 2 drops from 0.832 to 0.787, and the slope of the blue fit line (0.790) in Figure 8b is less than the slope (0.838) in Figure 8a. This means that RTCI and TID play positive effects in the model, in that they improve the model accuracy and reduce the extent of the underestimation. Figure 8c,d respectively show the validation results of adding ROAD and POIs (PS, Scen) into optimal variable combination A, where it can be seen that both variables can improve the model accuracy slightly (0.837, 0.848).

The Dialectical Selection of the AOT Variable
From the mapping results, Figure 9a presents the spatial distribution of PM 2.5 for a given hour. The distribution of PM 2.5 is generally consistent with the heterogeneity of the spatial distribution, and the transition is smooth. Figure 9b lacks more spatial details compared with Figure 9a, and the performance is insufficient in the high-value area. This illustrates that RTCI and TID are beneficial for the mapping of PM 2.5 concentration. Figure 9c,d show that when adding ROAD and POIs to estimate PM 2.5 concentration, the mapping results contain more outliers, and the spatial distribution of PM 2.5 concentration also shows some differences with the results shown in Figure 9a. Obviously, we can see the low-value anomaly in the northeastern part of the downtown shown in Figure 9c, and the high-value anomaly in the southwestern part of the downtown shown in Figure 9d. This is probably because of the greater spatiotemporal heterogeneity of these relatively static variables. Furthermore, the representative samples from the monitoring stations in the study area are insufficient.
Taking both the quantitative evaluation and the mapping results into consideration, although ROAD and POIs can improve the estimation accuracy slightly, they bring obvious anomalies when mapping the continuous distribution of PM 2.5 . Thus, we finally removed these variables from our modeling process. Overall, the dynamic social sensing variables (RTCI, TID) that change in real time result a better performance than the relatively static data (POIs, ROAD) when the distribution of samples is sparse and heterogeneous.

The Dialectical Selection of the AOT Variable
Considering the wide used of AOT products in PM 2.5 concentration estimation, we discuss the selection of the AOT variable in a practical scenario from two aspects. On the one hand, there are large gaps in the AOT data due to the restriction of conditions, as mentioned above. The intuitive performance is that the great quantity gap of the two sample sets (set A: 80,000 and set B: 1600) used for modeling. We compared the model results (Tables 1 and 2) of the two sample sets and found that the validation result using sample set A for modeling obtains a higher accuracy, with an R 2 of 0.832, than using sample set B, with an R 2 of 0.742, which can be interpreted as the deep learning model needing a large amount of data to obtain more stable and accurate results. What is more, if we included AOT during the study period, there would be lots of gaps in the spatiotemporal mapping, and the coverage would be significantly reduced. Therefore, considering the limited temporal and spatial scope of the application scenarios, it is feasible to exclude the AOT variable, in order to obtain sufficient sample data and a more continuous mapping result.
On the other hand, the AOT variable can have a positive effect on the model accuracy. As shown in Table 2, which is based on Sample set B, R 2 decreases by 0.033, when the AOT variable is excluded from the model. Overall, the dialectical selection of the AOT variable could be based on the practical application, considering the temporal and spatial conditions.

Conclusions
Mapping the hourly and continuously distributed PM 2.5 is meaningful for air quality monitoring and health risk research. In this study, we mainly considered remote sensing data, social sensing data, meteorological data and the spatiotemporal features of PM 2.5 to estimate hourly PM 2.5 in the central area of Wuhan. A spatiotemporal grid matching framework was proposed to unify the multi-source heterogeneous data, and the DBN method was introduced to learn the complex relationships among variables. By exploring the effects of PM 2.5 influencing factors in the estimation accuracy and mapping results of PM 2.5 concentration, we came to the following conclusions: (1) The real-time check-in data and traffic index data have a positive influence on fine-scale air pollution studies, which dynamic characteristics can help to identify hot events; (2) when the relatively static variables vary widely in spatiotemporal scale and the representative samples are insufficient, these variables usually bring anomalies in the process of estimating PM 2.5 ; (3) the AOT variable should be dialectically selected, such as considering the limited modeling conditions, whether the PM 2.5 concentration at night is needed and whether the full-coverage mapping results can be obtained.
Further study will focus on improving the model ability to learn rare features, and we will look to explore the intensive observations data for monitoring the air pollutants [57,58]. It is worthwhile to integrate remote sensing and social sensing for spatiotemporal estimation of air pollution based on advanced statistical methods. We hope that this paper will inspire researchers to study the integration of multi-source data using advanced methods for urban ecological applications.
Supplementary Materials: The following are available online at http://www.mdpi.com/1660-4601/16/21/4102/s1, Table S1: Abbreviations for data sets and variables, Figure S1: Diagram of the composite analysis matrix based on Sample set A, Figure S2