Real-Time Estimation of Population Exposure to PM2.5 Using Mobile- and Station-Based Big Data

Extremely high fine particulate matter (PM2.5) concentration has been a topic of special concern in recent years because of its important and sensitive relation with health risks. However, many previous PM2.5 exposure assessments have practical limitations, due to the assumption that population distribution or air pollution levels are spatially stationary and temporally constant and people move within regions of generally the same air quality throughout a day or other time periods. To deal with this challenge, we propose a novel method to achieve the real-time estimation of population exposure to PM2.5 in China by integrating mobile-phone locating-request (MPL) big data and station-based PM2.5 observations. Nationwide experiments show that the proposed method can yield the estimation of population exposure to PM2.5 concentrations and cumulative inhaled PM2.5 masses with a 3-h updating frequency. Compared with the census-based method, it introduced the dynamics of population distribution into the exposure estimation, thereby providing an improved way to better assess the population exposure to PM2.5 at different temporal scales. Additionally, the proposed method and dataset can be easily extended to estimate other ambient pollutant exposures such as PM10, O3, SO2, and NO2, and may hold potential utilities in supporting the environmental exposure assessment and related policy-driven environmental actions.


Introduction
Air pollutants, especially fine particulate matters such as PM 2.5 (particles with an aerodynamic diameter less than 2.5 µm), have been the focus of increasing public concern because of its strong relation with health risks [1,2]. Numerous epidemiologic studies have established robust associations between long-term exposure to PM 2.5 and premature mortality associated with various health conditions-such as heart disease, cardiovascular and respiratory diseases, and lung cancer-that substantially reduce life expectancy [2][3][4][5][6][7]. With the unprecedented economic development and urbanization over the past three decades, the severe and widespread PM 2.5 pollution has been one of the biggest health threats in China [8,9]. The Ministry of Environmental Protection reported that only eight of the 74 monitored cities meet China's ambient air quality standards (annual mean: 35 µg/m 3 ; and 24-h mean: 75 µg/m 3 ) in 2014 [10], and the number of cities was only three in 2013 [11].
people move within regions of generally the same air quality throughout a day or other time periods. Thus, how to obtain real-time estimations of population exposure to PM 2.5 concentrations is urgently needed for instant or short-time assessments (e.g., hourly or short-term PM 2.5 concentrations are more relevant to vulnerable population groups than the daily or monthly concentrations on average [14]) and cumulative exposure effects (the aggregation of short-term assessments is more robust than the monthly or annual average).
Addressing these ubiquitous challenges, more information on human space-time location is required. Some of previous studies have tried to use surveying data, such as travel questionnaire surveys, personal GPS or smart sensor based devices [14,31,32] to delineate how an individual move in the city during his/her daily life. For example, Lu and Fang [32] used the GPS-equipped portable air sensor to measure air pollutant intakes in individual's immediate surroundings and space-time movement trajectories in Huston, Texas. However, their high expenses and limited samples within local areas barricade the data availability. The alternative approaches are to use mathematical models to simulate population mobility patterns, such as gravity model [33] and radiation model [34]. This kind of methods allow us to draw more quantitative conclusions from a larger population size, but their results are only valid for situations with similar initial parameters in the simulation process [29]. Recently, Park and Kwan [14] simulated 80 possible daily movement trajectories based on daily trip distribution data from the Congestion Management Program Report to reflect the actual commuting tendency of Los Angeles (USA) county residents, and estimated exposure risks by considering the interactions between air pollution and individuals' location. However, such kind of studies are still constrained to limited spatial and temporal scales. With the rapid growth of mobile internet, especially the location-based services of applications (apps) in the smartphones, it makes us possible to access direct spatiotemporal records of human activities [35,36]. Additionally, the high correlation between the mobile-phone locating-request records and the spatiotemporal characteristics of human activities has been revealed by many studies [37][38][39]. A growing number of studies have started to use mobile phone data in the field of environmental exposure assessments [29,30,40]. For example, Dewulf et al. [29] collected mobile phone data of approximately five million mobile users in Belgium to calculate the daily exposure to NO 2 . Gariazzo et al. [30] conducted a dynamic city-wide air pollution (NO 2 , O 3 , and PM 2.5 ) exposure assessment by using time resolved population distributions derived from mobile phone traffic data, and modelled air pollutants concentrations. Yu et al. [40] combined cell phone location data from 9886 SIMcard IDs in Shenzhen, China to assess the misclassification errors in air pollution exposure estimation. Although all these pioneering studies highlight the promising advantages of incorporating population dynamics in estimating air pollution exposure, the available datasets are still limited to sample sizes and spatiotemporal scales due to the cost and time for collecting fine-resolution data, data privacy and confidentiality issues, and computational complexities [41].
To investigate the nationwide PM 2.5 concentration risks for population in China, spatially explicit and temporally continuous studies are needed to detect hotspots, estimate vulnerability, and assess population exposure at finer temporal scales. In this paper, we propose a novel approach to achieve the real-time estimation of population exposure to PM 2.5 by integrating mobile-phone locating-request (MPL) big data and station-based PM 2.5 observations. Compared with previous studies regarding ambient pollution exposure assessments, it has the following highlights. First, the proposed method introduces the dynamics of population distribution into the nationwide exposure estimation, thereby providing an improved way to better assess the actual exposure risk to PM 2.5 at different temporal scales. Second, to the best of our knowledge, it is the first time to provide the real-time estimation of nationwide population exposure to PM 2.5 at pixel-based level (~1.2 km) in China. Third, the proposed method and dataset can be easily extended to estimate other ambient pollutant exposures such as PM 10 , O 3 , SO 2 , and NO 2 , and may hold potential utilities in supporting the environmental exposure assessments and related policy-driven environmental actions.

Ground-Station PM 2.5 Measurements
Hourly ground-station PM 2.5 measurements from 1 March to 31 March 2016 were collected from the official website of the China Environmental Monitoring Center (http://113.108.142.147: 20035/emcpublish/). According to the Chinese National Ambient Air Quality Standard (CNAAQS), the station-based PM 2.5 data in China were obtained using the tapered element oscillating microbalance method (TEOM) or the beta-attenuation method, combined with the periodic calibration. In this study, we used a total of 1465 monitoring stations ( Figure 1) that have been established in all provinces for monitoring ambient air quality. , the station-based PM2.5 data in China were obtained using the tapered element oscillating microbalance method (TEOM) or the beta-attenuation method, combined with the periodic calibration. In this study, we used a total of 1465 monitoring stations ( Figure 1) that have been established in all provinces for monitoring ambient air quality.

Ground-Station Meteorological Measurements
Ground-station meteorological variables, including air temperature (AT), surface wind speed (WS), and horizontal visibility (VIS) were used from Global Telecommunication System (GTS) established by World Meteorological Organization (https://rda.ucar.edu/datasets/ds461.0/). In this study, the 3-h measurements (from 2:00 a.m. to 23:00 p.m. local time) from 411 stations in China and 128 stations within the 0.01-degree buffer zones around the boundary of China ( Figure 1) were collected from 1 March to 31 March 2016.

Mobile Phone Locating-Request Big Data
By retrieving real-time locating requests from mobile phone users' activities in apps, the mobile phone locating-request (MPL) data was used in this study to monitor human movement. The MPL data are from Tencent big data platform in China, which is one of the largest Internet service providers both nationwide and worldwide. All of the MPL data are produced by active smartphone users using apps, which have been enabled to report real-time locations from the mobile devices. Due to the widespread usage of Tencent apps (e.g., WeChat, QQ, Tencent Map, etc.) and their locationbased services, the daily locating records have reached 36 billion from more than 450 million users globally in 2016 [42]. Thus, the MPL big data can be represented as an indicator to characterize human

Ground-Station Meteorological Measurements
Ground-station meteorological variables, including air temperature (AT), surface wind speed (WS), and horizontal visibility (VIS) were used from Global Telecommunication System (GTS) established by World Meteorological Organization (https://rda.ucar.edu/datasets/ds461.0/). In this study, the 3-h measurements (from 2:00 a.m. to 23:00 p.m. local time) from 411 stations in China and 128 stations within the 0.01-degree buffer zones around the boundary of China ( Figure 1) were collected from 1 March to 31 March 2016.

Mobile Phone Locating-Request Big Data
By retrieving real-time locating requests from mobile phone users' activities in apps, the mobile phone locating-request (MPL) data was used in this study to monitor human movement. The MPL data are from Tencent big data platform in China, which is one of the largest Internet service providers both nationwide and worldwide. All of the MPL data are produced by active smartphone users using apps, which have been enabled to report real-time locations from the mobile devices. Due to the widespread usage of Tencent apps (e.g., WeChat, QQ, Tencent Map, etc.) and their location-based services, the daily locating records have reached 36 billion from more than 450 million users globally in 2016 [42]. Thus, the MPL big data can be represented as an indicator to characterize human activities and population distribution in a fine spatiotemporal scale. The Tencent MPL dataset used in this study was collected from 1 March to 31 March 2016 via the application program interface (API) from the Tencent big data platform (http://heat.qq.com). The original Tencent MPL dataset was recorded by aggregating the real-time locations of active apps users every five minutes within a mesh grid at a spatial resolution of 30 arc-second (~1.2 km). All the information regarding users' identities and privacies were removed in this publicly available dataset.

Population Census Data
The latest city-level population census of China in 2014 obtained from the national scientific data sharing platform for population and health (http://www.ncmi.cn/) was used in this study. This dataset was established and maintained by infectious disease network reporting system, and it was derived based on population census released by the State Statistics Bureau. It collected all population census including permanent resident and registered resident at the county level by gender and age group since 2004.

Estimation of Spatiotemporal Continuous PM 2.5 Concentrations
Due to the difference in geographic locations between PM 2.5 monitoring stations and meteorological stations, all datasets were processed to be consistent in spatial and temporal domains. The meteorological variables were first interpolated by ordinary Kriging method [43] to obtain data that covering the entire study area with a spatial resolution of 30 arc-second (~1.2 km). To mitigate the interpolation biases, we averaged all meteorological observations with a 30 arc-second search radius around each PM 2.5 monitoring station, and then assigned the result to the corresponding PM 2.5 monitoring station. In addition, the widely used Geographically Weighted Regression (GWR) model [44] with adaptive Gaussian bandwidth was adopted to build the statistical relationship between meteorological variables and PM 2.5 concentrations. Specifically, we grouped all variables within a month into 8 time points (i.e., from 2:00 a.m., 5:00 a.m., . . . , 23:00 p.m.), and then developed 8 GWR models for each time point in this study as follows: where PM 2.5,i,t denotes the PM 2.5 concentration at the location i at time t, VIS i,t , AT i,t , and WS i,t denote the visibility (m), air temperature ( • C), and surface wind speed (m/s), respectively, at location i at time t. β 0,i,t , β 1,i,t , β 2,i,t , and β 3,i,t are corresponding regression coefficients at location i at time t. A 10-fold validation analysis [45] was adopted to evaluate the modeling performance by comparing the estimated and measured PM 2.5 concentrations (details can be found in Supplementary Materials). With the iterative cross validations, the optimal coefficients in each time point were retrieved to interpolate the entire study areas with a spatial resolution of 30 arc-second (~1.2 km), and then were used to estimate gridded PM 2.5 concentrations.

Estimation of Real-Time Population Distribution by Integrating MPL and Census Data
The mobile phone locating-request (MPL) data can be served as an indicator to delineate the spatiotemporal pattern of population distribution, however, the MPL data do not represent the actual population sizes. In this study, we first aggregated the 5-min MPL data into 3-h MPL data, making its temporal resolution consistent with that of the estimated PM 2.5 concentrations, and calculated the pixel-based population density using the MPL data, and then applied the MPL-based population density map to downscale the census data. Consequently, we can obtain the 3-h pixel-based population approximations. Given the difference of physical environment and socio-economic development in various areas of China, downscaling the MPL data with population census at the national scale will undoubtedly result in the underestimation of population in under-and less-developed areas and overestimation of population in those developed areas. To solve this problem, we decided to estimate real-time population distribution by integrating MPL and census data at the city level. The 3-h MPL map was used to redistribute the census data for each city by Equations (2) and (3), under the assumption that the inter-city mobility will not dramatically influence the total population of a city within a short time window. Finally, we could obtain the 3-h pixel-based population approximation for each city, and then conducted the image mosaic to produce the 3-h national-scale population distribution map in China.
where p i,j is the amount of locating-request times within the i-th pixel at the hour j, n is the total number of pixels within a city, W i,j is the weight for redistributing population and TR is the total population in the city from the census data. Pop i,j denotes the population approximation in the i-th pixel at the hour j.

Real-Time Estimation of Population Exposure to PM 2.5
Since the levels of PM 2.5 concentration and population distribution are spatially and temporally varied, here we adopted the population-weighted metric (Equation (4)) to estimate the real-time exposure risks to PM 2.5 concentrations, which was likely to be more representative of population exposure to PM 2.5 across different temporal scales [46]: where pop i and pm i denote the population and PM 2.5 concentration level in the i-th pixel, N is the total number of pixels within the corresponding administrative unit. PWP is the population-weighted PM 2.5 concentration level for the targeted administrative unit. With the PM 2.5 concentrations and population distribution estimated in previous sections, we could integrate them based on Equation (4) to provide the estimation of population exposure to PM 2.5 with a 3-h updating frequency, thereby being able to track the real-time dynamics of exposure risks by considering the spatiotemporal variation of PM 2.5 concentration and population distribution.
2.8. Estimation of Cumulative Inhaled PM 2.5 PM 2.5 concentration causes acute and chronic adverse effects on human health mainly by means of inhalation exposure. To our understanding, deriving the estimations of cumulative inhaled PM 2.5 masses will be one of the most important prerequisites to model the accurate relationship between PM 2.5 exposure and human health [47][48][49]. Thus, we proposed to incorporate human respiratory volume and the spatiotemporal variation of PM 2.5 concentration and population density to present a better estimation of cumulative inhaled PM 2.5 : where p i and h i denote the population and the inhaled volume of air for the i-th age group, N is the total number of the age group. t denotes the time (hours in this study), m(t) denotes the PM 2.5 concentration level at time t, T is the target temporal period, d i is the percentage of outdoor population, α is the outdoor-indoor ratio of PM 2.5 concentration.
However, recent advances regarding the outdoor-indoor ratio of PM 2.5 concentrations are all limited to local scales for the purpose of experimental tests [50], as it is difficult to acquire such valid observations relating this ratio on a large scale. More importantly, the outdoor-indoor ratio is influenced by several factors such as geographic location, building structures, and living habits. In addition, the inhaled volume of air is also different, not only in terms of age differences but of physical activities, gender, and size, all of these factors would affect the inhaled value [51,52]. Thus, we have to simplify the ideal model in Equation (5) for being suitable to nationwide estimates of cumulative inhaled PM 2.5 masses by neglecting the difference between outdoor and indoor PM 2.5 concentration exposure and the inhaled volume of air among different age groups, gender, and other related factors. In this way, we can directly obtain the estimation of cumulative inhaled PM 2.5 masses using the following equation: where InPM 2.5 denotes the cumulative inhaled PM 2.5 mass from the simplified model, and h denotes the empirical inhaled volume of air. A measurement conducted by Adams [51] based on 200 individuals showed that the hourly average volume of air breathed by adults when they are sitting or resting were ranging from 0.42 to 0.63 m 3 (i.e., 10.08 to 15.12 m 3 /day), and the volumes for walking were from 1.20 to 1.44 m 3 /h, and for running were from 3.10 to 3.48 m 3 /h. Thus, the average inhaled volume of air for an individual is assumed to be 15 m 3 /day in this study [52].

Comparison of Exposure Assessments from the MPL-Based and Census-Based Methods
In order to investigate whether the improvement of incorporating dynamic population distributions does make a difference in the exposure assessment, we intuitively compared the MPL-based and census-based calculations of cumulative inhaled PM 2.5 masses and population-weighted PM 2.5 exposure concentrations in China's 359 cities across different temporal scales (i.e., 3-h, 1-day, 1-week, and 1-month). For each city, the population from the census data was directly used in the census-based method, while the redistributed population dynamics was used in the MPL-based method.

Different Facets of Population Exposure to PM 2.5
The spatiotemporal integration of PM 2.5 concentration and population density was used to produce thematic information that document different facets of population exposure to PM 2.5 . Figure 2 shows an extracted example from the time-series analysis of population exposure to PM 2.5 in China. Figure 2a shows the real-time nationwide estimation of population distribution (11:00 a.m.) on 1 March 2016, which is derived by integrating MPL and census data at a city-level scale in Section 2.6. The intensity represents the specific population number in each gridded pixel with stretched colors from blue to red denoting varied population size. Figure 2b shows the real-time nationwide estimation of PM 2.5 concentrations (11:00 a.m.), which is derived from incorporating ground-station PM 2.5 measurements and meteorological variables based on GWR models in Section 2.5. Figure 2c shows the nationwide estimation of 24-h cumulative inhaled PM 2.5 masses. Figure 2d shows the estimation of 24-h cumulative inhaled PM 2.5 masses based on the census data.

Different Facets of Population Exposure to PM2.5
The spatiotemporal integration of PM2.5 concentration and population density was used to produce thematic information that document different facets of population exposure to PM2.5. Figure  2 shows an extracted example from the time-series analysis of population exposure to PM2.5 in China.

Temporal Dynamics of Population Exposure to PM 2.5
In the form of Figure 2a Figure 2a shows the real-time nationwide estimation of population distribution (11:00 a.m.) on 1 March 2016, which is derived by integrating MPL and census data at a city-level scale in Section 2.6. The intensity represents the specific population number in each gridded pixel with stretched colors from blue to red denoting varied population size. Figure 2b shows the real-time nationwide estimation of PM2.5 concentrations (11:00 a.m.), which is derived from incorporating ground-station PM2.5 measurements and meteorological variables based on GWR models in Section 2.5. Figure 2c shows the nationwide estimation of 24-h cumulative inhaled PM2.5 masses. Figure 2d shows the estimation of 24-h cumulative inhaled PM2.5 masses based on the census data.

Temporal Dynamics of Population Exposure to PM2.5
In the form of Figure 2a

Comparison of Exposure Assessment Methods
From the visual inspection from Figure 2c,d, it can be found out that the MPL-based method yields the gridded cumulative inhaled PM2.5 masses, whereas the census-based assessments are only based on administrative units (cities in this study), which informs us that the MPL-based method improves the spatial resolution of basic cells from administrative units to gridded pixels in exposure assessments. In addition, by comparing the cumulative inhaled PM2.5 masses and population-

Comparison of Exposure Assessment Methods
From the visual inspection from Figure 2c,d, it can be found out that the MPL-based method yields the gridded cumulative inhaled PM 2.5 masses, whereas the census-based assessments are only based on administrative units (cities in this study), which informs us that the MPL-based method improves the spatial resolution of basic cells from administrative units to gridded pixels in exposure assessments. In addition, by comparing the cumulative inhaled PM 2.5 masses and population-weighted PM 2.5 exposure concentrations in China's 359 cities across different temporal scales, results in Figure 4 show that without introducing the dynamics of population distribution into the exposure assessment, the maximum biases (over-or under-estimation) of cumulative inhaled PM 2.5 mass reach to over 100% across different temporal scales. Meanwhile, the maximum biases of population-weighted PM 2.

Discussion
Compared with previous methods for air pollution exposure assessment, the proposed method in this study considered well the spatiotemporal variability of both population distribution and PM2.5 concentration levels, thereby contributing to a better exposure assessment. The relative reasonability of our method may be due to the following strengths. First, the spatiotemporal variability of PM2.5 concentrations and population distribution are incorporated in air pollution exposure assessments. Given that the level of PM2.5 concentrations is continuously changing over space and time and human beings are also mobile across spatiotemporal scales [14], both of these dynamic characteristics and their interactions at finer spatiotemporal scales should be well considered to estimate population exposure risks. However, many previous studies always used the census data with the assumption that people are non-mobile or moving within regions of generally the same air quality throughout a day or other time periods, thus leading to considerable biases in actual air pollution exposure assessments. In reality, people in different areas experience different levels of PM2.5 concentrations across different temporal scales. In order to characterize the interaction between population dynamics and PM2.5 concentrations, here we used the mobile-phone locating-request (MPL) big data to quantify the dynamics of population distribution. By integrating the MPL and census data, we then derived real-time pixel-based population dynamics at the nationwide scale. Combing this nationwide population dynamic information and surface-based PM2.5 concentrations simultaneously will be of great importance to assess the actual population exposure to PM2.5 at different temporal scales. Second, the characterized dynamics of PM2.5 concentrations and population dynamics in the

Discussion
Compared with previous methods for air pollution exposure assessment, the proposed method in this study considered well the spatiotemporal variability of both population distribution and PM 2.5 concentration levels, thereby contributing to a better exposure assessment. The relative reasonability of our method may be due to the following strengths. First, the spatiotemporal variability of PM 2.5 concentrations and population distribution are incorporated in air pollution exposure assessments. Given that the level of PM 2.5 concentrations is continuously changing over space and time and human beings are also mobile across spatiotemporal scales [14], both of these dynamic characteristics and their interactions at finer spatiotemporal scales should be well considered to estimate population exposure risks. However, many previous studies always used the census data with the assumption that people are non-mobile or moving within regions of generally the same air quality throughout a day or other time periods, thus leading to considerable biases in actual air pollution exposure assessments. In reality, people in different areas experience different levels of PM 2.5 concentrations across different temporal scales. In order to characterize the interaction between population dynamics and PM 2.5 concentrations, here we used the mobile-phone locating-request (MPL) big data to quantify the dynamics of population distribution. By integrating the MPL and census data, we then derived real-time pixel-based population dynamics at the nationwide scale. Combing this nationwide population dynamic information and surface-based PM 2.5 concentrations simultaneously will be of great importance to assess the actual population exposure to PM 2.5 at different temporal scales. Second, the characterized dynamics of PM 2.5 concentrations and population dynamics in the proposed method keep a consistent spatiotemporal scale. The MPL data used in this study were initially retrieved at a 5-min updating temporal resolution from the Tencent big data platform. We further aggregated the 5-min updating MPL data into 3-h synthetic data, making it temporally comparable to the updating frequency of the nationwide surface-based PM 2.5 concentrations. Meanwhile, the spatial resolution of PM 2.5 concentrations is also set to be with a 30 arc-second (~1.2 km) spatial resolution, which is the same with that of MPL data. These efforts contribute much to achieving near real-time (3-h) estimates of national population exposure to PM 2.5 at the pixel-based level in China. Third, the presented model incorporated human respiratory volume and the spatiotemporal variation of PM 2.5 concentration and population density to estimate cumulative inhaled PM 2.5 masses. It will contribute to advancing the development of modelling the relationship between PM 2.5 exposures, health risks, and life expectancies quantitatively.
Besides PM 2.5 , the ground monitoring stations are always coupled with sensors measuring other air pollutants such as PM 10 , SO 2 , NO 2 , and O 3 . With the similar framework by integrating mobile phone big data and air pollutant concentrations, the proposed method can also be customized to estimate population exposure risks to these ambient pollutants in China. Compared with the census-based method, the MPL-based method can yield near real-time estimations of population exposure to ambient pollutants. That is, we can achieve the estimation of air pollution exposure risks at any specific location and time on a large scale by combining the spatiotemporal variability of population distribution and air pollutant concentrations. By aggregating the short-term exposure assessments into longer temporal scales, we can also derive more robust and reliable estimations related to the chronic effects from air pollutants. Additionally, the proposed framework can be also applied to estimate the real-time number of people exposed to poor air quality as a result of updating the population distribution and air pollutant concentrations.
Meanwhile, some potential concerns regarding the implementation of the proposed method should be pointed out. First, in order to redistribute the census data to derive real-time population dynamics using the MPL data, we assume that the total population of each administrative unit (359 cities in this study) is constant since the inter-city mobility (the trade-off of inflow and outflow population) will not dramatically influence the total population of a city within a short time window. Thus, human movements and migrations across administrative units are neglected in this study. Second, volunteer-produced geospatial big data, such as MPL records in this study tend to leave out some population groups of the society because the children, the elderly, and the poor are less-frequent active users. Nevertheless, such data can still well quantify actual population distribution patterns [35,37,38] because of the massive volumes of data records. Here we take the MPL records in China on 1 March 2016 for example, the total number of locating-request records reaches 1.71 billion. By aggregating all MPL records from 1 March to 31 March 2016, the total number of locating-request records will be approximately 60 billion, thereby providing a robust measurement of population dynamics. Third, although the nationwide PM 2.5 concentrations used in this study are estimated by incorporating the meteorological variables and ground-based PM 2.5 measurements with the GWR models, the spatial interpolations are still the limits to affect the estimation accuracy in areas without sufficient inputs of station-based variables. As a result, even there is much greater spatial variations in the population data, there will be relatively less spatial variations in PM 2.5 concentrations, which may lead to no significant impacts on the exposure assessments. However, with·the comparison of exposure assessments between the MPL-based and the census-based methods, we can still figure out considerable differences. Thus, if we can further improve the estimation of PM 2.5 concentrations, such as developing spatial-temporal integrated method by combing satellite-based and station-based observations guided with the diurnal change pattern of PM 2.5 concentrations, land cover/use types, landscape topography, and related meteorological variables, the combination of the mobile phone big data and the improved air pollutant concentrations will contribute to a more reliable exposure assessment. Finally, the simplified model without considering outdoor-indoor ratio of PM 2.5 concentrations and the difference of inhaled volumes of air among different population groups may be biased to the assessment of actual cumulative inhaled PM 2.5 masses. As the Tencent-based MPL dataset was recorded by aggregating the real-time locations of active apps users within a mesh grid at a spatial resolution of 30 arc-second (~1.2 km) without differentiating individual's moving trajectories and population groups, it was impractical to apply empirical parameters into the exposure assessment at a nationwide scale since the outdoor-indoor ratio of PM 2.5 concentrations is influenced by several factors such as geographical locations, building materials, living habits, and so on. Similarly, the gridded MPL data without tracking individuals' trajectories also prevented us from considering the commuting patterns or choices of different transports. However, the MPL dataset represents the unique data source having the best spatial resolution with real-time updating population distribution we can access right now. Meanwhile, the estimates in the experimental test also represent the trade-off between over-and under-estimated cumulative inhaled PM 2.5 masses. On the one hand, these estimates are the highest estimates of cumulative inhaled PM 2.5 masses since we do not consider the situations that people are with indoor environments or commuting transportations. On the other hand, the cumulative inhaled PM 2.5 masses could be even higher because we use the constant value representing a low level of the inhaled air volume for an adult without considering factors such as physical activity, gender, and size [51]. Thus, these over and under estimates help balance each out in terms of cumulative inhaled PM 2.5 masses to provide the general assessment at large scales.

Conclusions
This study sought to combine mobile phone big data and station-based PM 2.5 measurements to achieve real-time estimations of population exposure to PM 2.5 concentrations in China. The results showed that the proposed method can well quantify dynamics of the real-time population distribution and yield the estimation of population exposure to PM 2.5 concentrations and cumulative inhaled PM 2.5 masses with a 3-h updating frequency. This study provides a novel framework for environmental exposure assessments by considering the spatiotemporal variability of both population distribution and PM 2.5 concentrations, which can also be customized to estimate other ambient pollutant exposure risks. These findings and methods may hold potential utilities in supporting the environmental exposure assessment and related policy-driven environmental actions.