Estimation of High-Resolution Daily Ground-Level PM 2.5 Concentration in Beijing 2013–2017 Using 1 km MAIAC AOT Data

Featured Application: This work has several applications: (a) to reveal spatiotemporal variations of PM2.5 across urban landscapes; (b) to assess the long-term impacts of PM2.5 on human health; (c) to support government policy decision-making; and (d) to provide valuable information to the public. Abstract: High-spatiotemporal-resolution PM 2.5 data are critical to assessing the impacts of prolonged exposure to PM 2.5 on human health, especially for urban areas. Satellite-derived aerosol optical thickness (AOT) is highly correlated to ground-level PM 2.5 , providing an effective way to reveal spatiotemporal variations of PM 2.5 across urban landscapes. In this paper, Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOT and ground-based PM 2.5 measurements were fused to estimate daily ground-level PM 2.5 concentrations in Beijing for 2013–2017 at 1 km resolution through a linear mixed effect model (LMEM). The results showed a good agreement between the estimated and measured PM 2.5 and effectively revealed the characteristics of its spatiotemporal variations across Beijing: (1) the PM 2.5 level is higher in the central and southern areas, while it is lower in the northern and northwestern areas; (2) the PM 2.5 level is higher in autumn and winter, while it is lower in spring and summer. Moreover, annual PM 2.5 concentrations decreased by 24.03% for the whole of Beijing and 31.46% for the downtown area from 2013 to 2017. The PM 2.5 data products we generated can be used to assess the long-term impacts of PM 2.5 on human health and support relevant government policy decision-making, and the methodology can be applied to other heavily polluted urban areas.


Introduction
PM 2.5 refers to atmospheric particulate matter (PM) with a diameter of less than 2.5 micrometers, which remains suspended for a longer time in the air. Numerous studies have demonstrated that exposure to PM 2.5 is associated with adverse health effects including respiratory and cardiovascular diseases [1][2][3]. In urban areas, PM 2.5 concentrations vary sharply over short distances caused by uneven Appl. Sci

Ground-Level PM2.5 Data Sets
Ground-level PM2.5 data sets were used for model parameterizing and accuracy assessment in this study. PM2.5 data are collected hourly in Beijing from totally 35 monitoring sites located in the metropolitan area ( Figure 1, Table S1). The data sets are managed by the Beijing Municipal Environmental Monitoring Center (BMEMC). In this study, the hourly ground-level PM2.5 concentrations were averaged to get two PM2.5 datasets at each monitoring site for five years from 1 January 2013 through 31 December 2017: (1) the PM2.5 24 h average between 00:00 a.m. and 23:00 p.m. and (2) the PM2.5 period average between 10:00 a.m. and 2:00 p.m.

AOT Data Sets
The major AOT data sets used in this study were the MAIAC AOT data sets, which are produced using a generic algorithm specifically developed for MODIS, performing aerosol retrievals and atmospheric correction over both vegetated surfaces and bright deserts at 1 km spatial resolution based on time series analysis and image processing [26,27]. The MAIAC algorithm also derives column water vapor (CWV), the surface reflectance (SR) bidirectional reflectance distribution

Ground-Level PM 2.5 Data Sets
Ground-level PM 2.5 data sets were used for model parameterizing and accuracy assessment in this study. PM 2.5 data are collected hourly in Beijing from totally 35 monitoring sites located in the metropolitan area ( Figure 1, Table S1). The data sets are managed by the Beijing Municipal Environmental Monitoring Center (BMEMC). In this study, the hourly ground-level PM 2.5 concentrations were averaged to get two PM 2.5 datasets at each monitoring site for five years from 1 January 2013 through 31 December 2017: (1) the PM 2.5 24 h average between 00:00 a.m. and 23:00 p.m. and (2) the PM 2.5 period average between 10:00 a.m. and 2:00 p.m.

AOT Data Sets
The major AOT data sets used in this study were the MAIAC AOT data sets, which are produced using a generic algorithm specifically developed for MODIS, performing aerosol retrievals and atmospheric correction over both vegetated surfaces and bright deserts at 1 km spatial resolution based on time series analysis and image processing [26,27]. The MAIAC algorithm also derives column water vapor (CWV), the surface reflectance (SR) bidirectional reflectance distribution function (BRDF), and spectral regression coefficients (SRC). MAIAC collects 5 (over poles) to 16 (over equator) days of MODIS observations depending on locations on the Earth using a sliding window approach. These MODIS observations are then gridded to 1 km spatial resolution with specific projection coordinate systems. When the land surface conditions remain stable during a data collection period, the surface BRDF is retrieved using the regional background aerosol model. The 2.1 µm MODIS band is used to retrieve AOT and surface reflectance over dark and moderately bright surfaces. Four or more low AOT days are used to calculate the SRC correlating surface reflectance in the blue and shortwave infrared bands. Once the SRC is obtained, the AOT is retrieved with the last MODIS observation. AOT retrieval is conducted with the regional background aerosol model in clear conditions, but with surface BRDF in hazy conditions. Aerosol Robotic Network (AERONET) [28] validation shows that the MAIAC and the MODIS Level-2 atmospheric aerosol product (MOD04 ) algorithms have similar accuracy over dark and vegetated surfaces and that MAIAC AOT generally improves in accuracy over brighter surfaces due to SRC retrieval and explicit BRDF factor characterization, as demonstrated for several U.S. West Coast AERONET sites [26].
MAIAC AOT data sets derived from Terra (at~10:30 a.m. local time) and Aqua (at~1:30 p.m. local time) were merged to improve its spatial coverage in urban areas [14]. For a given day, due to meteorological condition changes and other factors, the two MAIAC AOT measurements might vary slightly. When merging the AOT data for a given area and a given day, there are two possible cases regarding the availability of AOT data: (1) both MAIAC-Aqua AOT and MAIAC-Terra AOT are available, or (2) only one of the two MAIAC AOTs is available. In the second case, the missing AOT values were estimated through a regression model. As a result, the daily AOT average represents the average between 10:00 a.m. and 2:00 p.m. for each day. As the ratio of morning AOT and afternoon AOT varies by season, we reorganized all the AOT data sets into two seasons for each year: warm season (15 April-14 October) and cold season . For the missing AOT estimation, the R 2 values of the regression model in warm and cold seasons were 0.88 and 0.87 respectively, indicating a good regression-based AOT interpolation.
For comparison purposes, MODIS Collection 6 Level 2 aerosol data products with 3 km spatial resolution (MOD-3K AOT) for 2014 were also collected. The algorithm for generating MOD-3K AOT is similar to the algorithm used in the previous 10 km AOT product, but the MOD-3K AOT algorithm averages 6 × 6 pixels in a single retrieval box rather than 20 × 20 pixels after cloud screening and other surface mask processes [29]. The MOD-3K AOT products have been validated by ground-based measurements from six AERONET sites in China, and its spatiotemporal variations show good agreement with the AERONET AOT, with R 2 values of 0.80-0.97 at the six sites [30].

Data Preprocessing
To establish the PM 2.5 -AOT relationship model (parameterization), the ground-based PM 2.5 measurements and satellite-derived MAIAC AOT values must be collected at the closest spatial location and time. For each monitoring site (PM 2.5 ), the 1 km pixel of MAIAC AOT where the monitoring site locates was selected and its AOT value was extracted to form a PM 2.5 -AOT data pair. PM 2.5 -AOT data pairs with either PM 2.5 values less than 3.0 µg/m 3 or missing PM 2.5 /AOT measurements were omitted. In addition, the days with less than two PM 2.5 -AOT data pairs available were excluded as a regression slope cannot be estimated from only a single data pair. MAIAC AOT pixels of water areas were removed through flagging because the high moisture content around water areas tends to be misidentified as PM 2.5 by estimation models.

LMEM Model Fitting and Validation
Lee et al. [13] developed the LMEM algorithm to estimate PM 2.5 from MODIS MOD04 AOT at 10 km resolution; it takes into account the day-to-day variability of the PM 2.5 -AOT relationship based on mixing height, relative humidity, PM 2.5 composition, and PM 2.5 vertical profile with the assumption that the PM 2.5 -AOT relationship varies largely day to day but minimally spatially on a given day at the study area. The LMEM calculates day-specific random intercepts and slopes for the PM 2.5 -AOT relationship and incorporates both fixed-effects terms and random-effects terms. The LMEM equation is where PM ij is the daily average PM 2.5 concentration at site i on day j; AOT ij is the daily average AOT value corresponding to site i on day j; α and µ j are the fixed and random intercepts on day j, respectively; β and υ j are the fixed and random slopes on day j, respectively; s i is the random intercept of site i; and ε ij is the error term at site i on day j. In this model, the fixed parameter β for AOT represents the average PM 2.5 -AOT relationship for the entire space and time in the specific urban area, while the random parameter υ j represents daily variation of the PM 2.5 -AOT relationship. We trained (fitting and validating) the LMEM using a large number of PM 2.5 -AOT sample data pairs for PM 2.5 estimation from 2013 to 2017. As an AOT pixel value represents the average aerosol level in a given 1 km × 1 km area while a ground-based PM 2.5 measurement only represents the aerosol level at a point, theoretically, there may be systematic bias between AOT-derived PM 2.5 and ground-measured PM 2.5 . The site bias may also vary by surface cover type, topography, distance to pollution source, and other anthropogenic and natural environmental conditions. It should be noted that site random effect (s i ) was not considered when estimating PM 2.5 concentrations for our study period because AOT values are unbiased representative of the corresponding grid cells; in other words, PM 2.5 values derived from AOT reflect the overall PM 2.5 levels in the grid cell, but the unadjusted predicted PM 2.5 levels with site effects maybe be more representative of the average population exposures to PM 2.5 [13].
After the model training, the PM 2.5 estimates over our study area for the whole time period were generated using LMEM and 1 km MAIAC AOT. Further, the model performance using 1 km AOT data and 3 km AOT data was compared using the 1 km MAIAC AOT and 3 km MOD-3K AOT data sets for 2014 as inputs. In addition, as the PM 2.5 24 h average and the PM 2.5 period average may differ, we also established different LMEM models and compared the model performance with these two PM 2.5 data sets.
Tenfold cross-validation (CV) was applied to test the potential model overfitting through randomly splitting the entire samples (PM 2.5 -AOT pairs) into 10 subsets with each subset containing approximately 10% of the whole sample dataset. One sample subset was used for model testing, while the remaining nine sample subsets were used for model fitting. This procedure was repeated 10 times until every subset was tested. The site-specific (each site) R 2 between estimated PM 2.5 and ground-based PM 2.5 measurements, mean prediction error (MPE), and root-mean-squared prediction error (RMSPE) were calculated for model fitting and cross validation results: where y i and y i are the estimated PM 2.5 and the measured PM 2.5 of the ith sample, respectively, and n is the total number of PM 2.5 -AOT sample pairs.

Data Descriptive Statistics
In this study, there were 33,785 PM 2.5 -AOT sample pairs in total for the 35 (Table S2), and the overall mean of the PM 2.5 period average between 10:00 a.m. and 2:00 p.m. is 65.82 µg/m 3 , varying from 39.39 µg/m 3 (SD = 47.49 µg/m 3 ) to 92.33 µg/m 3 (SD = 87.36 µg/m 3 ) by site (Table S3). The overall mean MAIAC AOT is 0.49, varying from 0.30 (SD = 0.32) to 0.61 (SD = 0.66) by site (Table S4). As shown in Figure 1, the study area is surrounded by mountains in the west, north, and northeast, while it is quite flat in the south and southeast. PM 2.5 levels have an increasing gradient trend from north to south, and the spatial distribution of the MAIAC AOT average and PM 2.5 levels for the whole study time period has a similar spatial pattern. The consistency of the spatial trends between the MAIACT AOT average and PM 2.5 levels indicates the correlation of AOT and PM 2.5 .

Results of Model Fitting and Validation
We fitted the LMEM PM 2.5 estimation model for each year using the two different daily PM 2.5 data sets-the PM 2.   The model fitting performance for each year can be seen in Figure 3a and Table 1: the year-specific R 2 of Model-I ranges from 0.90 to 0.94 and is 0.90 for the whole study period, while the year-specific R 2 of Model-II ranges from 0.91 to 0.93 and is 0.92 for the whole study period. Therefore, both Model-I and Model-II have high R 2 values (more than 0.90), demonstrating that both the PM 2.5 24 h average and the PM 2.5 period average between 10:00 a.m. and 2:00 p.m. can be used to fit the estimation model with high estimation performance. However, the other model fitting statistics indicate that Model-I performs slightly better than Model-II. In addition, model cross validation performance statistics are shown in Figure 2b for each site, and in Figure 3b and Table 1 for each year: Figure 2b shows that the site-specific R 2 for both Model-I and Model-II decreases, while RMSPE and MPE increase from model fitting to model validation with small differences. The yearly model cross validation results shown in Figure 3b and Table 1 indicate that the year-specific R 2 of Model-I ranges from 0.87 to 0.93 and is 0.90 for the whole study period, while the year-specific R 2 of Model-II ranges from 0.89 to 0.92 and is 0.90 for the whole study period. Overall, although both Model-I and Model-II have good estimation performance with high R 2 values, Model-I is relatively better as it has lower RMSPE and MPE values.
We fitted the model for each year and estimated the daily PM 2.5 concentrations for the period from 2013 through 2017. The annual mean PM 2.5 concentrations and five-year (2013-2017) mean PM 2.5 concentrations were generated for different seasons (warm and cold) and for specific urban areas (the whole urban area and the central city area) using estimated daily PM 2.5 concentrations in order to reveal the spatiotemporal variations of PM 2.5 levels.  To better assess the model performance, we also experimented on the 3 km MOD-3K AOT for the year 2014, and the results are shown in Table 2. The model performance results for the 3 km AOT are given in Table 2 and in Figures 4-6. Based on comparing the R 2 , RMSPE, and MPE, slightly better performance of Model-I than Model-II is also observed at 3 km resolution. As a result, we employed Model-I to estimate the daily PM2.5 concentrations for the study area during the period 2013-2017.   To better assess the model performance, we also experimented on the 3 km MOD-3K AOT for the year 2014, and the results are shown in Table 2. The model performance results for the 3 km AOT are given in Table 2 and in Figures 4-6. Based on comparing the R 2 , RMSPE, and MPE, slightly better performance of Model-I than Model-II is also observed at 3 km resolution. As a result, we employed Model-I to estimate the daily PM 2.5 concentrations for the study area during the period 2013-2017.

Comparing between Estimated and Measured PM2.5 Concentrations
The differences between the Model-I estimated and measured PM2.5 concentrations are shown in Figure 7. It can be observed that the PM2.5 concentrations are slightly overestimated in lightly polluted areas, while they are slightly underestimated in heavily polluted areas. However, the differences are within ±10.90 μg/m 3 , and a relative difference of 15.67% for 85% of all the monitoring sites indicates a good agreement between them.

Comparing between Estimated and Measured PM 2.5 Concentrations
The differences between the Model-I estimated and measured PM 2.5 concentrations are shown in Figure 7. It can be observed that the PM 2.5 concentrations are slightly overestimated in lightly polluted areas, while they are slightly underestimated in heavily polluted areas. However, the differences are within ±10.90 µg/m 3 , and a relative difference of 15.67% for 85% of all the monitoring sites indicates a good agreement between them.
The differences between the Model-I estimated and measured PM2.5 concentrations are shown in Figure 7. It can be observed that the PM2.5 concentrations are slightly overestimated in lightly polluted areas, while they are slightly underestimated in heavily polluted areas. However, the differences are within ±10.90 μg/m 3 , and a relative difference of 15.67% for 85% of all the monitoring sites indicates a good agreement between them.

Spatiotemporal Trends of PM2.5 Concentrations
Using the above Model-I, we can create daily, weekly, monthly, seasonal, and annual mean PM2.5 maps which can be applied in health effect analysis, air pollution management, and mitigation policy decision-making support. Figure 8 illustrates the annual mean maps of MAIAC AOT-derived PM2.5 estimations at 1 km spatial resolution in the study area from 2013 to 2017. The annual mean estimated PM2.5 concentrations are 63.93, 69.09, 65.28, 59.17, and 48.56 μg/m 3 from 2013 to 2017, respectively. These maps revealed the significantly similar spatial patterns of PM2.5 concentrations across the Beijing metropolitan area during the study period: high PM2.5 levels are located in the southern parts which are closer to the major pollution sources from the Hebei Province and in the upwind direction of Hebei, while low PM2.5 levels appear in rural and mountainous areas in the northern parts. This corresponds well with the spatial patterns demonstrated by the ground-based monitoring shown in Figure 1c.

Spatiotemporal Trends of PM 2.5 Concentrations
Using the above Model-I, we can create daily, weekly, monthly, seasonal, and annual mean PM 2.5 maps which can be applied in health effect analysis, air pollution management, and mitigation policy decision-making support. Figure 8 illustrates the annual mean maps of MAIAC AOT-derived PM 2.5 estimations at 1 km spatial resolution in the study area from 2013 to 2017. The annual mean estimated PM 2.5 concentrations are 63.93, 69.09, 65.28, 59.17, and 48.56 µg/m 3 from 2013 to 2017, respectively. These maps revealed the significantly similar spatial patterns of PM 2.5 concentrations across the Beijing metropolitan area during the study period: high PM 2.5 levels are located in the southern parts which are closer to the major pollution sources from the Hebei Province and in the upwind direction of Hebei, while low PM 2.5 levels appear in rural and mountainous areas in the northern parts. This corresponds well with the spatial patterns demonstrated by the ground-based monitoring shown in Figure 1c.
These maps revealed the significantly similar spatial patterns of PM2.5 concentrations across the Beijing metropolitan area during the study period: high PM2.5 levels are located in the southern parts which are closer to the major pollution sources from the Hebei Province and in the upwind direction of Hebei, while low PM2.5 levels appear in rural and mountainous areas in the northern parts. This corresponds well with the spatial patterns demonstrated by the ground-based monitoring shown in Figure 1c. The seasonal mean PM2.5 estimates (Figure 10) for the entire Beijing metropolitan area are 66.10, 45.33, 60.17, and 75.59 μg/m 3 for spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February), respectively, indicating that PM2.5 levels exhibit distinct seasonal variations, with the highest PM2.5 levels in winter and the lowest in summer. Such seasonal variations can be attributed to the mixed contributions from meteorological conditions (planetary boundary layer height (PBLH), relative humidity, seasonal wind, etc.) and local pollution sources such as coal combustion for domestic heating in winter. Moreover, the seasonal mean PM2.5 has similar spatial patterns with the annual mean PM2.5; that is, higher concentrations occur in the southern part and lower concentrations occur in the northern part. The seasonal mean PM 2.5 estimates (Figure 10) for the entire Beijing metropolitan area are 66.10, 45.33, 60.17, and 75.59 µg/m 3 for spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February), respectively, indicating that PM 2.5 levels exhibit distinct seasonal variations, with the highest PM 2.5 levels in winter and the lowest in summer. Such seasonal variations can be attributed to the mixed contributions from meteorological conditions (planetary boundary layer height (PBLH), relative humidity, seasonal wind, etc.) and local pollution sources such as coal combustion for domestic heating in winter. Moreover, the seasonal mean PM 2.5 has similar spatial patterns with the annual mean PM 2.5 ; that is, higher concentrations occur in the southern part and lower concentrations occur in the northern part. 2017 with much lower PM2.5 levels. The annual mean PM2.5 estimates in this downtown area are 81.56, 77.50, 80.04, 67.38, and 55.90 μg/m 3 from 2013 to 2017, respectively. Within the downtown area, PM2.5 levels in the northern part are much lower than those in the southern part.
The seasonal mean PM2.5 estimates (Figure 10) for the entire Beijing metropolitan area are 66.10, 45.33, 60.17, and 75.59 μg/m 3 for spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February), respectively, indicating that PM2.5 levels exhibit distinct seasonal variations, with the highest PM2.5 levels in winter and the lowest in summer. Such seasonal variations can be attributed to the mixed contributions from meteorological conditions (planetary boundary layer height (PBLH), relative humidity, seasonal wind, etc.) and local pollution sources such as coal combustion for domestic heating in winter. Moreover, the seasonal mean PM2.5 has similar spatial patterns with the annual mean PM2.5; that is, higher concentrations occur in the southern part and lower concentrations occur in the northern part.

Discussion
In this study, we investigated daily ground-level PM2.5 estimates at 1 km spatial resolution in the Beijing metropolitan area using the LMEM by incorporating the MAIAC AOT data and ground-based PM2.5 measurements. The results show that both model fitting and cross validation can generate higher R 2 values for monitoring sites. We also analyzed the model performance for each year in the five-year study period, indicating that both Model-I and Model-II give higher model fitting R 2 values and cross validation R 2 values relative to a linear regression model based on the same data inputs ( Figure S1), and can be hugely improved when considering that daily variations of PM2.5-AOT relationship are correlated to mixing height, relative humidity, PM2.5 composition, and PM2.5 vertical profile. Both Model-I and Model-II have higher R 2 values, indicating that the PM2.5 24 h average and the PM2.5 period average between 10:00 a.m. and 2:00 p.m. can be used to fit LMEM models with high confidence. Overall, Model-I has slightly better performance as demonstrated by its lower RMSPE and MPE values and similar R 2 values to Model-II.
AOT represents the amount of aerosols in the vertical column of atmosphere from the ground to satellite sensors (not only limited to the vertical column segment close to the monitoring sites). The PM2.5 24 h average captures more of the aerosols which have subsided down to ground level at monitoring sites than does the PM2.5 period average between 10:00 a.m. and 2:00 p.m. This might explain why the LMEM model fitted with the PM2.5 24 h average performs a little better than the LMEM model fitted with the PM2.5 period average.
The LMEM in our study generated higher model fitting R 2 values of 0.90-0.94 and cross validation R 2 values of 0.87-0.93 than those in previous studies. For example, the GWR model applied to the whole of China with an overall cross validation R 2 value of 0.64 [31] and to the Pearl River Delta region with an R 2 value of 0.74 [21]; the empirical nonlinear model applied for the Xian metropolitan area with an R 2 value of 0.67 [32]; the new LMEM model incorporating VIIRS night light data applied to the Beijing-Tianjin-Hebei Region with an R 2 value of 0.75; the improved LMEM model applied to three megalopolises in China, namely, Beijing-Tianjin-Hebei, with an R 2 value of

Discussion
In this study, we investigated daily ground-level PM 2.5 estimates at 1 km spatial resolution in the Beijing metropolitan area using the LMEM by incorporating the MAIAC AOT data and ground-based PM 2.5 measurements. The results show that both model fitting and cross validation can generate higher R 2 values for monitoring sites. We also analyzed the model performance for each year in the five-year study period, indicating that both Model-I and Model-II give higher model fitting R 2 values and cross validation R 2 values relative to a linear regression model based on the same data inputs ( Figure S1), and can be hugely improved when considering that daily variations of PM 2.5 -AOT relationship are correlated to mixing height, relative humidity, PM 2.5 composition, and PM 2.5 vertical profile. Both Model-I and Model-II have higher R 2 values, indicating that the PM 2.5 24 h average and the PM 2.5 period average between 10:00 a.m. and 2:00 p.m. can be used to fit LMEM models with high confidence. Overall, Model-I has slightly better performance as demonstrated by its lower RMSPE and MPE values and similar R 2 values to Model-II.
AOT represents the amount of aerosols in the vertical column of atmosphere from the ground to satellite sensors (not only limited to the vertical column segment close to the monitoring sites). The PM 2.5 24 h average captures more of the aerosols which have subsided down to ground level at monitoring sites than does the PM 2.5 period average between 10:00 a.m. and 2:00 p.m. This might explain why the LMEM model fitted with the PM 2.5 24 h average performs a little better than the LMEM model fitted with the PM 2.5 period average.
The LMEM in our study generated higher model fitting R 2 values of 0.90-0.94 and cross validation R 2 values of 0.87-0.93 than those in previous studies. For example, the GWR model applied to the whole of China with an overall cross validation R 2 value of 0.64 [31] and to the Pearl River Delta region with an R 2 value of 0.74 [21]; the empirical nonlinear model applied for the Xian metropolitan area with an R 2 value of 0.67 [32]; the new LMEM model incorporating VIIRS night light data applied to the Beijing-Tianjin-Hebei Region with an R 2 value of 0.75; the improved LMEM model applied to three megalopolises in China, namely, Beijing-Tianjin-Hebei, with an R 2 value of 0.77, to the Yangtze River Delta region with an R 2 value of 0.80, and to the Pearl River Delta region with an R 2 value of 0.80 [23]; the observation-based model using MAIAC AOT applied to the Beijing-Tianjin region with an R 2 value of 0.70, to the Yangtze River Delta region with an R 2 value of 0.77, and to the Pearl River Delta region with an R 2 value of 0.83 [33]; and the LMEM models using MOD-3K AOT applied to Beijing with R 2 values of 0.796 and 0.81 [24,25]. The model performance for Beijing in this study is also comparable with those studies conducted in the U.S. with MAIAC AOT data as inputs. For example, model fitting over the southeastern U.S. achieved an R 2 value of 0.83, while cross validation achieved an R 2 value of 0.67 [14]; annual model fitting gets an R 2 value of 0.71-0.85, while cross validation gets an R 2 value of 0.62-0.78 during 2001-2010 [15], and cross validation in the New England area achieved an R 2 value of 0.89 [34].
Our overall RMSPE of 21.67 µg/m 3 from model cross validation is higher than those in studies on areas in the United States (<9.0 µg/m 3 ) [9,[13][14][15][16][17][35][36][37], but much lower than other results over the whole of China (32.98 µg/m 3 ), which might be mainly due to the much denser ground monitoring sites in our study area. Our method has a slightly higher overall RMSPE than do those studies in the same study area (Beijing) using the coarser 3 km spatial resolution for PM 2.5 estimation by Li (16.04 µg/m 3 ) [24] and Xie (17.85 µg/m 3 ) [25], but with higher R 2 . This is possibly due to the smoothing effect of satellite data at coarser resolutions, representing more coverage for air quality conditions. The higher RMSPE in Beijing than in the United States is more likely due to the higher PM 2.5 levels in Beijing ranging from 3.0 to 773.6 µg/m 3 , which is up to 10 or more times those in the United States, and the larger variations in the PM 2.5 levels may possibly be overlooked due to the missing satellite AOT values. Another possible reason is that the correlation of PM 2.5 -AOT was negative instead of positive under heavy pollution conditions, causing the dense aerosol layer (high AOT) with low PM 2.5 levels or much lower PBLH (low AOT) with high PM 2.5 levels to occur in northern China [38]. The model also may underestimate the PM 2.5 concentrations at the high PM 2.5 levels shown in Figure 6. Furthermore, the air pollution sources and location of the monitoring sites may also causing high RMSPE values [13].
The annual mean PM 2.5 concentrations decreased to 48.56 µg/m 3 from 63.93 µg/m 3 (~24.04%) over the whole Beijing area and to 55.90 µg/m 3 from 81.56 µg/m 3 (~31.46%) for the downtown area from 2013 to 2017, showing that the air quality has been dramatically improved since 2013. The annual mean PM 2.5 concentrations averaged from all monitoring sites decreased to 55.73 µg/m 3 from 79.84 µg/m 3 (~30.20%) over the whole Beijing area and to 57.63 µg/m 3 from 85.63 µg/m 3 (~32.70%) for the downtown area from 2013 to 2017. The trends in the annual mean PM 2.5 concentrations revealed by our model estimation and by the ground measurements are consistent, indicating that the PM 2.5 levels have really been lowered from 2013 to 2017. While the PM 2.5 concentrations across Beijing remain high, substantial improvements have occurred in recent years due to government policies and mitigation measures implemented to improve air quality. In September 2013, the Chinese government released the "Plan of Action for Preventing and Controlling of Atmospheric Pollution" [39] promoting ten control measures, including restriction on the number of vehicles in megacities, reducing coal combustion, and promoting the use of clean energy such as water, gas, geothermal energy, wind power, solar energy, and bioenergy. The Plan also set the goal for Beijing to reduce its annual mean PM 2.5 concentration to 60 µg/m 3 by 2017. To achieve this, the Beijing Municipal Government issued a 5-year Clean Air Action Plan (2013-2017) that includes policies to control population growth, reduce the number of vehicles, control emissions for some major air pollutants including industrial emissions, and promote the use of clean energy and new energy by public vehicles [40]. The results of our study indicate that the annual PM 2.5 concentration goal set for Beijing was achieved in 2017.

Conclusions
Through integrating ground-based PM 2.5 measurements and MAIAC AOT, this study has preliminarily demonstrated that the LMEM model can be used effectively for estimating surface PM 2.5 at high spatial resolution (1 km) for heavily polluted urban areas with high accuracy. The results indicated that LMEM models fitted using the PM 2.5 24 h average have better performance than those using the PM 2.5 period average. The high spatial resolution of MAIAC AOT data sets has a substantial advantage over other satellite AOT data sets at coarser resolutions for estimating surface PM 2.5 by providing more spatial variation details of PM 2.5 in large urban areas, which is valuable for pollution warning and health impact analysis. The spatial patterns and trends revealed by the annual mean PM 2.5 concentrations generated by our models show good consistency with those revealed by the ground-based measurements, and both have indicated that the surface PM 2.5 levels have decreased from 2013 to 2017, thereby verifying the effectiveness of government air quality control measures in recent years. This study generated 1 km PM 2.5 data sets at different temporal resolutions (daily, weekly, monthly, seasonal, and annual) for the Beijing metropolitan area. These data products can be provided to the government environmental management agencies and urban air quality and public health research institutions for applications in relevant research and policy decision-making support.
The PM 2.5 data products at 1 km spatial resolution in this study may work well to estimate the incidence of specific diseases or relevant mortalities in large areas but are still not adequate to estimate PM 2.5 effects at community levels. Hotspots like urban street canyons are not detectable at such a resolution. For better health impact studies, AOT data products at higher spatial resolutions still need to be incorporated with the models in the future.
MAIAC AOT data have been available twice per day since 2000 and therefore have the potential to be applied in health impact analysis for long-term exposure to PM 2.5 . Given the higher R 2 values for both site-specific and year-specific model estimation statistics, other explanatory factors are unlikely to play a dominant role in the models, but it is worth future investigation to improve the models. The methodology can be applied to other large and heavily polluted urban areas in China or other regions of the world based on further tests.