A Review on Predicting Ground Pm 2.5 Concentration Using Satellite Aerosol Optical Depth

This study reviewed the prediction of fine particulate matter (PM 2.5) from satellite aerosol optical depth (AOD) and summarized the advantages and limitations of these predicting models. A total of 116 articles were included from 1436 records retrieved. The number of such studies has been increasing since 2003. Among these studies, four predicting models were widely used: Multiple Linear Regression (MLR) (25 articles), Mixed-Effect Model (MEM) (23 articles), Chemical Transport Model (CTM) (16 articles) and Geographically Weighted Regression (GWR) (10 articles). We found that there is no so-called best model among them and each has both advantages and limitations. Regarding the prediction accuracy, MEM performs the best, while MLR performs worst. CTM predicts PM 2.5 better on a global scale, while GWR tends to perform well on a regional level. Moreover, prediction performance can be significantly improved by combining meteorological variables with land use factors of each region, instead of only considering meteorological variables. In addition, MEM has advantages in dealing with the AOD data with missing values. We recommend that with the help of higher resolution AOD data, future works could be focused on developing satellite-based predicting models for the prediction of historical PM 2.5 and other air pollutants.


Introduction
According to the World Health Organization's report in 2014, 3.7 million premature deaths related to ambient air pollution occurred around the world in 2012 [1].Ambient air pollutants include particulate matter, ozone, nitrogen dioxide, sulfur dioxide, and other contaminants.Fine particulate matter with aerodynamic diameters smaller than 2.5 µm (PM 2.5 ) is the most problematic of these pollutants.PM 2.5 particles can enter into the alveoli, subsequently being retained in the lung parenchyma [2].Due to the toxicological effects of the resulting inflammation and oxidative stress [3], PM 2.5 can cause severe cardiovascular diseases, respiratory diseases and even lung cancer [4,5].A study of the global burden of disease study in 1990-2010 ranked ambient PM 2.5 concentrations ninth out of all health risk factors [6].PM 2.5 has therefore played an important role in the area of air pollution and environmental health [7][8][9][10].
However, most pollutant concentration information was obtained from ground monitoring stations, which have many limitations.These stations are limited in number, unequally distributed [7,11] and have different measure frequency ranges [12].These limitations may affect the geographical and demographical range of studies, resulting in an information bias and reducing the confidence in the results of exposure response studies [13].Furthermore, the temporal and spatial variation of PM 2.5 is complex, and continuous monitoring of PM 2.5 is absent in many countries and regions [14].For example, PM 2.5 was not included in China's national monitoring system until 2013.Remote sensing techniques could therefore allow the collection of long period continuous PM 2.5 data on large spatial scales over China [15].
Numerous researchers have attempted to estimate ground PM 2.5 levels using satellite-derived atmospheric aerosol optical depth (AOD) [16], which is the aerosol extinction coefficient of accumulated points in the vertical direction [4,16,17].Satellite-derived AOD research began in the mid-1970s, and, in 2003, Wang et al. [16] initiated the use of Moderate Resolution Imaging Spectrometer (MODIS) AOD in the prediction for ground level PM 2.5 though linear correlation.Liu et al. [18] came up with Chemical Transport Model (CTM) in 2004, and, in 2011, Lee et al. [19] created the day-specific Mixed-Effect Model (MEM) using MODIS AOD.In recent years, PM 2.5 levels have been estimated using a variety of satellite sensors, including the MODIS [20,21], the Multi-Angle Imaging Spectrometer (MISR) [4,20,22], the Geostationary Operational Environment Satellite (GEOS) [23,24], Polarization of Earth's Reflectance and Directionality (POLDER) [25,26], the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) [27,28], the Ozone Monitoring Instrument (OMI) [29] and the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) [29,30].Although studies of this kind are becoming more common, prediction results have been unstable and varied significantly between different regions [31,32].Additionally, different studies have used different methods of dealing with missing AOD data [7,[33][34][35][36].The objective of this study is to review previous studies in order to compare existing PM 2.5 predicting models based on satellite AOD and illustrate their advantages and limitations.This could provide a helpful reference for future satellite-based PM 2.5 predicting studies.

Subject of This Review
What is the relationship between PM 2.5 concentrations predicted from aerosol optical depth retrieval and PM 2.5 concentrations measured on the ground?

Search Criteria
We searched the following electronic databases prior to 30 June 2016: Web of Science (WOS), PubMed, Engineering Index (EI), Nature, Elsevier Science Direct, Wiley, Springer, and Taylor and Francis.Keywords used in the searches included: aerosol optical depth (AOD, aerosol optical thickness, AOT), fine particulate matter (PM 2.5 ), satellite data, satellite remote sensing, satellite derived, and satellite retrieved.These keywords were searched under the categories of "subject", "title", and "keywords" respectively, connected through logical combinations of "and" and "or".When searching in Web of Science, for example, we used the following combination of keywords: (("aerosol optical depth") OR ("AOD") OR ("aerosol optical thickness") OR ("AOT") OR ("satellite data") OR ("satellite remote sensing") OR ("satellite derived") OR ("satellite retrieved")) AND (("fine particulate matter") OR ("PM 2.5 ")).

Inclusion and Exclusion Criteria
The inclusion criteria are as follows: (1) papers published in the peer-reviewed journals before 30 June 2016; (2) empirical research utilizing both satellite AOD data and ground PM 2.5 data; and (3) papers incorporating PM 2.5 predicting models based on satellite-derived AOD and model evaluation.During the process of abstract and full texts reviewing, studies were excluded according to these criteria: (1) abstracts and conferences only; (2) studies using AOD data only or PM 2.5 data only, and studies without R 2 values; and (3) satellite-based PM 2.5 predicting studies conducted over the ocean or special terrains (such as mountains), or during the following natural and anthropogenic events: land (forest) fires, dust storms, volcanic eruptions, and fuel combustion events.We reviewed all the selected studies in detail and summarized their main features.

Results
After screening 1436 identified studies and assessing the eligibility of the remaining studies, we selected 116 articles for our review that are primarily relevant to the satellite-based PM 2.5 predicting model (Figure 1).The study areas, results, models used and other basic characteristics of all included studies are summarized in Table 1.

Inclusion and Exclusion Criteria
The inclusion criteria are as follows: (1) papers published in the peer-reviewed journals before 30 June 2016; (2) empirical research utilizing both satellite AOD data and ground PM2.5 data; and (3) papers incorporating PM2.5 predicting models based on satellite-derived AOD and model evaluation.During the process of abstract and full texts reviewing, studies were excluded according to these criteria: (1) abstracts and conferences only; (2) studies using AOD data only or PM2.5 data only, and studies without R 2 values; and (3) satellite-based PM2.5 predicting studies conducted over the ocean or special terrains (such as mountains), or during the following natural and anthropogenic events: land (forest) fires, dust storms, volcanic eruptions, and fuel combustion events.We reviewed all the selected studies in detail and summarized their main features.

Results
After screening 1436 identified studies and assessing the eligibility of the remaining studies, we selected 116 articles for our review that are primarily relevant to the satellite-based PM2.5 predicting model (Figure 1).The study areas, results, models used and other basic characteristics of all included studies are summarized in Table 1.Of these 116 studies, 25 used Multiple Linear Regression (MLR), 23 used the Mixed-Effect Model (MEM), 16 used the Chemical Transport Model (CTM), and 10 used Geographically Weighted Regression (GWR), while Linear Correlations (LC), the Generalized Additive Model (GAM), Land Use Regression (LUR) and others models were found in 12 studies, six studies, seven studies and 27 studies, respectively (Figures 2 and 3).
Of these 116 studies, 25 used Multiple Linear Regression (MLR), 23 used the Mixed-Effect Model (MEM), 16 used the Chemical Transport Model (CTM), and 10 used Geographically Weighted Regression (GWR), while Linear Correlations (LC), the Generalized Additive Model (GAM), Land Use Regression (LUR) and others models were found in 12 studies, six studies, seven studies and 27 studies, respectively (Figures 2 and 3).

Discussion
Satellite remote sensing technology plays an essential role in the field of meteorology because of its highly accurate prediction of meteorological disasters.Recently, this technology has also been used in the prediction of daily air pollution (PM2.5)levels.Although PM2.5 data can be obtained from AOD measured by ground-based remote sensing equipment [129], it is more meaningful to predict PM2.5 levels from satellite observations.From our study, we concluded that MLR, MEM, CTM, and GWR were the models most commonly used to predict PM2.5 levels.Of these 116 studies, 25 used Multiple Linear Regression (MLR), 23 used the Mixed-Effect Model (MEM), 16 used the Chemical Transport Model (CTM), and 10 used Geographically Weighted Regression (GWR), while Linear Correlations (LC), the Generalized Additive Model (GAM), Land Use Regression (LUR) and others models were found in 12 studies, six studies, seven studies and 27 studies, respectively (Figures 2 and 3).

Discussion
Satellite remote sensing technology plays an essential role in the field of meteorology because of its highly accurate prediction of meteorological disasters.Recently, this technology has also been used in the prediction of daily air pollution (PM2.5)levels.Although PM2.5 data can be obtained from AOD measured by ground-based remote sensing equipment [129], it is more meaningful to predict PM2.5 levels from satellite observations.From our study, we concluded that MLR, MEM, CTM, and GWR were the models most commonly used to predict PM2.5 levels.

Discussion
Satellite remote sensing technology plays an essential role in the field of meteorology because of its highly accurate prediction of meteorological disasters.Recently, this technology has also been used in the prediction of daily air pollution (PM 2.5 ) levels.Although PM 2.5 data can be obtained from AOD measured by ground-based remote sensing equipment [129], it is more meaningful to predict PM 2.5 levels from satellite observations.From our study, we concluded that MLR, MEM, CTM, and GWR were the models most commonly used to predict PM 2.5 levels.

Theory Background and Application
MLR has been used to predict PM 2.5 from satellite AOD since 2005.In this model, PM 2.5 measured at ground level PM 2.5 was set as the dependent variable, and AOD was set as the independent variable.Several factors were also included in the model as covariates, including humidity, temperature, wind speed, wind direction, aerosol type, and height of the boundary layer.MLR was often used in earlier studies to predict PM 2.5 levels.For instance, Liu et al. [38] used this model to analyze three area types (city, and suburb and countryside) in the eastern United States in 2001.They reported that coefficients were quite low to some extent and also varied greatly between different regions; R 2 values were 0.420, 0.490, 0.590 and 0.430 in city, suburban, countryside and whole area, respectively.The low R 2 value showed above indicated that the inclusion of covariates (such as relative humidity, height of the boundary layer, season variable, etc.) in MLR models requires further discussion [33].In contrast, R 2 value reached up to 0.960 in Gupta's study when certain conditions (weather condition, boundary layer heights and others conditions) were met [41].
More recently, in order to improve model performance, some studies have explored covariate factors in the MLR model under different conditions [17,20,21,[24][25][26][39][40][41]46,47,49,50,54,75].A few covariate factors, such as relative humidity and height of the boundary layer, were regarded as significant enough to affect and even invert the relationships between AOD and PM 2.5 .In 2013, Cordero et al. [73] predicted PM 2.5 levels by applying both the satellite-based MLR method and the Community Multi-Scale Air Quality (CMAQ) model.Results showed that the satellite-based MLR method performed better than the CMAQ model during summer: R 2 values ≥0.423 (MODIS), R 2 values ≥0.137 (CMAQ).However, the R 2 value increased to 0.740 when the two models were combined [73].In 2015, Han et al. carried out affecting factors analyses between AOD and PM 2.5 in Nanjing [93].The authors found that aerosol type and height of the boundary layer were significant factors in the prediction of PM 2.5 levels.They also stated that R 2 value was 0.624 with only aerosol type adjusted, and R 2 value was 0.548 when both aforementioned significant factors were adjusted [93].

Advantages and Disadvantages
In summary, the determination coefficients of MLR were relatively higher than those of the linear correlation model, and a confounding bias could be avoided by including relevant covariates into the model.However, there are several limitations.Some important covariates, such as seasonal variation of the aerosol, regional variation, and land use information, were missing from the models [93].Additionally, the accuracy and resolution of the satellite-derived AOD and meteorological data was low [38], which can lead to an information bias.

Theory Background and Application
In early research, missing AOD data was an essential factor in the estimation of PM 2.5 from AOD, and the method used to compensate for missing AOD data is a very important factor in the precision and accuracy of the derivation.Kloog et al. [33], from Harvard School of Public Health, first proposed that satellite-derived AOD could be included in the three-stage MEM and they applied this approach in New England in 2011.Based on the AOD day-specific correction mixed-effect model of Lee et al. [19], they took meteorological variables and classic land use variables into the MEM [34].The MEM also used the inverse distance weight method (IDW), cluster analysis, GAM and generalized additive mixed model (GAMM) to deal with missing AOD values so that daily ground PM 2.5 levels could be predicted in a wide range [34].If missing AOD presented non-random distribution, AOD data needed to be corrected by meteorological factors using the inverse probability weight method (IPW) [82].
MEM has been applied in many regions.In New England, Kloog et al. [33] constructed their own MEM, based on MEM of Lee et al., in 2011 (CV R 2 value = 0.830, for days with available AOD data; 0.810, for days without AOD data).The distinctive feature of the model of Kloog et al. is its inclusion of meteorological variables (such as temperature, wind speed and visibility) and land use variables (such as elevation, percentage of open spaces, area emissions, point emissions and distance to major roads) into the model, which is appropriate for studying acute and chronic health effects.Since then, many researchers, including Kloog, Madrigano, Chiu and others, have used MEM to study acute and chronic health effects [79,96,[130][131][132][133], and it has performed well.In 2012, by using GEOS AOD data and adding a surface reflection variable into MEM, Chudnovsky et al. [35] showed a high predictive value of CV-R 2 = 0.920.This study also proved that high resolution GEOS AOD may be a better predictor of urban PM 2.5 than rough resolution MODIS AOD [35]. Lee et al. [64] found that the R 2 value of MEM could reach 0.830 if missing AOD value were filled using a combination of cluster analysis and generalized additive models.In the mid-Atlantic region, Kloog et al. [34] improved the MEM by adopting IPW for non-random missing AOD data, and obtained a value of 0.850 for the cross-validation of R 2 .Kloog et al. [34] also established PM 2.5 predictive models in different regions by adding traffic density, population density and distance to the point emission variables.
In 2013, in order to take advantage of high resolution AOD products.Chudnovsky et al. [56] developed a MODIS based Multi-Angle Implementation of Atmospheric Correction algorithm (MAIAC) and used this new algorithm to improve the inversion resolution of MODIS AOD products (from 10 × 10 km to 1 × 1 km).In their results, R 2 value reached up to 0.500 in New England and 0.860 in Boston area [72].The MAIAC algorithm has since been widely applied in MEM studies [7,82,83,94,120,134]. Kloog et al. [83] obtained a CV-R 2 value of 0.810 in mid-western United States in 2000-2006.In a later study based on an early MEM [33,34], Kloog et al. performed a GAM to address missing AOD values, obtaining an R 2 value of 0.880 in the northeastern United States (New England, New York and New Jersey) [82].In New England, Alexeeff et al. [135] further employed the MEM model with [34,131] Kriging and land use regression to describe an epidemiological relationship between AOD and predicted PM 2.5 in 2003.The following year, Shi et al. [120] used MEM to predict PM 2.5 using MODIS AOD data collected between 2003 and 2008, and they obtained consistent results (R 2 value = 0.890) for days with available AOD data and without available AOD data.This method was also successfully applied in studies on the relationship between low PM 2.5 exposure and mortality.
In recognition of regional geographical differences, Lee et al. [7] predicted PM 2.5 concentrations using IPW in seven southeastern states of the United States in 2016, and they obtained three coefficients of determination (0.770, 0.810, and 0.700) from three different geographical area types.They suggested that their PM 2.5 estimation methods could be applied from urban areas to rural areas.Just et al. [94] analyzed the geographical distribution of PM 2.5 in Mexico in 2004-2014.They obtained an R 2 value of 0.724 using MEM and showed that precipitation and height of the boundary layer are both important factors influencing the relationship between AOD and PM 2.5 [94].Furthermore, with AOD derived from Medium Resolution Imaging Spectrometer (MERIS) and Advanced Along-Track Scanning Radiometer (AATSR) synergistic observations.Beloconi et al. [108] applied MEM to the evaluation of the day-specific and site-specific random effects in London.Their results showed a CV-R 2 value of 0.846 between 2002 and 2012.Ma et al. [136] provided an improved MEM to address data missing from satellite observation as well as ground-level measurements.

Advantages and Disadvantages
To sum up, MEM had the following advantages: (1) It had a relatively high predicting coefficient of determination.The R 2 value could generally reach up to 0.800 or higher.R 2 values of time and spatial consistency were also high; they could reach up to 0.700 or higher among different regions.Besides, R 2 value could be greatly enhanced through the use of MAIAC algorithms [7,72,82]; (2) MEM could be widely applied to the prediction of PM 2.5 at a regional level by using different land use and meteorological variables for model calibration; (3) MEM can be used to predict daily PM 2.5 concentrations, and has been applied in studies on the acute and chronic health effects of PM 2.5 exposure in New England, the Mid-Atlantic and other regions of the United States.These studies can be extended to other regions in the future [15].The model may also be used to explore the difference between satellite-derived AOD-based PM 2.5 data and ground based PM 2.5 data in health effect studies.
MEM has the following disadvantages: (1) Due to the lack of ground-level PM 2.5 monitoring data in certain areas, the PM 2.5 monitoring data could not meet the requirements of Kriging in MEM, which affected the accuracy of the results [72]; (2) The determination of correlation between AOD and PM 2.5 may decrease when only total AOD is applied.It is not clear which of the aerosols influencing AOD (such as sulfate, nitrate, ammonium, carbonaceous, mineral dust, and sea salt) plays a major role in the total AOD, or how much other air pollutants affect this correlation [33,94,137]; (3) Land use and traffic pollution information is hard to collect.

Theory Background and Application
Based on the characteristics of vertical distribution and transmission of AOD, Liu et al. [18] proposed the Global atmospheric chemistry model (GEOS-CHEM), which is a prediction model of PM 2.5 based on satellite AOD.Following Liu and coworkers' study [138], van Donkelaar et al. [43] developed the CTM which calibrates the height of the boundary layer and the humidity of air.Considering the composition and distribution of AOD and utilizing emissions listing data as well as daily emission patterns published in European and other countries, van Donkelaaar et al. built a precise CTM formula in 2006.In 2010, they simplified the CTM by redefining the association between AOD and PM 2.5 as a conversion factor.CTM can now be used on a global as well as a local scales [6,57], and has attracted extensive interest [139,140].
This model was employed in different regions between 2010 and 2012.Di Nicolantonio and Cacciari [55] applied the method in North Italy and obtained different results for satellite-based PM 2.5 predicting results (R 2 values of 0.680 (Terra MODIS), 0.590 (Aqua MODIS), 0.700 (Terra and Aqua MODIS, respectively).Hystad et al. [62] obtained an R 2 value of 0.410 for the first time to add land use variables in Canada using CTM [57].Additionally, in a comparison between IDW-adjusted CTM and MLR, IDW-adjusted CTM (R 2 value = 0.510 per year) performed better than MLR (R 2 value = 0.330 per year) [62,67].Lee et al. [63] made a comparison between the Kriging method and CTM in the United States.Although both methods gave consistent results, CTM had better applicability and higher accuracy, especially in areas with few ground level monitoring sites.Further studies by van Donkelaar et al. have shown that meteorological factors can calibrate and reduce the system error and spatial smoothing of the IDW method can reduce the random error, eventually extending the spatiotemporal prediction scale [67].Crouse [141] not only obtained a high R 2 value (0.792) in 11 Canadian cities in 1987-2001, but also successfully applied their results to the study of long-term health effects of PM 2.5 exposure.Following van Donkelaar's study [57], others studies conducted by Villeneuve, Chen, To and Brauer [142][143][144][145][146] focused on acute and chronic health effects and on the global burden of disease.
In addition, the estimates of PM 2.5 from MODIS AOD in the above studies were somewhat varied.In 2013, van Donkelaar et al. [147] added land use type data, which were used to quantify the weight of AOD data, and proposed Optimal Estimation (OE) in order to improve the predictive ability of AOD.More recently, Wang et al. [124] have provided an improved AOD retrieval algorithm for MODIS at 1 km resolution that can be retrieve AOD at high spatial resolutions at intra-urban scales.These MODIS-retrieved AODs are used to predict ground level PM 2.5 using aerosol vertical profiles and local scale factors obtained from the CTM simulation.Daily R 2 value = 0.860 and monthly R 2 value = 0.930 were obtained from data collected over the city of Montreal, Canada [124].
At the global level, in a study similar to van Donkelaar's 2010 study, Boys and Martin [148] completed a global ground level prediction of PM 2.5 in 2014, which integrated global AOD data collected from the MISR and SeaWiFS AOD (1 km × 1 km) satellite sensor between 1998 and 2012.They also included a few effecting factors in the CTM, such as the vertical structure of aerosol extinction, relative humidity, aerosol size and component of aerosol variables.Their results showed that PM 2.5 levels in East America, the Arabian Peninsula, Eastern and southern Asia were relatively consistent [148].In a different study, van Donkelaar et al. [101] combined GWR with CTM, and obtained a higher value of CV-R 2 (0.780) with high resolution in North America.In the same year, van Donkelaar et al. [27] improved the CTM approach to the prediction of PM 2.5 concentrations at a global level.Their research integrated AOD data from three satellites in order to avoid negative effects from the source variations of AOD.The study obtained high R 2 values (0.656) for North America in 2001-2010, indicating that PM 2.5 prediction could be feasible at the global level.

Advantages and Disadvantages
Based on above studies, the advantages of CTM are: (1) it can predict PM 2.5 concentrations at ground level without PM 2.5 data from ground monitors [127]; and (2) it takes the component of AOD and the effects of other pollutants into account, and has been widely used in Canada, North America and South America, for predicting on a global scale [27,28,149,150].CTM is currently central to health effect analysis related to PM 2.5 components [109].The disadvantages of CTM are: (1) the prediction effect was relatively low and variant among different regions.Considering the poor performance of CTM, lower R 2 values can lead to a high exposure bias in health effect studies; (2) it will consume time, energy and financial resources to collect the necessary chemical and physical information on PM 2.5 [57]; (3) due to the lack of pollutants emissions type and emissions listing data in developed countries, it is hard to meet the conditions of application of CTM in China, India and other developing countries [27]; and (4) other pollutants (SO 2 , O 3 , etc.) have different inversion resolutions compared with PM 2.5 [143].

Theory Background and Application
Based on the assumption that "regression coefficient is a function of the observation point's spatial position in linear regression" with spatial weight assigned according to the distance between observation points, Geographical Weighted Regression (GWR) was first proposed [151,152].This spatial regression technique reflects spatial variability and non-smooth character, and could provide a regional-level regression model [151][152][153].In 2009, Hu et al. [32] introduced AOD into GWR and carried out a prediction of PM 2.5 levels in the United States.After that, Ma [87] further optimized GWR in 2014 by taking AOD, land use variables as the independent variables, and PM 2.5 concentrations as the dependent variable.Meanwhile, based on the differences between regions in PM 2.5 ground monitoring, spatial weight assignment was developed and applied to each region with the quantity of AOD data.If a large proportion of AOD data was missing, we could select certain buffer areas for each spatial observation point and fill in the vacancy according to the corrected Akaike Information Criterion.Thus, spatial distribution of regression parameter gained, and the GWR model could explain the effects of the spatial autocorrelation within a certain area when spatial aggregation occurred for a certain variable [87,105,107,128].
Hu's initial investigation on GWR found that it had a low R 2 value compared with MEM and CTM, probably because not all studies took meteorological factors and land use factors into account [32].Based on regional differences, Hu et al. [59] brought meteorological variables and land use variables into the GWR to predict PM 2.5 concentrations in North America.Results showed that R 2 values improved significantly when these variables were considered (R 2 = 0.672 (North American Regional Reanalysis data), and R 2 = 0.706 (North American Land Data Assimilation System data)).However, large spatial variability and instability occurred in these variables.Further studies showed that PM 2.5 concentrations were higher in urban areas, and lower in rural villages or mountain areas.
In order to compensate the basis without considering the cross-validation, Ma et al. [87] expanded the National GWR model with data from the newly built national monitoring network to predict PM 2.5 levels in China, reporting a CV-R 2 value of 0.640.This result indicated that it was feasible to estimate PM 2.5 levels in China using satellite AOD combined with meteorological and land use data.The model obtained similar results to those obtained by the CTM used by van Donkelaar in 2010, but GWR found higher PM 2.5 concentrations in rural areas.Similar results for national PM 2.5 levels were found by You et al. [126] with CV-R 2 values of 0.760 and 0.810 for MODIS and MISR, respectively, in China.Additionally, using 3-km resolution MODIS AOD in 2014, You et al. [125] confirmed that this GWR approach is useful for estimating large-scale ground-level PM 2.5 distributions in China.

Advantages and Disadvantages
From the studies above, the advantages of this model are: (1) PM 2.5 estimation requires only small amounts of data.For example, this model can work with the daily average, monthly average or yearly average of both PM 2.5 data alone or AOD data alone.Determination coefficients were also less affected.Studies have shown that compared with CTM, GWR had a higher R 2 value [87]; (2) Similar to MEM, GWR used ground monitored PM 2.5 values for AOD calibration, and it had a better model performance than MLR.The disadvantages are: (1) since model construction depends on ground monitoring data, model performance may be much less reliable in areas lacking ground monitoring data; and (2) to our knowledge, GWR has only been employed in limited PM 2.5 prediction studies with the combination of satellite data [74,87,100,107,125,126,128], so the feasibility of applying it widely in other regions needs to be investigated in further research.

Other Models
In addition to the models mentioned above, other researchers used linear correlations [16,30,31,37,42,58,71,113,115,117], GAM [23,24,53,65,77], LUR [66,69,70,78,91,122], Kriging [88,90,108] or the nonlinear regression model.Those PM 2.5 estimating models all regard AOD as the primary independent variable.As a result, the predictability of these models was limited.Their R 2 values were generally lower, and varied between different areas.However, these listed models have been gradually optimized or integrated into other models, as with artificial neural networks (ANN, which incorporate LUR in the CTM) [52,61,68,110,111] and the two stage model (TSM, which combine the GWR with MEM) [80,81,119,121].In recent years, with the development of the AOD-based mathematical model, many new methods have been developed, such as geographically and temporally weighted regression (GTWR) [107], support vector regression methods (SVR) [99] and machine learning regression (which is a combination of SVR, Gauss neural network processes, Decision trees, and Random forests) [28].Although these new methods had been proposed, their reliability and veracity need to be investigated in further studies.

Summary
In terms of the accuracy of PM 2.5 prediction, though no single model can replace all others, some existing models have their advantages in the following areas.(1) Model predictability: MLR was commonly used in early studies [17,20,21,[24][25][26][39][40][41]46,47,49,50,54,75], whereas MEM and CTM gradually became the dominant methods and replaced MLR after 2010.However, GWR has developed at a slower pace with a limited number of studies to data, and had moderate performance [32,74,125,126].Included studies showed that R 2 value of MEM was higher than those of the other three models in the same area [17,87,104,136].Moreover, MAIAC algorithms, which led to a highly accurate of AOD, were mostly used in MEM, significantly improving the R 2 value of the model [7,35,83,120,135].On the global scale, CTM has been proven to be efficient for the mechanism of completing the prediction from using partial AOD data by AOD component analysis [57]; (2) Adjusting factors: The number of these factors has increased due to the development of prediction models.Moreover, factors such as atmospheric boundary layer height and relative humidity have become a permanent part of the adjustment process.In early LC and MLR studies, adjusting factors were limited in number and scope, and were mainly focused on meteorological factors (atmospheric boundary layer height, humidity, temperature, wind speed, etc.) [38,39,41,42].Later on, GAM took both meteorological factors and land use factors into account, which increased the performances [23,77].MEM and CTM also incorporated more meteorological factors and land use factors; their R 2 values proved to be satisfactory; (3) Missing AOD: Although predicting of PM 2.5 with satellite AOD has become the hotspot in remote sensing field, missing values of AOD cannot be ignored, because the predicted reliability of PM 2.5 could be affected when the percentage of missing AOD values reach 60%.Among the four models, MEM systematically and comprehensively described methods of dealing with missing AOD [137]; results of each method could be found in different studies.CTM, on the other hand, filled in the vacancy by establishing "buffer areas" or avoided the problem of missing AOD by assigning different weights to each area according to the amount of AOD data.For the MLR, missing AOD was not processed.

Conclusions
The review showed that MEM performed best.CTM had strengths in the prediction of PM 2.5 on a global scale.GWR was suitable for PM 2.5 prediction on a regional scale.MLR was relatively weak in terms of predictability.When land use information was included as an adjustment factor in addition to meteorological factors, the accuracy of predictions greatly improved.Other models, such as ANN, TSM and SVR, need to be further validated.We therefore suggest that the following possibilities be considered in future studies: (1) the use of AOD data with higher resolution for more accurate estimation of PM 2.5 in relatively small areas; (2) the use of satellite-based predicting models for historical PM 2.5 prediction and retrospective study in areas lacking historical PM 2.5 data; and (3) the development prediction models not only for PM but also for other air pollutants (SO 2 , NO 2 ), to extend the applicability of predicting models.

Figure 2 .
Figure 2. The frequency distribution of seven models.

Figure 3 .
Figure 3. Constituent ratio of seven models.

Figure 2 .
Figure 2. The frequency distribution of seven models.

Figure 2 .
Figure 2. The frequency distribution of seven models.

Figure 3 .
Figure 3. Constituent ratio of seven models.

Figure 3 .
Figure 3. Constituent ratio of seven models.

Table 1 .
Characteristics of included studies.