Spatio-Temporal Variation of PM2.5 Concentrations and Their Relationship with Geographic and Socioeconomic Factors in China

The air quality in China, particularly the PM2.5 (particles less than 2.5 μm in aerodynamic diameter) level, has become an increasing public concern because of its relation to health risks. The distribution of PM2.5 concentrations has a close relationship with multiple geographic and socioeconomic factors, but the lack of reliable data has been the main obstacle to studying this topic. Based on the newly published Annual Average PM2.5 gridded data, together with land use data, gridded population data and Gross Domestic Product (GDP) data, this paper explored the spatial-temporal characteristics of PM2.5 concentrations and the factors impacting those concentrations in China for the years of 2001–2010. The contributions of urban areas, high population and economic development to PM2.5 concentrations were analyzed using the Geographically Weighted Regression (GWR) model. The results indicated that the spatial pattern of PM2.5 concentrations in China remained stable during the period 2001–2010; high concentrations of PM2.5 are mostly found in regions with high populations and rapid urban expansion, including the Beijing-Tianjin-Hebei region in North China, East China (including the Shandong, Anhui and Jiangsu provinces) and Henan province. Increasing populations, local economic growth and urban expansion are the three main driving forces impacting PM2.5 concentrations.


Introduction
Fine particulate matter (PM 2.5 , i.e., particles less than 2.5 μm in aerodynamic diameter) is rich in organic toxic components and has a strong association with many adverse health effects [1][2][3]. Epidemiological studies have reported associations between PM 2.5 and a variety of medical diseases such as asthma, cardiovascular problems, respiratory infections, lung cancer [4] and breast cancer. The air quality in China, particularly the PM 2.5 level, has become an increasing public concern because of its connection to such health risks. For example, Hu suggested that exposure to high PM levels may have deleterious effects on the duration of survival after a breast cancer diagnosis among females [5]. Zhang et al. estimated the number of people living in high-exposure areas in Beijing during the autumn of 2012 [6].
Accurate modeling of fine scale spatial variation in PM 2.5 concentrations is critical for environmental and epidemiological studies. Land use regression (LUR) models are widely employed to expand in situ measurements of PM 2.5 concentrations to large areas. LUR is essentially an interpolation technique that employs the PM 2.5 concentrations as the dependent variable, with proximate land use, traffic and physical environmental variables used as independent predictors [7,8]. Some literature has suggested that PM emissions in urban areas has come from road traffic, household activities, energy production, building work, (inland) shipping and (small-scale) industry [9]. However, the lack of long-term monitoring data is the primary obstacle in developing countries where PM 2.5 concentration sites are sparse and have only been established in recent years. Fortunately, recent studies have indicated that satellite-observed total-column aerosol optical depth (AOD) could offer spatially continuous information about PM 2.5 concentrations at the global scale [10]. Mao et al. developed an AOD-enhanced space-time LUR model to predict the PM 2.5 concentrations in the state of Florida in the USA [11]. Van et al. presented a global estimate of PM 2.5 concentrations at 10 km × 10 km resolution for six years (2001)(2002)(2003)(2004)(2005)(2006) by combing satellite-derived AOD with in situ measurements [12]. In June 2013, the Global Annual PM 2.5 Grids were published by Battelle Memorial Institute and the Center for International Earth Science Information Network (CIESIN)/Columbia University; this data set represents a series of annual average grids (2001-2010) [13]. The global, 0.5 × 0.5 grid of estimated PM 2.5 concentrations was developed using monthly AOD data from MODIS and MISR for the period 2001-2010; these estimates leveraged the AOD/PM 2.5 surface level conversion factors calculated by van Donkelaar [12] and applied them to the gridded remote sensing data. The gridded product provides a continuous surface of PM 2.5 concentrations in micrograms per cubic meter for health and environmental research.
The objective of this paper is to explore the spatio-temporal characteristics and driving forces of PM 2.5 concentrations in China based on long-term newly refined data. Annual Average PM 2.5 gridded data, land use data, gridded population data and Gross Domestic Product (GDP) data for the period 2001-2010 were used in the analysis. The contributions of urban areas, high population and economic development to PM 2.5 concentrations were analyzed using the Geographically Weighted Regression (GWR) model.

PM 2.5 Data
The Global Annual Average PM 2.5 Grids represent a series of annual average grids (2001-2010) of PM 2.5 , these data were obtained from the Battelle Memorial Institute and the Center for International Earth Science Information Network (CIESIN)/Columbia University, and each file obtained from Battelle/CIESIN contains integer values for a global, 0.5 × 0.5 grid of estimated PM 2.5 concentrations. The average annual PM 2.5 concentration for each grid cell was calculated by multiplying the MODIS and MISR mean AOD for each month by the monthly conversion factor, as in Equation (1) (1) where E i stands for an annual-average estimated PM 2.5 concentration for each grid cell; , im AOD stands for the MODIS and MISR mean AOD for each month; and , im  stands for the monthly conversion factor. The PM 2.5 dataset offers spatially continuous information about PM 2.5 concentrations at the global scale while it also has some uncertainties. PM 2.5 concentrations derived from Battelle/CIESIN has biased values, and they may be higher or lower than those for van Donkelaar et al. in some regions such as largely arid and semi-arid countries with large desert areas [13]. The reasons for these differences are unclear. Uncertainties or limitations of the AOD data and computing methods or some other possible reasons all can cause these biases.
Data for China were extracted from the global dataset using the ArcGIS software and were transformed to the same coordinate system as the other datasets, specifically the Albers Equal Area projection system, Beijing 1954 geodetic datum and Krassovsky ellipsoid. Figure 1 shows the estimated distribution of PM 2.5 concentrations in China from 2001 to 2010.

Population Data
Gridded population data in China with a spatial resolution of 1km from 2001 to 2010 were provided by the Resources and Environmental Scientific Data Center (RESDC), Chinese Academy of Sciences (CAS) [14]. These gridded population data were transformed from census data based on the relationship between demographical data and land use types, and the population data were redistributed onto 1 km × 1 km grids [14].

GDP Data
GDP is a commonly used indicator of economic development. The gridded GDP dataset for China from 2001 to 2010 was adopted in this study. The dataset was obtained from RESDC, CAS. The statistical GDP data at the county level were transformed into gridded data at a resolution of 1 km × 1 km based on the relationship between the GDP data and the land use types [15].

Land-use Data
Land-use data for China at a scale of 1:100,000 for the years 2001 and 2010 were used in this study. The datasets were obtained using Landsat TM (Thematic Mapper) and the China-Brazil Earth Resources satellite (CBERS-2) satellite images and were interpreted by experts at RESDC, Chinese Academy of Sciences. The following six land use types were identified: (1) cultivated land; (2) woodland; (3) grass land; (4) water; (5) urban and rural settlements; and (6) barren land. Areas of urban sprawl were derived from the land use data. A set of land data from field surveys was selected to guarantee the accuracy of land use classification and it is the most accurate land use dataset at this scale in China [16]. Before further processing, all of the source data were re-sampled onto a raster dataset with 1 km spatial resolution, and transformed into the same coordinate system.

Methodology
The spatio-temporal variations of the PM 2.5 concentrations and their relationships with socioeconomic factors were evaluated using the following steps: Step 1: Evaluate the spatio-temporal variation of PM 2.5 concentrations in China from 2001 to 2010 based on annual average PM 2.5 grids.
Step 2: Compare the distribution of PM 2.5 concentrations with each of the following factors: urban areas, population and GDP. The impact of each factor on the PM2.5 concentrations was analyzed and compared.
Step 3: Use the GWR method to evaluate the relationships between the PM 2.5 concentrations and the urban areas, population and GDP.
A conventional regression method, such as ordinary least squares (OLS), is a type of global statistic that assumes that the relationship under study is constant over space and therefore assumes that the parameter is the same for the entire study area [17]. The GWR model extends the traditional standard regression framework to estimate local, rather than global, parameters [18]. The GWR model is a type of local statistic that produces a set of local parameter estimates that show how a relationship varies over space. This visualization enables examining the spatial pattern of the local statistics to gain a better understanding of possible hidden causes for that pattern [19]. The global regression model can be expressed as follows: We can obtain the vector estimates of the parameters using OLS: Where is the vector estimate of the parameters, and X is the matrix composed of the observed values of the independent variable; the first column elements of X are 1. Y is composed of the observed values of the dependent variable [20].
The GWR method considered the local estimates of the parameters, and the model is extended to (1): where W(U i ,V i ) is the range weight matrix [20,21].

Spatio-temporal Variation of PM 2.5 Concentrations in China
The World Health Organization (WHO) defined the standard for the annual average PM 2.5 concentration to be less than 10 μg/m 3 [22]. The human illness rate will increase immensely when the annual average concentrations reach 35 μg/m 3    We hypothesize that higher populations and GDP levels may cause higher PM 2.5 concentrations and that a larger urban area results in higher PM 2.5 concentrations. Thus, we build the regression model to study the correlation between the PM 2.

Correlation between PM 2.5 Concentrations and Socioeconomic Issues
Before geographically weighted regression can occur, one initial statistical analysis can determine the characteristics of each of the variables proposed for the model. We use liner regression models to examine the correlation between PM 2.5 and each variable. The summary statistical results are shown in Table 1. All of the associations with PM 2.5 are in the expected direction.  The GWR model produces a set of local regression results including local parameter estimates and local residuals, which can be mapped to show their spatial variability. In this study, we have chosen an ADAPTIVE kernel whose bandwidth will be found by minimizing the corrected Akaike Information Criterion (AICc) value.  Figure 7. Not surprisingly, some unusually high or low residuals can be observed. Those regions with some desert area have very large residuals (StdResid > 2). For example, the PM 2.5 concentrations in the northwest region of Xinjiang are high because of the desert. Xinjiang has a high incidence zone of dust explosion. The concentrations of dust aerosol at altitude are closely related to the surface conditions below; the concentrations of particles above the desert areas will be greater than those in the vegetation-covered areas [24]. Therefore, the high PM 2.5 concentrations in desert regions are mainly related to the dusty weather. The southern Hebei province, the northern Henan province and the northwest Shandong province also have much higher residuals because they are high pollution emission regions of northern China. For example, the Shijiazhuang Iron and Steel Company discharges an average of more than 2000 t/a of PM 2.5 . There are also many polluting enterprises in the urban areas [25]. The pollution in these regions is always more serious than the pollution in other regions. In addition, the Sichuan basin has high residuals because of its high aerosol optical depth values. The optical depth of the Sichuan Basin is higher than its surrounding areas due to its geographical climate characteristics; its annual average optical depth is approximately 0.7 [26]. PM 2.5 has a strong positive correlation with AOD [27], so the Sichuan Basin has high PM 2.5 concentrations. Regions that are rich in marine salt can also have high PM 2.5 concentrations [28]. Those regions have a noticeable over-prediction of PM 2.5 concentrations; this warrants closer inspection to discover the possible explanations. In those regions, the model under-predicts the levels of PM 2.5 concentrations [18]. However, the regions with StdResid values in the range of −2 to 2 account for 94.8% and 94.6% of the whole country, which indicates that the relations between PM 2.5 and that each of the three factors are stable. What's more, we can also obtain from the results that the regions with the positive value of the local coefficients for urban areas, population, and GDP account for 92.72%, 90.52% and 95.62% respectively in 2001 and 92.01%, 95.29% and 90.50% respectively in 2010 of the whole country. There is agreement with our expectation on the direction of the influence of those variables.

Discussion
Most of the associations between the PM 2.5 concentrations and the other variables considered are in the expected direction. PM 2.5 is correlated to population, GDP and urban area. Therefore, we have sufficient reason to believe that the regions with large populations, high values of GDP and large urban areas would have high values of PM 2.5 .
In China, PM 2.5 mainly comes from human activities (motor vehicle tail gas dust and coal dust) and the karaburan; the ground dust and secondary particles of the karaburan also contribute to the PM 2.5 concentrations [29,30]. Human activities that have a strong impact on the air quality are often sources of PM 2.5 ; therefore, some cities with poor air quality have a high level of PM 2.5 . Metropolitan areas have large populations, high GDP values, large proportions of urbanization and industries that produce contaminants [31]. From Figure 2, we can see that the high PM 2.5 values are mainly concentrated in the regions with large populations, high GDP values and large proportions of urbanization. For example, in Beijing, the values of PM 2.5 have a linear relation with motor vehicle tail gas dust, coal dust and karaburan [32][33][34]. Population growth and economic development are accelerating the environmental deterioration in Beijing [31]. In addition, some research shows that, in Tianjin and Chongqing, the PM 2.5 concentrations are dependent on motor vehicle exhaust dust, coal dust and karaburan [35][36][37]. In some large cities, the pollution from coal, other fuels and industrial pollution impacts particulate matter concentrations [38]. In addition, PM 2.5 concentrations are dependent on temperature, humidity and rainfall in some regions [39]. Each area has its own leading factor that influences PM 2.5 concentrations. Future work might include partitioning the different factors impacting PM 2.5 concentrations.

Conclusions
In this study, spatio-temporal characteristics and factors impacting PM 2.5 concentrations in China for the years 2001-2010 were evaluated based on newly refined long-term data. The following main conclusions are reached: This paper, for the first time, presents a comprehensive insight into the spatio-temporal characteristics of PM 2.5 concentrations in China at national scale. However, the problem is complex and needs further attention.