Socioeconomic Drivers of PM2.5 in the Accumulation Phase of Air Pollution Episodes in the Yangtze River Delta of China

Recent studies in PM2.5 sources show that anthropogenic emissions are the main contributors to haze pollution. Due to their essential roles in establishing policies for improving air quality, socioeconomic drivers of PM2.5 levels have attracted increasing attention. Unlike previous studies focusing on the annual PM2.5 concentration (Cyear), this paper focuses on the accumulation phase of PM2.5 during the pollution episode (PMAE) in the Yangtze River Delta in China. This paper mainly explores the spatial variations of PMAE and its links to the socioeconomic factors using a geographical detector and simple linear regression. The results indicated that PM2.5 was more likely to accumulate in more developed cities, such as Nanjing and Shanghai. Compared with Cyear, PMAE was more sensitive to socioeconomic impacts. Among the twelve indicators chosen for this study, population density was an especially critical factor that could affect the accumulation of PM2.5 dramatically and accounted for the regional difference. A 1% increase in population density could cause a 0.167% rise in the maximal increment and a 0.214% rise in the daily increase rate of PM2.5. Additionally, industry, energy consumption, and vehicles were also significantly associated with PM2.5 accumulation. These conclusions could serve to remediate the severe PM2.5 pollution in China.


Introduction
In past decades, with the rapid development of industrialization and urbanization, the problem of air pollution has become increasingly severe around the world. In particular, fine particulate matter (PM 2.5 ), a type of pollutant, has been verified to be unhealthy for humans and the living environment, as it can cause lung cancer, respiratory and cardiovascular diseases, affect transportation, as well as increase mortality [1][2][3][4]. Recently, PM 2.5 pollution has become the greatest environmental problem in China and is therefore attracting growing public concern. Since 2013, heavy PM 2.5 pollution events have often occurred in many Chinese cities, as observed via monitoring-site data [5] and satellite imagery [6,7]. For instance, in the Yangtze River Delta (YRD), a developed region in China, Shanghai witnessed a maximum hourly PM 2.5 concentration of 602 µg/m 3 [8] in December of 2013. In addition, Nanjing experienced a daily PM 2.5 concentration (C day , the average concentration of 24 h in a day) of 369 µg/m 3 , which is far above 75 µg/m 3 (the 24-h standard of Chinese Ambient Air Quality Standards, CAAQS, GB3095-2012). In 2014, 86 million people in some cities of the YRD were exposed to more For the aforementioned reasons, we selected the YRD as the sample area, focusing only on the relationship of SEFs with PM 2.5 in the accumulation stage during in the pollution episodes, called PMAE in this study. Two main aspects were considered. Firstly, each pollution episode was split into an accumulation stage and a diminishing stage, based on the peak concentration. Unlike the diminishing process which is controlled by winds and precipitation [24], the accumulation stage generally occurs under stagnant weather conditions [13,25]. Further, cities in the YRD had similar synoptic conditions, as a pollution episode typically only lasted for a few days. Thus, our focus on PMAE could greatly reduce the interference from the different weather conditions to some extent. Secondly, although the YRD was often regarded as a whole in numerous studies, the socioeconomic development here as well as the atmospheric pollution, did differ among cities in the region. For example, in 2014, Shanghai had a population of 24 million, more than 34 times that of Zhoushan. This difference brings about many distinctions with regard to the other SEFs between the two cities, like the total energy consumption and the number of vehicles in Shanghai, which were found to be 98 and 27 times higher, respectively, than those in Zhoushan. Given the minimal difference in the synoptic environmental conditions in the YRD, the spatial diversity of PMAE may be mainly dominated by socioeconomic drivers. However, to our knowledge, existing studies show a lack of concern about the relationship between PMAE and socioeconomic factors, especially at a regional scale.
Regarding the analysis methods, various models, such as the traditional Ordinary Least Square model, Land Use Regression [26][27][28], GWR [16], Panel Data model [29], Spatial Lag model and Spatial Error model [5], were adapted to quantify the relationship between various SEFs and PM 2.5 . However, due to their collinearity, many SEFs were removed in the abovementioned models. To compare the influence of each indicator on PM 2.5 at the regional level, we used a geographical detector to explore the spatial correlations between PMAE and SEFs.
Overall, we first selected 10 typical pollution episodes from 2014 and averaged them to obtain the characteristics of PMAE in the YRD. Then, using the geographical detector and a linear regression model, the following goals were pursued, as presented in the subsequent sections: to explore the relationship between PMAE and SEFs, as well as to identify the influence of the socioeconomic indicators driving the accumulation of PM 2.5 . Some of the main findings are discussed at the end of this paper. All of these should provide policy-makers with an insight into PM 2.5 accumulation during pollution episodes in order to formulate appropriate air quality regulations.

Sample Area and Cities
In our study, the YRD consists of Shanghai, the southern part of Jiangsu Province and the northeastern part of Zhejiang Province, including 16 cities (Figure 1a,b). Being a developed area, although this region covers only 1.1% of China in terms of area, it accounts for 7.47% of its total population (according to the data of 16 cities in 2014). Due to the dense urban clusters and the increasing coal consumption in recent years, the region has frequently suffered from severe PM 2.5 pollution events. On account of the negative effects on human health, PM 2.5 concentration has been automatically monitored in many cities in China since 2013. Considering that most of the monitoring sites are located in the urban built-up area. We chose the urban area of 16 prefecture-level cities and 14 counties for this study, including at least one air-monitoring site, as shown in Figure 1c.

Ambient PM2.5 Concentration
Currently, the PM2.5 concentration data are updated every hour on the air quality publishing platform of the National Environmental Monitoring Centre in China. This paper derived hourly PM2.5 concentrations from 120 monitoring sites in the 30 sample cities in the YRD from 1 January to 31 December in 2014. According to the requirements for the validity of the concentration of air pollutants released in GB3095-2012, firstly, we deleted the values ≤0 and the abnormal concentrations in the raw data. Secondly, a daily concentration (Cday) was calculated by averaging the value for 24 h from 0:00 to 23:00. If the hourly data were missing for more than 4 h on any day, the Cday was considered invalid and excluded. At last, for each monitoring site, the mean of all daily concentrations was seen as its Cyear, and then the average Cyear of all sites in a city represented its urban pollution level.

Pollution Day and PM2.5 Episodes
To explore the characteristic of pollution episodes clearly, we defined a pollution day as a day with Cday ≥ 75 μg/m 3 , and a pollution episode as the pollution period with more than two consecutive pollution days. To reduce abnormality and contingency, ten pollution episodes were carefully picked out to average. The principles of selecting PMAE are as follows: (1) there is no daily data missing during a pollution episode; (2) there is no precipitation, and the wind conditions had a value of less than 3 on the Beaufort scale; and (3) the pollution episodes are distributed in different seasons. Although the Cday in part cities perhaps have not reached the pollution level as defined above, the same PM2.5 changing period was chosen simultaneously in all cities for comparing. As a result, ten pollution events, EP1-EP10 (during in the time of 1/1-1/6, 1/15-1/21, 2/19-2/23, 3/8-3/11, 5/26-5/30, 10/13-10/17, 11/9-11/13, 11/16-11/21, 12/20-12/26, and 12/27-12/31, respectively) were adapted and selected for this paper ( Figure 2) at last. Due to frequent rainfall, no episode was eligible from June to September.

Ambient PM 2.5 Concentration
Currently, the PM 2.5 concentration data are updated every hour on the air quality publishing platform of the National Environmental Monitoring Centre in China. This paper derived hourly PM 2.5 concentrations from 120 monitoring sites in the 30 sample cities in the YRD from 1 January to 31 December in 2014. According to the requirements for the validity of the concentration of air pollutants released in GB3095-2012, firstly, we deleted the values ≤0 and the abnormal concentrations in the raw data. Secondly, a daily concentration (C day ) was calculated by averaging the value for 24 h from 0:00 to 23:00. If the hourly data were missing for more than 4 h on any day, the C day was considered invalid and excluded. At last, for each monitoring site, the mean of all daily concentrations was seen as its C year , and then the average C year of all sites in a city represented its urban pollution level.

Pollution Day and PM 2.5 Episodes
To explore the characteristic of pollution episodes clearly, we defined a pollution day as a day with C day ≥ 75 µg/m 3 , and a pollution episode as the pollution period with more than two consecutive pollution days. To reduce abnormality and contingency, ten pollution episodes were carefully picked out to average. The principles of selecting PMAE are as follows: (1) there is no daily data missing during a pollution episode; (2) there is no precipitation, and the wind conditions had a value of less than 3 on the Beaufort scale; and (3) the pollution episodes are distributed in different seasons. Although the C day in part cities perhaps have not reached the pollution level as defined above, the same PM 2.5 changing period was chosen simultaneously in all cities for comparing. As a result, ten pollution events, EP1-EP10 (during in the time of 1/1-1/6, 1/15-1/21, 2/19-2/23, 3/8-3/11, 5/26-5/30, 10/13-10/17, 11/9-11/13, 11/16-11/21, 12/20-12/26, and 12/27-12/31, respectively) were adapted and selected for this paper (Figure 2) at last. Due to frequent rainfall, no episode was eligible from June to September.

PM 2.5 Episode Indexes
To exhibit the characteristic of PMAE, We defined three episode indexes, MI ep , DI ep , and AD ep , which are written as follows: where, MI ep , DI ep , and AD ep refer to the maximum increment, the daily increase rate and the accumulation degree of PM 2.5 during a pollution episode, respectively. i = 1, 2, 3 . . . . . . n (n = 10). C (0, i) and C (max, i) refer to the beginning and the peaking concentration of PM 2.5 in EPi, respectively. d i is the days of duration before PM 2.5 concentration reaching the peak in EPi. In Equation (3), W 1 and W 2 represent the weight of MI ep and DI ep , respectively. In this study, considering MI ep and DI ep were significantly positively related to each other (R 2 = 0.79), we defined W 1 = W 2 = 0.5. When the Equation (3) was applied, MI ep and DI ep were normalized to the region of (0, 1) first. The higher value of AD ep indicates PM 2.5 in that city is easier to accumulate.

Geographical Detector Model
The geographical detector proposed by Wang et al. [30] is a novel and suitable spatial analysis method to detect the influential force on certain geographic and environmental phenomena, which has been applied in many fields in recent years [31][32][33]. We used factor detector, one module of the geographical detector, to determinate the impact of socioeconomic indicators on PMAE in our study. Let one SEF be D, which is categorized into several sub-region D i (i = 1, 2, 3 . . . . . . m, m is the number of sub-region, and m = 5 in this study), and let an episode index (i.e., MI ep , DI ep , or AD ep ) be H, then the determinant power of factor D to H (PD D,H ) could be expressed as follows [30,31,33]: where, n and n D,i are the number of samples in the total study area and in the divisional D i , respectively; σ 2 and σ 2 D,i refer the variations of H in the total study area and in the divisional D i . The higher value of PD D,H , which is between 0 and 1, indicates the impact of the factor D on PM 2.5 is stronger.

Factor Detector Indicators
The analysis of PM 2.5 sources was a focus in previous articles. Firstly, pursuant to several papers, industry, coal combustion, private vehicles, gas combustion, iron and steel manufacturing, and biomass burning were regarded as the main sources generating fine particulate matter in the YRD [13,34]. Secondly, some studies have suggested that city size impacts air quality. For instance, Stone [35] and Martins [36] suggested that urban sprawl led to changes in population, energy consumption and air emissions, which would finally result in a worsening of air quality. Thirdly, population density was considered an influencing factor, but it is unclear whether the population density plays a positive or negative role with regard to PM 2.5 . On the one hand, a higher density in a large city is helpful for pollution-concentrated disposal, which will improve environment [35]. On the other hand, a higher density of population would cause more energy consumption and more emissions. As reported in many studies, the ambient PM 2.5 concentration is higher in an urban area than in a rural region [6,37], indicating that higher population density can aggravate PM 2.5 pollution. To further explore the effects of social and economic factors on PMAE, we chose five pollution sources, including 12 indicators, as detector factors (Table 1). Considering the monitoring sites were located in the urban built-up district, each variable in this paper was considered at an urban area scale to match the PM 2.5 data. All the SEFs data, which was captured as annual values, came from the Statistical Yearbook (2014) [38][39][40] or the National Economic and Social Development Statistics Bulletins of 30 cities (http://www.stats.gov.cn/tjgz/wzlj/dftjwz/). Figure 3 shows the spatial distributions of the twelve SEFs. In the remainder of this paper, the abbreviations of the indicators as follows: X 11 , Sec_indu; X 12 , Indu_L; X 21 , Energy; X 22 , Elec_tot; X 23 , Elec_indu; X 31 , Pop_tot; X 32 , Den_pop; X 33 , Area; X 41 , Vehicles; X 42 , Road; X 51 , Prim_indu; X 52 , Sown.

Univariate Linear Regression
It is noteworthy that a positive or negative correlation did not indicate causality completely. To express the force of each SEF on PMAE, we employed univariate linear regression to determinate the influence. Linear regression is a model used for data following a normal distribution. Due to the skewed distribution of many socioeconomic data (Figure 3), we select a log transformation to process the raw indicators. Compared with SK (the skewness of initial data), the values of SK-Log (the skewness of data in a logarithmic form), as shown in Figure 3, had a significant decrease from 2.68 ± 1.38 to 0.24 ± 0.93. Thus, a log transformation is considered to be a better effective method for processing the data and was selected in this study. The linear fitting formula is written as Equation (5): where, Y is an episode index (MI ep , DI ep , or AD ep ); X means the socioeconomic indicators in Table 1; a is constant, and b stands for the slope of the regression line. This paper used the slope b to express the elasticity of Y growth caused by per unit of X added.

Characteristics of PM2.5 Pollution in the YRD
In 2014, the YRD experienced long-duration PM2.5 pollution, as shown in Figure 2. The regional Cyear was 62.57 μg/m 3 , which is higher than the national average (61 μg/m 3 ) [16] and obviously exceeds the CAAQS 35 μg/m 3 Cyear standard. Affected by various factors, the PM2.5 concentration was highest in winter (100.52 μg/m 3 ), followed in spring (62.28 μg/m 3 ) and in fall (52.49 μg/m 3 ), and lowest in summer (49.61 μg/m 3 ). Figure 4a,b show the spatial variation of Cyear and pollution days, respectively. Figure 4 indicates that the cities with higher Cyear were mainly located in the northwest of the region. For example, the top four cities, Nanjing, Taizhou, Jurong, and Jiangyin, which were located along the Yangtze River, had the higher Cyear of 74.63, 73.53, 73.17, and 73.17 μg/m 3 ,

Characteristics of PM 2.5 Pollution in the YRD
In 2014, the YRD experienced long-duration PM 2.5 pollution, as shown in Figure 2. The regional C year was 62.57 µg/m 3 , which is higher than the national average (61 µg/m 3 ) [16] and obviously exceeds the CAAQS 35 µg/m 3 C year standard. Affected by various factors, the PM 2.5 concentration was highest in winter (100.52 µg/m 3 ), followed in spring (62.28 µg/m 3 ) and in fall (52.49 µg/m 3 ), and lowest in summer (49.61 µg/m 3 ). Figure 4a,b show the spatial variation of C year and pollution days, respectively. Figure 4 indicates that the cities with higher C year were mainly located in the northwest of the region. For example, the top four cities, Nanjing, Taizhou, Jurong, and Jiangyin, which were located along the Yangtze River, had the higher C year of 74.63, 73.53, 73.17, and 73.17 µg/m 3 , respectively. Only one out of 30 sample cities, i.e., Zhoushan, witnessed a C year of less than 35 µg/m 3 . There were 16 cities with more than 100 pollution days. Figure 5 shows the statistical results of C day . In the highly-polluted days, the C day reached a peak of 454 µg/m 3 in Zhuji, 449 µg/m 3 in Lin'an, and 143-352 µg/m 3 in other cities, far in excess of 75 µg/m 3 . Regarding spatial distribution, the value of C year and pollution days all decreased from the northwest to the southeast, which may be related to the coastal location. respectively. Only one out of 30 sample cities, i.e., Zhoushan, witnessed a Cyear of less than 35 μg/m 3 . There were 16 cities with more than 100 pollution days. Figure 5 shows the statistical results of Cday.
In the highly-polluted days, the Cday reached a peak of 454 μg/m 3 in Zhuji, 449 μg/m 3 in Lin'an, and 143-352 μg/m 3 in other cities, far in excess of 75 μg/m 3 . Regarding spatial distribution, the value of Cyear and pollution days all decreased from the northwest to the southeast, which may be related to the coastal location.

Characteristics of PMAE
To express the results intuitively, the 30 cities were divided into five groups based on their GDP in 2014. Ten selected pollution episodes from these cities have been shown in detail in Figure 6. Table 2 showed the statistics of pollution indicators during the ten episodes in the YRD. The results indicated that severe episodes mainly occurred in January. In particular, during EP1 and EP2, the Cday peaked at 143-307 μg/m 3 in the sample cities. During the ten selected episodes, the maximum of MIep and DIep reached 217 μg/m 3 and 55.42 μg/(m 3 ·d) (observed in Nanjing). Notably, a peak was noticed in May, during EP5, for instance, when the maximum Cday reached 84-273 μg/m 3 in all the cities. respectively. Only one out of 30 sample cities, i.e., Zhoushan, witnessed a Cyear of less than 35 μg/m 3 . There were 16 cities with more than 100 pollution days. Figure 5 shows the statistical results of Cday.
In the highly-polluted days, the Cday reached a peak of 454 μg/m 3 in Zhuji, 449 μg/m 3 in Lin'an, and 143-352 μg/m 3 in other cities, far in excess of 75 μg/m 3 . Regarding spatial distribution, the value of Cyear and pollution days all decreased from the northwest to the southeast, which may be related to the coastal location.

Characteristics of PMAE
To express the results intuitively, the 30 cities were divided into five groups based on their GDP in 2014. Ten selected pollution episodes from these cities have been shown in detail in Figure 6. Table 2 showed the statistics of pollution indicators during the ten episodes in the YRD. The results indicated that severe episodes mainly occurred in January. In particular, during EP1 and EP2, the Cday peaked at 143-307 μg/m 3 in the sample cities. During the ten selected episodes, the maximum of MIep and DIep reached 217 μg/m 3 and 55.42 μg/(m 3 ·d) (observed in Nanjing). Notably, a peak was noticed in May, during EP5, for instance, when the maximum Cday reached 84-273 μg/m 3 in all the cities.

Characteristics of PMAE
To express the results intuitively, the 30 cities were divided into five groups based on their GDP in 2014. Ten selected pollution episodes from these cities have been shown in detail in Figure 6. Table 2 showed the statistics of pollution indicators during the ten episodes in the YRD. The results indicated that severe episodes mainly occurred in January. In particular, during EP1 and EP2, the C day peaked at 143-307 µg/m 3 in the sample cities. During the ten selected episodes, the maximum of MI ep and DI ep reached 217 µg/m 3 and 55.42 µg/(m 3 ·d) (observed in Nanjing). Notably, a peak was noticed in May, during EP5, for instance, when the maximum C day reached 84-273 µg/m 3 in all the cities.  From Figure 6, it can be seen that every PM2.5 pollution episode in each city in the YRD showed a similar, but not identical, tendency. This means the upward and downward trends in each city were similar, while their peak concentrations and the corresponding time of appearance were different. Unlike the spatial variations of Cyear (decreasing from northwest to southeast), the high values of MIep, DIep, and ADep (obtained by averaging ten episodes) were concentrated on both sides of the Yangtze River, in addition to being characterized by a downward trend from north to south (Figure 7a-c). For instance, in Shanghai, PM2.5 reached the peak faster, during six out of ten episodes, than in the other cities. Although Shanghai experienced a relatively lower Cyear at 51.97 μg/m 3 , it saw the highest MIep at 88.73 μg/m 3 , DIep at 23.78 μg/(m 3 ·d) and ADep at 0.97 among the 30 cities. That is probably because of the dense population and vast industrial activities, which would cause more air emissions compared to the other cities. Another example, Nanjing, not only suffered  From Figure 6, it can be seen that every PM 2.5 pollution episode in each city in the YRD showed a similar, but not identical, tendency. This means the upward and downward trends in each city were similar, while their peak concentrations and the corresponding time of appearance were different. Unlike the spatial variations of C year (decreasing from northwest to southeast), the high values of MI ep , DI ep , and AD ep (obtained by averaging ten episodes) were concentrated on both sides of the Yangtze River, in addition to being characterized by a downward trend from north to south (Figure 7a-c). For instance, in Shanghai, PM 2.5 reached the peak faster, during six out of ten episodes, than in the other cities. Although Shanghai experienced a relatively lower C year at 51.97 µg/m 3 , it saw the highest MI ep at 88.73 µg/m 3 , DI ep at 23.78 µg/(m 3 ·d) and AD ep at 0.97 among the 30 cities. That is probably because of the dense population and vast industrial activities, which would cause more air emissions compared to the other cities. Another example, Nanjing, not only suffered from the highest C year at 74.64 µg/m 3 , but also experienced a high MI ep at 88.43 µg/m 3 , DI ep at 21.33 µg/(m 3 ·d), and AD ep at 0.87. Thus, it was a typical city with severe pollution that would easily accumulate. The reason for that might be related to the high humidity, local multiple emissions, and unfavorable diffusion here [13,20,41]. PM 2.5 was also easily accumulated in Nantong, Kunshan, Suzhou, as well as Taicang. However, it was low or hard to accumulate in some cities, such as Zhoushan and Chun'an. Among the remaining cities, the PM 2.5 accumulation degree was at a moderate level.  [13,20,41]. PM2.5 was also easily accumulated in Nantong, Kunshan, Suzhou, as well as Taicang. However, it was low or hard to accumulate in some cities, such as Zhoushan and Chun'an. Among the remaining cities, the PM2.5 accumulation degree was at a moderate level.   (In this figure, the H, M, and S before "-" refer heavy, moderate and slight pollution, respectively; the H, M, and L after "-" refer high, moderate, and low accumulation degree, respectively. Each city was numbered as Figure 1. The blue color is the Yangtze River.) Overall, a more heavily polluted city had a higher degree of accumulation of PM2.5, although, a less polluted city did not necessarily indicate "lack of ease of accumulation". In other words, severe from the highest Cyear at 74.64 μg/m 3 , but also experienced a high MIep at 88.43 μg/m 3 , DIep at 21.33 μg/(m 3 ·d), and ADep at 0.87. Thus, it was a typical city with severe pollution that would easily accumulate. The reason for that might be related to the high humidity, local multiple emissions, and unfavorable diffusion here [13,20,41]. PM2.5 was also easily accumulated in Nantong, Kunshan, Suzhou, as well as Taicang. However, it was low or hard to accumulate in some cities, such as Zhoushan and Chun'an. Among the remaining cities, the PM2.5 accumulation degree was at a moderate level.   (In this figure, the H, M, and S before "-" refer heavy, moderate and slight pollution, respectively; the H, M, and L after "-" refer high, moderate, and low accumulation degree, respectively. Each city was numbered as Figure 1. The blue color is the Yangtze River.) Overall, a more heavily polluted city had a higher degree of accumulation of PM2.5, although, a less polluted city did not necessarily indicate "lack of ease of accumulation". In other words, severe  (In this figure, the H, M, and S before "-" refer heavy, moderate and slight pollution, respectively; the H, M, and L after "-" refer high, moderate, and low accumulation degree, respectively. Each city was numbered as Figure 1. The blue color is the Yangtze River.) Overall, a more heavily polluted city had a higher degree of accumulation of PM 2.5 , although, a less polluted city did not necessarily indicate "lack of ease of accumulation". In other words, severe pollution was dominated by the ease of PM 2.5 accumulation in certain spaces. However, in the cities with low C year , like Shanghai, there were still highly-polluted periods. During the pollution episodes, the increase and decrease of PM 2.5 were rapid here.
In a word, the difference of the PM 2.5 accumulation, as well as that of C year , is significant in the YRD. Unlike the trend of C year , "higher in the northwest, lower in the southeast", the tendency of the degree of PM 2.5 accumulation is "higher in the north, lower in the south". Table 3 shows the Pearson correlation coefficients of SEFs with PM 2.5 episode indexes. From Table 3, it can be seen that no SEF was significantly related to the C year (R ranging from 0.13 to 0.34), indicating weak correlations between them. However, for PMAE, the coefficients increased remarkably and presented a significant positive relationship. Taken together, three sources, including industrial factors, energy consumption, and population and city, had a stronger influence on PMAE than transportation and agricultural factors. Specifically, nine out of twelve SEFs except for road (X 42 ) and agricultural factors (X 51 and X 52 ) were significantly related to MI ep , as well as to DI ep and to AD ep at the level of p < 0.01 or p < 0.05. Among these nine indicators, firstly, population density (X 32 ) had the highest coefficients with MI ep (0.68), DI ep (0.67) and AD ep (0.64), respectively, suggesting that population density could be an important factor driving the rapid increase of PM 2.5 , compared to other factors, when pollution occurs. Although the impact of population density was unclear in previous literature, our results suggested that a higher density of population may influence PM 2.5 and cause it to increase rapidly, within a short period. Secondly, secondary industry (X 11 ), industry above designated size (X 12 ), energy consumption (X 21 ), total electricity power (X 22 ), and electricity power of industry consumption (X 23 ) had a high coefficient of 0.53-0.56 with MI ep , 0.60-0.62 with DI ep and 0.55-0.58 with AD ep , indicating these five factors also influence PM 2.5 accumulation significantly. Overall, compared with C year , PMEA was more sensitive to the influence of socioeconomic indicators.

PD of SEFs to PMAE
The Factor Detector module was used to quantify the PD of SEFs to PMAE. The specific process was listed as follows: (1) the SEFs were discretized into five categories ( Figure 3); (2) the spatial figures of SEFs were overlaid on the figures of episode indexes (Figure 7) in ArcGIS; and (3) the PD of each factor was calculated by the Equation (4). It should be noted we have compared four discretization methods, system cluster, equal interval break, quantile break, and natural break. PD values by the system cluster method were larger than the others, thus, we considered it be suitable for the actuality. Furthermore, we also compared four clusters and five clusters and discovered no significant difference between them. Accordingly, this paper chose the system cluster method to discretize the quantitative data into five clusters ( Figure 3).
As shown in Figure 3, the twelve SEFs experienced a great difference in the 30 sample cities. The values of CA ranged from 64-167, which indicated that the social and economic gap in the YRD did exist. Generally speaking, the region had an average population of 2.64 million, secondary industry of 1257 billion yuan, urban built-up area of 222 km 2 , and vehicles of 44 million in 2014. Driven by a population of 24 million in Shanghai, most of other SEFs, such as secondary industry, built-up area, vehicles, and energy consumption etc., were more than 100 times higher than that of Chun'an, a city with the minimum of SEFs. In spatial distribution, cities, with a high value of secondary industry, energy consumption, population and population density, were apparently concentrated on both sides of the Yangtze River.
According to Table 3, the PD of SEFs to C year was between 0.09-0.19, while it was in the range of 0.18-0.37 for PMAE, showing an extraordinary increase. Similar to the Pearson correlation analysis, the Factor Detector reached the conclusion that SEFs had a larger explanatory power with regard to the spatial diversity of PMAE than C year . For MI ep , population density had the highest PD value (0.36), followed by the built-up area (0.33) and population (0.32), suggesting that city size and the population degree could better explain the spatial difference of PM 2.5 increment compared to the other SEFs. That also means PM 2.5 accumulates more easily in a larger and more populous city. Similarly, for DI ep , the high PD value belonging to secondary industry (0.37), industry above designated size (0.31), energy consumption (0.32), population (0.30) and population density (0.33), was higher than the other detector factors. Apparently, industry and energy consumption had a very significant influence on the rate of PM 2.5 increase, as well as population. Synthesizing this information to AD ep , the PD values were sequenced as: X 32 (0.39) > X 11 (0.36) > X 23  The close correlations between PMAE and industrial factors, population and city, and energy consumption could also be observed clearly in Figure 9. Overall, the two upward-sloping fitting lines in each figure (except the last three) mean that a positive interrelationship exists between them. Similar to the Pearson analysis and Factor Detector, population density had the highest slope among the twelve factors, suggesting that a higher increment and rising rate occurred with an increase in population density compared to the other factors. The slopes, 0.17 for MI ep , and 0.214 for DI ep showed that an increment of 1% in population density would cause an increase of 0.17% in MI ep and 0.21% in DI ep . The other major slopes belonged to industry factors, energy consumption, and vehicles, in that order, showing the importance of these SEFs to PM 2.5 response during the episodes. From Figure 9, it can be seen that there were no significant linear trends for episode indexes with road and with agricultural factors. In short, our statistical results indicated that, firstly, PMAE could better explain the distribution of socioeconomic forces on PM2.5 compared with the annual average concentration in the YRD on the basis of statistical analysis. Secondly, the population density and total population played important roles in promoting the accumulation of PM2.5 during a pollution episode and causing the spatial diversity in the YRD. Thirdly, industrial factors and energy consumption also led to the dramatic increase in PM2.5, especially the daily rise rate. Fourthly, as a reflection of the traffic factor, PMAE was significantly related to vehicles, suggesting that the impact of vehicles cannot be ignored. Lastly, although straw burning and agricultural biomass burning have been verified to be a key source in the YRD, we found that the effect of agricultural factors on PMAE was not statistically significant.

Discussion
During the last few decades, a growing number of scientific papers and reports have explored the socioeconomic drivers of air pollutant emissions, which are important for the development of pollution control strategies. As one of these cases, to better express the socioeconomic impact on haze pollution, we chose to focus only on PM2.5 in accumulation stage (PMAE) in this study, unlike In short, our statistical results indicated that, firstly, PMAE could better explain the distribution of socioeconomic forces on PM 2.5 compared with the annual average concentration in the YRD on the basis of statistical analysis. Secondly, the population density and total population played important roles in promoting the accumulation of PM 2.5 during a pollution episode and causing the spatial diversity in the YRD. Thirdly, industrial factors and energy consumption also led to the dramatic increase in PM 2.5 , especially the daily rise rate. Fourthly, as a reflection of the traffic factor, PMAE was significantly related to vehicles, suggesting that the impact of vehicles cannot be ignored. Lastly, although straw burning and agricultural biomass burning have been verified to be a key source in the YRD, we found that the effect of agricultural factors on PMAE was not statistically significant.

Discussion
During the last few decades, a growing number of scientific papers and reports have explored the socioeconomic drivers of air pollutant emissions, which are important for the development of pollution control strategies. As one of these cases, to better express the socioeconomic impact on haze pollution, we chose to focus only on PM 2.5 in accumulation stage (PMAE) in this study, unlike previous studies focusing on C year . The reason was that we considered it may be more reasonable to employ PMAE to reflect the spatial differentiation of socioeconomic impact than C year .
The following two inferences can be drawn. On the one hand, PM 2.5 was significantly affected by natural factors such as climate, topography, and surface vegetation [42][43][44] besides socio-economic drivers. C year is a combined effect of these factors. Particularly, the reduction of PM 2.5 concentration often depended on precipitation or gales. It is very difficult to identify the influence of each factor on C year . On the other hand, firstly, the YRD has similar climate conditions. In 2014, the annual average temperature in the 30 cities was between 14-19 • C and the yearly relative humidity was from 71% to 76%. Compared with SEFs in the region, the CA of temperature (6%) and relative humidity (3%) had a sharply decrease, which indicated the difference in weather factors among cities was much smaller than the SEFs. Secondly, one pollution episode only lasts for several days. In a few days, especially during the accumulation phase of a pollution episode, the synoptic conditions are relatively stagnant, which is one of the prerequisites of PM 2.5 pollution occurrence. Lastly, the ten pollution episodes were strictly picked out for this study under the principle without the interference of inclement weather (rainfall, winds, etc.). Thus, we believe that it can reduce the interference from natural factors as far as possible to focus on PMAE at the regional scale.
Even so, the disturbance caused by the difference in synoptic conditions in the YRD could not be removed completely. In addition, although we realize that the significant distinction of topography and landscape structure in the study area also contributed to the spatial variation of PMAE, this paper had to exclude these factors due to the lack of data. Furthermore, we selected the urban district as the sample assessment unit to minimize the coverage of PM 2.5 , while it is still a limitation that using the data from a small number of fixed monitoring stations to represent the PM 2.5 pollution of a large area. With reference to existing, similar papers on the relationship of SEFs with PM 2.5 , many articles also explored the relationship between socioeconomic factors and monitoring-site PM 2.5 [16,29], in which the influence of natural factors are rarely included [5,6]. Further, the data in the logarithmic form in our study could efficiently minimize the potential heteroscedasticity. Therefore, we considered that the exclusion of the influence of natural factors may not result in a serious evaluation bias.
On basis of this, our results showed that SEFs were more closely related to the episode indexes than to the C year . In view of socioeconomic factors, firstly, although there has been no conclusive result about the positive or negative effects of population density on PM 2.5 from the existing articles so far, our statistical results indicated that population and its density have a significant influence on PMAE in the sample region. This may be because the population is the leading factor driving the increase in other social and economic factors. For example, in cities with a higher population density, more energy consumption, more private vehicles, and more industrial activities are required to satisfy the needs of a considerable population, which will, in turn, generate more emissions and deteriorate the air quality. When fine particles accumulate to a certain extent under stagnant synoptic conditions, PM 2.5 pollution will occur or be aggravated. According to existing research [6,38], the conclusion that PM 2.5 concentration was higher in urban areas than in rural regions also indicated population and its density may cause urban air quality deteriorating. Secondly, indicators of industrial factors, such as secondary industry, industry above designated size and energy consumption showed an important impact on PMAE in our study. In fact, industry, especially heavy industry, and its energy consumption could discharge many pollutants. As Zhao et al. [45] pointed out, a 1% increase in industrial added value will result in an increase of nearly 0.847% in relative pollution density. Therefore, the difference in industrial activities was also responsible for the spatial diversity of PMAE in the YRD. Thirdly, the number of motor vehicles, private vehicles in particular, has increased dramatically in recent years. As many studies have suggested, vehicular emissions include the main components of PM 2.5 , such as particles or the precursor gases [46][47][48]. Through this paper, we also found the significant impact from vehicles on PMAE, based on statistical information. Lastly, the influence of agricultural factors was deemed as a vital source of PM 2.5 in previous studies [49,50]. However, it was not found to be strong enough to be statistically significant in our study. We inferred that the comparative stability of the agricultural effect could not lead to the extreme increase in PM 2.5 , except in the harvest season, a few episodes during which time were appropriate for our study.
In brief, socio-economic factors heavily influenced the PMAE, with all contributors being tied to each other with the abovementioned analysis. There were two main contributions in this study. Firstly, we proved that the socio-economic influence on pollution episodes was more significant compared to the annual pollution level, which is rarely mentioned in previous studies. Secondly, the influence of each factor on PM 2.5 episodes was explored by the suitable techniques, geographical detector and linear regression. All of this information would be useful for developing policies for the improvement of air quality, especially during periods of serious haze episodes.

Conclusions
Using the PM 2.5 concentrations and twelve socioeconomic indicators in 2014, we have explored, for the first time, the characteristics of PM 2.5 accumulation during pollution episodes and its links with socioeconomic factors in the YRD from a statistical perspective. To compare, we employed the C year as the annual average pollution level and defined the episode indexes, MI ep , DI ep and AD ep to represent the degree of accumulation of PM 2.5 (the average of ten adopted episodes). In general, the spatial pattern of C year indicated "higher in the northwest, lower in the southeast", while the high value of PM 2.5 accumulation degree was mainly distributed among the northern cities located along the Yangtze River. Pearson coefficients and PD values suggested a similar conclusion, that the variation in PM 2.5 during the episodes was more sensitive to socioeconomic impact than C year . Scatter plots and linear regression further verified the influence of population density in causing PMAE variation in the YRD.
Overall, larger and more populous cities will generate more emissions, with PM 2.5 accumulation being easier there compared with the smaller ones. It may be time to think about controlling the scale of city expansion and population density in urban planning. Of course, China should also accelerate industrial restructuring, reduce coal consumption and develop new, cleaner energy sources. At the same time, motor vehicles did have a significant effect on PM 2.5 accumulation, and hence, should not be taken lightly. Although the explanatory power of agriculture was relatively weak in this study, the particles contributed by biomass burning identified by many other studies should be strictly controlled. Furthermore, air pollution was driven by all social and economic activities and their interaction. More studies about understanding the potential mechanism of socio-economic factors need to be conducted for improving the quality of our environment in the future.