Comparison of Hourly PM2.5 Observations Between Urban and Suburban Areas in Beijing, China

Hourly PM2.5 observations collected at 12 stations over a 1-year period are used to identify variations between urban and suburban areas in Beijing. The data demonstrates a unique monthly variation form, as compared with other major cities. Urban areas suffer higher PM2.5 concentration (about 92 μg/m3) than suburban areas (about 77 μg/m3), and the average PM2.5 concentration in cold season (about 105 μg/m3) is higher than warm season (about 78 μg/m3). Hourly PM2.5 observations exhibit distinct seasonal, diurnal and day-of-week variations. The diurnal variation of PM2.5 is observed with higher concentration at night and lower value at daytime, and the cumulative growth of nighttime (22:00 p.m. in winter) PM2.5 concentration maybe due to the atmospheric stability. Moreover, annual average PM2.5 concentrations are about 18 μg/m3 higher on weekends than weekdays, consistent with driving restrictions on weekdays. Additionally, the nighttime peak in weekdays (21:00 p.m.) is one hour later than weekends (20:00 p.m.) which also shows the evidence of human activity. These observed facts indicate that the variations of PM2.5 concentration between urban and suburban areas in Beijing are influenced by complex meteorological factors and human activities.

The rest of the paper is organized as follows: Materials and methods used are introduced in Section 2. Results and discussion of interesting spatial and temporal variations of PM2.5 concentration between urban and suburban areas are presented in Section 3. Finally, we present our conclusions in Section 4.

Data Description
Hourly PM2.5 observations of 12 air pollution monitoring stations within Beijing City obtained from the China Environmental Monitoring Center (CEMC, http://113.108.142.147:20035/emcpublish/) during the period 1 March 2013 to 28 February 2014 were collected. These observations are measured by continuous particulate monitor with Tapered Element Oscillating Microbalances (TEOM) or beta-attenuation method [31,32]. Additionally, PM2.5 concentrations during the same period at the U.S. embassy station are also prepared to verify the monthly variation. Since the U.S. embassy station uses a different sampling instrument and method from the CEMC stations, PM2.5 concentration data collected at the U.S. embassy station is used as a reference value.
CEMC defines the 12 stations as eight urban assessment stations and four suburban assessment stations (Table 1). Because the spatial distributions and the surrounding environments of these stations are different, the observations reflect different elevations and a mix of station surroundings [20]. Since the monitoring network has been deployed incrementally over the last three years, the period of record varies from monitoring station to monitoring station. Station 1012 was established approximately one month later than other stations and lacked the data for March, so in subsequent analyses, station 1012 was excluded in the station clustering step, while analyses on temporal variation utilized all available data.

K-Means Clustering Method
K-means method is used because it can map the original data to a higher dimensional feature space so that the data can be easily separated linearly [17]. K-means is a clustering method aims to partition nonlinearly and high-dimensional separated observations into appropriate clusters. For clustering a given set of PM2.5 observations (x1,x2,…,xn) into k sets (k ≤ n), that is, S = {s1,s2, …,sk }, the best appropriate partition is determined with the minimum within-cluster sum of squares (WCSS) which means the sum of the squared Euclidean distances between each observation and the corresponding cluster center [17]. WCSS can be described as Equation (1): where μi is the average dissimilarity of sample i to all other stations in cluster k; and Xj stands for PM2.5 observations at each station. For a given cluster number k, the algorithm proceeds by alternating between following two steps [33]: a. assign each PM2.5 observation to the cluster whose mean yields: b. calculate the new means to be the centroids of the PM2.5 observations in the new clusters: c. repeat steps a and b until WCSS is stable, which means the difference between adjacent iterations is less than a small threshold.
The correct cluster number k is determined with silhouette method which is first described by Rousseeuw [34]. Suppose the number of samples in cluster r is nr ( | | r r n C = ). The silhouette coefficient is defined as: where, S(i) is the silhouette of PM2.5 observation i; a(i) is the average dissimilarity of sample i to all other stations in cluster r; and b(i) is the least average dissimilarity of PM2.5 observation i to the stations within a cluster different from cluster r. Thus, a smaller S(i) value indicates a better similarity among stations within the same cluster. The overall quality of a clustering distribution can then be measured using the average silhouette width for the entire PM2.5 concentration data set, which is defined as: where n is the total number of PM2.5 observations. A higher value of SC indicates better discrimination among clusters of a mining result. The k value, as maximized by the SC, was selected as the final PM2.5 cluster number. However, it is possible for k-means to reach a local minimum because of the starting points. We perform number of times to repeat the clustering, each with a new set of initial cluster centroid positions. K-means returns the solution with the lowest value for SumD (Equation (8)), and this solution is selected as the final PM2.5 cluster result:

Annual PM2.5 Concentration
Taking all of the 12 stations into account, the annual arithmetic mean PM2.5 concentration during the research period is about 87 μg/m 3 ( Table 2). A large difference is found between the average concentrations for the cold and warm seasons, 105 and 77 μg/m 3 , respectively. In this study, it is estimated that approximately 174 days of the daily PM2.5 concentrations exceeded the daily PM2.5 standard of NAQS (75 μg/m 3 ) during the 1-year research period.

Seasonal Patterns of PM2.5 Concentration
A seasonal pattern is evident at each sampling station, which shows very little variability in the monthly average scale ( Figure 1). Furthermore, sampling data at the U.S. embassy station of the same period also indicates a similar monthly variation. For individual months, it is found that Beijing has four significant peaks in March 2013, June 2013, October 2013, and February 2014. The highest level (around 140 μg/m 3 ) appears in February 2014, and November 2013 is at the minimum. Station 1007 is a special case, as the traffic flow around this station is always larger than those of surrounding stations; it consequently has higher PM2.5 concentrations than other surrounding stations in each month, and its lowest value appears in August 2013 (about 65 μg/m 3 ). Likewise, it is also evident that a seasonal cycle for individual days of the week appears the similar situation based on the PM2.5 daily concentration (not shown).
To our knowledge, such a seasonal pattern of PM2.5 concentration in Beijing is unique as compared with other major cities. For instance, New York shows a marked peak in July, with minimum PM2.5 concentrations occurring in February [20], while Athens (Greece) demonstrates a bimodal pattern (March and December) [19].

Stations Variations
The Pearson correlation coefficient (R) and distance between every two stations are investigated to verify the inter-correlations of all 12 stations ( Table 3). Most of R values between every two stations are greater than 0.9, especially when the distance between the two stations are less than 5 kilometers (station 1003, 1004, 1005, and 1006), R exceeds 0.95. However, we find that correlations between urban and suburban stations are mostly less than 0.85. Consequently, it is necessary to cluster the stations before studying the spatial distribution pattern of PM2.5 concentration in Beijing.

Spatial Distribution of Station Clustering
K-means method is used to cluster the 11 stations (station 1012 being excluded) into suitable categories, intending to filter the strong seasonal, diurnal and, to a lesser degree, day-of-the-week effects from the subsequent spatial analysis. We calculate SC indexes for different cluster numbers from 1 to 7, and find that it reaches a maximum when the cluster number is 4, indicating that the between-cluster dissimilarity is large while the within-cluster dissimilarity is small. So the 11 stations are partitioned into four clusters (Figure 2).

Diurnal Patterns of PM2.5 Concentration
As shown in Figure 3, a pronounced diurnal cycle in PM2.5 concentrations is evident among all the clusters. On the whole, the diurnal variation of PM2.5 is observed with higher concentration at night than daytime, the minimum concentration generally appears in the early afternoon. This finding is consistent with prior research [28]. It is worth noting that there is a significant minimum around noon (14:00 p.m.), because the meteorological condition at this time is impeditive for the formation of the thermal inversion layer. Urban areas suffer higher PM2.5 concentration (about 92 μg/m 3 ) than suburban areas (about 77 μg/m 3 ), and PM2.5 concentration of cluster 1 is always much higher than cluster 2 while cluster 3 gets higher PM2.5 concentration than cluster 4. That is, north-west area has higher PM2.5 concentration than east area in urban. For suburban areas, north-east area (in Shunyi Distinct) near urban suffers higher PM2.5 concentration than other suburban areas. These facts indicate the influence of complex human activities on PM2.5 concentration.
Nevertheless, the diurnal pattern of PM2.5 for all clusters presents systematic seasonal variations. In spring, the evening rush hour peak appears between 19:00 p.m. and 21:00 p.m., followed by a 22:00-24:00 p.m. peak, while it rises at 18:00 p.m. in winter. In summer, the evening rush hour peak is 17:00 p.m.-19:00 p.m. followed by a notable decline. These observations indicate the mobile-source influence on PM2.5. The major difference is that PM2.5 concentrations rise at 22:00 p.m. in winter, while the opposite situation appears in summer and autumn. This may be related with the subsidence inversion at night in winter, as a result the stability of the atmosphere increases and the pollutions cannot easily spread around. And this weather condition hardly ever appears in summer or autumn.
In both urban and suburban areas, the morning rush hour peak is observed in spring (8:00 a.m.-10:00 a.m.) and winter (7:00 a.m.-9:00 a.m.), but not apparent in summer and autumn. The evening rush hour peak is notable in spring (19:00 p.m.-21:00 p.m.), autumn (17:00 p.m.-20:00 p.m.), and winter (18:00 p.m.-21:00 p.m.). There is a special case in summer, urban areas show an evening rush hour peak (18:00 p.m.-20:00 p.m.) while suburban areas do not show it. This could suggest that enhanced anthropogenic activity is not solely responsible for the rush hour PM2.5 peak and the cumulative growth of nighttime PM2.5 concentration in summer maybe due to the atmospheric stability. Also, it indicates that the rush hour peak doesn't occur near the time when atmospheric stability is normally at a maximum. Theoretically, if the rush hour peak always occurs near the time when atmospheric stability is normally at a maximum, the time of occurrence of the nighttime peak in winter should be earlier than that in summer, but in fact quite the opposite.

Day-of-Week Pattern
In general, yearly average PM2.5 concentration is about 18 μg/m 3 (about 21% of the annual mean concentration) higher on weekends (Saturdays and Sundays) than on weekdays (Figure 4). This phenomenon is probably due to the driving restriction in Beijing, that is, about 20% of cars have to stay off the road on each weekday. Since the vehicle possession level amounts to close to 5.35 million units, this means there will be 1.07 million more cars on the road on weekends than weekdays. Supposing the other emissions are constant, the sudden increase of cars on the road might be the main cause of the higher PM2.5 concentration at weekends.

Figure 4.
Day-of-week pattern in urban and suburban areas. Though urban concentration is higher than suburban in both weekdays and weekends, the day-of-week patterns are similar. Interestingly, the evening peak is an hour later in weekdays than weekends, reflecting human activities.
Furthermore, urban PM2.5 concentrations are higher than suburban PM2.5 concentrations on both weekdays and weekends. The nighttime peak of weekdays (21:00 p.m.) is one hour later than weekends (20:00 p.m.) in either urban or suburban areas; this also results from human activity. By contrast, there is a totally different situation in other big cities. Taking New York for example, PM2.5 concentrations across the city are significantly lower on weekends and uniformly high on weekdays [20].
Besides the interesting facts above, we also find some phenomena that cannot be easily explained by atmospheric condition or emission, mobile-source influence. For example, the unique seasonal pattern of PM2.5 concentration in Beijing, and PM2.5 concentrations start to rise from 2:00 a.m. until 11:00 a.m. in summer while it begins falling at 0:00 a.m. until 5:00 a.m. in other seasons. Therefore, further studies should be carried out with multi-resources data, such as pollution source information, real-time population grid data, meteorological data and traffic data, to provide reasonable interpretations for these unexplainable phenomena.

Conclusions
The spatial and temporal characteristics of PM2.5 concentration in Beijing City between March 1, 2013 and February 28, 2014 are analyzed based on the hourly observations at 12 stations. On the whole, the annual mean PM2.5 concentration indicates large differences between urban and suburban as well as cold and warm seasons in Beijing.
K-means method is involved to cluster the stations into four categories for diurnal and day-of-week patterns analysis. The diurnal variation of PM2.5 is observed with higher concentration at night and lower value at daytime, and the cumulative growth of nighttime PM2.5 concentration maybe due to the atmospheric stability. We also find that PM2.5 concentrations are about 18 μg/m 3 (about 21% of the annual mean concentration) higher on weekends than the other days of the week, which might be related with the driving restrictions implemented in Beijing. The nighttime peak in weekdays (21:00 p.m.) is one hour later than weekends (20:00 p.m.) which also shows the evidence of human activity.
These are still some phenomena cannot be easily explained by atmospheric condition or emission, mobile-source influence, but may be helpful for the air quality model to exploit a deep understanding of pollution mechanism and improve its simulation accuracy.