Spatial and Temporal Distribution of PM2.5 Pollution in Xi’an City, China

The monitoring data of the 13 stations in Xi’an city for the whole years of 2013 and 2014 was counted and analyzed. Obtaining the spatial and temporal distribution characteristics of PM2.5 was the goal. Cluster analysis and the wavelet transform were utilized to discuss the regional distribution characteristics of PM2.5 concentration (ρ(PM2.5)) and the main features of its yearly changes and sudden changes. Additionally, some relevant factors were taken into account to interpret the changes. The results show that ρ(PM2.5) in Xi’an during 2013 was generally higher than in 2014, it is high in winter and low in summer, and the high PM2.5 concentration centers are around the People’s Stadium and Caotan monitoring sites; For the regional PM2.5 distribution, the 13 sites can be divided into three categories, in which Textile city is Cluster 1, and High-tech Western is Cluster 2, and Cluster 3 includes the remaining 11 monitoring sites; the coefficient of goodness of the cluster analysis is 0.6761, which indicates that the result is acceptable. As for the yearly change, apart from June and July, the average ρ(PM2.5) concentration has been above the normal concentration criteria of Chinese National Standard (50 g/m3); cloudy weather and low winds are the major meteorological factors leading to the sudden changes of ρ(PM2.5).


Data
The distribution map of the 13 monitoring sites is shown in Figure 1, which clearly illustrates the location of the 13 sites. Apart from Chang'an district site, Yanliang district site and Lintong district site, which are relatively scattered, the remaining 10 sites are located centrally. All the sites belong to different regions, and there are some subtle difference in their environmental climates and the industrial structures. These differences can have a significant impact on the monitoring result-the concentration of PM2.5. For instance, the High-voltage Switch Factory is located in Lianhu district in northwestern Xi'an city, where the population concentration is high and the northeasterly winds are normal; and the main industry is the production of electro-technical instruments. The AQI data of Xi'an city from 1 January 2013 to 31 December 2014 has been analyzed statistically. The basic statistical information of the 13 motoring sites and the mean value of the whole city during the two years has been reorganized into Table 1. The original data comes from the website of Xi'an Environmental Monitoring Station, which is a platform site provided by the city government to notify the public about ρ(PM 2.5 Generally, the PM2.5 concentration value in 2013 was higher than that in 2014. The standard deviation for all monitoring sites, among which the highest value was 109.554, and the minimum was 63.695, is relatively high, showing that the concentration of PM2.5 goes up and down with great fluctuations. The possible reason for this phenomenon is that the concentration is sensitive to all kinds of factors, like temperature and wind velocity, which can cause impressive changes to it. We can learn from the mean value that there are no significant differences in the concentrations of all the monitoring sites.

Methodology
The main methods utilized in this paper to discuss the distribution of PM2.5 and its influential factors in Xi'an city are cluster analysis and wavelet analysis, and the main tool used was MATLAB-R2012b.

Cluster Analysis
Cluster analysis, which is also called group analysis, is a kind of method that study the classification of samples or index was based on their own features [24]. The so called "cluster", is a collection of similar elements. The rule of cluster analysis is to compare the features of elements directly, and put the elements with similar qualities into one classification, and elements with different qualities into different classifications. Suppose the number of elements is n (13 in this paper), and there are m indexes (365 × 2 in this paper) for each element. The basic steps for systematical cluster analysis are the following: firstly, define the distance or similarity coefficient of each element, and the distance of classification and elements are equal. Secondly, put the element (original classification) with shortest distance into a new classification, and calculate the distance between the new classification and other ones. The next step is a repeat operation for the last step, and until all elements are put into one classification, the whole analysis will be stopped. The entire process could be illustrated by the hierarchical clustering map. In this paper, the author intends to develop a deeply data mining based on cluster analysis for the 13 sites, and try to obtain the regional distribution characteristics and its industrial influential factors. The main steps of cluster analysis based on MATLAB-R2012b can be organized as follows: Step 1: Find the similarities and dissimilarities between different variables in the data set (the monitoring data of 13 sites in this paper), and calculate the distance (Euclidean distance in this paper) through the "pdist" function in MATLAB; Step 2: Define the linkage between different variables using the "linkage" function; Step 3: Assess the former calculation effect based on the "cophenetic" function; Step 4: Establish different clusters through the "cluster" function; Step 5: Obtain the hierarchial map through "plot".

Wavelet Transform
Wavelet transformation is new mathematical method which is adopted in all kinds of areas and have been developed rapidly in recent decades. Wavelet transform refers to the use of a limited long or quick attenuated shaking wave (which is called a mother wavelet) to represent a signal [25]. The basic idea of this method is that by decomposing an original signal into a series of primitive signals which contain good frequency domain positioning, and using the characteristics of primitive signals to represent the partial features of the original signal, one can achieve the goal of time and frequency location analysis. The wavelet transform method process includes the establishment of the wavelet function, wavelet transform, and wavelet decomposition and reconfiguration.
The wavelet function is a kind of function with turbulence and quick attenuation characteristics, which is defined as: ϕ is also called as primitive wavelet, and its dilation and translation can form a function set: is the branch wavelet, and a is the scaling factor which can reflect the cycle, and b is time factor, which describes the translation in time. As to the time series the branch wavelet in Equation (2) can be transformed as: is called a wavelet coefficient, which is the output of f(t) through unit impulse response filter, and it contains the information of a and b. As for the decomposition of wavelet, it can be described by the tree map shown in Figure 2. We adopted the "Mallat" algorithm to decompose the original signal (f(t)), if the length of the signal is N, then we will obtain the low frequency signal a1 (N/2) and the high frequency signal d1 (N/2) after the level 1 decomposition. After that, we would decompose the low frequency signal a1 in level 2, which would give us the low frequency signal a2 (N/4) and high frequency signal d2 (N/4). Similarly, we will get one low frequency signal and several high frequency signals in the last level decomposition, just as Figure 2 shows. The original signal can be described as the set: In this process, all the low frequency signals contain the information about the periodic changes, and the high frequency signals contain the information about the sudden changes. Thus, we can utilize the wavelet analysis to figure out the features of ρ(PM2.5) in both yearly (periodic) change and sudden change. As for the monitoring data in Xi'an city, if we analyze the original data directly, the yearly change characteristic would be difficult to find because of the huge amount data of 13 sites and 24 months (2013 and 2014); if we adopt the mean value, some specific information would be neglected, like some important sudden change points; similarly, if the Fourier Transform (FT) method of FT is introduced, although the frequency features of PM2.5 concentration could be obtained, we cannot get the accurate time corresponding to these frequencies. As a both time and frequency analysis tool, wavelet analysis can provide us some confident results, as assured by relevant studies [20]. Thus, wavelet analysis is a suitable tool to figure out the characteristics of PM2.5 concentration in time series.
The Daubechies (db) wavelet, which contains a compactly supported biorthogonal wavelet, presents good performance in time and frequency analysis. In this paper, we utilize the db wavelet to explore the yearly change and sudden change features of PM2.5 concentration. The db6 wavelet would be used to find out the yearly change features at level 4 (for its good performance in periodic change analysis), and the db1 wavelet would be adopted to illustrate the sudden feature changes at level 3 (for its impressive effect in sudden change analysis), and the reconstructed coefficient at level 1 and level 2 would be observed to get the sudden change points.
In this paper, we would utilize the "wavelet toolbox" in MATLAB-R2012b to conduct the analysis. Figure 3 shows the analysis platform for the yearly change using the db6 wavelet in Level 4, where we can choose different kind of wavelets in different levels, to obtain different results.

Distribution of ρ(PM2.5) in Time and Space
According to seasonal and the climatic characteristics in Xi'an city, each season should be arranged as follows: Spring contains March, April, and May; Summer holds June, July, and August, Autumn takes September to November; and Winter lasts from December to February in the next year. The distribution situation of PM2.5 concentrations for the different seasons in Xi'an city is illustrated in Figure 4. We can learn from the concentrations shown in the Figure 4 that the concentration of all sites in 2013 is higher than that in 2014, which matches the information contained in Table 1. For different seasons, the highest concentration is in Winter, when the specific mean value reaches 260 μg/m 3 ; Autumn is the second worst season, with a mean concentration 170 μg/m 3 ; and the concentration of PM2.5 is the lowest in Summer with a mean value 75 μg/m 3 .
For the statistic standard deviation for the two years, the value of 2013 is obviously higher than that in 2014, showing that the concentration of PM2.5 in 2013 is much more turbulent than that in 2014. For the standard deviation in different seasons, the value in Winter which is nearly three times that in Summer, is the highest, and the value in Autumn and Spring are located between those two extremes. This situation is similar to the seasonal distribution of PM2.5 concentration. In 2013 (as Figure 5 shows), the concentration of PM2.5 is higher in January and February than any other months, and the highest center is located in Cao Tan district. The concentration value is continually decreasing in March, April and May, and the highest center is located at the High-voltage Switch Factory in March and April. The concentration in June is the lowest in the whole year, and the mean value for downtown is 60.36 μg/m 3 , which is below the national standard (75 μg/m 3 ). The situation in July is similar to that in June, and the mean value is a little bit higher. The concentration value in August, September and October is continually increasing, and the situation of September is definitely different with that in other months-the highest center is located in Lintong district. The spatial distribution of PM2.5 is nearly the same for November and December, and the value in December again gets near the highest concentration value. The standard deviation shows that the concentration in June is relatively stable in the whole year. In 2014 (as Figure 6 shows), the overall conditions is similar to those in 2013, but there are several differences: the month of the lowest concentration value is July, and the lowest standard deviation month is September.

Regional Distribution Characteristic of ρ(PM2.5)
Cluster analysis is a versatile tool in this study, which can illustrate the inner relationships among behind the monitoring data of the 13 sites. Figure 7 shows the hierarchical clustering map of the 13 sites. Through this map, we can obtain the different clusters, and figure out why the 13 sites could be put into these clusters. If we define the Euclidean distance is 18, we can get three clusters from the 13 sites based on the PM2.5 concentration monitoring data. The reason why we choose 18 as the optimal distance is that, under this condition, the coefficient of goodness of the analysis is 0.6761, which is the best we can get. We can see from the map that Textile city is Cluster 1, and High-tech Western is Cluster 2, and the rest of the sites belong to Cluster 3, which includes the High-voltage Switch Factory district, Xingqing district, Xiaozhai, People's Stadium, Economic development district, Chang'an district, Yanliang dstrict, Lintong district, Qujiang cultural group, Guangyun Tan, and Cao Tan. Interestingly, we can learn from Figure 1 that, Yanliang district and Lintong district are located far from the other sites, but the cluster analysis similarly puts them into one cluster. Textile city is very near the other sites, but it is put into a cluster alone. This result reveals that the regional distribution of PM2.5 is not very related to the geographical locations, which means the climate influence on the regional distribution is not obvious. The most direct impact factor to this result could be the regional industrial production mode and activities. According to the cluster analysis result, we checked out some relevant information, and obtained the main representative industrial activities characteristics of the three clusters, which are collected in Table 2.
We know that Cluster 1 and 2 are often the highest center of ρ(PM2.5), which is determined by their industrial activities and production structure. For Cluster 3, it is usually not the highest center of ρ(PM2.5), and the possible reasons for this fact may be that, the proportion of agriculture and tourism in this cluster is quite large. In some specific sites, like Xiaozhai, where the commercial activities are developed, the ρ(PM2.5) at the site may often be the highest center. However, the specific degree of industrial impacts on ρ(PM2.5) regional clusters still needs further study.

Time Series Analysis of PM2.5 Concentration
Wavelet analysis is regarded as an excellent method for time series analysis. Figures 8 and 9 describe the result of the wavelet analysis through db6 on the 4th level, and the low frequency signal result contains the yearly change characteristics.  Figure 8(a), and the reconstructed low-frequency signal is illustrated in Figure 8(b). The yearly change characteristics can be obtained directly from the curve: The highest concentration months are January, February and December, and the lowest month is June. The change trend and tendency of PM2.5 concentration in the whole year is shown very clearly in Figure 8. Also, the extent of turbulence of the change is contained in the figure, in which the extent of turbulence from April to July is relatively small.
The situation for 2014 is revealed in Figure 9, where we can learn that the overall yearly change characteristics for 2014 are similar to those for 2013. However, the highest concentration months in 2014 are January and February, which is a little different with the situation in 2013. This result is consistent with the information revealed in Figures 3 and 4, which indicates that wavelet analysis is applicable and suitable for studying yearly change of PM2.5 concentration.  Figures 10 and 11, respectively, based on the db1 wavelet. The reconstructed signal at the 2nd level is shown in Figure 10(a), and the reconstructed signal on the 1st level is shown in Figure 10(b). We can obtain from the signal curve on the 2nd level that, there are seven points where the amplitude of the reconstructed coefficients are very big, which means the changes are sudden.  We can obtain the specific dates through zooming in the details in the figure-12th, 42nd, 57th,  71st, 300th, 351st, and 358th. The corresponding dates are: 12 January, 11 February, 26 February,  12 March, 27 October, 17 December, and 24 December. There are two sudden change points in Figure 11, and the dates are 16 January and 2 February. There are lots of factors that can induce sudden ρ(PM2.5) changes, for instance, fireworks, gas heating, as well as dust explosions, etc. Fireworks can cause an instant increase of ρ(PM2.5) at a single site, but not in the whole city. Similarly, for gas heating, the fired coal can bring about a ρ(PM2.5) increase in a time period, but not on a single day. Dust explosions are a kind of accident, which happens occasionally. We have searched the historical records in Xi'an city for the two years, finding that no dust explosions happened. For the considerations above, the human induced factors are not considered in this paper, and we focus on the climate factors that may cause the sudden changes in PM2.5 concentration, like temperature, wind speed, and barometric pressure. For this purpose we obtained the meteorological data for the sudden change points in Table 3. The data comes from the historical climate records system in Xi'an city. The pollution degrees for the nine sudden change points are all severe. The observation of the meteorological data suggests that the impact of temperature on PM2.5 concentration is not so obvious, and still needs further study. What we can decide is that, cloudy days and low wind (<3) are the main meteorological factors for the sudden changes of ρ(PM2.5).

Discussion
We try to explain the distribution in time and space of PM2.5 in Xi'an city geographical locations and meteorological features. Xi'an city is located in the center of China, and belongs to the Semi-humid continental monsoon climate region. A dry and dusty Spring may account for the high concentration of PM2.5 in Spring; While in Summer, there is lots of rainy weather and windy days, which can increase the sedimentation ability of the air and help the diffusion of PM2.5; In Autumn, low temperatures and winds are very common, both of which are not beneficial for the diffusion of PM2.5; The fired coal use for heating in Winter may be the leading factor for the high concentration of PM2.5 in December.
Several scholars have tried to find out the distribution of PM2.5 in China. The distribution characteristics of PM2.5 and PM10 in Beijing are an example in this case [26], in which the authors find that the order from high to low of monthly mean concentrations of PM2.5 in Beijing is April, February, March, and January; while in this study, the order is January, February, March, and April. The reason for this difference may be the different geographical locations and meteorological factors. Studies of spatial distribution of PM2.5 are seldom found. Arc GIS was once used to figure out the distribution map of PM10 in the national level in China [23]. The advantages of this method is that the concentration contours can be drawn directly on the map, and the disadvantage is that all the layer cannot match very well, which leads to some blank areas. In this paper, the strength of the method is utilizing the data as much as possible, even though the monitoring sites are distant from each other, but we cannot develop the distribution contour on a single city map.
In the exploration of the regional distribution of PM2.5, we adopt the cluster analysis method. For the convenience of analysis, we divide the 13 sites in three clusters, and analyze the correlations between the ρ(PM2.5) and the industrial activities in each cluster. The results show that Cluster 1 and Cluster 2 which with highest population and diverse industrial activities are often the center of highest ρ(PM2.5).
The characteristic ρ(PM2.5) yearly changes and sudden changes have been studied by the wavelet transform method, and some relevant factors have been analyzed. The results show that ρ(PM2.5) is high in Winter and Spring, and low in Summer and Autumn, which shows the credibility of this method. Meteorological factors are considered to explain sudden changes, and cloudy weather and low wind are the main inducements for the change. The wavelet transform had previously been used in the time series analysis of ρ(PM10) in Xi'an city in 2001 and 2002 [27]. In that study the author pointed out that the reason for the high ρ(PM10) is the combined effect of dusty weather, city construction (low wind, temperature inversion, etc.), and meteorological conditions. The result of this study is similar to the conclusion of PM10 distribution, which reveals that wavelet transform is a suitable tool for the analysis. Additionally, since AQI has only been adopted by the Chinese government for a short time period, and the monitoring of PM2.5 is not comprehensive both in time and space, it means the knowledge about it lacks depth as relevant materials are rare to find. All these can bring produce limitations in the study. Further studies should perform a regression analysis between ρ(PM2.5) and temperature, industrial production value, and wind speed, to figure out the specific correlation of these factors with ρ(PM2.5). If the data is enough, the change cycle in the time aspect of ρ(PM2.5) could also be studied.

Conclusions
This study develops a study of the PM2.5 concentration monitoring data of Xi'an city in 2013 and 2014, and the main conclusions are drawn as follows: (a) The temporal distribution characteristic of PM2.5 are that the concentration is higher in Winter, then in Autumn, then in Spring, and concentration in Summer is the lowest. The spatial distribution characteristics of PM2.5 are that the highest concentration center is often located in Caotan, High-tech Western, and Textile city.
(b) Cluster analysis reveals that the cluster of PM2.5 distribution is not related to the geographical locations; the concentration of PM2.5 in different clusters is the result of industrial activities and the proportion of agriculture.
(c) Wavelet transform illustrates that the characteristics of yearly change and sudden change of PM2.5 can be obtained by this method, and the main meteorological factors of sudden change are low wind and cloudy weather.