Identification of regional air pollution characteristic and the correlation with public health in Taiwan.

This study aims to classify regions with different air pollution characteristics into groups in Taiwan, and further to evaluate and compare the air quality of various groups. A selected multivariate analysis technique, cluster analysis, is applied to the pollution monitoring dataset which including PM10, SO2, NO2, CO and O3. The obtained results have proved that the regions with similar air pollution characteristic can be appropriately grouped by applying cluster analysis. All 22 regions are classified into six groups, and the pollution pattern for each group is characterized as: Group 1 (high SO2/NO2; low PM10), Group 2 (high PM10), Group 3 (high SO2/PM10), Group 4 (low SO2/NO2/CO; high O3), Group 5 (low CO/NO2; high O3) and Group 6 (low PM10/SO2/NO2/O3/CO). Results from air quality evaluation indicate that the regions in group 6 (Ilan, Hualien and Taitung) have the best air quality while the regions in group 3 (Kaohsiung and Kaohsiung City) have the worst air quality in Taiwan. The results from correlation analysis reveal that incidence of the respiratory system disease is significantly positively correlated with pollution of NO2 and CO at 99% confidence level.


Introduction
Due to the complexity and large variance of environmental sets, common statistic methods are not sufficient for assessment of pollution state [1]. Multivariate statistical methods for classification, modeling and interpretation of large datasets from environmental monitoring programs allow the reduction of the dimensionality of the data and the extraction of information [2]. The application of multivariate analysis methods such as cluster analysis, principal component analysis and factor analysis is therefore recommendable, and becoming popular in environmental studies dealing with measurements and monitoring [3]. With regard to application of cluster analysis in environmental studies, Chen et al. employed hierarchical cluster analysis (HCA) to classify the Cu, Ni, Pb, and Zn concentrations in soil samples which were collected from 30 urban parks in Beijing City [4]. Their results indicated that the location of the parks appears to affect the heavy metal concentrations in the soil samples greatly. Zhang used multivariate analyses and GIS to classify the chemical elements in urban soils and to identify elements influenced by human activities in Ireland. Cluster analysis (CA) and principal component analysis (PCA) were applied to classify the elements into two groups: the first group predominantly derived from natural sources, the second being influenced by human activities [5]. Owega et al. identified long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques [6]. They found that both techniques illustrate the cleaner nature of northerly and northwesterly transport patterns in comparison to southerly and southwesterly ones, as well as the effect of near stagnant air masses. Li and Shue applied data mining to uncover the hidden knowledge of air pollution distribution in the voluminous data retrieved from monitoring stations in TAQMN [7]. The cluster analysis in their study was used for data pattern identification. Simeonov et al. presented the application of different multivariate statistical approaches for the interpretation of a large and complex data matrix obtained during a monitoring program of surface waters in Northern Greece. In the study, CA was used for site similarity analysis [8]. Facchinelli et al. used PCA and CA to predict potential non-point heavy metals sources in soil on the regional scale [9].
This study aims to identify the air pollution characteristic of all 22 cities (counties) in Taiwan and to evaluate the correlation between air pollution and public health. A selected multivariate analysis technique, cluster analysis, is applied to the pollution monitoring dataset which including PM 10 , SO 2 , NO 2 , CO and O 3 . The obtained results allowed to determine groups of cities (counties) with similar pollution characteristic, and to compare air quality among these groups. Moreover, correlation analysis is also used to analyze the correlation between air pollution and disease of the respiratory system.

Materials
Taiwan area is comprised by 7 cities and 15 counties as shown in Fig 1. The population and human activities are mainly centered in the west of Taiwan. The dataset which contains the yearly average concentration values of five selected air pollutants (PM 10 , SO 2 , NO 2 , CO and O 3 ) in Taiwan's 22 cities (counties) is quoted from "Air Quality Annual Report Taiwan Area in 2004" [10]. The report is based on the data of the Taiwan Air Quality Monitoring Network (TAQMN), which is operated by the Environmental Protection Administration, Taiwan.

Cluster Analysis
Cluster analysis is an exploratory data analysis technique for solving classification problems. This technique comprises an unsupervised classification procedure that involves measuring either the distance or the similarity between objects to be clustered. The information obtained from the measured variables is used to reveal the natural clusters existing between the studied samples. Objects are grouped in clusters in terms of their similarity, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. The initial assumption is that the nearness of objects in the space defined by the variables reflects the similarity of their properties [2,11,12]. In our study, a hierarchical clustering by applying complete linkage method as the amalgamation rule and the squared Euclidean distance as metric were performed. Statistical calculations were performed by using the SPSS ® software package, and the map of clustering results was produced using ArcView ® software.
The similarities in this case were quantified through the squared Euclidean distance measurement. The distance between two objects (regional air pollution characteristics), i and j, is given as [3]: the Euclidean distance, Z ik : the standardized value of X ik , Z jk : the standardized value of X jk , m : the number of pollutant kinds

Pollution Level
In this study prior to cluster analysis, the descriptor variables (concentrations of PM 10 , SO 2 , NO 2 , CO and O 3 ) were standardized by means of z-scores to avoid any effects of units scale on the distance measurements by applying the equation [3]: The standardized value (Z ik ) of pollutant concentration in this case can be defined as "the pollution level of pollutant k in region i". Z ik >0 means that the pollution level is higher than the average value of all regions (μ k ), and the pollution state is relatively poor. On the contrary, Z ik <0 means that the pollution level is comparatively low. Z ik =0 means that the pollution state is at the average level.

Air Pollution Characteristic Analysis
In this study, the air pollution characteristic is constituted by the pollution levels of five air pollutants, i.e. PM 10 , SO 2 , NO 2 , CO and O 3 . The air pollution characteristic for region i (C i ) consequently can be represented as: Furthermore, the air pollution characteristic for group t is defined as: [ ]

Cluster Analysis Results
The dendrogram, Fig 2, reveals the results obtained from using hierarchical complete linkage clustering method and squared Euclidean distance as a criterion of similarity. The all 22 regions in Taiwan can be classified into two major groups: from Taoyuan to Taipei City and from Ilan to Hualien as presented in Fig 2. Note that the second major group which is formed by Ilan, Taitung and Hualien is characterized by the biggest Euclidean distance to the other groups. This group corresponds to the cleanest area of Taiwan. The first major group is composed of the other 19 regions. The associations among regions in this group are quite complex, and these regions can be further classified into five subgroups. Thus, all 22 regions can totally be classified into six groups: • Group 1: Taoyuan, Hsinchu City, Keelung and Taipei County.

Air Pollution Characteristics
The air pollution characteristics for all six groups are shown in Fig. 4, and are discussed as follows. The scale in the figure denotes the pollution level of pollutant k for group t (Z tk ).

Group 1
For group 1, the pollution levels of SO 2 and NO 2 are greater than 0 (i.e. the average level of all regions) while that of PM 10 is far lower than 0. Hence, this group may be characterized as high SO 2 /NO 2 pollution and low PM 10 pollution.

Group 2
For group 2, the pollution levels of PM 10 , NO 2 , CO and O 3 are all located in the range of 0~1. The high PM 10 pollution state in this group is especially noticeable, and may be characterized as high PM 10 pollution. Fig. 4 indicates that the pollution levels of all five pollutants for group 3 are greater than 0, in which pollution states of SO 2 (Z tk =2.4) and PM 10 (Z tk =1.2) are the highest among all six groups. Note that Z tk values of NO 2 and O 3 are also greater than most other groups. However, they are resulted from especially high NO 2 pollution state in Kaohsiung City (Z tk =1.4) and high O 3 pollution state (Z tk =1.2) in Kaohsiung County respectively. This group is characterized as high SO 2 and high PM 10 pollution.

Group 4
The pollution levels of SO 2 , NO 2 and CO are under 0 while PM 10 and O 3 are both greater than 0. As presented in Fig. 4, the Z tk value of CO is the smallest among six groups. This group may be characterized as low SO 2 /NO 2 /CO pollution and high O 3 pollution.

Group 5
For group 5, the pollution levels of CO (Z tk =1.8) and NO 2 (Z tk =1.1) are both the highest among six groups. The pollution level of O 3 , however, is relatively low (Z tk =-0.8). Hence, this group may be characterized as low CO/NO 2 pollution and high O 3 pollution.

Group 6
This group has the smallest Z tk values of PM 10 (-1.5), SO 2 (-1.5), NO 2 (-1.6) and O 3 (-1.2). The Z tk value of CO is also lower than most other groups except for group 4. The air pollution characteristic for this group may be located as low PM 10 /SO 2 /NO 2 /O 3 /CO.

Correlation Analysis Results
The Pearson correlation coefficient measures the strength of a linear relationship between two quantitative variables. Pearson's correlation coefficients between air pollution and public health in Taiwan are depicted in Table  1. The results reveal that incidence of the respiratory system disease is significantly positively correlated with pollution of NO 2 and CO at 99% confidence level. While it shows only weak positive correlations with PM 10 and SO 2 pollution. Relevant studies by Guo [13,14] also indicated that higher outdoor air pollution level, especially traffic-related pollutants, NO x and CO, was associated with asthma prevalence in school children.

Conclusions
This study aims to classify regions with different air pollution characteristics into groups in Taiwan, and further to evaluate and compare the air quality of various groups. The obtained results have proved that the regions with similar air pollution characteristic can be appropriately grouped by applying cluster analysis. All 22 regions in Incidence of the respiratory system disease a 0.391 0.368 0.685* 0.566* -0.100 Taiwan are classified into two major groups. The first major group is formed by 19 regions, and these regions are all geographically located at western Taiwan. Due to the complex associations among regions in this group, the first major group may be further subdivided into five groups. The other three regions which are all located at eastern Taiwan form the second major group, and correspond to the cleanest area in Taiwan. By calculating group mean pollution levels, air pollution states of all six groups are characterized individually. Group 1 is characterized as high SO 2 /NO 2 pollution and low PM 10 pollution. Group 2 is high PM 10 pollution. Group 3 is high SO 2 and high PM 10 pollution. Group 4 is low SO 2 /NO 2 /CO pollution and high O 3 pollution. Group 5 is low CO/NO 2 pollution and high O 3 pollution. Group 6 is low PM 10 /SO 2 /NO 2 /O 3 /CO pollution. Besides, results of air quality evaluation find that group 6 (Ilan, Hualien and Taitung) has the best air quality while group 3 (Kaohsiung City and Kaohsiung County) has the worst air quality in Taiwan. The areas with better air quality are geographically distributed at the east of Taiwan, and those with worse air quality are located at the south part of Taiwan.
The results from correlation analysis reveal that incidence of the respiratory system disease is significantly positively correlated with pollution of NO 2 and CO at 99% confidence level.
For the sake of the similarities in this study were quantified through the squared Euclidean distance measurement, the results of air pollution characteristic analysis indicate that there still exist some differences of characteristic among regions in a same group. This may be improved through application of different clustering methods in the future work.