Long-Term River Water Quality Trends and Pollution Source Apportionment in Taiwan

: The application of multivariate statistical techniques including cluster analysis and principal component analysis-multiple linear regression (PCA-MLR) was successfully used to classify the river pollution level in Taiwan and identify possible pollution sources. Water quality and heavy metal monitoring data from the Taiwan Environmental Protection Administration (EPA) was evaluated for 14 major rivers in four regions of Taiwan with the Erren River classiﬁed as the most polluted river in the country. Biochemical oxygen demand (6.1 ± 2.38), ammonia (3.48 ± 3.23), and total phosphate (0.65 ± 0.38) mg/L concentration in this river was the highest of the 14 rivers evaluated. In addition, heavy metal levels in the following rivers exceeded the Taiwan EPA standard limit (lead: 0.01, copper: 0.03, and manganese: 0.03) mg/L concentration: lead-in the Dongshan (0.02 ± 0.09), Jhuoshuei (0.03 ± 0.03), and Xinhuwei Rivers (0.02 ± 0.02) mg/L; copper: in the Dahan (0.036 ± 0.097), Laojie (0.06 ± 1.77), and Erren Rivers are (0.05 ± 0.158) mg/L; manganese: in all rivers. A total 72% of the water pollution in the Erren River was estimated to originate from industrial sources, 16% from domestic black water, and 12% from natural sources and runoff from other tributaries. Our research demonstrated that applying PCA-MLR and cluster analysis on long-term monitoring water quality would provide integrated information for river water pollution management and future policy making.


Introduction
Surface water quality is a matter of critical concern in developing countries because of growing population, rapid industrialization, urbanization, and agricultural modernization [1]. Of all water bodies, rivers are the most vulnerable to pollution because of their role in carrying agricultural run-off and municipal and industrial wastewater [2]. Water quality experts and decision makers are confronted with significant challenges in their efforts to manage surface water resources due to these complex issues [3]. Spatial variation and source apportionment characterization of water quality parameters can provide a detailed understanding of environmental conditions and help researchers to establish priorities for sustainable water management [4].
In recent years, the national income and standard of living in Taiwan have considerably improved following the nation's focus on economic development [5]. Rapid industrial development in Taiwan, including increased vehicle use, electrical power generation, and manufacturing of food, beverages, textiles, plastic, and metal, has affected pollution levels and other environmental problems, specifically water pollution [6]. In 1998, the Taiwan Environmental Protection Administration (EPA) reported that 16% (2088 km) of the total length of Taiwan's 21 major rivers was ranked as severely polluted, while another 22% were considered lightly and moderately polluted [7].
The Taiwan EPA uses the river pollution index (RPI) to explore monitoring trends for both long-term planning and day-to-day management of surface water quality. The RPI involves four parameters: dissolved oxygen (DO), biochemical oxygen demand (BOD), suspended solids (SS), and ammonia nitrogen (NH 3 -N). The overall index is divided into four pollution levels as follow: non-polluted, lightly polluted, moderately polluted, and severely polluted [8]. Previous research has used the RPI to evaluate the pollution levels of the following rivers: the Tanshui River [9,10], Kaoping River [11], Chuo-shui River, Beigang River, Jishui River, Agongdian River, and Sichong River [12] in Taiwan, and the Mahmoudia Canal in Egypt [13].
The application of multivariate statistical analysis for cluster analysis, principal component analysis (PCA), and source apportionment by multiple regression on principal components provides a detailed understanding of water quality and the ecological status of the studied systems for improved interpretation of these complex data matrices [14,15]. Such analyses also facilitate the identification of possible pollution sources that affect the water systems and offers a valuable tool for reliable management of water resources and the determination of potential solutions to pollution problems.
As shown in Figure 1, this study was conducted in three phases with three main objectives: (1) evaluate and compare water quality and heavy metal data of 14 major Taiwan rivers with the Taiwan EPA standards; (2) classify the contamination level of those 14 major Taiwan rivers then determine the most polluted river; and (3) identify the major possible pollution source apportionment affecting water quality in the most polluted river.

Study Area and Data Collection
The subtropical island of Taiwan has 151 major and minor rivers with a combined total length of 3717 km. Most rivers flow down from high mountains in short and steep courses [16]. Figure 2 displays the 14 major rivers that were selected for analysis of their water quality and heavy metal concentrations: Dahan, Danshuei, Jilong, and Laojie rivers in Northern Taiwan; Dongshan River in Eastern Taiwan; Jhuoshuei, Wu, and Xinhuwei rivers in Central Taiwan; and Erren, Gaoping, Jishuei, Puzi, Yanshuei, and Zengwun rivers in Southern Taiwan. These 14 rivers were selected because those rivers are the biggest in each area and the most polluted rivers according to the previous Taiwan EPA report. Table 1 shows information on the 14 major rivers and monitoring stations. Water quality data from 2002 until 2016 were provided by Taiwan EPA for each monitoring site in winter (December-February), spring (March-May), summer (June-August), and fall (September-November). The sampling procedures were conducted according to standard operational procedures summarized in Supplementary Table S1.

Statistical Methods
Cluster analysis was used to classify the rivers based on the RPI data. Pearson's correlation analysis was used to test for linear correlation between the RPI and water quality parameters (water  temperature, air temperature, conductivity, nitrate, SS, DO, BOD, COD, ammonia, TP, and TOC). PCA-MLR was used to determine source apportionment. These three multivariate analyses were performed using SPSS 22

Cluster Analysis
Hierarchical agglomerative cluster analysis was performed on the normalized RPI data set by Ward's method, using squared Euclidean distances as a measure of similarity. Ward's method uses an analysis of variance approach to evaluate the distances between clusters in an attempt to minimize the sum squares of any two clusters that can be formed at each step. To standardize the linkage distance represented on the y-axis, the linkage distance is reported as D link /D max , where D link represents the quotient between the linkage distance for a particular case and D max is the maximal distance multiplied by 100 [18][19][20]. The data used for cluster analysis are water quality data of 14 rivers from 2002 to 2016.

Source Apportionment Analysis
Principal component analysis (PCA) is a dimension-reduction technique that provides information on the most significant factors with a simple representation of the data. It is generally used for data structure determination and for obtaining qualitative information about potential pollution sources. However, if used alone, PCA cannot determine the quantitative contributions of the identified pollution sources in each variable [21]. Correlation analysis using Pearson's analysis was thus employed to select the high significant correlation (r > 0.3 with p < 0.01) between RPI and water quality parameters that would be inputted into PCA. The r > 0.3 indicates strong correlation among parameters and p < 0.01 signs the highly significant result [22].
Kolmogorov-Smirnov (K-S) statistics were used to test the goodness-of-fit of the data to log-normal distribution. To examine the suitability of the data for PCA, the Kaiser-Meyer-Olkin (KMO) and Bartlett's Sphericity tests were applied on the prepared dataset. In the KMO test, a value closer to 1 indicates high validity while a value <0.7 indicates an invalid analysis. Bartlett's Sphericity test was used to check the null hypothesis that the inter-correlation matrix comes from a population in which the variables are uncorrelated. For this study, the null hypothesis was rejected due to a significance level >0.01 [23]. These two tests required the selected water quality data to be fitted before PCA interpretation. If these two tests fulfil the requirement, we need to consider the total variance while a value >60%. Rotated variables with factor loading >0.7 are considered relevant and indicate a possible emission source. Next, multiple linear regression (MLR) was applied to determine the percentage of contribution for each pollution source [21,24]. In linear regression, the sum of each parameter standardization was defined as a dependent variable and the absolute principal component score as an independent variable. , for the four rivers in Northern Taiwan, on average 15% were severely polluted, 60% were moderately polluted, and 25% were lightly polluted. The average percentages for the rivers in Eastern Taiwan were 14% moderately polluted and 86% lightly polluted. The average percentages for the three rivers in Central Taiwan were 49% moderately and 51% lightly polluted. The average percentages for the six rivers in Southern Taiwan were 18% severely polluted, 59% moderately polluted, and 23% lightly polluted. In 2016, 65% of the total number of rivers in Taiwan were classified as moderately polluted, while the remaining 35% were classified as lightly polluted. Therefore, according to the average percentages above, from 2002-2016, the rivers in Northern, Eastern, Central, and Southern Taiwan had moderate, highest, moderate, and lowest water quality levels, respectively. Water quality and heavy metal evaluation in each river were interpreted using the average of the monitoring sampling station data in one river. Figure 4 displays a boxplot of river water quality variables for the 14 rivers. Each boxplot displays the six-number summary of a set of data, including the minimum, first quartile, median, third quartile, maximum, and outliers' value. Overall, the central and southern rivers in Taiwan had the highest levels of pH, DO, and SS, while BOD, ammonia, and TP were the highest in the Erren River.

Evaluation of River Water Quality and Heavy Metal Data
Heavy metal concentrations for the 14 rivers are summarized in Table 2 which displays total sample (N), long-term average value (mean), and standard deviation (SD). The overall mean concentrations of heavy metals were found in the following order: Hg < As < Cd < Se < Ag < Cr < Pb < Cu < Zn < Mn. In addition, the highest concentrations of heavy metals, all of which exceeded the Taiwan EPA standard limit (Table S2), were found in the following rivers: Pb in the Dongshan, Jhuoshuei, and Xinhuwei Rivers; Cu in the Dahan, Laojie, and Erren Rivers; and Mn in all rivers. Water quality and heavy metal evaluation in each river were interpreted using the average of the monitoring sampling station data in one river. Figure 4 displays a boxplot of river water quality variables for the 14 rivers. Each boxplot displays the six-number summary of a set of data, including the minimum, first quartile, median, third quartile, maximum, and outliers' value. Overall, the central and southern rivers in Taiwan had the highest levels of pH, DO, and SS, while BOD, ammonia, and TP were the highest in the Erren River.
Heavy metal concentrations for the 14 rivers are summarized in Table 2 which displays total sample (N), long-term average value (mean), and standard deviation (SD). The overall mean concentrations of heavy metals were found in the following order: Hg < As < Cd < Se < Ag < Cr < Pb < Cu < Zn < Mn. In addition, the highest concentrations of heavy metals, all of which exceeded the Taiwan EPA standard limit (Table S2), were found in the following rivers: Pb in the Dongshan, Jhuoshuei, and Xinhuwei Rivers; Cu in the Dahan, Laojie, and Erren Rivers; and Mn in all rivers.

River Name
Pb (mg/L) As (mg/L) Cd (mg/L) Cr (mg/L) Zn (mg/L)   Taiwan (see Table  1 for the river names associated with the river codes). Figure 5 shows the result of cluster analysis of the water quality variation tendencies among the targeted monitoring sites. Three significant groups (p < 0.01) were classified comprehensively based on the RPI value of each river. Cluster 1, classified as lightly polluted, included the Dongshan, Wu, and Zengwun rivers. Cluster 2, classified as moderately polluted, included the Dahan, Danshuei, Jilong, Jhuoshuei, Xinhuwei, Gaoping, Jishuei, Puzi, and Yanshuei rivers. Cluster 3, classified as severely polluted, included the Erren River. These results were fit in comparison with the RPI value of each river presented in Supplementary Table S3.  Taiwan (see Table 1 for the river names associated with the river codes).   Table 1 for the river names associated with the river codes).

Source Identification and Apportionment
This study analyzed source apportionment for the most polluted river, namely, the Erren River. The result of Pearson's correlation analysis is shown in Table 3. The significant water quality parameters correlated with RPI were −0.376 for conductivity, −0.719 for DO, 0.621 for BOD, 0.339 for COD, 0.512 for SS, 0.533 for coliform, 0.587 for ammonia, 0.402 for TP, 0.383 for TOC, −0.301 for nitrate, and 0.308 for nitrite. Biochemical oxygen demand, DO, SS, and ammonia were correlated with RPI because the RPI value was calculated according to the concentration of these four parameters. However, other parameters including conductivity, COD, coliform, TP, TOC, nitrate, and nitrite also had strong correlation with RPI. Therefore, these water quality parameters can be assumed to have a significant impact on river pollution levels. These significant factors were thus selected for further PCA analysis. Conductivity, likewise, showed high correlation with nitrate (0.564, p < 0.01) indicating that nitrate acts as an electrolyte along with organic matter in water [25].
The result indicated that PCA could be applied due to KMO's value being 0.76 and Barlett's test significance being 0.00. Figure 6 shows that there were three components in the scree plot defining the most dominant component among all variances. The three selected components in the dash mark will be defined as the three factors in the principal components which have a 62.3% total cumulative of variance percentage (Table 4). It means 62.3% variability in water quality data has been modelled by the extracted factor. Thus, it indicates that this model is properly acceptable to continue to the next step. The extracted varimax rotation of principal components among the selected parameters from correlation analysis is shown in Table 5. The bold and marked values indicate the dominant parameters in each factor.
The MLR results in Table 6 show that all regression results were significant (p < 0.01). Factor 1, accounting for 72% of the total variance, had strong and positive loadings on ammonia and TP. High concentrations of TP and ammonia in surface water can come from various sources, including municipal and industrial effluent [14]. Factor 2, accounting for 16% of the total variance, possibly originates from domestic black water factors related to the high loadings on coliform, BOD, COD, TOC, and nitrite [26,27]. Factor 3, accounting for 12% of the total variance, had high and positive loadings on conductivity, nitrate, and SS, and thus was interpreted as a mineral component of river water or runoff precipitation [28]. The source apportionment of this study is reliable because PCA-MLR showed the good fitted receptor model (R 2 > 0.74). The contribution percentage of each factor is shown in Figure 7.  Table 1 for the river names associated with the river codes).

Source Identification and Apportionment
This study analyzed source apportionment for the most polluted river, namely, the Erren River. The result of Pearson's correlation analysis is shown in Table 3. The significant water quality parameters correlated with RPI were −0.376 for conductivity, −0.719 for DO, 0.621 for BOD, 0.339 for COD, 0.512 for SS, 0.533 for coliform, 0.587 for ammonia, 0.402 for TP, 0.383 for TOC, −0.301 for nitrate, and 0.308 for nitrite. Biochemical oxygen demand, DO, SS, and ammonia were correlated with RPI because the RPI value was calculated according to the concentration of these four parameters. However, other parameters including conductivity, COD, coliform, TP, TOC, nitrate, and nitrite also had strong correlation with RPI. Therefore, these water quality parameters can be assumed to have a significant impact on river pollution levels. These significant factors were thus selected for further PCA analysis. Conductivity, likewise, showed high correlation with nitrate (0.564, p < 0.01) indicating that nitrate acts as an electrolyte along with organic matter in water [25].
The result indicated that PCA could be applied due to KMO's value being 0.76 and Barlett's test significance being 0.00. Figure 6 shows that there were three components in the scree plot defining the most dominant component among all variances. The three selected components in the dash mark will be defined as the three factors in the principal components which have a 62.3% total cumulative of variance percentage (Table 4). It means 62.3% variability in water quality data has been modelled by the extracted factor. Thus, it indicates that this model is properly acceptable to continue to the next step. The extracted varimax rotation of principal components among the selected parameters from correlation analysis is shown in Table 5. The bold and marked values indicate the dominant parameters in each factor.
*: Correlation is significant at the 0.05 level (2-tailed); **: Correlation is significant at the 0.01 level (2-tailed).     The MLR results in Table 6 show that all regression results were significant (p < 0.01). Factor 1, accounting for 72% of the total variance, had strong and positive loadings on ammonia and TP. High concentrations of TP and ammonia in surface water can come from various sources, including municipal and industrial effluent [14]. Factor 2, accounting for 16% of the total variance, possibly originates from domestic black water factors related to the high loadings on coliform, BOD, COD, TOC, and nitrite [26,27]. Factor 3, accounting for 12% of the total variance, had high and positive loadings on conductivity, nitrate, and SS, and thus was interpreted as a mineral component of river water or runoff precipitation [28]. The source apportionment of this study is reliable because PCA-MLR showed the good fitted receptor model (R 2 > 0.74). The contribution percentage of each factor is shown in Figure 7.

Discussion
The increasing population density in Taiwan is a significant source of domestic water pollution. Wastewater from agriculture, farming, and urban activities can also be major pollution sources causing diverse problems, such as toxic algal blooms, loss of oxygen, fish kills, loss of biodiversity (including species important for commerce and recreation), and loss of aquatic plant beds and coral reefs [26]. In addition, despite the decreasing number of domestic swine farms in Taiwan after it joined the World Trade Organization, approximately 7 million swine are still being raised in the country and their waste must be disposed of. Aside from domestic pollution sources, industrial wastewater is a major water pollution source as well. During the last three decades, Taiwan developed into a large trading economy with nearly 11,000 manufacturing plants disposing various contaminants [29]. Moreover, the location of industrial area and high population density in Taiwan is scattered. In other words, industrial and high population areas around Taiwan rivers are not only located in the downstream areas, but also in the upstream areas (see Supplementary Figure S1). Thus, water pollution in Taiwan rivers is spread along the river (see Supplementary Figure S2). We interpreted the water quality and heavy metals characteristics for each river that could be used by the Taiwan government to plan proper river management strategy.
Our study used cluster analysis on large-scale data in one country to classify the pollution level of major rivers. Previous research has shown that cluster analysis is useful for classifying rivers that have similar water quality characteristics. For example, Shrestha and Kazama 2007 [18] reported that cluster analysis results represent the influence of land use, residential sewage, agricultural activities, and industrialization, which can have major impacts on water quality. Another study grouped monitoring sites in rivers in South Florida into three groups (low, moderate, and high pollution) on the basis of their similar water quality characteristics [30].
In the current study, Erren River was determined to be the most polluted river among the other major rivers in Taiwan. The most significant water pollutants were identified to have originated from industrial activity, domestic black water, runoff from other rivers, and natural sources, including climate conditions. Given that the Erren River is located in an industrialized and urbanized area [31], the level of water pollutants in the river is very high due to huge amounts of nutrient salts, including organic pollutants, ammonia, and total phosphate. These pollutants are associated with possible

Discussion
The increasing population density in Taiwan is a significant source of domestic water pollution. Wastewater from agriculture, farming, and urban activities can also be major pollution sources causing diverse problems, such as toxic algal blooms, loss of oxygen, fish kills, loss of biodiversity (including species important for commerce and recreation), and loss of aquatic plant beds and coral reefs [26]. In addition, despite the decreasing number of domestic swine farms in Taiwan after it joined the World Trade Organization, approximately 7 million swine are still being raised in the country and their waste must be disposed of. Aside from domestic pollution sources, industrial wastewater is a major water pollution source as well. During the last three decades, Taiwan developed into a large trading economy with nearly 11,000 manufacturing plants disposing various contaminants [29]. Moreover, the location of industrial area and high population density in Taiwan is scattered. In other words, industrial and high population areas around Taiwan rivers are not only located in the downstream areas, but also in the upstream areas (see Supplementary Figure S1). Thus, water pollution in Taiwan rivers is spread along the river (see Supplementary Figure S2). We interpreted the water quality and heavy metals characteristics for each river that could be used by the Taiwan government to plan proper river management strategy.
Our study used cluster analysis on large-scale data in one country to classify the pollution level of major rivers. Previous research has shown that cluster analysis is useful for classifying rivers that have similar water quality characteristics. For example, Shrestha and Kazama 2007 [18] reported that cluster analysis results represent the influence of land use, residential sewage, agricultural activities, and industrialization, which can have major impacts on water quality. Another study grouped monitoring sites in rivers in South Florida into three groups (low, moderate, and high pollution) on the basis of their similar water quality characteristics [30].
In the current study, Erren River was determined to be the most polluted river among the other major rivers in Taiwan. The most significant water pollutants were identified to have originated from industrial activity, domestic black water, runoff from other rivers, and natural sources, including climate conditions. Given that the Erren River is located in an industrialized and urbanized area [31], the level of water pollutants in the river is very high due to huge amounts of nutrient salts, including organic pollutants, ammonia, and total phosphate. These pollutants are associated with possible pollution sources in Factor 1 of our study. Aneja et al. 2008 [14] reported that ammonia is found in industrial gas emissions or natural sources that evaporate and become particulate matter, and then descend with precipitation and enter surface water. Urbanized areas with high population density in Taiwan also show that domestic wastewater is a major contributor to river water pollution because of the levels of BOD and DO, which show a strong correlation with coliform levels that are associated with domestic wastewater [27]. Runoff from other rivers can be due to flash floods that often occur in Southern Taiwan following typhoons throughout the year. For example, in August 2009, Taiwan experienced the worst floods in 50 years after Typhoon Morakot struck almost the entire southern region. Yang et al. 2012 [32] analyzed the impact of climate change on river water quality in the southern area of Taiwan. High amounts of sediments and debris flowed into the Erren River basin because of the high concentration of suspended sediments in the river, which in turn caused the failure of wastewater treatment plants. Therefore, the river received significantly higher SS, BOD, and ammonia loads from farms and domestic wastewaters. During the dry season, the evapotranspiration rate increase, which may contribute to the increased water salinity. However, during the wet season, precipitation increases and runoff from other tributaries brings SS or nitrate content to the river. Therefore, we assume that climate conditions are one of the factors affecting water quality in rivers.
This study only explored pollution sources that were identified and considered using multivariate statistical analysis of water quality data for all seasons. However, pollution levels vary every season. Therefore, further study is necessary to analyze in detail how different seasons can affect the water quality level. In addition, some water quality variables might be affected by soil types, geological conditions, terrain, and anthropogenic pollution sources [33]. Further work is necessary to determine if these potential sources do significantly impact the rivers in Taiwan.
Our study found that the levels of heavy metal contamination in the Erren River are classified as among the highest in Taiwan. Since the 1970s, the development of a scrap metal industry along the Sanyegong River (a tributary of the Erren River) has severely polluted the river sediment with metals [34]. Chen et al. 2004 [35] determined that concentrations of Fe, As, Cd, Zn, Hg, and Cu in the Erren River were higher than those in other rivers, and that Cu levels exceeded the standard limit. This high heavy metal contamination problem has affected the river ecology and biota. They reported that the highest concentrations of Fe, Zn, Cu, and Mn in muscles were found in tilapia, striped mullet, large-scaled mullet, and milkfish. The highest concentrations of As and Hg were found in striped mullet and Indo-Pacific tarpon. The highest concentrations of Fe, Hg, and Cd were found in the livers of large-scaled mullet, while striped mullet had the highest concentrations of Zn, Cu, and As. Our data in 2002 revealed that As, Cu, Hg, and Zn levels in the Erren River were the highest compared to the other years, indicating that the high levels of the mentioned heavy metals may have affected the biota. However, the trend of heavy metals in the Erren River has been decreasing since the Taiwanese government started a river restoration program in 2002. The restoration program formed an implementation team that united the Water Resources Agency, Industrial Development Bureau, Construction, and Planning Agency, Council of Agriculture, Tainan City Government, Kaohsiung City Government, the river patrol team, and other units to make joint efforts toward improving the water quality of the Erren River. Through the combined efforts of the government and private entities over the long term, the Erren River's water quality is continuing to improve.

Conclusions
In this study, cluster analysis was successfully utilized to classify the water quality of 14 Taiwan rivers and PCA-MLR was conducted to determine the possible pollution sources for the most polluted river in Taiwan. According to the cluster analysis, the most severe water quality pollution problem can be found in the Erren River in Southern Taiwan. According to the PCA-MLR results, 62.3% of water pollutants in the Erren River were contributed by ammonia and TP as the first factor; DO, BOD, COD, nitrite, and coliform as the second factor; and conductivity, nitrate, and SS as the third factor. An estimated 72% of the first factor was found to be from industrial emission, 16% from domestic black water, and 12% from natural sources and runoff from another tributary.
Water quality monitoring programs generate complex multidimensional data that require multivariate statistical treatment for analysis and interpretation to obtain better information about the quality of surface water. Such information can help environmental managers make better decisions regarding action plans. The management of domestic and industrial wastes should strive for low accumulation in rivers to minimize environmental degradation. This objective can be achieved by installing proper treatment methods for municipal and industrial wastewater before being released to the environment.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4441/10/10/1394/ s1, Table S1: Water Quality and Heavy Metal Monitoring Methods, Table S2: Taiwan EPA water quality standards for heavy metal content in surface water, Table S3: River Pollution Index Descriptive Statistic Table, Figure S1: Laojie river map with industrial area and total population data; Figure S2: Heavy metal distribution in Laojie River (sampling locations S1-S7 are in order from upstream to downstream)