Long-term of River Water Quality and Pollution Source Apportionment in Taiwan

This study reports multivariate statistical techniques applied including cluster analysis to evaluate and classify the river pollution level in Taiwan, and principal component analysis-multiple linear regression (PCA-MLR) to identify the possible pollution source. Water quality and heavy metal monitoring data from Taiwan Environmental Protection Administration (EPA) was evaluated for 14 rivers in the four regions of Taiwan. The Erren River was classified as the most polluted River in Taiwan. Biochemical oxygen demand, ammonia, and total phosphate concentration in this river was the highest of the 14 rivers evaluated. In addition, heavy metal levels of the following rivers exceeded the Taiwan EPA standard limit: lead in the Dongshan, Jhuoshuei, and Xinhuwei Rivers; copper in the Dahan, Laojie, and Erren Rivers; and manganese in all rivers. Water pollution in the Erren River was estimated to originate 72% from industrial sources, 16% from domestic black water, and 12% from natural sources and runoff from other tributaries. Our research showed that PCA-MLR and the cluster analysis model accomplished our study objectives and will be helpful tools to evaluate water quality in rivers and we suggest that the continuous monitoring should be conducted to monitor water pollution from anthropogenic activities.


Introduction
Surface water quality is a matter of critical concern in developing countries due to growing population, rapid industrialization, urbanization, and agricultural modernization [1]. Rivers are the most vulnerable of all water bodies to pollution because of their role in carrying agricultural run-off and municipal and industrial wastewater [2]. Water quality experts and decision makers are confronted with significant challenges in their efforts to manage surface water resources due to these complex issues [3]. Spatial variation and source apportionment characterization of water quality parameters can provide a detailed understanding of environmental conditions and help researchers to establish priorities for sustainable water management [4].
In recent years, Taiwan's national income and standard of living has experienced great improvements following a focus on economic development [5]. Rapid industrial development in Taiwan, including increased usage of vehicles, electrical power generation, and manufacturing of food,beverages, textiles, plastic, and metal, has affected pollution levels and other environmental problems, specifically water pollution [6]. In 1999, the Taiwan Environmental Protection Administration (EPA) reported that 16 percent (2088 kilometers) of the total length of Taiwan's 21 major rivers was ranked as "seriously polluted," while another 22 percent was considered lightly and moderately polluted. The River Pollution Index (RPI) is used by Taiwan EPA to explore monitoring trends for both long term planning and day-to-day management of surface water quality. The RPI involves four parameters: dissolved oxygen (DO), biochemical oxygen demand (BOD), suspended solids (SS), and ammonia nitrogen (NH3-N). The overall index is divided into four pollution levels (non-polluted, lightly polluted, moderately polluted, and severely polluted) [7]. Previous research has used the RPI to evaluate the pollution levels of the following rivers: the Tanshui River [8,9], Kaoping River [10], Chuo-shui River, Beigang River, Jishui River , Agongdian River, and Sichong River [11] in Taiwan, and the Mahmoudia Canal in Egypt [12].
The application of multivariate statistical analysis as cluster analysis, principal component analysis (PCA), and source apportionment by multiple regression on principal components for the interpretation of these complex data matrices provide a detailed understanding of water quality and the ecological status of the studied systems [13,14]. In addition, this analysis provides the identification of possible pollution sources that affect the water systems and offers a valuable tool for reliable management of water resources and determination of potential solutions to pollution problems.
As shown in Fig. 1, this study was conducted in three phases. First, water quality and heavy metal data for 14 representative Taiwan rivers from 2002 to 2016 was evaluated and compared with the Taiwan EPA standards. Second, the contamination levels of the 14 rivers were classified and the most polluted river was determined. Third, the possible pollution source apportionment was identified for the major factors affecting water quality in the most polluted river.

Study Area dan data collection
The subtropical island of Taiwan has 151 major and minor rivers with a total length of 3,717 kilometers. Most rivers flow down from high mountains in short and steep courses [15]. Fig. 2 displays the 14 major rivers that were selected for analysis of their water quality and heavy metal concentrations: in Northern Taiwan (Dahan, Danshuei, Jilong, and Laojie Rivers), Eastern Taiwan (Dongshan River), Central Taiwan (Jhuoshuei, Wu, and Xinhuwei Rivers), and Southern Taiwan (Erren, Gaoping, Jishuei, Puzi, Yanshuei, and Zengwun Rivers). Table 1 shows information on the 14 representative rivers with 142 total monitoring stations in the major tributaries and main river in each area. Water quality data was provided by Taiwan EPA for each monitoring site from 2002 until 2016 for all four seasons. The sampling procedures were conducted according to standard operational procedures summarized in Supplementary Table S1.

Statistical Methods
The multivariate statistical method was used, including cluster analysis to classify the rivers based on the RPI data; Pearson's correlation analysis method between the RPI and water quality parameters (water temperature, air temperature, conductivity, nitrate, SS, DO, BOD, COD, ammonia, TP, and TOC) to know the selected; and PCA-MLR for source apportionment analysis. These three multivariate analyses were performed using SPSS 22.0 for Windows (IBM Corp., Armonk, NY, USA, 2013).    Hierarchical agglomerative cluster analysis was performed on the normalized RPI data set by Ward's method, using squared Euclidean distances as a measure of similarity. Ward's method uses an analysis of variance approach to evaluate the distances between clusters in an attempt to minimize the sum squares of any two clusters that can be formed at each step. The linkage distance is reported as Dlink/Dmax, which represents the quotient between the linkage distance for a particular case divided by the maximal distance, multiplied by 100, as a way to standardize the linkage distance represented on the y-axis [16][17][18]. The data used for cluster analysis are water quality data of 14 rivers from 2002 to 2016.

Source Apportionment Analysis
PCA is a dimension-reduction technique that provides information on the most significant factors with a simpler representation of the data. It is generally used for data structure determination, and to provide qualitative information about potential pollution sources. However if this method used alone, it cannot determine the quantitative contributions of the identified pollution sources in each variable [19]. Correlation analysis using Pearson's analysis was determined to select the high correlation between RPI and water quality parameters that would be inputted into the PCA analysis.
Kolmogorov-Smirnov (K-S) statistics were used to test the goodness-of-fit of the data to lognormal distribution. To examine the suitability of the data for PCA, the Kaiser-Meyer-Olkin (KMO) and Bartlett's Sphericity tests were applied on the prepared dataset. In the KMO test, a value that is closer to 1 indicates high validity while < 0.7 indicates invalid analysis. Bartlett's Sphericity test was used to check the null hypothesis that the inter-correlation matrix comes from a population in which the variables are uncorrelated. For this study, the null hypothesis was rejected due to a significance level > 0.05 [20]. These two tests required the selected water quality data to be fitted before PCA interpretation. In addition, rotated variables with factor loading > 0.7 are considered relevant and indicate a possible emission source. Next, MLR was applied to determine the percentage of contribution for each pollution source [19,21]. In linear regression, the sum of each parameter standardization was defined as a dependent variable, and the absolute principal component score as an independent variable. Figure 3 shows the RPI dataset stacked plot from 2002 to 2016 for the 14 rivers. Between 2002 and 2016, for the total length of the four rivers in northern Taiwan on average 15% was severely polluted, 60% was moderately polluted, and 25% was lightly polluted; the average percentage for the total length of the one river in eastern Taiwan was 14% moderately polluted and 86% lightly polluted; the average percentage for the total length of the three rivers in central Taiwan was 49% moderately and 51% lightly polluted; the average percentage for the total length of the six rivers in southern Taiwan was 18% severely, 59% moderately, and 23% lightly polluted. Recently in 2016, 65% of the total length of the rivers in Taiwan was classified as moderately polluted and 35% of the total length of the rivers in Taiwan was classified as lightly polluted.  Heavy metal concentrations for the 14 rivers are summarized in Table 2. The overall mean concentrations of heavy metals were found in the following order Hg<As<Cd<Se<Ag<Cr<Pb<Cu<Zn<Mn. In addition, the highest concentrations of heavy metals were found in the following rivers, Pb in Dongshan River, Jhuoshuei River, and Xinhuwei river; Cu in Dahan river, Laojie river, and Erren river; and Mn in all rivers are exceeded the Taiwan EPA limit.

Cluster Analysis
Fig . 5 shows the result of cluster analysis of the water quality variation tendencies among the target monitoring sites. Three significant groups (p<0.01) were classified comprehensively based on the RPI value of each river. Cluster 1, was classified as lightly polluted and included the Dongshan River, Wu River, and Zengwun River. Cluster 2, was classified as moderately polluted and included the Dahan River, Danshuei River, Jilong River, Jhuoshuei River, Xinhuwei River, Gaoping River, Jishuei River, Puzi River, and Yanshuei River. The Erren River was classified in Cluster 3 as severely polluted. These results were fit comparing with the RPI value of each river in Supplementary Table  S3.  Table 1 for the names associated with the river codes) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 August 2018 doi:10.20944/preprints201808.0415.v1 Figure 5. Dendrogram of RPI for 14 Taiwan representative rivers (see Table 1 for the names associated with the river codes)

Source Identification and Apportionment
This study analyzed source apportionment for the most polluted river, the Erren River. The result of Pearson's correlation analysis is shown in Table 3. The significant water quality parameters correlated with RPI were -0.376 for conductivity, -0.719 for DO, 0.621 for BOD, 0.339 for COD, 0.512 for SS, 0.533 for coliform, 0.587 for ammonia, 0.402 for TP, 0.383 for TOC, -0.301 for nitrate, and 0.308 for nitrite. BOD, DO, SS, and ammonia are correlated with RPI because the RPI value is calculated according to the concentration of these 4 parameters. However, other parameters including conductivity, COD, coliform, TP, TOC, nitrate, and nitrite also had strong correlation with RPI. Therefore, it can be assumed that these water quality parameters significantly affect river pollution levels. These significant factors were selected for further PCA analysis. In addition, conductivity shows high correlation with nitrate (0.564, p<0.01) indicating nitrate acts as an electrolyte along with organic matter in water [22].
The KMO and Barlett's tests result indicated that PCA could be applicable. Fig. 6 shows that there were three components in the scree plot defining the most dominant component among all variances. Table 4 shows the three selected components have a 62.3% total cumulative-of-variance percentage. Table 5 shows extracted principal components that performed on the correlation matrix between different parameters followed by varimax rotation. The bold and marked value indicates the dominant parameter in each factor.
The MLR results in Table 6 show that all regression results were significant (p<0.01). Factor 1, accounting for 72% of the total variance, had strong and positive loadings on ammonia and TP. High concentrations of TP and ammonia in surface water could come from various sources, including municipal and industrial effluent [13]. Factor 2, accounting for 16% of the total variance, possibly originates from domestic black water factors related to the high loadings on coliform, BOD, COD, TOC, and nitrite [23,24]. Factor 3, accounting for 12% of the total variance, had high and positive loadings on conductivity, nitrate, and SS, and thus was interpreted as a mineral component of river water or runoff precipitation [25]. The source apportionment of this study is reliable since PCA-MLR showed the good fitted receptor model (R2>0.7). The contribution percentage of each factor is shown in Fig. 7 Table 3. Correlation matrix between RPI and other water quality data in the Erren River ** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed

Discussion
Higher population density in Taiwan has been demonstrated to be a significant source of domestic water pollution. Wastewater from agriculture, farming, and urban activities can also be major pollution sources causing diverse problems such as toxic algal blooms, loss of oxygen, fish kills, loss of biodiversity (including species important for commerce and recreation) and loss of aquatic plant beds and coral reefs (Carpenter et al., 1998). In addition, the waste from approximately 7 million swine being raised in Taiwan must be disposed of, even though domestic swine farms are gradually being reduced in size after Taiwan joined the World Trade Organization. Beside domestic pollution sources, industrial wastewater also was a major water pollution source. During the last three decades, Taiwan developed into a large trading economy with nearly 11,000 manufacturing plants that need to dispose of various contaminants (Chinn, 1979).
Our study used cluster analysis to classify the pollution level of major rivers using huge-scale data in one country. Previous research has shown that cluster analysis is useful for classifying rivers that have similar water quality characteristics. For example, (Shrestha & Kazama, 2007) reported that cluster analysis results represented the influence of land use, residential sewage, agricultural activities, and industrialization that can have a major impact on water quality. Another study grouped monitoring sites in rivers in South Florida into three groups (low, moderate, and high pollution) based on their similar water quality characteristics (Hajigholizadeh & Melesse, 2017).
In this study, the Erren River was determined to be the most polluted river of the other major rivers in Taiwan. We identified that the most significant water pollutants originated from industrial activity, domestic black water, runoff from other rivers, and natural sources, including climate conditions. Because the Erren River is located in an industrialized and urbanized area (Lee et al., 2013), the level of water pollutants is very high due to huge amounts of nutrient salts, including organic pollutants, ammonia, and total phosphate associated with possible pollution sources in Factor 1 of our study. (Aneja et al., 2008) reported that ammonia was found in industrial gas emissions or natural sources that evaporated and became particulate matter, and then descended with precipitation and entered surface water. In addition, the urbanized areas with high population density in Tainan show that domestic wastewater is a major contributor to river water pollution because of the levels found of BOD and DO, which show a strong correlation with coliform levels that are associated with domestic wastewater (Vega et al., 1998). Runoff from other rivers can be due to flash floods that often occur in southern Taiwan following typhoons throughout the year. For example, in August 2009, Taiwan experienced the worst floods in 50 years after Typhoon Morakot struck almost the entire southern region. (C. P. Yang, Yu, & Kao, 2012) analyzed the impact of climate change on river water quality in the southern area of Taiwan. Huge amounts of sediments and debris flowed into the Erren river basin caused by the high concentration of suspended sediment in the river, which caused the failure of wastewater treatment plants. Therefore, the river received higher significant SS, BOD, and ammonia loads from farms and domestic wastewaters. During the dry season, the evapotranspiration rate will increase, which may contribute to increased water salinity. But during the wet season, precipitation increases and runoff from other tributaries will bring SS or nitrate content to the river. Therefore, we assume that climate conditions might become one issue affecting water quality in river. This study explored pollution sources only identified and considered using multivariate statistical analysis of all seasons water quality data. However, pollution levels vary during seasons; therefore, further study is necessary to analyze in detail from different seasons due to the season can affect water quality level. In addition, some water quality variables might affected by soil types, geological conditions, terrain, and anthropogenic pollution sources (Wu et al., 2010); further work is necessary to determine if these potential sources do significantly impact the rivers in Taiwan.
Our study found that the levels of heavy metal contamination in the Erren River were classified as among the highest in Taiwan. Since the 1970s, the development of a scrap-metal industry along the Sanyegong river (a tributary of the Erren River) has caused the river sediment to be severely polluted with metals (EPA, 2001). (Y. C. Chen et al., 2004) determined that concentrations of As, Cd, Zn, Hg, and Cu in the Erren River were higher than other rivers, and that Cu levels exceeded the standard limit. This high heavy metal contamination problem has affected the river ecology and biota. Another previous study in 2002 reported that the highest concentrations of Fe, Zn, Cu and Mn in muscles were found in tilapia, striped mullet, large-scaled mullet and milkfish. The highest concentrations of As and Hg were found in striped mullet and Indo-Pacific tarpon. The highest concentrations of Fe, Hg and Cd were found in livers of large-scaled mullet, while striped mullet had the highest concentration of Zn, Cu and As. Our data in 2002, the As, Cu, Hg, and Zn level in the Erren River was high than other years. It was indicated that the high level of mentioned heavy metal may affected the biota. However, the trend of heavy metal in Erren River is decreasing with years since Taiwanese government started a river restoration program in 2002. The restoration program formed the implementation team, uniting the Water Resources Agency, Industrial Development Bureau, Construction and Planning Agency, Council of Agriculture, Tainan City Government, Kaohsiung City Government, the river patrol team and other units to make joint efforts towards improving the Erren River's water quality. Through combining governmental and private efforts over the long term, the Erren River's water quality continues to improve.

Conclusions
In this study, cluster analysis was successfully helped to classify the water quality of 14 of Taiwan's rivers and PCA-MLR conducted to determine possible pollution sources for the most polluted river in Taiwan. According to cluster analysis, the most severe water quality pollution problem was in the Erren River in southern Taiwan. According to PCA-MLR result, 62.3% of water pollutant in Erren River was contributed from ammonia and TP as the first factor; DO, BOD, COD, nitrite, and coliform as second factor; conductivity, nitrate, and SS as third factor. First factor was estimated around 72% from industrial emission, 16% from domestic black water, and 12% from natural source and runoff from another tributary.
Water quality monitoring programs generate the complex multidimensional data that need multivariate statistical treatment for their analysis and interpretation to get better information about the quality of surface water which can help the environmental managers to make better decisions regarding action plans. The management of domestic and industrial wastes is required to be low accumulation in river to minimize environmental degradation. This should be achieved by installing proper treatment of municipal and industrial wastewater before being released to the environment.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: Water Quality and Heavy Metal Monitoring Methods, Table S2: Taiwan EPA water quality standards for heavy metal content in surface water, Table S3: River Pollution Index Descriptive Statistic Table   Author Contributions