Surface Water Quality Assessment and Contamination Source Identiﬁcation Using Multivariate Statistical Techniques: A Case Study of the Nanxi River in the Taihu Watershed, China

: Understanding the spatiotemporal patterns of water quality is crucial because it provides essential information for water pollution control. The spatiotemporal variations in water quality for the Nanxi River in the Taihu watershed of China were evaluated by a water quality index (WQI) and multivariate statistical techniques; additionally, the potential sources of contamination were identiﬁed. The data set included 22 water quality parameters collected during the monitoring period from 2015 to 2020 for 14 monitoring stations. WQI assessment revealed that approximately 85% of monitoring stations were classiﬁed as “medium-low” water quality, and most showed continuous improvement in water quality. Cluster analysis divided the 14 monitoring stations into three clusters (low contamination, medium contamination and high contamination). Discriminant analysis identiﬁed pH, petroleum, volatile phenol, chemical oxygen demand, total phosphorus, F, S, fecal coliform, SO 4 , Cl, NO 3 -N, total hardness, NO 2 -N and NH 3 as important parameters affecting spatial variations. Factor analysis identiﬁed four potential contamination source types: nutrient, organics, feces and oil. This study demonstrated the usefulness of multivariate statistical techniques in assessing large data sets, identifying contamination source types, and better understanding spatiotemporal variations in water quality to restore and protect water resources.


Introduction
Deterioration of the water environment is a prominent problem in worldwide watershed management and seriously threatens the security of the water ecological environment [1]. Natural factors (such as climate, topography, geology and soil) and human activities (such as urbanization, industrial production and agricultural practice) affect the surface water quality of an area [2][3][4][5]. The seasonal changes in precipitation, hydrological conditions and stream runoff have marked effects on stream flow and the consequent pollutant concentrations in surface water [1,[6][7][8]. Dynamic spatiotemporal assessment of water quality can be used to analyze water contamination problems, identify potential contamination source types, and provide information support and reference to effectively manage water resources [3].
To effectively prevent and control surface water contamination, reliable water quality data for in-depth research is necessary. Considering the spatiotemporal variation in the physicochemical and biological characteristics of surface water, a long-term monitoring plan to accurately assess water quality should be developed [9]. Environmental protection departments in China have established sound water quality monitoring networks and continuous water quality monitoring procedures that monitor the physical properties (e.g., temperature, pH and electrical conductivity, etc.), total organic components, nutrients and inorganic components, as well as the biological and microbial conditions. In water quality assessments, multiple water quality parameters are typically collected at multiple monitoring stations in different monitoring periods, and this process generates a complex data matrix [10]. Due to the potential multivariable correlations among monitoring stations, monitoring periods and water quality parameters, this complex data set is often challenging to analyze and explain [11][12][13]. In a comprehensive assessment of water quality, the challenge is to determine whether the changes in water quality should be attributed to the contamination of rivers by human activities or biogeochemical changes in natural processes [14]. Furthermore, the water quality parameters that can best describe the spatiotemporal changes and identify contamination source types should be determined.
The water quality index (WQI) is a useful method for evaluating the change and trend of water environment quality by synthesizing multiple original parameters to a single index [10]. As a water quality assessment model, WQI determines the relative weight of each parameter based on its importance in water environment protection and integrates multiple variables into a dimensionless variable to represent the comprehensive water quality status and grade [15][16][17][18]. WQI has played an increasingly crucial role in the water quality assessment of rivers, lakes and groundwater [19][20][21][22][23][24].
With the increased number and dimension of measurement parameters in samples, the problem of allocating unknown samples and mining valuable information becomes increasingly complex. Therefore, using multivariate statistical techniques and data reduction simultaneously to obtain satisfactory results is necessary [10]. Multivariate statistical techniques, such as cluster analysis (CA), discriminant analysis (DA), principal component analysis (PCA) and factor analysis (FA) have been widely applied to evaluate water quality, can simplify data dimensions from complex water quality data matrices and remove redundant information without losing valuable information [8,[25][26][27][28][29]. Multivariate statistical techniques can identify spatiotemporal patterns of water quality and analyze the possible factors causing spatiotemporal variations in water quality and affecting the health of water ecosystems [1,30].
The Nanxi River in the Taihu watershed, as a rapidly urbanized area in China, is experiencing high disturbance from human activities and serious water contamination problems [31]. The spatiotemporal variations in surface water quality and the identification of contamination source types are critical for sustainable watershed water quality management. However, studies focusing on the identification of contamination source types in the Nanxi River are limited. The primary aims of this study are as follows: (1) to evaluate the contamination levels of different monitoring stations and periods to examine the spatiotemporal distributions of water quality using WQI; (2) to extract the clustering information of monitoring stations, and determine the most important classification variables for the spatial variations in water quality; and (3) to analyze the potential impact factors of water quality in three regions with different contamination levels and explore possible contamination source types (natural processes or human activities).

Study Area
The Nanxi River (119 • 08 -119 • 36 E, 31 • 1 -31 • 41 N) is the main river in the western part of the Taihu watershed in China ( Figure 1). The total extension of the study area is 1535.87 km 2 and includes 39% farmland, 23% water area, 22% forestland and 16% builtup land. It belongs to the subtropical monsoon climate zone, with an average annual temperature of 16 • C and an average annual precipitation of 1147 mm, 70% of which occurs in the rainy season from May to October. The area comprises low mountains, hills, plain polders and other landform types, with elevations of 1-702 m. The main types of soil are paddy soils, yellow-brown soils, and yellow cinnamon soils [32]. The regional zonal vegetation is an evergreen and deciduous broad-leaved mixed forest. The main crops are rice, rape, tea and sericulture, etc. [31]. occurs in the rainy season from May to October. The area comprises low mountains, hills, plain polders and other landform types, with elevations of 1-702 m. The main types of soil are paddy soils, yellow-brown soils, and yellow cinnamon soils [32]. The regional zonal vegetation is an evergreen and deciduous broad-leaved mixed forest. The main crops are rice, rape, tea and sericulture, etc. [31]. This area is relatively developed in the Taihu watershed, with a population of approximately 763,000. The area has many chemicals, synthetic materials, and mechanical, electronic and cement factories. The basin has fertile paddy soil, which is very suitable for various agricultural activities. The main contamination source types in this area include municipal and industrial wastewater, livestock and poultry breeding, planting and aquaculture. In 2019, the total amount of industrial and domestic sewage discharge was 2248.8 × 10 4 and 3244.9 × 10 4 tons, respectively, including 3043.5 tons of chemical oxygen demand (COD), 538.5 tons of ammonia nitrogen (NH3-N), and 14.8% of the industrial sewage treatment rate; the COD, total nitrogen (TN) and total phosphorus (TP) of agricultural contamination sources were 3567.3, 338.2 and 202 tons, respectively. This area is relatively developed in the Taihu watershed, with a population of approximately 763,000. The area has many chemicals, synthetic materials, and mechanical, electronic and cement factories. The basin has fertile paddy soil, which is very suitable for various agricultural activities. The main contamination source types in this area include municipal and industrial wastewater, livestock and poultry breeding, planting and aquaculture. In 2019, the total amount of industrial and domestic sewage discharge was 2248.8 × 10 4 and 3244.9 × 10 4 tons, respectively, including 3043.5 tons of chemical oxygen demand (COD), 538.5 tons of ammonia nitrogen (NH 3 -N), and 14.8% of the industrial sewage treatment rate; the COD, total nitrogen (TN) and total phosphorus (TP) of agricultural contamination sources were 3567.3, 338.2 and 202 tons, respectively. Thirteen sewage treatment plants (STPs) exist to treat domestic and industrial wastewater in the area. The maximum daily treatment capacity of these STPs is 15 × 10 4 m 3 [33].

Monitoring Stations and Water Quality Data
Water samples were collected from 14 monitoring stations (Figure 1) in the Nanxi River every month from January 2015 to November 2020. All the water samples were  [35] to ensure the quality of the data. The final data were from the Liyang Environmental Protection Bureau.
These water quality data belonged to monthly routine sampling, which reflected the daily water quality status of each monitoring station, but cannot capture the dynamics of pollutants generated by episodic events (e.g., storms and pollution leakage accidents, etc.). Therefore, if possible, additional sampling of water quality before and after episodic events will be required in the future.
We selected 22 water quality parameters for this study. The abbreviations, units and descriptive statistics of them are summarized in Table 1. Mean represents the mean value; S.D. represents the standard deviation; C.V. represents the coefficient of variation.

Water Quality Index
WQI is an effective method for water quality assessments [36]. According to the impact of each water quality parameter on human water health and its relative importance in aquatic organisms [15], different weights were assigned in Table 2 [23]. Temp, pH, EC, DO, COD Mn , COD, BOD 5 , NH 3 -N, TP, TN, Petrol, F. Coli, SO 4 , CI, NO 2 -N, NO 3 -N, TSS and T-Hard were used, and the measured values were normalized.
The formula used to calculate WQI is as follows: where n is the total number of parameters involved in the calculation, C i is the normalization factor of parameter i, and P i is the relative weight of parameter i. The minimum value of P i is 1, and the maximum weight specified is 4. These values were determined based on previous studies [23,[37][38][39].

Statistical Analysis
The water quality data sets over six years (2015-2020) were checked to eliminate possible missing and abnormal values [40]. The parameter F. Coli had missing values, which were replaced by sequence mean values. Data conforming to normal distribution are needed for most multivariate statistical techniques. Therefore, kurtosis and skewness statistics were analyzed to test whether each water quality parameter conformed to a normal distribution [5,13]. The original data showed that the kurtosis value was between −1.888 and 357.738, and the skewness value was between −1.215 and 16.671, indicating that the raw data were far from the normal distribution. Because most kurtosis and skewness values were greater than 0, the raw data were logarithmically converted (x = log10(x)) [41]. After logarithmic transformation, kurtosis and skewness were in the ranges of −0.640 to 4.577 and −2.279 to 0.096 respectively. To minimize the influence of different units and variances on the parameters, Z-scale standardization (mean value is 0, variance is 1) was performed on the data.
Cluster analysis (CA) was performed on the standardized data to explore the spatial similarity and clustering information on water quality. Principal component analysis/factor analysis (PCA/FA) was performed on the standardized data to explore possible contamination source types [9,28]. Discriminant analysis (DA) was performed on the raw data to extract the important variables reflecting the variations between groups [5]. STATISTICA 10 was used for statistical analysis.
Please refer to Supplementary Materials for the detail about the above multivariate statistical methods.

Results and Discussion
The descriptive statistics of 22 water quality parameters are summarized in Table 1.
The pH values ranged from 5.74 to 8.91, which were basically within the standard limit of 6-9 allowed by GB3838-2002. The mean values of F, S, F. Coli, and VP in most water samples were far lower than the class III standard (GB3838-2002), while that of Petrol (0.058 mg/L) was slightly higher than the class III standard (0.05 mg/L). Among nutrients, the mean value of TN was 2.87 and far higher than the class III standard (1.0 mg/L); the mean values of NO 3 -N and NO 2 -N were 0.294 and 0.063 respectively, which were far lower than the class III standard (10 mg/L); the one of NH 3 -N was 0.916 and lower than the class III standard (1.0 mg/L). TN is the sum of NO 3 -N, NO 2 -N, NH 3 -N and organic nitrogen, which is the main indicator of water eutrophication. Thus, the main nutrient in the study area was organic nitrogen. The concentration levels of COD Mn , BOD 5 and COD deserve attention because these parameters represent the levels of biological, chemical and organic contamination in surface water, respectively. The maximum values of these parameters were 15.7, 9.0 and 87.9 mg/L, respectively, all exceeding the class III standard (6, 4 and 20 mg/L, respectively). Therefore, the study area had a relatively high contamination level. The coefficients of variation for NH 3 -N, NH 3 , Petrol, S, TN and NO 2 -N were relatively high, indicating significant temporal and spatial differences in the distributions of these water quality parameters.

Water Quality Assessment Using WQI
The water quality of most monitoring stations was classified as "medium-low", accounting for approximately 84.52% (of which "medium" accounted for 64.29% and "low" accounted for 20.24%). Additionally, 13.10% of the water quality was "good", and only 2.38% was "excellent" (Figure 2). The water quality of S1 and S2 was always above "good", especially the water quality of S2, which was "excellent" in 2015 and 2018. Because these two monitoring stations are in the Daxi Reservoir and Shahe Reservoir within the urban centralized drinking water protection area, their water quality has been maintained in good condition due to the good natural ecological environment and strict contamination control measures. Other monitoring stations are in urbanized or agricultural areas.
From the interannual change trend of WQI (Figure 3), about half of the monitoring stations showed an increasing trend, most of which were generally stable. The water quality of monitoring stations S1 and S2 decreased slightly. Due to rapid urbanization and population growth, water environment security is facing increased pressure, and the protection of water sources should be further strengthened. The water quality of other monitoring stations showed continuous improvement, especially from 2016 to 2019. Since 2017, Changzhou city has adopted special actions: "two reductions" (reducing total coal consumption and backward chemical production), "six governance" (governing the Taihu Lake water environment, domestic garbage, black and smelly water bodies, livestock and poultry breeding contamination, volatile organic compound contamination, and hidden environmental dangers), and "three improvements" (improving the level of ecological protection, environmental-economic policy regulation, and environmental law enforcement and supervision) [42]. The environmental quality has been significantly improved, the total discharge of major pollutants has been markedly reduced, and the environmental risks have been effectively controlled.

Spatial Similarities and Clustering
Spatial CA generated a dendrogram, dividing the 14 monitoring stations into 3 clusters at (Dlink/Dmax) × 100 < 40 ( Figure 4). According to the physical, chemical and microbiological characteristics of water quality, each cluster was classified into its own contamination category. Cluster A included stations S1 and S2 and corresponded to low contamination. Cluster B contained six monitoring stations (S5, S6, S8, S9, S11 and S13) and was classified as medium contamination. Cluster C comprised six monitoring stations (S3, S4, S7, S10, S12 and S14) and was classified as high contamination.
In cluster A, S1 and S2 are in the Daxi Reservoir and Shahe Reservoir. The contamination of the six monitoring stations in cluster B mainly derives from nonpoint source contamination, such as agricultural runoff, livestock and poultry breeding, and fishpond drainage. The monitoring stations of cluster C are mainly located in urban areas and downstream reaches, and the possibility of water contamination is higher because of the comprehensive impacts of domestic sewage, industrial wastewater and upstream inflow water [5,7,10].
The above spatial CA results coincided with the average WQI of the monitoring stations. The WQI values of S1 and S2 were the highest; those of S3, S4, S7, S10, S12 and S14 were relatively low; those of S5, S6, S8, S9, S11 and S13 were at a medium level ( Figure 5). Thus, CA can be used to provide reliable water quality classification throughout Water 2022, 14, 778 7 of 17 monitoring stations; however, designing optimal spatial sampling strategies is warranted in the future [10,28,41,43]. From the interannual change trend of WQI (Figure 3), about half of the monitoring stations showed an increasing trend, most of which were generally stable. The water quality of monitoring stations S1 and S2 decreased slightly. Due to rapid urbanization and population growth, water environment security is facing increased pressure, and the protection of water sources should be further strengthened. The water quality of other monitoring stations showed continuous improvement, especially from 2016 to 2019. Since poultry breeding contamination, volatile organic compound contamination, and hidden environmental dangers), and "three improvements" (improving the level of ecological protection, environmental-economic policy regulation, and environmental law enforcement and supervision) [42]. The environmental quality has been significantly improved, the total discharge of major pollutants has been markedly reduced, and the environmental risks have been effectively controlled.

Spatial Similarities and Clustering
Spatial CA generated a dendrogram, dividing the 14 monitoring stations into 3 clusters at (Dlink/Dmax) × 100 < 40 ( Figure 4). According to the physical, chemical and microbiological characteristics of water quality, each cluster was classified into its own contamination category. Cluster A included stations S1 and S2 and corresponded to low contamination. Cluster B contained six monitoring stations (S5, S6, S8, S9, S11 and S13) and was classified as medium contamination. Cluster C comprised six monitoring stations (S3, S4, S7, S10, S12 and S14) and was classified as high contamination.  In cluster A, S1 and S2 are in the Daxi Reservoir and Shahe Reservoir. The contamination of the six monitoring stations in cluster B mainly derives from nonpoint source contamination, such as agricultural runoff, livestock and poultry breeding, and fishpond drainage. The monitoring stations of cluster C are mainly located in urban areas

Spatial Variations in Water Quality
Based on the CA data, discriminant analysis was used to detect the significance of the discriminant function and to identify the important variables reflecting the variation between clusters. The Wilks' lambda and chi-square values in all discriminant functions were in the range of 0.036-0.509 and 504.269-2479.317, respectively, and the p values were all less than 0.01 (Table 3), indicating that the spatial DA was valid [13].  Table 4 and Table 5 show the discriminant function and classification matrix generated from the standard, forward stepwise and backward stepwise modes of DA. The standard and forward stepwise models of the discriminant function used 22 and 21 discriminant variables, respectively, and obtained the corresponding classification matrix, which correctly assigned approximately 88% of cases. However, in the backward stepwise mode, DA generated nearly 87% of the correct allocation to the classification matrix using only 14 discrimination parameters. Spatial DA showed that pH, Petrol, VP, COD, TP, F, S, F. Coli, SO4, Cl, NO3-N, T-Hard, NO2-N, and NH3 were the critical variables to

Spatial Variations in Water Quality
Based on the CA data, discriminant analysis was used to detect the significance of the discriminant function and to identify the important variables reflecting the variation between clusters. The Wilks' lambda and chi-square values in all discriminant functions were in the range of 0.036-0.509 and 504.269-2479.317, respectively, and the p values were all less than 0.01 (Table 3), indicating that the spatial DA was valid [13].  Tables 4 and 5 show the discriminant function and classification matrix generated from the standard, forward stepwise and backward stepwise modes of DA. The standard and forward stepwise models of the discriminant function used 22 and 21 discriminant variables, respectively, and obtained the corresponding classification matrix, which correctly assigned approximately 88% of cases. However, in the backward stepwise mode, DA generated nearly 87% of the correct allocation to the classification matrix using only 14 discrimination parameters. Spatial DA showed that pH, Petrol, VP, COD, TP, F, S, F. Coli, SO 4 , Cl, NO 3 -N, T-Hard, NO 2 -N, and NH 3 were the critical variables to distinguish the water quality of the three spatial clusters and explained most of the spatial variations in expected water quality. Based on the discriminant parameters analyzed by DA, box and whisker plots of three clusters (cluster A, cluster B, cluster C) were constructed to evaluate the spatial variations in water quality ( Figure 6). Most of the parameters showed significant differences between clusters. Overall, the average concentration of cluster A was much lower than that of clusters B and C, and the average concentration of cluster C was slightly higher than that of cluster B. Higher Petrol, COD and TP values were found in cluster C, indicating that organic contamination and eutrophication were the most serious water environment problems in cluster C. Additionally, lower pH values were found at the monitoring stations of cluster C, likely because of the hydrolysis of acidic substances (ammonia and organic acids) [5]. In conclusion, the water contamination of cluster C was more serious than that of the other two clusters. Thus, the prevention and control of contamination sources and treatment capacity of point source contamination must be strengthened, such as strengthening the construction and treatment capacity of STPs.
lower than that of clusters B and C, and the average concentration of cluster C was slightly higher than that of cluster B. Higher Petrol, COD and TP values were found in cluster C, indicating that organic contamination and eutrophication were the most serious water environment problems in cluster C. Additionally, lower pH values were found at the monitoring stations of cluster C, likely because of the hydrolysis of acidic substances (ammonia and organic acids) [5]. In conclusion, the water contamination of cluster C was more serious than that of the other two clusters. Thus, the prevention and control of contamination sources and treatment capacity of point source contamination must be strengthened, such as strengthening the construction and treatment capacity of STPs.

Principal Component Determination and Contamination Source Identification
Because the contamination levels of the three spatial clusters (clusters A, B, and C) were significantly different, PCA/FA was used to identify the water contamination source types for the normalized data sets of the three spatial clusters.
PCA/FA of the three data matrices obtained six, eight and seven variance factors (VFs) with eigenvalues ≥1, explaining 71.5%, 66.8% and 67.9% of the total variance in the corresponding data sets, respectively (Tables 6-8). Additionally, the loadings of parameters on VFs were categorized as "high", "medium" and "low" based on absolute loading values of > 0.75, 0.75-0.50 and 0.50-0.30 [44]. Among the six VFs of cluster A, VF1 explained 20.7% of the total variance and had high positive loadings on Petrol, VP, TN and F. This factor indicated toxic organic contamination from farmland drainage, oily sewage discharge from ship operation, domestic sewage, industrial wastewater, atmospheric deposition and precipitation leaching. VF2 (15.3% of the total variance) had high negative loadings on SO 4 and TSS, and high positive loadings on NO 2 -N. The presence of nitrite in water indicated that the decomposition process of organic matter continued, and the risk of organic matter contamination persisted. VF3 (13.9%) had high positive loadings on NO 3 -N, Cl and S, indicating nutrients from agricultural runoff and atmospheric deposition and the natural source of soil erosion and salt ions (CI, S) in the watershed [45]. VF4 (10.3%) had high positive loadings on BOD 5 and COD Mn , representing organic contamination in sewage [6]. VF5 (6.3%) had high positive loadings on NH 3 , pH and EC. Generally, EC indicates natural contamination, which may be due to soil erosion or an increase in the number of salt ions in water [44]. Additionally, VF6 (only 5.0%) had a medium negative loading on NH 3 -N. Regarding the data set of cluster B, among the eight VFs, VF1, which accounted for 12.9% of the total variance, represented a high negative loading on F but medium positive loadings on F. Coli and S, indicating microbial contamination from municipal sewage, livestock and poultry breeding. VF2 (11.1% of the total variance) represented high negative loadings on BOD 5 and COD Mn , indicating organic contamination in urban sewage and industrial wastewater. VF3 (9.3%) represented a high positive loading on DO but a high negative loading on Temp. VF4 (9.0%) represented only a moderate positive loading on TP, revealing nutrient contamination (e.g., P), especially from sewage containing detergents, industrial wastewater and fertilizer. Point source contamination (such as wastewater from the phosphorus chemical industry) and nonpoint source contamination (such as animal breeding and agricultural fertilizer) from P, constitute common eutrophication-causing contamination in this area [46]. VF5 (7.0%) applied only a moderate positive loading on NH 3 -N, representing the contamination of animal feces and agricultural fertilizers. VF6 (only 6.5%) presented a high positive loading only on EC, likely because of the mineral composition in river water [6]. VF7 (only 5.9%) presented a high positive loading on SO 4 and a medium positive loading on TN, representing industrial wastewater using sulfate or sulfuric acid. Finally, VF8 (only 5.2%) had a high positive loading on NH 3 and pH, likely because of industrial wastewater containing alkaline substances, such as NH 3 .
Regarding the seven VFs of cluster C, VF1 (20.5% of the total variance) showed high positive loadings on NH 3 -N and TN, representing nutrient contamination from agricultural runoff, municipal sewage and fertilizer plant wastewater (e.g., N). VF2 (12.2%) showed a high positive loading on F, representing industrial wastewater containing fluoride. VF3 (10.2%) showed a high positive loading on pH. VF4 (7.5%) showed a high positive loading on Temp and a moderate negative loading on DO, contrasting the results for VF3 of cluster B. VF5 (6.7%) showed a high positive loading on Petrol, representing contamination from oily sewage discharge from ship operations and wastewater from the petrochemical industry. VF6 (5.7) showed moderate positive loadings on TSS and EC. Agricultural runoff, wastewater discharge, solid waste disposal and irrigation return increased the suspended solids loading in streams [45]. VF7 (5.0%) showed a high positive loading on SO 4 , similar to VF7 of cluster B. We have identified four contamination source types-nutrient, organics, feces and oil. Specifically, nutrient represented point source contamination, such as urban domestic wastewater and industrial wastewater from chemical fertilizer plants, and nonpoint source contamination, such as that related to agricultural activities and aquaculture. Second, organics were mainly derived from oxygen consumption and toxic organic matter from municipal sewage and industrial sewage. Third, feces were mainly derived from animal fecal drainage in the fishery and livestock breeding industries. Finally, oil represented the contamination characters from the petroleum chemical industry and oily sewage discharge from ship operation.

Conclusions
In the Nanxi River of the Taihu watershed in China, WQI and multivariate statistical techniques were used to assess the spatiotemporal variations in water quality and to identify contamination source types.
(1) The WQI findings indicated that the water quality of most monitoring stations was classified as "medium-low" and presented a continuous improvement trend. The water quality of S1 and S2 was always above "good", especially the water quality of S2, which was "excellent" in 2015 and 2018.
(2) Cluster analysis divided the 14 monitoring stations into 3 clusters of low contamination, medium contamination and high contamination.
(3) Discriminant analysis used 14 parameters (pH, Petrol, VP, COD, TP, F, S, F. coli, SO 4 , Cl, NO 3 -N, T-Hard, NO 2 -N, and NH 3 ) for important data reduction and provided an 87% correct allocation in the spatial variation analysis for the 3 clusters.
(4) PCA/FA was used to analyze the data sets of three spatial clusters and obtained six, eight and seven potential factors. The study showed that the sources of water contamination were mainly related to nutrients (livestock and poultry breeding, agricultural activities), salt ions (natural) and toxic organic contamination (urban sewage, industrial wastewater and ship operation) in cluster A; fecal coliform (livestock and poultry breeding), organic contamination (industrial and domestic sewage), temperature (natural), nutrients (point source: industrial wastewater and domestic sewage; nonpoint sources: livestock and poultry breeding, agricultural fertilizer) in cluster B; and fluoride (industrial wastewater), pH and temperature (natural), and petroleum (ship operation and industrial wastewater) in cluster C.