3.1. Temporal Similarity and Variation
Temporal hierarchical CA analysis generated a transient dendrogram and grouped the 12 months into either a two-cluster system at (Dlink
) × 100 < 20 or a three-cluster system at (Dlink
) × 100 < 16 (selecting the scale tree to (Dlink
) × 100 option in order for the tree plot to be scaled to a standardized scale). The temporal difference in water quality between the two systems was significant (Figure 2
< 0.01). In the two-cluster system, Cluster 1 (first period) covered the time from May to November while Cluster 2 (second period) ranged from December to the next April. In three-cluster system, while Cluster 1 remains the same as the previous system, Cluster 2 was further divided into two clusters: Cluster 2 for April, December and March and Cluster 3 for January to February (Figure 2
a). Such a grouping differed from the Lake Dianchi Watershed, which is located in Southwest China [33
], because the length of the wet season in the Lake Taihu Watershed (May to October) is much longer than the Lake Dianchi Watershed (July to September). In both two- and three-cluster systems, the temporal variations from river water quality data from Lake Taihu were determined by both hydrological condition (i.e.
, wet or dry season) and water pollution characteristics (Figure 2
b). For example, NH4+
–N in the first period (May to November) was much higher than the other clusters (Figure 2
c), due to the greater contribution of agricultural runoff in the wet season. However, the mean concentrations of BOD5
in the first period were close to the other two periods (Figure 2
c), because the relatively larger amount of domestic sewage was offset by the greater streamflow in the rivers.
DA was then applied to evaluate the clusters’ systems generated by the temporal CA method. The objectives of the DA were to test the significance of discriminant functions and to choose the most significant variables that contributed to the differences among clusters. For each discriminant function, the results of Wilks’ lambda and chi-square analyses varied from 0.11 to 0.63 and 668 to 1972, respectively, with p
< 0.001. This suggests that the temporal DA was reliable and effective [29
]. In the two-cluster scenario, DA produced two classification matrices (CMs) with 94% accuracy of classification using three discriminant variables: water temperature, DO and BOD5
c). In the three-cluster scenario, DA produced three classification matrices (CMs) with 86.5% accuracy using two discriminant variables (i.e.
–N), which were significantly different from each other among the three-cluster systems (Figure 2
c). For instance, the average means of the three variables (BOD5
–N) in the third period were 3.4%, 8.4% and 12.5% higher than those of the second period (Figure 2
c), respectively. The discrepancy between first and third period was similar to that between first and second period, whereas the gaps were much bigger for NH4+
–N (Figure 2
c). The two-cluster system divided the 12 months into wet and dry seasons, while the three-cluster system divided the 12 months into wet (May to November), moderate dry (March, April and December), severe dry seasons (January to February), according to local meteorological characteristics. Together, the backward stepwise DA results suggested that both two- and three-cluster systems explained the temporal similarities well. Water temperature, DO and BOD5
were the three most significant variables in discriminating the water quality condition in different seasons in both systems. Although the coefficient of variation (CV; Table 1
) could reflect the numerical variations of samples, it is inappropriate to be used to evaluate the temporal variation among different seasons in either system.
3.2. Spatial Similarity and Variation
Through spatial CA analysis, we identified clusters of similar monitoring sites considering the effects of the temporal differences in spatial CA. Spatial similarity analysis was conducted for each individual temporal cluster, as well as all of the samples combined. Our results indicate that there were no significant differences among them, and further discussion will be focused on the spatial CA for all samples’ combined data. Spatial CA produced two dendrograms with two clusters at (Dlink
) × 100 < 20 and three clusters at (Dlink
) × 100 < 8, respectively. In the two-cluster dendrogram, Cluster A covered the S2, S8, S13, S19 and S20 sites, while Cluster B covered the S1, S3 to S7, S9 to S12 and S14 to S18 sites (Figure 2
b). In the three-cluster dendrogram, Cluster B from the previous two-cluster dendrogram was further split into two clusters: new Cluster B with the S4, S7, S9, S11, S15 and S18 sites and new Cluster C with the rest of them (Figure 2
b). All classifications varied at a significance level of p
< 0.01, which meets our expectation, since the sites in the same cluster shared similar natural backgrounds and had been affected by similar sources in a similar way.
The spatial DA was performed similarly as the temporal DA (Table 2
). We performed Wilks’ lambda and the chi-square analysis on each discriminant function. The values were within a range of 0.50 to 0.61 and 523 to 689, respectively, suggesting that the spatial DA had a similar discriminatory ability as the temporal DA. The spatial DA was performed using the original dataset with 11 variables after classifying into the two major groups (A and B) obtained from the spatial CA. Sites were the dependent variables, and the measured parameters were the independent variables. Backward mode discriminant functions successfully assigned >88.6% and >81.9% of the cases into the two- and three-cluster systems, respectively (Table 2
). Moreover, the backward stepwise DA demonstrated that Petro, V-ArOH, DO, NH4+
–N and TP were also significant discriminant variables for spatial variation (Figure 2
The sites in Cluster A were situated in the highly developed area (i.e.
, Wujin District or Xishan District of Changzhou City) (Figure 3
), where most of the industrial effluents and domestic sewage flow into the rivers directly. Most of the sites in Cluster B were located in the Tiaoxi River Basin (Figure 3
). Tiaoxi River is the largest tributary of Lake Taihu, and it originates from mountainous area and moderately developed rural regions. The sites in Cluster C were located in northwestern and eastern Lake Taihu (Figure 3
) in Yixing City and Wuzhong District of Suzhou City, where the major pollution sources include both point and non-point sources.
3.3. Identification of Potential Pollution Sources
Due to the similarity between two- and three-cluster systems (see Supplementary Material, Table S2
), source identification of water pollution for the two-cluster system was only illustrated. Before conducting the PCA analysis, the Kaiser-Meyer-Olkin (KMO) and Bartlett’s sphericity tests were performed on the parameter correlation matrix. The KMO results for Clusters A and B were 0.56 and 0.52, respectively, and Bartlett’s sphericity results were 861 and 812 (p
< 0.05), indicating that PCA could be used in dimensionality reduction. PCAs were applied to standardized log-transformed datasets (11 variables) to examine the differences between Clusters A and B and to identify the latent factors. PCA with VARIMAX rotation explained 75.6% and 67.0% of the total variance in Clusters A and B, respectively (Table 3
). Such a performance of source identification was close to those for Lake Dianchi and Lake Chaohu in China [33
For Cluster A, the first varifactor (VF1), which explained 18.6% of the total variance, had only strong positive loadings on Petro and Pb, but a moderate loading on V-ArOH (Table 3
). The element Pb is mainly from electronic manufacturing and chemical industries; V-ArOH is from paper-making and chemical industries; and Petro is from equipment manufacturing, metal smelting industries and chemical industries. VF1 represented chemical pollution, which is originated from industrial wastewater and discharged into the rivers. VF2 represented domestic pollution, explaining 17.09% of the total variance, and had strong positive loadings on TP, CODMn
. VF3 could be interpreted as N-related industrial pollution, which accounted for 16.7% of the total variance and had strong positive loadings on conductivity (Cond) and NH4+
–N. VF4 and VF5 explained 13.29% and 9.97% of the total variance, respectively. VF4 had strong positive loadings on water temperature, but strong negative loadings on DO, while VF5 had only strong positive loadings on pH. VF5 was attributed to the variability from the physicochemical source and represented natural sources impacted by seasonality.
According to the Pollution Source Census Survey of Jiangsu and Zhejiang provinces, V-ArOH, identified in VF1, is majorly generated by chemical manufactures in Lake Taihu area, and approximately 35% of V-ArOH is from the Wujin District of Changzhou City (Figure 3
), a Cluster A catchment area. Major chemical manufactures in Wujin District were under tight regulation by local government, but the total V-ArOH discharge is still massive. Moreover, 89% of Pb discharge was from communication and electronic manufactures that are widely spread in the Class A catchment areas in Wujin District of Changzhou City and Xishan District of Wuxi City. Similarly, petroleum-related emissions are also from electronic manufacturing and chemical industries.
Since urbanization expansion was accompanied with population growth, domestic pollution became the primary source and is represented by VF2 (Table 3
), with the major factors of TP, CODMn
. High population density could lead to massive organic pollution without proper regulation. In the Class A catchment area, the population density reached 1148 persons/km2
, and industrial wastewater discharge density was close to 90,000 tons/km2
annually (Figure 3
) Moreover, agricultural runoff of TN was 2.5 tons/km2
annually. Therefore, domestic pollution in this area is becoming a serious issue and should be carefully considered within the pollution control plan in the future.
Further analysis of VF3 from historic statistical data suggests that nitrogen (N) emission was primarily from industrial wastewater. In the Cluster A catchment area, 55% of NH4+–N was from industrial wastewater, which is moderately correlated with V-ArOH, 31% from agricultural runoff and 14% from domestic sewage. Since the combined N emissions affect electronic conductivity, the potential pollution source VF3 could also be explained as N-related industrial pollution.
For Cluster B, VF1 (accounting for 27.14% of the total variance) had strong positive loadings on CODMn
, TP, NH3+
–N, conductivity and BOD5
, which represent the combination of point and non-point sources. For instance, domestic wastewater discharge per area in Cluster B was only 34,000 tons/km2
annually, which is less than half that in Cluster A (Figure 3
). However, agricultural runoff of TN was up to 2.9 tons/km2
annually, close to the intensity of that in Cluster A (Figure 3
). As previously mentioned, VF2 explained 14.3% of the total variance and had positive loadings on water temperature, but strongly negative on DO. It represented natural sources impacted by seasonal change and hydrological conditions (Singh et al.
]; Zhou et al.
]). VF3 (13.77% of the total variance) was positively weighted by petroleum-related pollutions and had negative loadings on pH. Previously, we demonstrated that petroleum-related emissions were from multiple industries and represented the intensity of industrial development in a certain area. VF4 explained 11.7% of the total variance and had strong positive loadings on Pb and V-ArOH. Similar to the Cluster A area, VF4 was categorized as a chemical-related industrial pollution factor. Pb and V-ArOH discharges were mainly from electronic and chemical manufactures. However, it was still considered as an independent impact factor due to continuing economic growth and industrial development in this area.
Analysis of the major factors and main pollution patterns in the highly-polluted area (Class A area) and the moderately-polluted area (Class B area) revealed that there were significant differences between these two groups. The Cluster A area was severely impacted by the heavy chemical industries. Recently, the Cluster A area performed better in controlling pollution under strong regulations and pollution control. Pb and V-ArOH levels were basically below the detection limits. Nevertheless, due to the huge total discharge amount, stronger regulations and pollution controls would be still required to reduce emission amounts from chemical manufacturing, electronic and communication manufactures, compared to current conditions. The most prominent source in the Cluster B area was domestic pollution, whereas industrial pollution, as an independent major factor, could not be ignored either. Moreover, since natural conditions are the second important pollution factor in this area, different seasons with distinct precipitation would result in different lake water quality.