Integrated Application of Multivariate Statistical Methods to Source Apportionment of Watercourses in the Liao River Basin, Northeast China

Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011) on water quality in the Liao River system (China). Cluster analysis (CA) classified the 12 months of the year into three groups (May–October, February–April and November–January) and the 66 sampling sites into three groups (groups A, B and C) based on similarities in water quality characteristics. Discriminant analysis (DA) determined that temperature, dissolved oxygen (DO), pH, chemical oxygen demand (CODMn), 5-day biochemical oxygen demand (BOD5), NH4+–N, total phosphorus (TP) and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA) and positive matrix factorization (PMF) identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics.


Introduction
Surface water quality and aquatic ecosystems have been seriously impacted by complex human activities and natural processes at both river and basin scales, including domestic wastewater, industrial sewage, runoff, land reclamation, oil development, mining exploitation, atmospheric deposition and climate change [1][2][3][4]. Because of the complexity of river water environments [5,6] and obvious differences in regional pollution characteristics [2,4,5], regulators and experts of environmental protection face severe challenges in preventing and controlling water pollution. Accordingly, understanding the spatial and temporal patterns in the hydrochemistry of river water [1,7], extracting the most useful information from complicated monitoring data [2] and identifying the major sources of regional water pollution [3,5] can aid regulators in establishing priority measures for the efficient conservation and restoration of river water resources and aquatic ecosystems.

Study Area and Monitoring Sites
The Liao River basin, including the majority of Liaoning Province and parts of the Inner Mongolia Autonomous Region and Jilin and Hebei Provinces, covers an area of approximately 21.96 × 10 4 km 2 , extending from 40 • 30 to 45 • 10 N and 117 • 00' to 125 • 30' E [21]. The Liao River basin is located in the temperate and warm temperate belt and is the monsoon climate [9]. The mean annual temperature in this area is 9.4 • C; January is the coldest month during the year, and July is the hottest. Total annual rainfall is approximately 628 cm, and the amount of rainfall during monsoon months (May-September) accounts for more than 80% of the total annual rainfall. The Liao River system comprises two main rivers (the Liao River and Daliao River). The Liao River system exhibits a main stream of more 513 km in length and over 40 tributaries. The water in the middle reaches of the Liao River mainly comes from the East Liao River (the main tributary of the Liao River), where there is sufficient precipitation and a high percentage of vegetation coverage (more than 60% of the East Liao River watershed area, Figure A1) [9,22,23]. The Daliao River has two main tributaries (the Hun River and the Taizi River), and its basin has been affected by the rapid development of heavy industry in Northeast China. The Daliao River basin encompasses several large and mid-sized cities, such as Shenyang city, which is a super city and is the capital of Liaoning Province [24,25]. The upstream areas of the Liao River basin mainly consist of woodland and grassland (more than 80% of the upstream watershed area of the Liao River basin, Figure A1). The middle and downstream areas of the Liao River basin mainly consist of cropland (more than 85% of the middle and downstream watershed area, Figure A1), with scattered urban land (less than 10% of the middle and downstream watershed area, Figure A1) [9,25,26]. In 2005, the population of the Liao River basin was approximately 3500 × 10 4 , and gross domestic product (GDP) production was approximately 6000 billion Yuan. The majority of the population and GDP production is centered in towns and cities in this area. The GDP per capita in the Liao River basin is higher than the national average [9]. The 66 water quality sampling sites examined in this study covered a wide range of key areas of the Liao River system in Northeast China to reasonably represent river water quality.

Data Sources
Datasets for 66 sampling sites ( Figure 1) including 13 typical variables analyzed monthly for three years (2009)(2010)(2011) were provided by the Environmental Protection Bureau of Liaoning Province. The samples were collected once a month between 9:00 am and 16:00 pm. The chemical analysis was performed in the laboratory within 24 h of collecting the water samples. The monitored parameters included temperature, dissolved oxygen (DO), pH, chemical oxygen demand (COD Mn ), 5-day biochemical oxygen demand (BOD 5 ), ammonical nitrogen (NH 4 + -N), total phosphorus (TP), mercury (Hg), lead (Pb), volatile phenols, petroleum, fecal coliforms (E. coli) and electrical conductivity (EC). The sampling, preservation and analytical procedures were performed according to national standard methods for China [27]. Analytical methods for water quality parameters are listed in Table A1. Hydrological data (streamflow discharge) for 10 years (2000 and 2002-2010, Liaozhong Gauging Station) were obtained from the Hydrological Bureau of Liaoning Province. The land use data set was provided by Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) [28].

Data Treatment
The following data pretreatment measures were applied: (1) Missing data were evaluated based on average values from the corresponding datasets [16,20]; (2) When water quality parameter values (<1%) were below the minimum detection limits, the values were set to the detection limits [29]; (3) The normality of the distribution of each water quality parameter was checked through analysis of kurtosis and skewness before applying the multivariate statistical analyses [6,16,30]. After logtransformation of the data, the their skewness and kurtosis were significantly reduced; these variables (with the exception of DO, CODMn, NH4 + -N, TP, petrol, Hg, Pb and EC) showed values ranging from −0.723 to 0.44 and from −1.002 to 1.252, respectively; (4) Datasets were standardized (mean = 0, variance = 1) when using CA and PCA to minimize the effects of dimension and differences in the variance of water quality parameters [1,16,30]; and (5) the Kaiser-Meyer-Olkin (KMO) measure and Bartlett's sphericity tests were used to evaluate the suitability of the datasets prior to PCA [20].
PMF analysis of the datasets was performed using the EPA PMF 5.0 program (Environmental Protection Agency, Washington, DC, USA). The other statistical computations were performed with SPSS 19.0 (IBM SPSS, Chicago, IL, USA) for Windows. GIS maps were generated using ArcGIS 10.0 (ESRI, San Diego, CA, USA).

Analysis of Variance (ANOVA)
ANOVA was performed to analyze the significant spatial and temporal differences (p < 0.05).

Data Treatment
The following data pretreatment measures were applied: (1) Missing data were evaluated based on average values from the corresponding datasets [16,20]; (2) When water quality parameter values (<1%) were below the minimum detection limits, the values were set to the detection limits [29]; (3) The normality of the distribution of each water quality parameter was checked through analysis of kurtosis and skewness before applying the multivariate statistical analyses [6,16,30]. After log-transformation of the data, the their skewness and kurtosis were significantly reduced; these variables (with the exception of DO, COD Mn , NH 4 + -N, TP, petrol, Hg, Pb and EC) showed values ranging from −0.723 to 0.44 and from −1.002 to 1.252, respectively; (4) Datasets were standardized (mean = 0, variance = 1) when using CA and PCA to minimize the effects of dimension and differences in the variance of water quality parameters [1,16,30]; and (5) the Kaiser-Meyer-Olkin (KMO) measure and Bartlett's sphericity tests were used to evaluate the suitability of the datasets prior to PCA [20]. PMF analysis of the datasets was performed using the EPA PMF 5.0 program (Environmental Protection Agency, Washington, DC, USA). The other statistical computations were performed with SPSS 19.0 (IBM SPSS, Chicago, IL, USA) for Windows. GIS maps were generated using ArcGIS 10.0 (ESRI, San Diego, CA, USA).

Analysis of Variance (ANOVA)
ANOVA was performed to analyze the significant spatial and temporal differences (p < 0.05).

Pearson Correlation
Pearson correlations are currently widely employed in evaluating the relationships among river water quality parameters [5]. Relationships among the considered water quality parameters were tested using Pearson's coefficient with statistical significance set at p < 0.05.

Cluster Analysis (CA)
CA is a multivariate statistical analysis technique that classifies all dissimilar objects into different groups with an unsupervised pattern based on the characteristics they possess [2,[30][31][32]. High internal (within-group) homogeneity and external (between-group) heterogeneity should be observable in the resulting groups of objects [19,33]. CA was used to analyze our dataset to determine the temporal and spatial similarity of river water quality [1,16,20]. We performed hierarchical CA on the standardized dataset using Ward's method, with squared Euclidean distances as a similarity measure, to present an illustrated dendrogram [1,6,33]. The temporal and spatial variability of water quality in the Liao River basin was evaluated based on hierarchical CA using linkage distance [6,20], and the (D link /D max ) ratio between the linkage distance for a particular case (D link ) divided by the maximal linkage distance (D max ) was used to standardize the linkage distance [1,33,34].

Discriminant Analysis (DA)
DA was performed to classify samples exhibiting similar properties with prior knowledge of objects and to identify the most significant discriminant variables for several naturally occurring groups compared with CA [1,33,35]. If DA is effective for a specific data source, the table of classification matrices (including correct and incorrect estimates) will provide a high correct percentage [16,33]. DA was applied in stepwise mode to confirm the groups obtained via CA and to estimate both temporal and spatial variations on the basis of the discriminant variables [16,36]; the sampling periods (temporal variation) and sites (spatial variation) were the clustering (dependent) variables; and all the analyzed water quality parameters were the independent variables [16,20,36].

Principal Component Analysis (PCA)
PCA was used to extract eigenvalues and eigenvectors (loadings or weightings) from the covariance matrix of the original variables to generate new orthogonal (uncorrelated) variables referred to as varifactors (VFs) through VARIMAX rotation; VFs are linear combinations of the original variables [14,19,20,33,[37][38][39], and a VF can comprise both potential and hypothetical variables [1,4,39,40]. PCA is usually applied to obtain the minimal number of factors accounting for the maximal variance in the dataset [19,21]. Finally, the few identified factors will usually explain the vast majority of the entire original information [1,33]. PCA was applied to obtain composite variables identified as latent water pollution factors for the Liao River basin in Northeast China.

Positive Matrix Factorization (PMF)
PMF is a multivariate factorization model based on a least squares approach, using a data point weighting method [17,18]. The model can be written as follows in Equation (1): (1) where X ij represents the elements of the input data matrix of i (number of samples) by j (chemical species) dimensions; g ik represents the elements of the factor scores; f kj represents the factor-loading matrices; e ij is the residual for each sample/species; and p is the number of factors.
The task of PMF is to minimize the objective function, Q (Equation (2)), based on the uncertainties [17]. (2) where u ij is the uncertainty in the jth species for sample number i.

Temporal/Spatial Grouping
Hierarchical CA was applied to group the water quality dataset based on the temporal and spatial variation (using sampling sites in key areas of the Liao River basin) in river water quality in the resulting dendrogram. There is a seasonal flow change law applying to most rivers in the world, so the flow period division is in accordance with the seasonal flow change of the river [41]. Temporal CA generated a dendrogram ( Figure 2) that clearly separated the 12 months of the year into three groups at (D link /D max ) × 100 < 50, with significant differences between the three groups. Group 1 included May-October, which approximately corresponded to the high flow period (HF period) in the Liao River basin. More than 80% of the annual total precipitation falls in this period according to ten years of hydrology data. Group 2 consisted of February-April, which closely corresponded to the low flow period (LF period). Finally, Group 3 comprised November-January, which approximately corresponded to the normal flow period (NF period). A statistical description of discharge that coincides with each type of flow period is listed in Table A2. The spatial CA rendered a dendrogram that grouped all 66 monitoring sites into three different groups at (D link /D max ) × 100 < 80 (Figure 3), similar to the temporal cluster analysis. Group A contained sites S1-S3, S9-S11, S16-S23, S27-S29, S31, S44-S53, S58-S61, and S66; group B comprised sites S4-S8, S12-S13, S24-S26, S32-S33, and S35-S43; and group C contained sites S54-S55, S62-S65, S14-S15, S30 and S34. The spatial CA generated three groups of sampling sites with similar water pollution characteristics in a very convincing manner. In group A, seven sites (S1-S3, S27-S29, and S31) were situated in the upper and middle reaches of the Liao River; 13 sites (S9-S11 and S44-S53) were situated in the upper reaches of the Hun River; and 13 sites were situated in the Taizi River (S16-S23, S58-S61 and S66). In group B, two sites (S30 and S34) were situated in the middle reaches of the Liao River; four sites (S14-S15 and S54-S55) were situated in the lower reaches of the Hun River; and four sites (S62-S65) were situated in the lower reaches of the Taizi River. In group C, sixteen sites (S4-S8, S32-S33 and S35-S43) were situated in the middle and lower reaches of the Liao River; four sites (S12-S13 and S56-S57) were situated in the lower reaches of the Hun River; and three sites (S24-S26) were situated in the lower reaches of the Daliao River.

Temporal/Spatial Grouping
Hierarchical CA was applied to group the water quality dataset based on the temporal and spatial variation (using sampling sites in key areas of the Liao River basin) in river water quality in the resulting dendrogram. There is a seasonal flow change law applying to most rivers in the world, so the flow period division is in accordance with the seasonal flow change of the river [41]. Temporal CA generated a dendrogram ( Figure 2) that clearly separated the 12 months of the year into three groups at (Dlink/Dmax) × 100 < 50, with significant differences between the three groups. Group 1 included May-October, which approximately corresponded to the high flow period (HF period) in the Liao River basin. More than 80% of the annual total precipitation falls in this period according to ten years of hydrology data. Group 2 consisted of February-April, which closely corresponded to the low flow period (LF period). Finally, Group 3 comprised November-January, which approximately corresponded to the normal flow period (NF period). A statistical description of discharge that coincides with each type of flow period is listed in Table A2. The spatial CA rendered a dendrogram that grouped all 66 monitoring sites into three different groups at (Dlink/Dmax) × 100 < 80 (Figure 3), similar to the temporal cluster analysis. Group A contained sites S1-S3, S9-S11, S16-S23, S27-S29, S31, S44-S53, S58-S61, and S66; group B comprised sites S4-S8, S12-S13, S24-S26, S32-S33, and S35-S43; and group C contained sites S54-S55, S62-S65, S14-S15, S30 and S34. The spatial CA generated three groups of sampling sites with similar water pollution characteristics in a very convincing manner. In group A, seven sites (S1-S3, S27-S29, and S31) were situated in the upper and middle reaches of the Liao River; 13 sites (S9-S11 and S44-S53) were situated in the upper reaches of the Hun River; and 13 sites were situated in the Taizi River (S16-S23, S58-S61 and S66). In group B, two sites (S30 and S34) were situated in the middle reaches of the Liao River; four sites (S14-S15 and S54-S55) were situated in the lower reaches of the Hun River; and four sites (S62-S65) were situated in the lower reaches of the Taizi River. In group C, 16 sites (S4-S8, S32-S33 and S35-S43) were situated in the middle and lower reaches of the Liao River; four sites (S12-S13 and S56-S57) were situated in the lower reaches of the Hun River; and three sites (S24-S26) were situated in the lower reaches of the Daliao River.

Temporal/Spatial Variations in River Water Quality
Temporal variations in river water quality were estimated through DA after separating all data for key areas of Liao River basin into three seasonal groups. Temporal DA produced classification matrices (CMs) with 81.2% correct assignments using only eight discriminant parameters (Table 1). Thus, the temporal DA results showed that temperature, DO, pH, CODMn, BOD5, NH4 + -N, total phosphorus (TP) and volatile phenols were the most significant variables for discriminating between the three periods and that these eight parameters explained most of the temporal variations in the water quality of the Liao River system. Table 1. Classification matrices for stepwise discriminant analysis of temporal and spatial variations. Note: HF represents high flow period, NF represents normal flow period, LF represents low flow period.

Number of Clusters
Box and whisker plots of the selected water quality variables supporting the temporal variations identified through temporal DA are given in Figure 4. The results showed that temperature and pH were generally higher in the HF season than the other seasons, whereas higher values for CODMn, BOD5, NH4 + -N, TP and volatile phenols were observed in the LF season than in the other seasons.

Temporal/Spatial Variations in River Water Quality
Temporal variations in river water quality were estimated through DA after separating all data for key areas of Liao River basin into three seasonal groups. Temporal DA produced classification matrices (CMs) with 81.2% correct assignments using only eight discriminant parameters (Table 1). Thus, the temporal DA results showed that temperature, DO, pH, COD Mn, BOD 5 , NH 4 + -N, total phosphorus (TP) and volatile phenols were the most significant variables for discriminating between the three periods and that these eight parameters explained most of the temporal variations in the water quality of the Liao River system. Table 1. Classification matrices for stepwise discriminant analysis of temporal and spatial variations. Box and whisker plots of the selected water quality variables supporting the temporal variations identified through temporal DA are given in Figure 4. The results showed that temperature and pH were generally higher in the HF season than the other seasons, whereas higher values for COD Mn , BOD 5 , NH 4 + -N, TP and volatile phenols were observed in the LF season than in the other seasons.

Number of Clusters
Spatial variations in water quality were evaluated through DA after classifying the data for the study area into three spatial groups. DA also produced CMs with approximately 81.2% correct assignments for the three groups identified by spatial CA (Table 1). The stepwise DA showed that all river water quality parameters were discriminant variables of spatial variation.
Box and whisker plots of the discriminant parameters supporting the spatial variations determined through spatial DA are included in Figure 5. Figures 5 and 6 show that most of the parameters (apart from T, pH, volatile phenols, petrol and EC) presented higher values (DO exhibited an inverse pattern) in group B than in the other two groups. Petroleum and EC were higher in group C than in the other groups, and volatile phenols were higher in group A than in the other groups. COD Mn , BOD 5 and NH 4 + -N were higher in urban areas than in nearby rural areas, while DO displayed an inverse trend with the urbanization level [5,37]. Hg was higher in the lower reaches of the Taizi River, whereas Pb was higher in the middle reaches of the Hun River and Liao River. TP was higher in the lower reaches of the Liao River, and E. coli was higher in the middle and lower sections of the Hun and Taizi Rivers. The effects of volatile phenols in the Fushun section of the Hun River and the Benxi section of the Taizi River (sites 17, 51 and 58) were greatest in the Liao River system. The EC values were higher in the Liao River Estuary than the other areas ( Figure 6). Spatial variations in water quality were evaluated through DA after classifying the data for the study area into three spatial groups. DA also produced CMs with approximately 81.2% correct assignments for the three groups identified by spatial CA (Table 1). The stepwise DA showed that all river water quality parameters were discriminant variables of spatial variation.
Box and whisker plots of the discriminant parameters supporting the spatial variations determined through spatial DA are included in Figure 5. Figures 5 and 6 show that most of the parameters (apart from T, pH, volatile phenols, petrol and EC) presented higher values (DO exhibited an inverse pattern) in group B than in the other two groups. Petroleum and EC were higher in group C than in the other groups, and volatile phenols were higher in group A than in the other groups. CODMn, BOD5 and NH4 + -N were higher in urban areas than in nearby rural areas, while DO displayed an inverse trend with the urbanization level [5,37]. Hg was higher in the lower reaches of the Taizi River, whereas Pb was higher in the middle reaches of the Hun River and Liao River. TP was higher in the lower reaches of the Liao River, and E. coli was higher in the middle and lower sections of the Hun and Taizi Rivers. The effects of volatile phenols in the Fushun section of the Hun River and the Benxi section of the Taizi River (sites 17, 51 and 58) were greatest in the Liao River system. The EC values were higher in the Liao River Estuary than the other areas ( Figure 6).

Identification of Latent Pollution Factors
The 66 monitoring sites were applied to evaluate the correlation matrix of the 13 measured parameters (Table 2). CODMn was highly correlated with BOD5, TP and E. coli in group A and C (r = 0.566-0.902, p < 0.01). High positive correlations were observed between E. coli and NH4 + -N in the all three groups (r = 0.374-0.664, p < 0.1).

Identification of Latent Pollution Factors
The 66 monitoring sites were applied to evaluate the correlation matrix of the 13 measured parameters ( Table 2). COD Mn was highly correlated with BOD 5 , TP and E. coli in groups A and C (r = 0.566-0.902, p < 0.01). High positive correlations were observed between E. coli and NH 4 + -N in the all three groups (r = 0.374-0.664, p < 0.1).
PCA was used to evaluate the latent pollution factors based on the standardized datasets separately for three different groups (groups A, B and C), as determined via CA (Table 3 and Figure 7). The KMO values for the three groups (groups A, B and C) were 0.709, 0.610 and 0.647, respectively, and the significance levels determined by Bartlett's sphericity test were all less than 0.001, which showed that the PCA was useful for significantly reducing the dimensionality of the data [1,16,20,33]. The PCA with VARIMAX rotation produced 5-6 VFs (eigenvalues equal or greater than 1) and explained 61.217%, 69.645% and 63.57% of the total variance in groups A, B and C, respectively. PCA results (including the loading of the 13 water quality parameters, the variance contribution rate of each VF and the accumulated variance contribution rate) for the three groups are listed in Table 3. Some studies [6,33] classify factor loading values of 0.50-0.30, 0.75-0.50 and >0.75 as "weak", "moderate" and "strong", respectively, corresponding to the absolute loading. The VF loading plot (Figure 7) of the three different groups (groups A, B and C) revealed relationships among the river water quality variables, with a shorter distance corresponding to a stronger correlation between the parameters [6,16,20,29,33]. PCA was used to evaluate the latent pollution factors based on the standardized datasets separately for three different groups (groups A, B and C), as determined via CA (Table 3 and Figure  7). The KMO values for the three groups (groups A, B and C) were 0.709, 0.610 and 0.647, respectively, and the significance levels determined by Bartlett's sphericity test were all less than 0.001, which showed that the PCA was useful for significantly reducing the dimensionality of the data [1,16,20,33]. The PCA with VARIMAX rotation produced 5-6 VFs (eigenvalues equal or greater than 1) and explained 61.217%, 69.645% and 63.57% of the total variance in groups A, B and C, respectively. PCA results (including the loading of the 13 water quality parameters, the variance contribution rate of each VF and the accumulated variance contribution rate) for the three groups are listed in Table 3. Some studies [6,33] classify factor loading values of 0.50-0.30, 0.75-0.50 and >0.75 as "weak", "moderate" and "strong", respectively, corresponding to the absolute loading. The VF loading plot (Figure 7) of the three different groups (groups A, B and C) revealed relationships among the river water quality variables, with a shorter distance corresponding to a stronger correlation between the parameters [6,16,20,29,33].   For group A (Table 3 and Figure 7), VF1, which explained 24.036% of the total variance, exhibited strong positive loading on only COD Mn , BOD 5 and NH 4 + -N and moderately positive loading on EC. Thus, VF1 represented oxygen-consuming organic pollution from non-point pollution caused by nutrient runoff from cropland and woodland [6,25]. VF2 (accounting for 10.786% of the total variance) displayed moderately positive loading on temperature and moderately negative loading on DO and was attributed to seasonal changes [14,16,20]. Additionally, VF3 (explaining 9.817% of the total variance) presented strong positive loading on E. coli and moderately positive loading on TP and may be interpreted as fecal and nutrient (TP) pollution originating from local livestock farms and domestic wastewater [10]. VF4, accounting for 8.608% of the total variance, exhibited strong positive loading on pH and moderate positive loading on temperature and may be interpreted as the physicochemical source of the variability [16,20,33,34]. VF5 (explaining 7.971% of the total variance) showed strong positive loading on Pb and moderately positive loading on Hg and was subject to heavy metal pollution originating from mining activity [10]. For group B (Table 3 and Figure 7), VF1 (explaining 21.781% of the total variance) presented strong positive loading on EC, moderately positive loading on NH 4 + -N and TP, and moderately negative loading on DO, likely representing nutrient pollution from domestic wastewater and sewage treatment works [23,26,42]. VF2 (accounting for 12.599% of the total variance) exhibited strong positive loading on Petrol and BOD 5 and, thus, represented oil pollution originating from the petroleum chemical industry [2,12,42,43]. Additionally, VF3 (explaining 9.62% of the total variance) showed strong negative loading on temperature, similar to VF2 of group A, representing natural source impacted by seasonal changes. VF4 (accounting for 9.331% of the total variance) presented strong positive loading on E. coli and moderately positive loading on TP, similar to VF3 of group A. VF5 (explaining 8.28% of the total variance) exhibited strong positive loading on Pb and moderately positive loading on Hg and was attributed to heavy metal pollution from industrial sewage [42,44]. For group C (Table 3 and Figure 7), VF1 (accounting for 20.494% of the total variance) showed strong positive loading on COD Mn and moderately positive loading on BOD 5 , NH 4 + -N, Petrol and Volatile phenols, which represented mixed pollution, including oil.
Pollution, oxygen-consuming organic pollution and toxic organic pollution. Oil pollutants originated from oil production and the petroleum chemical industry, whereas oxygen-consuming/toxic organics mainly originated from steel-making, gas-firing, cooking water, industry, domestic wastewater, garbage produced by humans and bilge water [10,12,42,44]. VF2 (explaining 15.835% of the total variance) presented strong positive loading on DO and moderately positive loading on pH, similar to VF4 of group A (physicochemical sources). VF3 (accounting for 10.523% of the total variance) exhibited strong positive loading on temperature and moderately positive loading on Hg and was attributed to heavy mental pollution originating from industrial sewage during different flow periods [1,2,6,44]. VF4 (explaining 8.623% of the total variance) showed strong positive loading on E. coli and TP, similar to VF3 of group A.     To identify the spatial patterns in latent pollution factors, the loadings and scores of the VFs were plotted [1,6,16,45] for three different group (groups A, B and C) of monitoring sites to illustrate spatial differences (Figure 8). The larger VF scores presented a greater effect [2,6,16,20,33]. In group A (Figure 8a,b), some sites (e.g., 58, 50, 51, 17 and 61) were strongly influenced by organic pollution, whereas other sites (e.g., 2, 1, 27, 3, 31, 53 and 28) were primarily influenced by nutrient pollution. In group B (Figure 8c), some sites (e.g., 62, 63 and 64) were predominantly influenced by nutrient pollution. In group C (Figure 8d), some sites (e.g., 38, 39, 40 and 41) were strongly influenced by oil pollution. To identify the spatial patterns in latent pollution factors, the loadings and scores of the VFs were plotted [1,6,16,45] for three different group (groups A, B and C) of monitoring sites to illustrate spatial differences (Figure 8). The larger VF scores presented a greater effect [2,6,16,20,33]. In group A (Figure 8a,b), some sites (e.g., 58, 50, 51, 17 and 61) were strongly influenced by organic pollution, whereas other sites (e.g., 2, 1, 27, 3, 31, 53 and 28) were primarily influenced by nutrient pollution. In group B (Figure 8c), some sites (e.g., 62, 63 and 64) were predominantly influenced by nutrient pollution. In group C (Figure 8d), some sites (e.g., 38, 39, 40 and 41) were strongly influenced by oil pollution. PCA was also applied to the datasets from three different periods (HF, LF and NF) for each group (A, B and C) of sampling sites to consider the influence of temporal variation on the VFs. The results (Table 4) for KMO and Bartlett's test showed that PCA was effective in reducing dimensionality for all datasets from the Liao River system. The statistical analysis procedures were the same as the previous PCA. Table 5 summarizes the results of source identification for the monitoring sites (groups A, B and C) in the different periods. Most sampling sites in group A were not obviously affected by heavy metal pollution during the NF period. In group B, most sampling sites were influenced by toxic organic pollution during HF and NF periods.  PCA was also applied to the datasets from three different periods (HF, LF and NF) for each group (A, B and C) of sampling sites to consider the influence of temporal variation on the VFs. The results (Table 4) for KMO and Bartlett's test showed that PCA was effective in reducing dimensionality for all datasets from the Liao River system. The statistical analysis procedures were the same as the previous PCA. Table 5 summarizes the results of source identification for the monitoring sites (groups A, B and C) in the different periods. Most sampling sites in group A were not obviously affected by heavy metal pollution during the NF period. In group B, most sampling sites were influenced by toxic organic pollution during HF and NF periods.

Temporal/Spatial Similarities and Groupings
The temporal variation in water quality (Figure 2) in the Liao River system was clearly affected by hydrologic conditions (high, normal and low flow periods) and also by seasonal changes and river water pollution characteristics to some degree [2,6,16,20,33]. As shown by the results (Figure 3) of spatial CA, the sites in group A were primarily situated in the upper and middle reaches of the Liao, Hun and Taizi Rivers. The most upstream sites in the study area were located in a timbered mountainous region receiving little influence from human activities [9,25]. In group B, the sites were situated in the middle reaches of the Liao River and the lower reaches of the Hun and Taizi Rivers, which pass through the areas showing the highest population density and greatest industrialization within the Liao River watershed and are subject to serious river water pollution problems [9][10][11][12]26,43,44]. The sites in group B primarily received discharge from industrial sewage, domestic wastewater and sewage treatment works in city areas and non-point source pollution in rural areas [10,11,14,21,25,26]. The sites in group C were situated in the middle and lower reaches of the Liao River and the lower reaches of the Hun and Daliao Rivers, which are primarily influenced by oil production and petrochemical industry pollution [12,43,44,46,47]. The results of temporal and spatial CA showed that the monitoring frequency and number of monitoring sites may be appropriately reduced through selecting monitoring periods in different seasons and sampling sites from different groups [1,2,20].

Temporal/Spatial Variations in River Water Quality
The characterization of seasonal and spatial variations in water quality is important for evaluating river pollution caused by anthropogenic or natural factors [5,6,16,37]. Temperature and pH were generally higher, while DO was generally lower in the HF season (20.0 • C for water temperature, 7.85 for pH and 6.78 mg/L for DO) than the other seasons (6.5 • C for water temperature, 7.72 for pH and 8.03 mg/L for DO), whereas higher COD Mn , BOD 5 (Figure 4 and Table A3). The lower DO values recorded during the HF period were due to many factors. For example, the local climate differed between seasons, with an obviously higher mean temperature occurring during the HF period (May-October) than the other periods ( Figure 4); thus, the lower DO in the river water observed in summer than in the other seasons results from a natural process because warm water shows lower saturation values for dissolved oxygen and is able to hold less dissolved oxygen [1,5,16,20,48]. Additionally, intense rainfall washes continental organic matter (such as agricultural, forestal and municipal wastes) into the surface water, and organic matter consumes a large amount of dissolved oxygen through biodegradation [6,21,25]. Lower NH 4 + -N, volatile phenol and petrol concentrations were detected during the HF period, due to the effect of dilution by rainfall on point source pollution [2,3,6,42,43,46,47]. The average COD Mn , BOD 5 and TP concentrations were all higher during the LF period than the NF and HF periods due to a lower flow (which would dilute oxygen-consuming organics and nutrients). Hg, Pb and E. coli displayed no statistically significant differences (Figure 4) among the three periods, which was attributed to the relatively low values of these variables and the investigation of sampling sites with similar sources during the different periods [21,42]. Among the measured water quality variables, most of the variables in group A (sites 9, 16, 44, 45 and 48) exhibited low values because these areas were nearly pristine, without significant point source pollution ( Figure 6) [10,21,25]. Within the Liao River basin (Figure 6), the average concentrations of volatile phenols were highest in the Fushun section of the Hun River and the Benxi section of the Taizi River (sites 17, 51 and 58), coming from industrial effluents of the organic chemical industry and steel-making and coke plants [2,10,42,46,47]. Higher COD Mn , BOD 5 , NH 4 + -N, TP, and Hg values and lower DO and pH values were found in group B ( Figures 5 and 6 and Table A3). The higher NH 4 + -N, COD Mn , and BOD 5 values and lower pH values were attributed to the fact that most of these monitoring sites were located in watercourses downstream of or near large urban areas in the Liao River basin, where factories are scattered along the low and middle reaches of the Liao and Daliao Rivers [10,12,42,43,46]. Large amounts of incompletely treated domestic and industrial wastewater (domestic wastewater is over 5 × 10 4 t/a and industrial wastewater is over 4 × 10 5 t/a) from urban areas are discharged into the Liao River system [10], exceeding the self-purification ability of the river and deteriorating water quality [2,11]. The hydrolysis of some acidic materials from point sources (industrial wastewater) causes a decrease in water pH values [4,6,46]. The higher TP and E. coli values observed in most group B sites were primarily due to the fact that the region is a rapidly developing area with the highest population density in the Liao River basin and is characterized by large-scale livestock and poultry breeding production and large areas of cropland [10]. The highest average Hg concentration in the Liao River basin was found in the lower reaches of the Taizi River ( Figure 6), due to effluents from industrial wastewater from the cities of Benxi and Anshan [2,10,21,42,44,49,50]. The average Pb concentration in the Liao River basin was higher in the middle reaches of the Hun River and Liao River ( Figure 6) because of the surrounding industrial wastewater discharge and mining activities [10,21,25,42,49]. The average E. coli concentration was highest in the middle and lower sections of the Hun and Taizi Rivers (Figure 6), which show the highest population densities in the Liao River basin and are characterized by large-scale livestock and poultry breeding farms [10,11,26]. The average petroleum concentration in group C was higher due to oil and gas production and the petrochemical industry. The average EC concentrations at most group C sites were higher in the Liao River Estuary, where the river reaches are affected by tides more often than the other areas [5].

Identification of Latent Pollution Factors
The 66 sampling sites were combined to calculate the Pearson correlation matrix of the 13 analyzed variables ( Table 2). The Pearson correlation coefficients should be interpreted with caution due to the simultaneous effects of temporal and spatial variations on river water quality [1,48,51]. However, some hydro-chemical relationships could be inferred [1,26,48]. COD Mn was highly correlated with BOD 5 , TP and E. coli in groups A and C, which were responsible for point source pollution. NH 4 + -N was highly positively related to E. coli in the all three groups, indicating that these pollutants came from similar sources [1,11,16,20,48]. EPA PMF software was further used to identify the source of the watercourses in the Liao River basin of Northeast China. The number of source factors should be calculated before running the PMF model [17,18]. Considering the Q Value, the number of source factors for PMF was set to five for three groups (groups A, B and C).
The concentrations of species and the percentages of each species for the three groups are shown in Figure 9. For group A, factor profile 1 (F1) is dominated by TP, COD and BOD, and it seems reasonable to conclude that F1 represents nutrient and oxygen-consuming organic pollution originating from cropland and woodland runoff [6,25]. Factor profile 2 (F2) was characterized by enrichment with NH 4 + -N, and F2 appears to be associated with domestic waste [10,21]. Factor profile 3 (F3) was characterized by enrichment in Petrol, showing an association with oil production at some sampling sites. Factor profile 4 (F4) was dominated by Temperature, and F4 represents seasonal changes [14,16,20]. Factor profile 5 (F5) is dominated by E. coli, DO, pH, Pb, Hg and volatile phenols, which appear to be associated with wastewater from local livestock farms, mining activity and the industrial wastewater [10,11,13]. For group B, T, Pb, Hg, pH, TP and E. coli dominated in F1, F1 was best suited for industrial sewage and seasonal changes [14,16,20]. F2 was dominated by DO, pH and Pb, which appear to represent physicochemical pollution [16,20,33,34]. F3 was enriched with NH 4 + -N, TP and E. coli, and it is reasonable to conclude that F3 is associated with local livestock farms and domestic wastewater [10,25,26,42]. Volatile phenols, COD Mn and BOD 5 dominated in F4, which appear to represent gas-fired and cooking water from industry [2,12,13]. F5 was enriched with Petrol and BOD 5 and appears to be associated with oil production and petroleum chemical industry [2,12]. For group C, F1 was dominated by NH 4 + -N, which appears to be associated with domestic waste [25,26,42]. T, Hg and Pb dominated in F2, which is also associated with industrial sewage and seasonal changes [12,16,20]. F3 was dominated by COD Mn , BOD 5 , DO, pH, and which appear to represent oxygen-consuming organic pollution from the industrial sewage and physicochemical sources [25,37,42]. F4 was enriched with TP and might be associated with nutrient pollution from wastewater from local livestock farms. F5 was dominated by Petrol, and it therefore seemed reasonable to conclude that F5 is associated with oil production and petroleum chemical industry [2,12]. The source apportionment results of the PCA and PMF methods are listed in Table 6. Most of the PMF results exhibited good agreement with the PCA results qualitatively, except for F3 (oil pollution) of group A and F4 (gas-fired and cooking water from industry) of group B; however, F3 (oil pollution) of group A and F4 (gas-fired and cooking water from industry) of group B represent continual extensions and refinements of the unexplained variance in their own groups. The results from PMF analysis are in close agreement with the results from PCA method. Compared with PCA, PMF could further quantitatively analyze different pollution sources [17]. However, the results obtained via the PMF method might introduce uncertainty into the conclusions [18,52]. The assessment of source apportionment by the PMF model must be confirmed via PCA to improve its reliability; to a certain extent, the PCA model is the foundation of the PMF, and the PMF model provides more details and expands upon the PCA; the combination of the two methods can provide more valuable information [17,18,52,53].
The actual levels and types of river pollution may be determined by many water quality parameters, and each river presents unique characteristics due to the different influence of natural and human activities [3,5,6,16,20,37]. The above 13 variables were selected, and the following eight pollution types were identified in key areas of the Liao River basin: oxygen-consuming organic pollution [2,29] (mainly influenced by non-point sources for group A and point sources for groups B and C, non-point source pollution including agricultural and forestal plant litter and point sources including industrial sewage, domestic wastewater and wastewater treatment plants); toxic organic pollution [2,12,25,33,46,47] (mainly from steel-making, gas-fired and coking plants); nutrients [5,33,54,55] (mainly from non-point sources); fecal pollution [6,12,26] (mainly from livestock and poultry breeding and domestic sewage); heavy metals [2,6,21,49,56,57] (mainly from mining development and industrial sewage); oil pollution [2] (mainly from oil development); physicochemical pollution [16,20,33] (physicochemical sources of the variability); and natural pollution [16,33] (natural sources impacted by seasonal changes). The pollution types at the sampling sites of the three different groups (groups A, B and C) differed markedly during the HF, NF and LF periods ( Table 5). The majority of monitoring sites in group A clearly received more heavy metal pollution during HF period than NF and LF periods. Because the areas surrounding some sites in group A were characterized by mineral exploitation activity [10,21,25,50], heavy rainfall carried heavy metals to the surrounding river (e.g., upstream regions of the Hun and Taizi Rivers). Most sites in groups A and C received more nutrient and fecal pollution during the HF period than in the NF and LF periods. The majority of sites in group B showed the inverse pattern, receiving more fecal pollution during the LF and NF periods than the HF period. The sites in group B showed the highest population density and contained many intensive livestock and poultry breeding farms, where a high flow in the HF period diluted fecal pollution; whereas the sites in groups C and A presented lower population densities and less intensive livestock and poultry breeding than group B, and the abundant rainfall in the HF period carried more fecal pollution from non-point pollution sources [10,11,26,42]. Some sites in group C were subject to more serious heavy metal and toxic organic pollution during the NF period than during the HF and LF periods, suggesting that there were more sources of toxic organics and heavy metals during the NF period [9,11,21,25,41,42]. These results of source apportionment considering different periods may be helpful for the prevention and control of water pollution caused by human activities in different seasons [2,16,20,35].

Conclusions
The comprehensive application of various GIS-based multivariate statistical methods (Pearson correlation, CA, DA, PCA and PMF) was successful in elucidating the spatial and temporal variations of water quality and the source apportionment of water environment pollution in the Liao River system of Liaoning province. The main conclusions were as follows.
(1) In the Liao River basin of Liaoning province, the 12 months of the year could be grouped into three periods (May-October, February-April and November-January), and all sites in the area could be divided into three significantly different groups. It was quite obvious that the CA method was effective in providing a reliable classification of river water in the Liao River basin of Northeast China, and the establishment of an optimal sampling strategy with a lower cost will become possible in the future [2,20]. (2) Temperature, DO, pH, COD Mn , BOD 5 , NH 4 + -N, TP and volatile phenols were discriminant variables showing temporal variations, with 81.2% correct assignments, and all water quality monitoring parameters were discriminant variables showing spatial variations, also with 81.2% correct assignments. (3) The patterns of pollution varied significantly on spatial and temporal scales. The results from PMF analysis are in close agreement with the results from the PCA method. For group A, oxygen-consuming organics from cropland and woodland runoff were the main latent pollution source. The main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter for group B. The evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics for group C. (4) For group B, the main latent pollution factors were oxygen-consuming organics, oil, nutrients and fecal pollution during the HF and LF periods and oxygen-consuming organics, nutrients, fecal pollution and heavy metals during the NF period. For group C, the main pollutants evaluated mainly consisted of oxygen-consuming organics, oil, and heavy metal during the HF and LF periods and oxygen-consuming organics, toxic organics, oil and heavy metals during the NF period. Author Contributions: Jiabo Chen and Fayun Li designed the study and drafted the manuscript, and Zhiping Fan and Yanjie Wang participated in this work via drafting of the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Analytical methods for water quality parameters of the Liao River system in Liaoning province, China.

Limits of Detection
Temperature   Note: SD is the abbreviation of standard deviation.