Use of Multivariate Statistical Techniques to Study Spatial Variability and Sources Apportionment of Pollution in Rivers Flowing into the Laizhou Bay in Dongying District

: Spatial variability and source apportionment of river pollution ﬂowing into the Bohai Sea are of great signiﬁcance to the pollution liability and development of control strategies to reduce the terrestrial discharge of pollution in the ocean. In this study, ten water quality variables from 14 monitoring sites in rivers ﬂowing into Laizhou Bay were obtained to investigate the spatial variation and pollution sources in Dongying District from 2018–2019. The survey area was divided into a low pollution (LP) zone and a high pollution (HP) zone by cluster analysis based on ten indicators. Principle component analysis/factor analysis with a geographic information system was performed to identify the four main pollution sources in the survey area. Compared with the positive matrix factorization model, the absolute principal component score-multiple linear regression (APCS-MLR) model was more appropriate for the source apportionment of pollution in the surface water of Dongying District. The point source pollution of domestic sewage (23.6%) was the most crucial pollution source of water in the LP zone, followed by non-point pollution from agricultural activity (16.4%). The contribution rate in the HP zone analyzed by the APCS-MLR model followed a decreasing order: point source pollution from domestic sewage (28.5%) > non-point pollution source of overland runoff (14.8%) > point source pollution of hybrid wastewater (12.4%) > point source pollution from industries sewage (10.6%). Therefore, the spatial distribution and sources of pollution in the investigated area should be considered while developing control measures to reduce the discharge of pollution to Laizhou Bay. petrochemical


Introduction
The Bohai Sea, a semi-enclosed sea, is surrounded by land on three sides with poor water exchange conditions. As one of the Bohai Sea bays, Laizhou Bay is of ecological and economic importance because of its biological diversity and economic and environmental value. Numerous water pollution events have been reported in Laizhou Bay [1][2][3]. Spatiotemporal variation of water quality is available in large regions [4], but small-scale water quality variation has rarely been investigated. Little information is available in the literature on spatial variability and source apportionment of pollution in rivers flowing into Laizhou Bay. The current method to determine pollution liability is through the inspection of sewage outlets in China. Because of this, it is impossible to identify the source of pollution if the underground sewage outlets are relatively concealed. This has hindered the responsibility for pollution and the development of control measures to reduce land-based pollution discharged into Laizhou Bay.
The Dongying District is located in the downtown area of Dongying city, Shangdong Province, and lies between latitudes 37 • 14 13 N and 37 • 31 57 N, and between longitudes 118 • 12 42 E and 118 • 59 52 E (Figure 1). Dongying District is 67.5 km in width from east to west and 26.5 km in length from north to south, with a total area of 1155.62 km 2 . Land use types include cropland, forestland, grassland, wetland, water bodies (rivers, canals, ditches), anthropogenic areas, and barelands ( Figure 2). Nine rivers are situated in the Dongying District, including a natural river (Yellow River) and eight drainage rivers or drainage canals. The GuangLi and ZhiMai rivers originate from the Yellow River and flow into the Bohai Sea via the Laizhou Bay. The GuangLi River has several tributaries, including the Laoguangpu, Liugan, and Wugan rivers. The ZhiMai River consists of the Xinguangpu and Wujia rivers. The Dongying River flows into the Yihong River and then into the Guangli River ( Figure 1). Dongying District has a warm temperate continental monsoon climate. The average annual temperature of Dongying District was 14.7 °C. The lowest temperature often occurs in January, and the hottest months are usually July and August. The average annual precipitation is 472.8 mm, and the average annual evaporation is 1900.8 mm.  Dongying District has a warm temperate continental monsoon climate. The average annual temperature of Dongying District was 14.7 • C. The lowest temperature often occurs in January, and the hottest months are usually July and August. The average annual precipitation is 472.8 mm, and the average annual evaporation is 1900.8 mm.

Sample Collection and Analysis
Water samples were collected from 14 monitoring sites along the tributaries of the GuangLi and ZhiMai rivers from May 2018 to July 2019 ( Figure 1 and Table S1). The meteorological conditions at the time of sampling are presented in Figure S1. Twenty-four samples were collected every 15 days (twice a month) from each site on the river axis. A bucket was used to collect 1 L of water samples from 10 to 20 cm at each sampling point. The samples were then placed in a polyethylene bottle and brought back to the laboratory for freezing. The selected chemical variables that could affect the quality of surface water included pH, dissolved oxygen (DO), BOD, COD, NH 3 -N, TN, TP, permanganate index (PI), petroleum hydrocarbon (PHC), and the number of fecal coliforms (F.coli). These variables were determined based on GB3838-2002 [19] and are described briefly as follows. The pH value was measured using the glass electrode method, while DO was determined using the electrochemical probe method, and BOD was determined by dilution and inoculation. The COD was measured using the dichromate method, while PI was determined using the potassium permanganate method. Nessler's reagent spectrophotometry was used to determine the NH 3 -N, TN content was determined by UV spectrophotometry using alkaline potassium persulfate digestion, and TP was determined using ammonium molybdate spectrophotometry. The PHC was determined using infrared spectrophotometry, while F.coli was determined using the multi-tube fermentation method. It should be noted that water temperature and turbidity did not need to be determined in this study. This is because water temperature results from the effect of natural variations on water quality, including the season, and turbidity is related to other water quality indicators monitored in this study, including organic pollutants, inorganic pollutants, and microorganisms. The seasonal variation was not obvious in this study because the pollution of the city was mainly affected by the discharge of wastewater but not by rainfall, thus, these factors were excluded.

Sample Collection and Analysis
Water samples were collected from 14 monitoring sites along the tributaries of the GuangLi and ZhiMai rivers from May 2018 to July 2019 ( Figure 1 and Table S1). The meteorological conditions at the time of sampling are presented in Figure S1. Twenty-four samples were collected every 15 days (twice a month) from each site on the river axis. A  The GIS data include land use types and the distribution of industrial parks, plants, and sewage. Landsat Thematic Mapper images with 30 m resolution for 2015 were used to map land use in the study area ( Figure 2b). Land use data were obtained from Tsinghua University (http://data.ess.tsinghua.edu.cn/fromglc2015_v1.html, accessed on 29 May 2019). The distribution of industrial parks, plants, and sewage in 2019 was obtained by the Environment Protection Ministry (Figure 2a and Table S2). The animal husbandry Ministry of Dongying District provided information on livestock and poultry in 2019 (Table S3) Cluster analysis was used to identify different spatial distribution characteristics based on distance or similar variables [20]. As one of the most common CA methods, hierarchical cluster analysis was adopted to reduce the dimensionality of the original data dramatically in this study [21]. The Ward method was chosen to perform clusters with minimum intra-cluster variance, and squared Euclidean distances were applied to act as similarity indicators.

Determination of the Critical Factors of Water Quality Changes Based on Principal Component Analysis (PCA)/Factor Analysis (FA)
Principal component analysis and FA were conducted to analyze the key factors of water quality changes in each zone divided by CA and identify the primary pollution sources. The principal component analysis is a multivariate statistical method that transforms correlated original variables into new and linearly uncorrelated variables to reduce the dimensions of the original dataset [22]. Factor analysis further reduces the contribution of the less significant variables to simplify the data structure by rotating the axis defined by the PCA. From this, the new variables called varifactors (VF) are constructed to explain unobservable, hypothetical, and latent sources that lead to surface water pollution. An eigenvalue greater than one is regarded as a component.

Estimating the Contribution of Pollution Sources Based on the APCS-MLR Model
As a receptor model, the APCS-MLR model can quantitatively describe the contribution of pollution sources to the water quality indicators, which has been widely used in the analysis of water environmental pollution sources in recent years [16,18], and is expressed as follows in Equation (1): where i is the number of water quality variables in the analysis, k is the number of observations, j is the number of the components influencing the water quality variable, C ik is the concentration of the ith parameter during observation k, r i0 is the constant obtained by the multiple linear regression (MLR) component of the analysis of the water quality variable i, r ij is the regression coefficient of pollution source j to variable i, and APCS jk is the absolute component score for the jth component, which was obtained by Equations (2)-(6) [23].
where (Z k ) i is the standardized value of the ith variable on the kth observation, C i and σ i are the average concentration and standard deviation of the ith variable, respectively; S ij is the score coefficient of the jth component for the ith variable, (Z 0 ) i is the standardized value of absolute zero concentration, (A z ) jk is the component score of the jth component, (A 0 ) i is the component score of absolute zero concentration, and APCS jk is the absolute component score for the jth component. A negative value may be obtained in the APCS-MLR model, which results in the loss of pollution source analysis accuracy. According to Gholizadeh et al. [10], a measure of converting negative to positive values was used to calculate the contribution rate of pollution sources to water quality variables.

Estimating Contribution of Pollution Sources Based on the PMF Model
Similar to the APCS-MLR model, the PMF model, first proposed by Paatero et al. [24] as a receptor model, is another multivariate statistical analysis tool and has been widely used for the identification of pollution sources. Based on a weighted least-squares fit, the PMF model decomposes the original data matrix into a factor score matrix, factor loading matrix, and residual matrix, which was expressed as follows in Equation (7): where X ij is the original data matrix containing the concentration of the jth parameter in the ith sample, G ik is the matrix with the contributions of kth source pollution to the ith sample, F kj is the factor loading matrix for with the kth factor of the jth parameter, and E ij is the residual matrix for the jth parameter in the ith sample. The optimal matrices G and F are obtained by minimizing the objective function Q based on the weighted least-squares method for limiting and iterative calculations (Equation (8)): where u ij is the uncertainty of the jth variable in the ith sample. In addition, an additional uncertainty of 20%, as suggested, was considered in this study to represent the uncertainty associated with field sampling operations, such as sample collection, transport, and storage [25,26]. The number of factors is an important parameter for obtaining optimal results in the PMF model process. In this study, the number of factors was determined by PCA/FA results to compare between the APCS-MLR and PMF models [10]. Further detailed information can be found in the EPA PMF 5.0 User Guide.

Statistical Analysis
Excel 2016 and SPSS 22 were used to process and analyze the data. The non-parameter test, Kruskal-Walls ANOVA, was used to compare different zones divided by CA. SPSS 22 was used to perform CA, PCA, and ACPS-MLR, while EPA PMF 5.0 was used to handle the PMF model. The resulting data are shown as a mean ± standard deviation. ArcGIS 10.6 was used to handle the map of land use and industrial distribution. Table 1 describes the annual mean of ten variables at the 14 monitoring sites (twentyfour samples in each site). The pH ranges from 7.00 to 9.34, while the concentration of DO varied between 3.07 and 9.01 mg/L, and the concentration of BOD ranged from 2.3 to 10 mg/L. The COD concentrations ranged from 13 to 51 mg/L, while the concentration of NH 3 -N varied between 0.038 and 1.98 mg/L and the concentrations of TN and TP ranged from 13-51 and 0.013-0.9 mg/L, respectively. The concentration of PI ranged from 1.88 to 14.9 mg/L, and the concentration of PHC varied between 0.005 and 0.49 mg/L. These results indicate that the surface water in Dongying District was polluted to some degree despite the fact that pH, PI, and PHC in all samples satisfied grade IV according to the GB3838-2002 standard [19]. The concentrations of BOD, COD, NH 3 -N, TP, and TN exceeded grade IV of the national surface water GB 3838-2002 in some samples [19]. Overall, the pollution caused by organic pollutants appeared to be the most serious. Furthermore, the amount of F.coli in some samples was much higher than the 20,000 MPN/L threshold of grade IV [19]. Note: All data is mean ± standard deviation for the annual value of the bi-monthly sampling events.

Characteristics of the Spatial Distribution
The average concentration of the sample site was used to perform CA to gain insights into the distribution characteristics and trace the pollution sources. Figure S2 depicts the dendrogram rendered by the CA, All 14 river sampling sites were divided into two statistically significant clusters: cluster 1, consisting of L1, L2, L3, L4, and L5; and cluster 2, consisting of L6, L7, L8, L9, L10, L11, L12, L13, and L14. The spatial variations of clusters 1 and 2 are displayed in Figure 3. There was a significant difference in water pollution at the monitoring sites between both clusters. Compared to cluster 1, the concentrations of BOD, COD, NH 3 -N, PPI, and F.coli were significantly higher in cluster 2. However, the DO of cluster 1 was significantly superior to that of cluster 2. Thus, clusters 1 and 2 corresponded to relatively low pollution and very high pollution, respectively, and could be defined accordingly as the low pollution (LP) and high pollution (HP) zones. The LP zone is distributed in the southwestern Dongying District, while the HP zone is distributed in the northeastern Dongying District (Figure 1). In addition, Figure S3 displays the maps of spatial variation of the annual mean of ten water quality indicators before standardization in Dongying District. The results shown in Figure S3 are consistent with those of the CA. The wet and dry season variations were not obvious in this study because the pollution of the city was mainly affected by human activities and not by rainfall.
is distributed in the southwestern Dongying District, while the HP zone is distributed in the northeastern Dongying District (Figure 1). In addition, Figure S3 displays the maps of spatial variation of the annual mean of ten water quality indicators before standardization in Dongying District. The results shown in Figure S3 are consistent with those of the CA. The wet and dry season variations were not obvious in this study because the pollution of the city was mainly affected by human activities and not by rainfall.

Number of Pollution Sources Identified Based on the PCA Analysis
For PCA/FA, the Kaiser-Meyer-Olkin (KMO) test was used to verify the sampling adequacy. When KMO value was significantly higher than 0.5 [27], PCA/FA could be used for further analysis. In the two pollution zones, the KMO values were 0.577 (LP zone) and 0.673 (HP zone), with p values close to zero, respectively. Thus, PCA/FA could be used significantly to reduce the initial database dimensionality and was valid for source identification.
Based on the PCA/FC results (Table 2), four factors in the LP zone could explain 63.5% of the total variance. Factor 1 (VF1) contributed approximately 23.6% of the total variance and showed a strongly positive load on PI and PHC, described as follows. The Dongying District is a typical petrochemical city. Figure 2a indicates that the industrial park3 and some petrochemical industries are located in the LP zone, and wastewater from these industries can be discharged into adjacent rivers. Furthermore, PI is an indicator of organic/inorganic matter pollution. Therefore, VF1 represents point source pollution from the discharge of sewage in the petrochemical industry. Factor 2 (VF2, 15.4% of the total variance) showed a moderately positive load on NH3-N and TN. As shown in Figure 2b, cropland was predominantly located in the LP zone, thus nitrogen fertilizer is likely the major source of N in rivers [28][29][30]. According to the Statistical Yearbook in 2019, approximately 2600 tons of nitrogen fertilizers was applied to croplands in Dongying District.

Number of Pollution Sources Identified Based on the PCA Analysis
For PCA/FA, the Kaiser-Meyer-Olkin (KMO) test was used to verify the sampling adequacy. When KMO value was significantly higher than 0.5 [27], PCA/FA could be used for further analysis. In the two pollution zones, the KMO values were 0.577 (LP zone) and 0.673 (HP zone), with p values close to zero, respectively. Thus, PCA/FA could be used significantly to reduce the initial database dimensionality and was valid for source identification.
Based on the PCA/FC results (Table 2), four factors in the LP zone could explain 63.5% of the total variance. Factor 1 (VF1) contributed approximately 23.6% of the total variance and showed a strongly positive load on PI and PHC, described as follows. The Dongying District is a typical petrochemical city. Figure 2a indicates that the industrial park3 and some petrochemical industries are located in the LP zone, and wastewater from these industries can be discharged into adjacent rivers. Furthermore, PI is an indicator of organic/inorganic matter pollution. Therefore, VF1 represents point source pollution from the discharge of sewage in the petrochemical industry. Factor 2 (VF2, 15.4% of the total variance) showed a moderately positive load on NH 3 -N and TN. As shown in Figure 2b, cropland was predominantly located in the LP zone, thus nitrogen fertilizer is likely the major source of N in rivers [28][29][30]. According to the Statistical Yearbook in 2019, approximately 2600 tons of nitrogen fertilizers was applied to croplands in Dongying District. Therefore, VF2 could be explained by the non-point source pollution of agricultural activity due to fertilizers. Factor 3 (VF3, 14.1% of the total variance) showed a moderately positive load on TP and F.coli. According to the information provided by the Animal Husbandry Ministry of Dongying District in 2019 (Table S3), livestock farms were mainly located in the LP zone, thus livestock manure could be the dominant source of bacteria in adjacent rivers. Further, there were no sewage or manure treatment facilities in rural areas, resulting in the direct discharge of pollutants. Moreover, fecal pollution is associated with soil erosion and surface runoff, and non-point sources tend to contribute significantly to the fecal pollution of surface waters [31]. Consequently, VF3 represents the fecal pollution sources with points and non-points. Factor 4 (VF4, 10.3% of the total variance) exhibited a moderately positive load on BOD which is a comprehensive indicator of organic pollutants [18]. As shown in Figure 2b, the rural area was distributed in the LP zone, thus organic matter from untreated domestic wastewater could be discharged into adjacent rivers. From this, VF4 represents the point source of pollution of the domestic sewage discharge. In the HP zone, four significant factors accounted for 60.3% of the total variance in the dataset. Among these factors, VF1 (25.0% of the total variance) showed a strongly positive load on PI and PHC, moderate load on F.coli, and negative load on pH and DO, which represented the point source pollution of hybrid sewage discharged into the river from the outlet (Table S2). VF2 in the HP zone (13.3% of the total variance) showed a moderately positive load on TP and TN, representing the influence of non-point source pollution of soil erosion due to urbanization. Accordingly, many researchers have found that urbanization activities increase N and P in surface water [32][33][34]; furthermore, soil erosion in this area has been severe owing to rapid urban development over the past decade [34,35]. Similar to VF4 in the LP zone, VF3 (11.7% of the total variance) showed a strongly positive load on BOD, which might be attributed to the point source discharge of municipal wastewater. VF4 (10.7% of the total variance) had a moderately positive load on NH 3 -N and negative loading on COD. The COD originated primarily from industrial wastewater. Industrial park 1 (chemical industries) and industrial park 2 (manufacturing) were mainly distributed along the riverbank of Dongying District (Figure 2b), producing a large amount of acid-containing and organic wastewater. The untreated sewage from these plants exhibited high COD. Furthermore, it is known that a large amount of NH 3 -N can arise from the decomposition of organic pollutants [10,17,36]. Thus, VF4 represents the effect of point source pollution from this industrial sewage.

Contributions of Pollution Sources Based on the APCA-MLR Model
The APCA-MLR model was used to quantify the contribution of each pollution source to water quality. According to the results shown in Table S4, the determination coefficient (R 2 ) of MLR was greater than 0.5, indicating a good consistency between the predicted and measured values by line fitting, and that the source apportionment results were reliable [10]. The ratio of the average predicted and measured values was close to 1, which showed that the APCS-MLR receptor model had good applicability for source apportionment. Figures 4 and 5 show the contribution of pollution sources for the ten studied variables and the average contribution of pollution sources based on the APCS-MLR model in the LP and HP zones.
In the LP zone, most contributions were predominately influenced by point source pollution of domestic sewage (S4, 23.6%), shown as DO (56.5%) and BOD (66.6%). In addition, non-point source pollution from agricultural activity (S2) accounted for 16.4% of the total pollution sources as NH 3 -N (38.1%) and TN (26.9%). The average contribution of the pollution source of fecal matter (S3) and point source pollution of petrochemical sewage discharge (S1) accounted for 7.8% and 3.4%, respectively, which were significantly lower than those of S2 and S4. Therefore, the contributions of pollution sources in the LP zone should be ranked in the decreasing order as follows: point source pollution of domestic sewage > non-point source pollution of agricultural activity> pollution source of fecal matter > point source pollution of petrochemical sewage. outlets should be strengthened, and substandard wastewater discharge should be prohibited in urban areas. In the progress of urbanization, the green area should be increased to reduce soil loss.

Contribution of Pollution Sources Based on the PMF Model
The "Factor Figureprints" of water quality variables are depicted in Figure 6 to indicate the contribution of pollution sources based on the PMF model. In the LP zone, the outlets should be strengthened, and substandard wastewater discharge should be prohibited in urban areas. In the progress of urbanization, the green area should be increased to reduce soil loss.

Contribution of Pollution Sources Based on the PMF Model
The "Factor Figureprints" of water quality variables are depicted in Figure 6 to indicate the contribution of pollution sources based on the PMF model. In the LP zone, the In the HP zone, the point source discharges of municipal domestic wastewater (S3) contributed the most at 28.5%, shown as BOD (65.8%). Non-point source pollution from overland runoff (S2) accounted for 14.8% of the total pollution sources, represented as TP (50.9%). The point source pollution of hybrid wastewater discharged to the river by outlets (S1) accounted for 12.4% of the total pollution sources, represented as PI (31.1%) and DO (24.9%). The average contribution of point source pollution from industrial sewage (S4) accounted for 10.6%, shown as COD (58.5%) and NH 3 -N (48.9%). S4 was significantly lower than those of S2 and S3. This could contribute to the strict monitoring and management of sewage plants. The HP zone's pollution sources might follow the decreasing order as follows: point source pollution from domestic sewage > non-point source of pollution of the overland runoff > point source pollution from hybrid wastewater discharged into the river by outlets > point source pollution from industrial sewage.
The APCS-MLR model results showed that point source pollution was the most critical source of surface water pollution in Dongying District, followed by non-point pollution from agriculture and overland runoff. In general, the LP zone was mainly affected by point source pollution from untreated domestic sewage, followed by non-point source pollution from agriculture. Therefore, it is essential to strengthen the infrastructure construction of sewage treatment facilities in rural areas. Additionally, the use of fertilizers and pesticides on croplands needs to be controlled. In the HP zone, industrial and domestic wastewater may be one of the most critical pollutant sources. This study has found that water quality is influenced by non-point source pollution of overland runoff caused by urbanization. Therefore, the monitoring of sewage treatment plants and factory discharge outlets should be strengthened, and substandard wastewater discharge should be prohibited in urban areas. In the progress of urbanization, the green area should be increased to reduce soil loss.

Contribution of Pollution Sources Based on the PMF Model
The "Factor Figureprints" of water quality variables are depicted in Figure 6 to indicate the contribution of pollution sources based on the PMF model. In the LP zone, the contributions of source 1 (S1) to COD and PI were 47.5% and 45.9%, respectively. Therefore, S1 was identified as the source of industrial sewage pollution. Source 2 (S2) was mainly characterized by a PHC of 88.8%, which could be considered a point source of wastewater discharged from petrochemical enterprises. Source 3 (S3) was dominated by F.coli (100%), NH 3 -N (69.1%), and TN (65.0%), which could be considered to represent the non-point source pollution of overland runoff due to agricultural activity. The contribution of source 4 (S4) to TP, DO, BOD, and pH was 72.0%, 56.0%, 53.9%, and 47.5%, respectively. Thus, S4 represents the point source pollution of the domestic sewage. contributions of source 1 (S1) to COD and PI were 47.5% and 45.9%, respectively. Therefore, S1 was identified as the source of industrial sewage pollution. Source 2 (S2) was mainly characterized by a PHC of 88.8%, which could be considered a point source of wastewater discharged from petrochemical enterprises. Source 3 (S3) was dominated by F.coli (100%), NH3-N (69.1%), and TN (65.0%), which could be considered to represent the non-point source pollution of overland runoff due to agricultural activity. The contribution of source 4 (S4) to TP, DO, BOD, and pH was 72.0%, 56.0%, 53.9%, and 47.5%, respectively. Thus, S4 represents the point source pollution of the domestic sewage.
In the HP zone, S1 was dominated by F.coli (86.4%), which is considered to represent the non-point source pollution of overland runoff. The contribution of S2 to the PHC was 75.3%. Therefore, S2 represents the point source pollution of wastewater discharged from petrochemical enterprises. S3 was mainly characterized by a TP of 79.2%, DO of 53.4%, pH of 47.0%, TN of 46.2%, and BOD of 45.7% in the HP zone. Therefore, S3 could be considered as a point source of pollution in domestic sewage. The contributions of S4 to NH3-N and PI were 69.6% and 49.5%, respectively, thus, S4 represents the point source pollution of industrial wastewater.
The suggested average contribution of pollution sources is shown in Figure 7   In the HP zone, S1 was dominated by F.coli (86.4%), which is considered to represent the non-point source pollution of overland runoff. The contribution of S2 to the PHC was 75.3%. Therefore, S2 represents the point source pollution of wastewater discharged from petrochemical enterprises. S3 was mainly characterized by a TP of 79.2%, DO of 53.4%, pH of 47.0%, TN of 46.2%, and BOD of 45.7% in the HP zone. Therefore, S3 could be considered as a point source of pollution in domestic sewage. The contributions of S4 to NH 3 -N and PI were 69.6% and 49.5%, respectively, thus, S4 represents the point source pollution of industrial wastewater.
The suggested average contribution of pollution sources is shown in Figure 7,

Comparison of Pollution Source Contribution Based on the APCS-MLR and PMF Models
The contribution of pollution sources based on the APCS-MLR and PMF models d fered significantly. Therefore, R 2 and the ratio of the predicted and observed values we used to compare the application and accuracy between the APCS-MLR and PMF mod (Table S4). According to Liu et al. and Simeonov et al. [14,37], the source apportionme results would be relatively good when the ratio of predicted and observed values w close to 1.0, and R 2 was greater than 0.5. In this study, the ratio of predicted and observ values in the APCS-MLR model varied between 0.7 to 1.43, and the average R 2 was 0 (Table S4), whereas the ratio of predicted and observed values in the PMF model rang between 1.00 and 1.08, and the average of R 2 was 0.42 (Table S4). In addition, the scat plots between the predicted and observed values from the results of the two PMF a APCS-MLR models are displayed in Figures S4 and S5 for the LP and HP zone, resp tively. From these figures, the APCS-MLR model had better goodness of fit for most of t water quality variables based on the fitting equation and R 2 . On the other hand, the p cedure and dataset involved in the APCS-MLR model were relatively simple compar to the PMF model [10,38], although there was a lower specificity for the APCS-MLR mod to use the concentration data without uncertainty values. Therefore, the accuracy of t APCS-MLR model was superior to that of the PMF model in terms of quantitative analy in this study.

Conclusions
In this study, multivariate statistical methods, including CA and PCA/FA, were us Similar to the APCS-MLR model, the PMF model results showed that point source pollution was the most critical pollution source of surface water in Dongying District, followed by the non-point pollution of overland runoff.

Comparison of Pollution Source Contribution Based on the APCS-MLR and PMF Models
The contribution of pollution sources based on the APCS-MLR and PMF models differed significantly. Therefore, R 2 and the ratio of the predicted and observed values were used to compare the application and accuracy between the APCS-MLR and PMF models (Table S4). According to Liu et al. and Simeonov et al. [14,37], the source apportionment results would be relatively good when the ratio of predicted and observed values was close to 1.0, and R 2 was greater than 0.5. In this study, the ratio of predicted and observed values in the APCS-MLR model varied between 0.7 to 1.43, and the average R 2 was 0.60 (Table S4), whereas the ratio of predicted and observed values in the PMF model ranged between 1.00 and 1.08, and the average of R 2 was 0.42 (Table S4). In addition, the scatter plots between the predicted and observed values from the results of the two PMF and APCS-MLR models are displayed in Figures S4 and S5 for the LP and HP zone, respectively. From these figures, the APCS-MLR model had better goodness of fit for most of the water quality variables based on the fitting equation and R 2 . On the other hand, the procedure and dataset involved in the APCS-MLR model were relatively simple compared to the PMF model [10,38], although there was a lower specificity for the APCS-MLR model to use the concentration data without uncertainty values. Therefore, the accuracy of the APCS-MLR model was superior to that of the PMF model in terms of quantitative analysis in this study.

Conclusions
In this study, multivariate statistical methods, including CA and PCA/FA, were used to survey the spatial variation in river water quality. The APCS-MLR and PMF models identified potential factors influencing the water quality of the river, revealed the potential pollution sources, and quantified their contribution to the pollution of surface water. According to the degree of pollution, the study area was classified into LP and HP zones by CA. The contribution obtained by the APCS-MLR model was more reliable in identifying the pollution source in Dongying District, by comparison, the R 2 and the ratio of predicted and observed values between the APCS-MLR and PMF models. The APCS-MLR model with PCA/FA was used to identify four critical pollution sources in the LP and HP zones. In the LP zone, the contributions of pollution sources at the LP zone should be ranked as point source pollution of domestic sewage > non-point source pollution of agricultural activity > pollution source of fecal matter > point source pollution of petrochemical sewage. The contribution in the HP zone followed the decreasing order as follows: point source pollution from domestic sewage > non-point pollution source of overland runoff > point source pollution of hybrid wastewater > point source pollution from industrial sewage. These findings suggest that the rivers flowing into Laizhou Bay in Dongying District are primarily polluted by the point source pollution of domestic sewage and non-point source pollution of overland off. In considering the limitations of this study, the monitored indicators consisted of only ten water quality variables, and the sampling time period was only one year. Therefore, further work needs to be conducted using more monitored indicators and a longer sampling time period to better identify the undefined sources and understand the spatial variability.  Table S1. Longitude and latitude of monitoring sites; Table S2. River outlets in the LP and HP zones; Table S3. Number of livestock and poultry farms and types of livestock in the LP and HP zones; Table S4. Mean source contributions to different variables by the APCS-MLR and PMF models in the LP and HP zones.